The value of log file analysis for SEO

The value of log file analysis for SEO

Log files are of great importance for every SEO audit of the site. They are necessary to understand how the site is found and mapped by the various search engine crawlers.

Advantages compared to the well-known web analytic tools

Almost all computers and programs use log files, as do web servers. Important information is stored in the log files such as the dates and times of visits to the site, the IP address used, the name of the visited page, the browser version, operating system of the visitor, and cookie information. The log files not only record the data of regular visitors, but also those of the crawlers (also known as spiders or bots) that the search engines like Google use to map the site. That is something that no web analytic tool based on JavaScript can do. An additional advantage is that there are no additional requirements to use log files: No JavaScript needs to be added on every page that needs to be logged, which prevents configuration errors and a delay in opening the page. Also, no special ‘tags’ are required, which is often the case when using JS. In spite of this, JavaScript tools are generally indispensable, but log files are a welcome addition and for some SEO audit activities a better alternative.

In particular, an analysis of the crawl activities can provide valuable insights about the functioning of the site and the individual pages. The behavior of the crawlers changes regularly, so there is no point in searching for small changes every day. Log files are particularly useful when tracking trends over the longer term. For example, it can be analyzed whether the activity of the crawlers on the site shows a rising or falling line and which pages on the site are most frequently crawled.

log file analysis for seo

Log files are extremely extensive and record virtually all the activities, so they can contain a lot of useful information for an SEO scan. So it is important to get those files in your hands. If the site is managed in-house, that will not be a problem, but even in case the website is hosted by a third party (ISP), the files are usually accessible. Inquire about their availability and also ask how long the log files are kept. Often the provider has log files that go far back in time, so the analysis can be carried out over a longer period, which improves the reliability.

Crawl budget optimization

An obvious way to analyze log files is by using Microsoft Excel. This requires some dexterity when importing, filtering and sorting the data. An imported log file provides a worksheet with many rows and columns. By using filters it is possible to make clear which web crawlers have access to the site and with which regularity. The crawling of the search engines has certain rules. One of the most important is the so-called ‘crawl budget’, the number of pages per site that the crawler searches through each day. The ‘crawl budget’ is not an unchanging number, but depends on a number of different factors. An account is taken of the negative effect that crawling may have on the speed of the host. That is why a hierarchical list of pages is drawn up, based among other things on the basis of PageRank. The crawl priority of the sections and pages can also be influenced using the XML sitemap.

log file analysis seo

The pages on the crawl list are re-crawled regularly, starting with the pages that have the highest priority. The crawl process is stopped when the crawler activity causes delays on the host. As a result, the most important pages are most frequently re-crawled and it is possible that less high-scoring pages are regularly skipped. That is why it is important to know how the crawlers spend their time on the site. It is possible that the search engines do not crawl enough pages of the website due to a limited ‘crawl budget’. An analysis of the log files can then be extremely useful. It becomes visible which sections and pages are being crawled and the frequency with which that happens. But it also becomes clear how the crawler spends his time on the site and on which parts of the site he unnecessarily loses time. For example, it is possible that irrelevant or duplicate URLs are crawled. For example, ‘URL parameters’, which are often tagged to multiple pages if they are part of the same marketing campaign. In order to exclude these URLs from the crawl process, you can log in to the Google Search Console (at Google) , select ‘Crawl’ and then ‘URL parameters’. For example, it is possible to configure and exclude URL parameters from crawling, leaving more time for searching important pages.

Also read about: Voice Search Optimization

HTTP response codes

Another important source of information from the log files are the ‘ response codes ‘ of the web server. By searching for non-existent pages valuable time is lost. When Excel sorts the log files on this column, it becomes apparent how often a certain response code is generated by the activity of the crawlers.

In principle, the only good response code ‘200’, the message that the page is found, all its ‘redirect’ codes (300, 301, 302), which ultimately end up with a code 200 are also in order, provided that it is not there. Many are error codes such as 404 and 500 indicate other problems with the site.


Are there tools on the market to facilitate this whole process? Hell yes. My preference is for Screaming Frog. This is a freemium spider tool that you can download here. The free version has a limit of 500 URLs per crawl. For small sites, this is usually enough.

Do not forget the log files …

Log files can contain a wealth of data that can be crucial for the website’s search performance. The use of log files, in addition to web-based analytic, helps to determine the ‘condition’ of a site and can help to solve problems that the site experiences.

Jagdish Prajapat
Jagdish Prajapat
Jagdish Prajapat is a professional digital Marketer & SEO expert with the idea of digitizing the way brand talk to people. I breathe digital, think digital and talk digital.

Leave a Reply

Your email address will not be published. Required fields are marked *