Analytics Market is a one-stop resource for anything you can do with Google Analytics. Tips and tricks, product news, API tools, and more!
Analytics Market Chart

Why Don't Log Files Match Google Analytics?

Think of it this way. Google Analytics is an apple. Log files are oranges.

It is not uncommon for Google Analytics users to run it in tandem with another tool to audit the reports or test its accuracy. Often they compare it to other web analytics tools, but sometimes it gets compared to tools that were never meant for web analysis.

This article will focus on one common question: Why doesn't Google Analytics line up with my weblogs?

Why Doesn't Google Analytics Line Up with My Weblogs?

It's not uncommon for companies or technical individuals to compare Google Analytics to their log files either as a casual audit of the data or because they suspect that something is wrong with the Google Analytics reports. When they do, they usually find that there are discrepancies (sometimes huge) between the two reports.

The reasons for discrepancies depend on which metrics the user is comparing as well as the method they are using to analyze the log files.

At a high level, the simplest explanation is that Google Analytics uses client-side code to gather information, whereas most log files contain only server-side information. This is an important distinction.

Visits

It should be recognized that Google Analytics (and every other mainstream web analytics tool) rarely tracks 100% of your visitors. The reports should be considered like a survey. There are a variety of reasons for this, ranging from browsers that block JavaScript to deleted cookies. This unintentional sample rate is often still above 90%.

The other side to this coin is that Google Analytics does not track spiders and bots. Log files, however, record every time a file is requested, regardless of who requests it.

This combination of variables results in conflicting visit reports from Google Analytics and log files. Google Analytics reports tend to be more consistent.

Pageviews

When a browser loads a page from a website, hitting the back button (and sometimes the refresh button) will simply reload a cached version of the page from memory. No new request goes to the server, so it doesn't get logged.

Google Analytics, however, records even cached pageviews because the tracking code is forced to execute every time a page is displayed.

Because of these variables, the number of total visits and pageviews will never line up between GA and log files.

Visitors

Google Analytics uses cookies to tell one visitor from another. The cookies hold a unique visitor ID so that Google Analytics can tell whether a visitor has been to the site before. This also allows it to tie pageviews throughout a visit to a single visitor.

When using log files to analyze traffic, however, often the most reliable method is to use a visitor's IP address, or a combination of the visitor's IP address with their user agent. This is unreliable, since two visitors can share an IP and user agent. It also makes it more difficult to determine whether a visitor has been to the site before.

As a result, unique visitor numbers in Google Analytics tend to be much more reliable than assumptions gleaned from log files.

Traffic Sources

Log files can determine the referring site for a visit most of the time. It is stored with the first page of their visit. However, it can't differentiate between paid advertisements and free referrals. Nor can it store any information about the advertisement itself. And because it may have difficulty reliably stringing together which pages belong to which visit, it can't reliably attribute actions during a visit to a traffic source.

Google Analytics uses cookies to store the traffic source information. In addition to automatically labeling search engine traffic as separate from referrals, it also allows users to pass several levels of custom campaign information. All of this is stored in the cookies, which persist throughout a visit and in subsequent visits until they are overwritten by a different advertisement.

Google Analytics Never Matches Log Files

These are some of the most common reasons that Google Analytics reports don't match up with log file reports. In some ways, Google Analytics is a more accurate solution, because it handles things more reliably and consistently even if it misses some visits.

Log files and server-side solutions work well as click counters, but not for analysis. There are too many inherent inconsistencies with how things get recorded. That's because log files were never meant for anything other than IT reports.

Comments

Big differences

My logfile-based page views, after filtering out bots, are a factor of anywhere from 3 to 50 times higher than what Google Analytics is reporting.