Analytics Market Chart

How Google Analytics Works

Understanding Google Analytics reports and capabilities requires an understanding of the basic principles. Knowing what data Google Analytics can capture and how it interprets it is key to making sense of the software.

At its most basic level, Google Analytics consists of

  1. JavaScript code on each page of a website,
  2. A data collection service on Google's servers
  3. A processing engine that creates report data

1. Google Analytics JavaScript Code

When a visitor arrives at a page with tracking code, the code is executed by his browser. It collects information about the visitor's browser and computer settings, like screen resolution, operating system, etc. The script's visibility is pretty limited. It can typically only see what it's told.

The script then sets a few cookies containing some basic visit information. These cookies determine whether it is a new or returning visitor, among other things.

2. Google Analytics Data Collection Service

Next, all of this information must be sent to the Google Analytics servers so it can be processed. The GA tracking code sends the information by requesting a very small file, named __utm.gif. It appends all the cookie data and information it just collected to the query string for __utm.gif. This way, Google's servers have a record of when a file was requested and all of the visitor information about that pageview.

Many organizations store a copy of every tracking request sent to the data collection service - this is accomplished with the setLocalRemoteServerMode(); function in ga.js. Once you have a local copy of Google Analytics tracking requests, you can process them with Angelfish Software and turn them into interactive reports.

***Update: Universal Analytics tracking code makes a REST request to /collect instead of a GET for __utm.gif

3. Processing Visit Information

In the last step, Google Analytics processes all of the __utm.gif requests, applies filters and config settings, and makes the data available to your account.

Visit data is typically processed every few hours, although this frequency has been increasing over time. Google rolled out a set of Real Time data reports in 2011 although the data available in these reports is not comprehensive.

Pros and cons to Google Analytics

There are a few important points to consider with this approach.

__utm.gif is Critical

Any visits to your site that don't execute the JavaScript file won't get counted. If the code doesn't run, then __utm.gif never gets requested from the GA servers and Google Analytics will never know about the visit. Likewise, if you take the code off your site or misconfigure it so that it's not working properly, the visits during that period won't be counted.

Track Cached Pages

Google Analytics will track visits to a page even if it's been cached (i.e. stored in your browser's memory). Each individual __utm.gif request contains unique information, which means it won't be served by the browser cache. A new request for the file will be made every time a page is viewed, including when a visitor refreshes the page or hits the back button.

Cookie Manipulation

If a visitor deletes the Google Analytics cookies, s/he will be seen as a new visitor in the next visit, and all information from previous visits will be lost.

This also means that multiple users on a computer will be seen as the same visitor. Also, a visitor using two computers will be seen as two different visitors.

This is one of many reasons that web analytics reports ought to be viewed as a survey sample and not as concrete fact.

No Reprocessing

It's important to remember that Google Analytics data is processed remotely. Users don't control when it's processed. That means that once data is in the account, it's there for good. Mistakes in historical data can't be reprocessed.

To avoid mixing good data with bad, we recommend creating a duplicate profile to use as a sandbox. Apply filters to the sandbox to see what impact they will have before applying them to your production profile.

Google Analytics on Different Servers

All of the Google Analytics code is client-side, so it doesn't matter where the website is hosted. Different browsers might treat the code a little bit differently, but as long as the HTML references the JavaScript code correctly, the code never has to interact with the server.

****Update: The Google Analytics Measurement Protocol is a framework for a Do-It-Yourself Google Analytics client, and is not browser dependent. Angelfish Software uses the Measurement Protocol to upload site errors, file downloads, and stolen bandwidth reports to Google Analytics.