How Google Analytics Works
Understanding Google Analytics reports and capabilities requires an understanding of the basic principles. Knowing what data Google Analytics can capture and how it interprets it is key to making sense of the software.
At its most basic level, Google Analytics consists of
- A data collection service on Google's servers
- A processing engine that creates report data
When a visitor arrives at a page with tracking code, the code is executed by his browser. It collects information about the visitor's browser and computer settings, like screen resolution, operating system, etc. The script's visibility is pretty limited. It can typically only see what it's told.
The script then sets a few cookies containing some basic visit information. These cookies determine whether it is a new or returning visitor, among other things.
2. Google Analytics Data Collection Service
Next, all of this information must be sent to the Google Analytics servers so it can be processed. The GA tracking code sends the information by requesting a very small file, named __utm.gif. It appends all the cookie data and information it just collected to the query string for __utm.gif. This way, Google's servers have a record of when a file was requested and all of the visitor information about that pageview.
Many organizations store a copy of every tracking request sent to the data collection service - this is accomplished with the setLocalRemoteServerMode(); function in ga.js. Once you have a local copy of Google Analytics tracking requests, you can process them with Angelfish Software and turn them into interactive reports.
***Update: Universal Analytics tracking code makes a REST request to /collect instead of a GET for __utm.gif
3. Processing Visit Information
In the last step, Google Analytics processes all of the __utm.gif requests, applies filters and config settings, and makes the data available to your account.
Visit data is typically processed every few hours, although this frequency has been increasing over time. Google rolled out a set of Real Time data reports in 2011 although the data available in these reports is not comprehensive.
Pros and cons to Google Analytics
There are a few important points to consider with this approach.
__utm.gif is Critical
Track Cached Pages
Google Analytics will track visits to a page even if it's been cached (i.e. stored in your browser's memory). Each individual __utm.gif request contains unique information, which means it won't be served by the browser cache. A new request for the file will be made every time a page is viewed, including when a visitor refreshes the page or hits the back button.
If a visitor deletes the Google Analytics cookies, s/he will be seen as a new visitor in the next visit, and all information from previous visits will be lost.
This also means that multiple users on a computer will be seen as the same visitor. Also, a visitor using two computers will be seen as two different visitors.
This is one of many reasons that web analytics reports ought to be viewed as a survey sample and not as concrete fact.
It's important to remember that Google Analytics data is processed remotely. Users don't control when it's processed. That means that once data is in the account, it's there for good. Mistakes in historical data can't be reprocessed.
To avoid mixing good data with bad, we recommend creating a duplicate profile to use as a sandbox. Apply filters to the sandbox to see what impact they will have before applying them to your production profile.
Google Analytics on Different Servers
****Update: The Google Analytics Measurement Protocol is a framework for a Do-It-Yourself Google Analytics client, and is not browser dependent. Angelfish Software uses the Measurement Protocol to upload site errors, file downloads, and stolen bandwidth reports to Google Analytics.