Hacking Google Analytics

by Minishark

Introduction

Web sites have been tracking users since the very beginning of the web.

In recent years, methods for tracking users have matured considerably.  No longer are site owners limited to simple hit counters; they now can know where users came from, which pages were visited, how long they were viewed, and hundreds of other metrics.  The most popular tool that can collect this data is Google Analytics.1

Now, web analytics can be a very useful marketing tool for running any site.

However, my concern is that site owners are too quick to trust all their analytics data to the hands of third-parties (in this case Google).

The Google Analytics privacy policy states that it does not collect "personally identifiable information" about users.  However, Google does not clearly define what constitutes personally identifiable information.  We already know that other Google services log users' IP addresses, and Google Analytics is no exception.

While your IP address isn't necessarily personally identifiable, in many cases it's still uniquely identifiable.  Google now not only has information about your habits on their sites, but potentially on the thousands of other sites that use Google Analytics as well.

Google promises that they aren't doing anything fishy with all this data about you, but you may not be willing to risk taking their word for it.  They're still not above the law, and recent cases have shown they have few qualms about turning over user data to the government if they're subpoenaed.2

Additionally, studies have shown that you can uniquely identify the majority of people based solely on a few pieces of "anonymous" demographic/geographic data.3

How Google Analytics Works

Google Analytics uses JavaScript and cookies to track users.

Users place the following snippet of JavaScript code on each page of their site that they wish to track (it's usually placed at the bottom of the page):

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try{
    var pageTracker = _gat._getTracker("UA-xxxxxx-x");
    pageTracker._trackPageview();
} catch(err) {}
</script>

The first <script> block references a file named ga.js from Google's servers (either ssl.google-analytics.com/ga.js or www.google-analytics.com/ga.js).  This is the main Google Analytics tracking code source.

In the next <script> block, the code instantiates a Google Analytics tracking object by calling the _gat._getTracker("UA-xxxxxx-x") function, which is defined in ga.js.

It takes UA-xxxxxx-x, the site administrator's unique Google Analytics account number, as a parameter.

The next line, pageTracker._trackPageview(), uses this tracking object to register a page view.

This is where the interesting things happen.  First, it checks a number of cookies, and sets or updates them as necessary:

__utma  - A persistent cookie that expires after 2 years.  It contains: a web site (domain) hash, a visitor hash, timestamp of the first visit, timestamp of the last visit, timestamp of the current visit, and the count of total visits for this user.  They are separated by periods, e.g. 247248150.1037924604.1252115649.1252432081.1252444069.1

__utmb, __utmc  - These temporary (i.e. session) cookies are used to determine the length of a visit.  __utmb contains a timestamp of the first pageview, and __utmc contains a timestamp of the last pageview.

__utmz  - This cookie, which expires after six months, keeps track of where the user came from (it does this by looking at the Referer HTTP header).  There are a number of pipe-separated fields containing this information, most notably: utmcsr (source - the site they came from), utmccn (campaign - the ad campaign or seo campaign the referring link belongs to), and utmcmd (medium - e.g. referral, organic search, paid search).  The whole thing might look something like this: 247248150.1252444069.11.10.utmcsr=www.google.com|utmccn=(none)|utmcmd=organic

Once these cookies are set, data is then actually sent to Google Analytics.  The tracking code makes an HTTP GET request for a 1x1 pixel, transparent GIF image located on Google's servers.

This image is named: __utm.gif

The __utma, __utmb, __utmc cookie are appended to this GET request as query string parameters, along with a other info such as browser type, screen resolution, language, etc.

You can view this GET request as it happens using tools such as the Live HTTP Headers extension for Firefox.4

Google picks up all this data on their end, and processes it to generate the Google Analytics reports.

How to be Invisible to Google Analytics

Google Analytics requires both JavaScript and cookies in order to track you.

You can prevent the JavaScript from ever being run by either turning JavaScript off in your browser settings, or by using an extension such as NoScript5 for Firefox, which can be configured to selectively block the ga.js file.  If the JavaScript never runs, then no cookies will ever be set, and no data will ever be sent to Google.

Another method is to disable cookies in your browser.

Keep in mind that Google Analytics uses first-party cookies, so simply blocking third-party cookies (as some browsers do by default) will not work.  When only disabling cookies, the tracking code will still run, and data will still be sent to Google.

However, there will be no cookie data appended to the __utm.gif GET request, and Google will simply disregard this data on its end.

These techniques will work for any analytics software that uses JavaScript and cookies to track users.

Another method for tracking users is called IP + User-Agent tracking, which uses your IP address and the browser's User-Agent to uniquely identify a visitor by parsing web server log files.

This method is less accurate than JavaScript/cookie tracking (for instance, many people have dynamic IPs), but it's still fairly popular.  Since this is done on the server side, you can't stop it from tracking you altogether, but you can use something like Tor6 to at least prevent it from uniquely identifying you.

How to Exploit Google Analytics for Fun

As you've seen, everything Google Analytics collects about you is done in plain text on the client's browser.

This means it's fairly trivial to send whatever bogus information you want to Google Analytics.

For example, using something like the Web Developer Toolbar7, you can change the values of the Google Analytics cookies.

Try changing the __utma visit count to 1 million.  Or you could change __utmz cookie source information to something like this: utmcsr=www.fbi.gov|utmccn=(referral)|utmcmd=referral

They'll be left scratching their heads wondering why the FBI is linking to their site.

You can also create your own page with the Google Analytics tracking code.

By design, Google Analytics will accept traffic from any domain, not just the one associated with the owner's account - all you need is their UA-xxxxxx-x number (which is right there on their site).  Then put the pageTracker._trackPageview() function in a loop to artificially inflate their pageview count.

The best part about all this is that site owners cannot remove data from their Google Analytics account once it's there.

Filters can be manually set up to exclude certain data, but they do not work retroactively.  Therefore, unless they had enough foresight to set up the filters initially (which most people don't), they'll be stuck with whatever bogus data you sent them.

Oh, the benefits of giving up your data to Google!

References

  1. www.google.com/analytics
  2. wikileaks.org/wiki/Gmail_may_hand_over_IP_addresses_of_journalists
  3. arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/
  4. addons.mozilla.org/en-US/firefox/addon/http-header-live/
  5. noscript.net
  6. www.torproject.org
  7. chrispederick.com/work/web-developer
Return to $2600 Index