Transmissions

by Dragorn

What does a guy have to do to not get noticed around here?

You are no longer a shadowy figure on the Internet.

A dynamic IP will not increase your mystique.  Your nested anonymity routing system will not hide who you really are.  The Internet Santa Claus knows if you've been Googling for naughty or nice, and he knows you haven't been sleeping at 3 am.  Internet Santa is coming to your town to make sure you've seen every possible advertisement for the latest TV show, gadget, or method for enlarging your nether regions for the holiday season, and Santa needs to get paid.

There are hundreds or thousands of bits of information about where you are, what you buy, and what ads you've watched (and what ones you've skipped), what books you read, what search terms you look for, and what sort of email you get.  Each piece of information is of limited value until someone links them together - suddenly the disparate fragments of your behavior become a single record set revealing more about your habits and interests than you might think (or want).

The first ghost, that of privacy past, takes us back to 2006 when AOL released a large database of anonymized search data for public research: within days, several groups had associated the search terms of the users to build profiles of users, even multiple users of the same system, and in some cases it was enough to track down individuals to real-world names and addresses.

Despite quickly realizing their error and removing the search data, it had obviously spread too far to contain and is still available.  Let's look at this again: After removing all user-identifiable information from the logs and hashing users down to a single number, it was still possible track down someone in the real world.

The second ghost, that of privacy present, shows us what can happen when companies share data.  Monitoring the browsing habits of millions of users is trivial when those users volunteer their information, likes, dislikes, and friends.  Social networks have often been considered a major privacy risk, but the risks are directly tied to the information that the user is willing to share.

In November 2007, Facebook partnered with several companies to share behavior and purchasing data from other sites.  The "Beacon" feature links a user's Facebook identity with their behavior on other sites by allowing access to the Facebook information.

Multiple commercial sites, such as Overstock, Fandango, and The New York Times review sites link to the Beacon system and aggregate purchase information with a user profile.  The most public outcry is due to this information being displayed to other users viewing the Facebook entry, but no matter how (or not) the information is displayed, the behavior has been recorded and correlated.

The biggest privacy invader of modern systems is the web browser.  Browsers are large, complex pieces of code which handle untrusted (and frequently hostile) data from anonymous network sources.  Excluding vulnerabilities and exploits to the browser code itself, modern sites are attempting to turn a stateless unauthenticated system into a stateful, strongly authenticated system to refer to dynamic data.

Browsing leaves a continual detritus of cookies and session data linking who you were with where you are now.  The browser is a constant across changing IP addresses: Who you were the last time is who you are now, regardless of how you got there.

Our greatest convenience is our greatest downfall, as is often the case with security.  "Remember me" is the most innocuous and obvious of the risks - ad services each place a tracking cookie which can monitor your movement across multiple websites.

The most obvious, but by no means the only one, is Google Analytics.  Google achieved deep penetration by including useful, free, and (to the average user) non-obtrusive tools.  Website maintainers include a bit of JavaScript, and get a wealth of useful information about visitors.  Estimates of coverage are hard to find, but it is pervasive.

The downside?  Every site which contains an Analytics entry updates the bread crumb trail, building a model of who you are and where you go.  Privacy networks such as Tor can protect traffic and origin, but can't prevent an application on your system happily updating the bread crumb trail.

Sure, the majority of these services are anonymized so that no directly identifiable information is returned.  However, a look to the past shows that obfuscated information may not be enough to prevent identifying information from leaking, and the services you use may be actively working against your privacy interests: Providing advertising data is a lucrative business model.

Finally we come to the specter of privacy future, traditionally the most frightening of the trio and in this story no less so.

"So what," you may ask, "I don't care if they want to send me ads, I block pop-ups, and what's wrong with getting ads for products I might actually care about?"  Absolutely nothing.  But once that data modeling your behavior, inclinations, and opinions exists, it is there forever, simply a subpoena away from the next witch hunt for whatever are considered the latest unpatriotic activities.

In 2006, the U.S. government launched a subpoena process for search data from the major search providers: Google, Yahoo!, AOL, and Microsoft.  Of the four, only Google fought the request.  While the request was only for search terms, with absolutely no user-identifying information (even the one-way hash AOL used to link queries by the same user in the previously released data), it shows that the courts are aware of the availability of this information.

In June 2007, federal prosecutors attempted to force Amazon to disclose customers who had purchased books from a specific seller.  The case centered around tax evasion on the part of the seller, however it served as an additional harbinger of attempts to use online tracking data well beyond the presentation of advertisements, and the judge who ruled in favor of Amazon in November agreed, calling it "troubling because it permits the government to peek into the reading habits of specific individuals without their prior knowledge or permission."

How do we prevent this future from happening to us?

Unfortunately it's not going to be as easy as buying the biggest turkey in the store window (and that's where I'll end the holiday metaphors).

Browsers have begun to add privacy-enhancing features: Firefox can automatically clear the cookies, cache, and browsing history on exit, for example.  However, these measures won't help against tracking within a single browser session, and a significant model of behavior can still be built.

Disabling all tracking functionality in the browser by turning off cookies, JavaScript, Java, and Flash will prevent tracking by anything but IP address and HTTP referrers, but will render many sites unusable.

Some mitigation can also be found by using tools such as Greasemonkey or AdblockPlus to filter the URLs which provide the tracking information: www.google-analytics.com and ssl.google-analytics.com are easily blocked, but affect only tracking by Analytics and not other sites.

There is likely no silver bullet besides vigilance: Be vocal, hold the services which hold your personal information to the commitments in their privacy agreements, and avoid dealing with those who don't or who have poor privacy policies.  Opt-out of information sharing whenever possible, and complain when it isn't made possible.

Happy browsing to all, and to all a good night.

Return to $2600 Index