Fishing with Squid

by Suborbital  (suboorbital@gmail.com)

Squid (www.squid-cache.org) is an open-source proxy server that can be installed on any operating system.

The configuration file is imposing, to say the least, but only because it contains basically the entire documentation for Squid.

Lines of default configuration file: 4984

Lines actually in use in my config file: 45

The Squid instance described in this article was installed under the MacPorts package on OS X 10.6.something (although I have set it up under Windows XP, too).

I started out with the intention of blocking advertising on iPad applications.

Normally, you could use something like the Firefox add-on "AdBlock Pro," but on an iPad, ads turn up all over the place, not just in web browsers (the Atomic Web Browser has ad blocking, but I was interested in things like ads in the BBC app).

Fortunately, for a given wireless server, you can manually define a proxy, and so I duly set this to my MacBook, IP address 192.168.0.9, running Squid on the default port, 3128.

Squid was set up to allow proxying access to anything on the local (i.e., home) network, with the line:

acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network

And, most importantly, to log the terms in GET requests, with the line:

strip_query_terms off

As an example, the request http://www.google.com/search?q=2600 will be logged in its entirety, instead of just http://www.google.com/search?.

POST requests are not handleable in the same way, but to examine the content of POST requests, you could probably redirect all traffic (at least temporarily) to a custom script whose only function was to enumerate POST request variables and their values.

Secure requests (HTTPS requests, usually to port 443) are encrypted and also not available.  On the whole, this is a good thing, as every request to apple.com was made via HTTPS, including some which look quite advertisement-seeking, such as:

1293720754.249 2663 192.168.0.10 TCP_MISS/200 1512 CONNECT iadsdk.apple.com:443 - DIRECT/216.236.237.207 -

The fields here (the Squid default) being the timestamp, time to serve, requesting IP (i.e., the iPad), cache result (i.e., not found in cache), size of result (bytes), method (e.g., GET, CONNECT), URL address:port, the "hierarchy code" (RFC 931), peer status/peer host (i.e., how and where data was returned from), and returned data (MIME) type (" - " here, since it was not logged, but, e.g., "image/jpeg").

So, the ads being served through various apps were fairly easy to pick up, although there was one false positive (tapjoyads.com, used to authenticate purchases; the WolframAlpha app does the same).  The ad servers that I saw in the Squid access.log (which logs every request passing through Squid along with whether it was served from the Squid cache, a primary use of Squid ) were added to a blacklist file.

This was included in the Squid config file with the lines:

include /opt/local/etc/squid/blacklist.txt 
http_access deny BlackList

The blacklist.txt file contained a list of the servers to block, each one a regular expression, albeit trivial ones, like:

acl BlackList url_regex -i google-analytics.com
acl BlackList url_regex -i googlesyndication.com
acl BlackList url_regex -i doubleclick.net
acl BlackList url_regex -i admob.com
acl BlackList url_regex -i ads.mp.mydas.mobi
acl BlackList url_regex -i google_custom_search_watermark.gif
acl BlackList url_regex -i greystripe.com
...

The other servers currently in my blacklist are:

iphone.playhaven.com
m.pinger.com
ads.pinger.com
serve.vdopia.com
www.fluik.com
www.jampaq.com
www.myprivatebrowserapp.com
analytics.medu.com
cloudfront.net
adwhirl.com
medialytics.com
imrworldwide.com
2mdn.net

Not all of these servers are ad servers per se, but some provide tracking of various kinds (e.g., google-analytics.com) and so were denied too.

The cloudfront.net servers are used to provide content hosted on Amazon's cloud services and could conceivably serve up useful content, and so this regex might need some refining, but in all of the cases I saw, they were being used for ads.  Seen in the logs but missing from this list was the server tapjoyads.com, used by the Doodle Buddy app, a free drawing application which contains themed sets of stencils, backgrounds, and stamps, to check for purchased sets (you get one free); it also contains banner ads, but these were served by greystripe.com.

Note to Developers:  Please don't use servers with the term ads.com in them for serving legitimate content.  It's disingenuous.

As another example, the BBC News app ads were served by ad.mo.doubleclick.net.  All easily dealt with using the above blacklist; from their frequency, it appears that either Greystripe, DoubleClick, or AdMob are serving ads from the iAd system (Apple's in-app ad server), or perhaps more than one of these.

Of note is www.myprivatebrowser.com.

This free web browser promises "A simple web browser built for the iPad that removes all your web browser cookies and history when you open and close the browser."  Not all that secure, but better than nothing, right?  Well, when you open it, the default (unchangeable) home page is a custom Google search form, which immediately runs off and requests www.myprivatebrowserapp.com/app/big.gif.  Nice statistics gathering, Cooply Apps!  Welcome to the blacklist!

So, ads come from all over the place (including the usual suspects), and (at least at home) you can set up a proxy to deal with them.

What other strange requests are going out over the airwaves from your iDevice?  Only your Unique Device Identifier (UDID).  Only to ad servers (well, not only).

Requests were made to the following servers which passed my iPad's UDID in GET requests:

ads2.greystripe.com
adsx.greystripe.com
mayhem.eamobile.com
serve.vdopia.com/adserver/...
ws.tapjoyads.com

Gah!  Well, tapjoyads.com, checking what in-app add-ons I'd purchased... O.K.  EA Games (eamobile.com), seemingly informing them of in-game achievements... O.K.  But Greystripe?

WTF?  And here's an interesting one (line breaks inserted before each GET variable; X's added for anonymity):

http://ads.mp.mydas.mobi/getAd.php5?sdkapid=18754
&auid=b4585XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX23463
&mmisdk=3.5.8-10.6.29.i
&ua=iPad%204.2.1
&age=31
&vendor=adwhirl
&lat=0.000000
&zip=
&long=0.000000
&adtype=MMBannerAdBottom&hswd=728
&hsht=90
&accelerometer=true

Here we have a request to an ad server which uniquely identifies my iPad, passes my age (well, that's not mine, but perhaps I entered this one somewhere?), the version of my iPad's OS, whether I have an accelerometer in my device (or whether it's on?), and, although not used, my latitude and longitude?

If this were a useful app that happened to start up with the request "App XXX would like to use your current location," perhaps those might have been passed on to the ad company.

If anyone can find such an example, please write in.  All in all, it was no surprise that, in the middle of this project, a story appeared on the BBC News app (ha!) about a class action against Apple for allowing personally identifying data (i.e., the UDID) to be shared unnecessarily and without users' consent.

It's 2011.  Do you know where your ads are coming from?  The converse might just be true.

Return to $2600 Index