Null-Routing Facebook - Using Small Tech to Fight Big Tech

by aestetix

I really hate Facebook.

Part of this hatred is a general dislike of "social" media websites which pollute and destroy civil discourse with haphazard policies and elusive "algorithms," but Facebook takes the evil to the next level.  In this article, we'll explore how they do this, and what you can do to fight back.

This evil began when Facebook got into the business of mass surveillance, starting with a website widget.  According to them, we could add "one line of JavaScript" to our website, and it would magically enable people to like and share things like blog entries that we had written.  This later expanded into other technologies, such as single sign-on, where we could enable people to use their Facebook accounts to "log in" to our website and use our services.

But there's something that Facebook didn't mention.

Every time a web browser makes a request to a website, it requests all the resources on the page, such as images, CSS, and so on.  Including that "one line of JavaScript,' which makes a request to a Facebook URL and downloads JavaScript that enables the promoted functionality.  And every time our web browser makes a request to Facebook, it creates a log entry on Facebook's servers with all sorts of information, such as our User-Agent and our IP address.  For every website we visit that has this functionality enabled, Facebook can track us, even if we don't have a Facebook account.

This is probably how Facebook collected the data that became "shadow profiles."

The public first learned about these through a public information request by Max Schrems in 2011 (details at europe-v-facebook.org).

These shadow profiles - secret dossiers about people's Internet browsing activities compiled without their knowledge or consent - have been the source of a lot of controversy, even coming up in Congressional and Parliamentary questioning, although Facebook refuses to address any concerns.

While the Internet was designed to route around censorship, it was also designed to route around surveillance.

When the web browser requests an asset, such as a JavaScript file, it has to perform a domain name resolution to a Domain Name Service (DNS) because computers don't really read domain names.  When we type "facebook.com" into the browser, the browser will look in our local DNS cache to find the IP address that matches facebook.com.  If there isn't one cached, it will make a request to a DNS server with the domain, and the server will reply with a corresponding IP address.  The browser will then put this in its cache and use the IP address to access the website.

In recent years, as websites filled up with annoying and distracting ads, people have started using ad blockers to prevent the browser from displaying the ads.

As of this writing, the problem is so bad that it's effectively unsafe to look at most websites without using an ad blocker.

And for those of us who don't want to rely on a browser-based solution, we can use Pi-hole (www.pi-hole.net).  Designed to run on a Raspberry Pi, Pi-hole is a self-hosted standalone "ad blocker" that runs at the DNS level, ensuring that requests to known bad websites don't even resolve.

This also means that the requests never make it to the servers, so the bad websites can't track us.

Pi-hole has a great feature: it lets us whitelist and blacklist sites.

Although we can do this on our local computer by modifying the hosts file (for example, setting facebook.com to point to localhost), the hosts file doesn't support wildcards.

Pi-hole, using DNSmasq as a back-end, does.  And we can use this feature to blacklist every URL with a hostname relating to Facebook, allowing our system to null-route all such requests, making ourselves effectively invisible to Facebook's all-seeing eyes.

Using some simple regular expressions, we can block out a multitude of Facebook URLs, as well as their Content Delivery Networks (CDN).

In my list, I'm also blocking out all Instagram URLs because they are owned by Facebook, as well as Twitter because, honestly, there are very few good reasons to ever look at Twitter.  You can put whatever websites you want in here, and metaphorically tell those companies to go steal someone else's usage data.

Here is the list I use:

(^|\.)facebook\.com$
(^|\.)facebook\.net$
(^|\.)fbcdn\.net$
(^|\.)fna\.fbcdn\.net$
(^|\.)ftxl1-1\.fna\.fbcdn\.net$
(^|\.)instagram\.com$
(^|\.)tfbnw\.net$
(^|\.)twitter\.com$
(^|\.)xx\.fbcdn\.net$

Once we've set up Pi-hole, the last step is to get our system to use it as our default DNS.

On Linux systems, this is usually the /etc/resolv.conf file.

The easiest way is to add a line such as:

nameserver 192.168.0.2

above our automatically assigned nameserver, where 192.168.0.2 is the IP address of our Pi-hole server.

If you have the means, I recommend installing Pi-hole on a cloud VPS, because then you can block Facebook no matter where your computer is.

It's worth noting that some systems, such as Ubuntu, automatically generate the resolv.conf files, so you should probably figure out how the system makes it (if it does), and modify the template files so you don't have to re-add the line every time you connect to a new network.

A few more technical notes.

First, disable logging.

Pi-hole logs how many requests it blocks and has pretty graphs, but if you don't care about that, it can slow the system down and waste disk space.

I also turn off the webserver:

$ sudo service lighttpd stop

because I don't need to look at the graphs.

Second, if you visit a lot of sites with Facebook embeds, your DNS resolution might take longer and longer over time due to DNS caching, causing all your web browsing to be fairly slow.

You can verify if Pi-hole is causing slowness by running the command from your local shell:

$ dig facebook.com

If the response is an instant 0.0.0.0, Pi-hole is working as expected.  But if the response takes forever to resolve, then it is probably overloaded.

To fix this, log into the server, and run:
$ sudo pihole restartdns

This will clear the Pi-hole logs and make your system run smoothly again.  It's a little annoying, but it's a small price to pay for privacy.

There is a major technology war afoot, and there are big questions about whether we even own our own data.

As someone who believes strongly that an individual's right to privacy overrides a corporation's desire to sell that privacy to the highest bidder, I think it is important that, when we are able, we should use technology to ensure that our data cannot be bought, especially without our consent.

After all, what these corporations do not have can only make us stronger.

Return to $2600 Index