Procedural Worlds, Statistical Analysis, Image Processing, and PRNG Exploitation for the Lulz - or Why IMDb Got a CAPTCHA

by sam

In February 2005, the Gay Nigger Association of America (GNAA) devised a cunning plan to troll IMDb users using various fancy hacks.  This is what happened.

The Plan

It was suggested on the #gnaa IRC channel that the movie Gayniggers from Outer Space, from which the organization takes it name, be upvoted to the IMDb Top 250 as an emotional tribute to this cult movie.  The GNAA, not being 4chan, did not have an army of idiots to carry out their deeds; they had to use skills and technology instead.

The first attempt was simple: everyone voted for Gayniggers from Outer Space, and asked people they knew to vote as well.  It went slowly.  In order to vote several times, a person had to go through a heavy process: only registered users can vote on IMDb, and a valid email address is required in order to register an account.  Manual account creation was slow.  The GNAA therefore decided to automate the IMDb account creation.

Creating a Procedural World of People

The following observations and guesses were made about the IMDb voting process:

  • For the Top 250, only votes from "regular voters" were considered.  This probably meant that in order to have an impact on the vote, they needed to:
  • A "weighting system" was applied to the votes, which probably included disfavoring votes from the same IP address, so they needed to use as many different IPs as possible.
  • Multiple email addresses from the same domain were more likely to attract attention, so using as many different domains deduce which other accounts were created using this process.
  • New users needed to fill out a form with their gender, birth year, country, postal code, etc.  Randomizing this information would reduce the odds of being detected through statistical analysis.
  • So the GNAA wrote an account creation library that, given a random seed, would create a unique identity comprising of:
    • Full name, using data from the most common female and male names, as well as surnames in the U.S.
    • Email address, using the full name combined with a variety of free email providers such as spam.la, mailinator.com, fastmail.us...
    • Gender, country, year of birth, postal code.
    • A preferred password for use on websites.

Generated identities would then look like this:

SEED, FULL NAME, GENDER, EMAIL ADDRESS, PASSWORD
3480, Tracy Gilbert, F, TracyGilbert@spamhole.com, 26ACTR41
3481, Rene Reid, M, Rene_Reid@runbox.com, Re96RE14
3482, Sandra Silva, F, SANDRA63@swiftmail.com, UA75ED11
3483, Terrence Bowman M, terrencebowman@spamhole.com, en29TETE
3484, Ian Wade, M, WADE5946@poboxed.com, 59DE28WA
3485, Barbara Burke F, barbara_burke@spam.la, rb86BA13

People taking part in the operation would then be responsible for a seed range.  For instance, Gary would run a script with seeds 1400 to 1499 for several days.  But if Gary became busy, someone else could run the script with the same seed range and continue where he had left off.  There was no need to create a central database because all of the identity information was generated procedurally.

Operation imdbtroll

The GNAA combined the identity creation library with additional anonymizing features such as a regularly updated list of public HTTP proxies (Tor was barely usable back in 2005), and web user agent randomization.  The imdbtroll.py script was created.

People on IRC started running the script with a seed range assigned to them.  The script went through several iterations, but the final version worked roughly as follows:

  1. Choose a seed from the provided range, and create the corresponding identity.  For instance, seed 9432: John Blackman (john_blackman@runbox.com).
  2. Check whether the identity's email address is activated, by logging in if necessary.  For instance, a spam.la account didn't require any subscription.  But a mailinator.com account did.
    • If the email address is not active, register an account at the email provider.

  3. Check whether the IMDb account is present, by logging in if necessary.
    • If the IMDb account is not present but there is a confirmation email in the mailbox, activate the account using that e-mail.
    • If the IMDb account is not present and there is no email, create an IMDb account and wait for a confirmation email in the mailbox.

  4. Log in to IMDb.
  5. Vote for movies from IMDb's Top 250, from the bottom 100, or using its built-in search engine; random search words included "troll," "communists," or "nazis."
  6. Vote for Gayniggers from Outer Space, giving that movie 8, 9, or 10 stars.
  7. Vote for other movies some more, so as not to show an obvious pattern.

The script also tried hard to simulate a real human using a real web browser, pausing between pages, using valid Referer information, clicking on links, sometimes not even voting for Gayniggers from Outer Space...

It worked well.  The weighted average vote for Gayniggers from Outer Space went from 5.9 stars to 8.7.

Feb. 25.9/10
Feb. 37.5/10
Feb. 48.7/10

And here are the voting details:

Feb. 2Feb. 3Feb. 4
10 605 (68.0%) 1391 (81.2%) 2913 (81.8%)
9 26 (2.9%) 60 (3.5%) 224 (6.3%)
8 24 (2.7%) 25 (1.5%) 85 (2.4%)
7 28 (3.1%) 28 (1.6%) 55 (1.5%)
6 28 (3.1%) 29 (1.7%) 51 (1.4%)
5 3 (3.7%) 33 (1.9%) 51 (1.4%)
4 18 (2.0%) 18 (1.1%) 37 (1.0%)
3 27 (3.0%) 27 (1.6%) 35 (1.0%)
2 0 (3.4%) 30 (1.8%) 37 (1.0%)
1 71 (8.0%) 71 (4.1%) 72 (2.0%)

Bantown Trolls the GNAA

On February 4th, Bantown, a rival trolling group, got ahold of the GNAA's script by lurking on the IRC channel and using powerful hacker tools such as GNU Wget to retrieve the publicly posted script updates.

Bantown started running imdbtroll.py, too, with their own secret seed ranges.  They just made one single modification to it: instead of giving Gayniggers from Outer Space ten stars, they were giving it one star.

A race had begun.  It was obvious that Bantown was running more instances of the script than the GNAA, so that they could completely cancel the GNAA's efforts.  One solution was to run even more instances than Bantown, but a weapon escalation could only mean the eventual detection of unusual behavior by IMDb admins.

But the GNAA had a secret weapon: a logic bomb hidden in plain sight, right inside imdbtroll.py.

The GNAA Trolls Bantown Back

The library used for IMDb access had a lot of features, including changing a user's password.  It was not used by imdbtroll.py, but it was fully functional.  The GNAA therefore created a new script, fuckbantown.py, which did the following:

  • Create a new identity from a random seed.
  • Log into IMDb using the identity.
  • Change the user's password so that the account becomes unusable for Bantown's running scripts.
  • Change the vote for Gayniggers from Outer Space from 1 star back to 10.

There was only one small problem: the GNAA did not know what random seeds Bantown had been using.  They would have to potentially log in to billions of possible accounts in order to find out which users were created.  That was not only guaranteed to raise alarms at IMDb, but it was also practically unfeasible in a reasonable amount of time.

But there was another way, thanks to spam.la.  Some of the identities were using that domain for their email address.

One prominent feature of spam.la was that all emails sent to a spam.la address appeared on the website.  ("All email sent to any_address@spam.la is publicly readable right here" is what was said on their site.)  So the GNAA only had to monitor that website and look for unknown IMDb account activation emails!  Then, if the confirmation email was sent to, say, TRACEY49@spam.la, they only had to brute-force the Python pseudorandom number generator in order to find the seed that had created such an address.

That still meant testing all possible seeds, but without having to connect to any server.  If the seed was 215045, it probably meant that a Bantown person was using seeds 215000 to 215999.

Little by little, the GNAA secretly changed the votes for the users that Bantown had spent hours creating.

The IMDb CAPTCHA

Understandably, the Bantown people felt butthurt.  On February 5th, they decided to put an end to the whole operation and they alerted IMDb.  A wave of panic swept over the admins and one of them quickly set up a CAPTCHA composed of a random movie or actor name to protect account creation from automated scripts:

Back in 2005, CAPTCHA breaking was rather uncommon.  Some tools existed, but they only targeted simple CAPTCHAs with minor image distortions.  The one used by IMDb was considered hard to break.

However, the CAPTCHA had an unexpected weakness.  It took the GNAA some time to understand it but, with a few samples, it had become visible:

Can you see it?  "Morgan Freeman" and "Hide and Seek" appeared twice each.  What were the odds that, given 16 movie and actor names chosen at random, two of them would appear more than once?  Pretty small, wouldn't you agree?  Well, yes, unless the list of movies and actors was unexpectedly short.  And a small dictionary is a serious CAPTCHA weakness.

In order to guess the size of the dictionary, the GNAA gathered 192 CAPTCHA samples and counted how many times duplicates appeared:

  • 66 names appeared once.
  • 39 names appeared twice.
  • 10 names appeared 3 times.
  • 3 names appeared 4 times.
  • 1 name appeared 6 times.

They then performed a statistical analysis and managed to compute the probability that the above distribution would appear given various dictionary sizes:

The most probable dictionary sizes were between 170 and 190.  As expected, that was small and allowed for a CAPTCHA breaking attack that did not involve OCR.  Given the size of the corpus, they only had to count characters instead of decoding them.  For instance, four characters followed by seven characters could be "Pulp Fiction," "Ryan Gosling," or "Teri Hatcher."  Since three tries were allowed to solve the captcha, that one would always be successfully guessed.  In average, this led to a CAPTCHA breaker that had more than 60 percent efficiency.

Operation imdbtroll could carry on.

Epilogue

A few hours after the CAPTCHA breaker was integrated into imdbtroll.py, someone on #gnaa pointed out that the IMDb Top 250 only allowed movies that ran for more than 45 minutes.

Gayniggers from Outer Space was a short movie.  It would never enter the Top 250.

The whole operation had been in vain, but science progressed and lulz were had.

Return to $2600 Index