Building the Better Brute-Force Algorithm

A Guide to Heuristic Password Cracking

James Penguin  (jamespenguin@gmail.com)

Let's begin with a brief overview of standard (non-decryption based) password cracking methods.

When faced with the task of cracking a password hash, and reverse engineering the encryption algorithm used to create the hash of the original password isn't a viable choice, you are left with a couple of different options.

Dictionary Attacks

A dictionary attack is carried out by iterating through a list of words, creating a hash of each one, and comparing the result to your target hash until you find a match.

While dictionary based cracking methods can be used to crack a lot of common passwords, they're not all that effective when things start to become more complicated.

Often people use one or more words in their password, and those words may or may not be separated by a space or contain numbers or punctuation.

Take, for example, the password: falconpunch

It contains the two words falcon and punch.  Both of these words are found in a standard list of dictionary words, but you wouldn't be able to crack this password using a dictionary attack because they're mashed together to form a single "word."

We also run into the same issue with passwords that are made up of, or include, portmanteaus that are not generally found in dictionary lists.

Take, for instance, the portmanteau made from combining the words char and lizard.  Chances are pretty good that your dictionary list doesn't include the word charizard, thus rendering a dictionary attack not very effective for that password.

Brute-Force Attacks

When dictionary attacks just won't do, you can always try cracking the password using the brute-force method.

A brute-force attack is carried out by cycling through every possible combination combination of a sequence of characters (aaaa, aaab, aaac, etc.) and hashing each one until you find the sequence whose hash matches your target hash.

The benefit of using a brute-force attack is that it has a 100 percent chance of success to crack your password hash.  The downside, though, is that to iterate through every possible combination of a sequence of characters until you find a match for your target hash, it could end up taking a very long time, and it only gets worse the longer and more complicated your target password is.

Take, for example, the password: charizard rules

That's a password that is 15-characters long, and only contains letters and spaces.  Simple, right?

Well, in order to crack this password using a brute-force attack, you would need to iterate through every possible combination of the letters A-Z and the space character (27 characters total) from a length of one to at least 15 characters.

Let's do some math to see just how many combinations (in theory) we would have to try:
Characters   Total Combinations
1  (27¹)     27
2  (27²)     729
3  (27³)     19,683
4  (27⁴)     531,441
5  (27⁵)     14,348,907
6  (27⁶)     387,420,489
7  (27⁷)     10,460,353,203
8  (27⁸)     282,429,536,481
9  (27⁹)     7,625,597,484,987
10 (27¹⁰)    205,891,132,094,649
11 (27¹¹)    5,559,060,566,555,520
12 (27¹²)    150,094,635,296,999,000
13 (27¹³)    4,052,555,153,018,980,000
14 (27¹⁴)    109,418,989,131,512,000,000
15 (27¹⁵)    2,950,000,000,000,000,000,000
That's a total of 3,067,940,118,341,250,379,359 combinations.

At a rate of testing 50 hashes per second, it would take about 1,945,674,859,424 years to try every possible combination. That's almost two trillion years!

Now, I'm not sure about you, but waiting a couple of trillion years to crack someone's password sucks.

So let's try cutting down that brute-forcing time by operating under the assumption that our target password has to be at least seven characters long. That means that we only need to calculate the combinations of the letters A-Z and the space from a length of seven characters to at least 15 characters.

Let's see how our math looks now:
Characters   Total Combinations
7  (27⁷)     10,460,353,203
8  (27⁸)     282,429,536,481
9  (27⁹)     7,625,597,484,987
10 (27¹⁰)    205,891,132,094,649
11 (27¹¹)    5,559,060,566,555,520
12 (27¹²)    150,094,635,296,999,000
13 (27¹³)    4,052,555,153,018,980,000
14 (27¹⁴)    109,418,989,131,512,000,000
15 (27¹⁵)    2,950,000,000,000,000,000,000
That's a total of 3,067,940,118,340,848,058,083 combinations.

At a rate of testing 50 hashes per second, it would only take about 465,101,560,021 years to try every possible combination. Hooray, that's only 465 billion years, which is a lot less time compared to two trillion years!

Conclusion

From the information above, we've learned that dictionary attacks are nice, but aren't much help when trying to crack a password more complicated than something your grandma might come up with (passwords consisting of more than just a single word or that include nonstandard portmanteaus).

We've also learned that using a brute-force attack will (eventually) crack any target password with a 100 percent rate of success, but it'll probably take a few eons to crack longer, more complicated passwords.

So, what's a devilishly good looking super hacker to do? Why, the answer is obvious. You need to use a smarter brute-forcing algorithm!

Before We Continue

You may have noticed that I've conveniently forgotten to mention anything about passwords that include numbers or funky punctuation ($, &, @, etc.).

It's not that I don't acknowledge the existence of passwords like these, it's just that at this point we're only going to focus on passwords that use the letters A-Z, spaces, hyphens, underscores, and common grammatical punctuation (. , ' ? !).

I'll address all those kooky complicated passwords later on.

The Psychology of Password Creation

The primary issue with using a brute-force attack to crack most real-world passwords is that they spend a ton of time comparing hashes generated from phrases like:
aaaaaaaaa
aaacccaad
xcv hjj abu
hhgdfgdrfg
Now these are all spiffy secure passwords, but you'd be hard pressed to find someone who would actually use a password like one of these.  Granted, there are a select group of paranoid types who I'm sure use passwords like these all the time, but, more often than not, people tend to pick passwords that they can actually remember; passwords that actually contain words.

These words may not necessarily always be separated by spaces, be spelled correctly, or even appear in a dictionary, but they are all still at least words.  They are phrases that can be read aloud, and phrases that people say aloud in their minds as they type them in.

Take for instance the words falcon and punch, and all of the different ways they could be arranged to make a password.

A few possible combinations would be: falcon punch, falconpunch, falcon punch!, falconpunch!, falcon, punch!

Looking at these passwords, no matter how they're put together, the resulting password always includes the words falcon and punch, and when you read it out loud, it's falcon punch.

So in order to crack a "regular password" (i.e., passwords that aren't random sequences of nonsense), we need a more heuristic brute-forcing algorithm that doesn't waste its time on unrealistic passwords like aaaa and asdxcv, but instead focuses exclusively on generating passwords made up of words that adhere to the rules of English-like words and phrases.

What is an English-Like Word or Phrase?

An English-like word is a word that may not necessarily be an actual English word, but still adheres to a series of rules that our brains use to determine whether or not a given sequence of characters qualifies as a "word."

Therefore, an English-like phrase is a grouping of two or more English-like words that adhere to the rules that determine whether or not a group of words qualifies as a valid phrase.

Take, for instance, this article you're reading right now.  As you take in each word, your brain is running a series of tests to make sure that the word you're looking at is actually a word and not just a bunch of random letters.  If I were to drop the word "kguifdgj" in the middle of a sentence, your brain would automatically flag that word as not being a valid word because it doesn't follow the "rules" of English-like words.

That is, certain rules that every word follows in order to be considered a valid word by our brains.  Therefore, you would conclude that that particular sentence was not a valid sentence because it wasn't made up entirely of valid words.

So in order to create a brute-forcing algorithm that doesn't waste its time on nonsense words like "sfdre" and "86ugkie65", we need to "teach" it how to perform at least some of those same tests that our brain does for us automatically so that it can determine whether a word or phrase is valid or just gibberish.

By doing this, we create a brute-force algorithm that only generates possible passwords that an everyday person would potentially use.  Which in turn drastically reduces the amount of time spent generating extremely unlikely possible passwords.

The Rules of English-like Words

Apostrophes

A word cannot include more than one apostrophe.

If a word includes an apostrophe, the apostrophe can only be positioned as the last character, or second to last character, in the word.  Moreover, an apostrophe's last bordering letter must be a "s".

Examples:

chuck's is a valid word.

chucks' is a valid word.

chuck'x is not a valid word.

ch'uks is not a valid word.

Hyphens and Underscores

A word cannot include both hyphens and underscores.

If a word includes hyphens or underscores, the word should be split at that punctuation, and each word should be tested independently for whether or not it is a valid English-like word.

Examples:

snape-kills_dumbledore is not valid.

lightsabers_are_awesome should be split at the underscores and the individual words should be tested separately to determine whether they are valid or not.  If any of the individual words are invalid, then the entire word is invalid as well.

Ending Punctuation (! ? , .)

A word cannot include more than one instance of ending punctuation, and any occurrence of such punctuation can only be positioned as the last character of a word.

Other Punctuation (&, @, etc.)

A word cannot include any instances of other punctuation.

Suffixes

If a word ends in a known suffix (ing, ist, scope, ology, etc.), the last character before the suffix cannot be the same as the first letter of the suffix.

Examples:

psychology is a valid word.

psychoology is not a valid word.

Note:  There are a very few words that don't follow this rule, like "zoology."  However, since words like these are the (rare) exception to the rule, it's more effective to just ignore them.

Vowels

Words must include at least one vowel.

Character Repetition Patterns

The same character can never be repeated more than twice in a row.

Examples:

books is a valid word.

boooks is not a valid word.

The same sequence of characters can never be repeated more than twice in a row.

Examples:

mahimahi is a valid word.

mahimahimahi is not a valid word.

Character Position Analysis

One of the great things about computers is that they're very good at performing simple tasks over and over really fast.  Because of this trait, there are certain tests we can have the computer perform to validate words that wouldn't be efficient if you were verifying a word by hand.

One such test is character position analysis.

A character position analysis is a test performed by iterating through each character in a word, and analyzing that character's relationship with its neighboring characters in order to determine whether or not certain characters "fit" next to each other.

To perform a character position analysis, you first need to build a database that documents how often characters appears directly next to, or one character apart from, each other.

This database is broken up into three separate tables that keep track of occurrence patterns for:

The first three characters of a word (starters table).

The last three characters of a word (enders table).

And the characters in a word as a whole (neighbors table).

Below is an example table documenting the overall character occurrence patterns for the word: AWESOME

Each cell holds two numbers.  The first number represents the number of times a character appears directly next to another character, and the second number represents the number of times a character appears one character apart from another character:
   A    E    M    O    S    W
A  0,0  0,1  0,0  0,0  0,0  1,0
E  0,1  0,0  1,0  0,1  1,0  1,0
M  0,0  1,0  0,0  1,0  0,1  0,0
O  0,0  0,1  0,0  0,0  1,0  0,0
S  0,0  1,0  0,1  1,0  0,0  0,1
W  1,0  1,0  0,0  0,0  0,1  0,0
From the data in the table above, we can conclude that the letters "A", "E", "M", "O", and "S" never appear directly after the letter "A".

We can then use this data to verify whether or not other words are valid.

For example, the word "AMBER" would be considered not valid, because the letter "M" appears directly after the letter "A" which our occurrence patterns table tells us isn't possible.

Well, obviously "AMBER" is a valid word (anyone who's seen Jurassic Park knows that), but the data we have in the table above says otherwise.

So in order to perform an accurate character position analysis, a very large list of words must be analyzed in order to build a useful set of character position occurrence tables.

Such a list of words can be found here at github.com/brannondorsey/naive-hashcat/releases/download/data/rockyou.txt.

Once you have a character position occurrence database, then you can perform a character position analysis.  A character position analysis is broken up into three separate tests, and a word is only valid if it passes all three tests.

Starting Characters Position Analysis

This test is performed by taking the first three characters of a word and checking in the starters table whether the occurrence count (a.k.a., neighbor score) for the first and second characters, or first and third characters, is equal to zero.  If either neighbor score is equal to zero, then the word is not valid.

Ending Characters Position Analysis

This test is performed by taking the last three characters of a word and checking in the enders table whether the occurrence count (a.k.a., neighbor score) for the third to last and second to last characters, or third to last and last characters, is equal to zero.  If either neighbor score is equal to zero, then the word is not valid.

General Character Position Analysis

This test is performed by iterating through each character in a word and checking in the neighbor's table whether the occurrence count (a.k.a., neighbor score) for each character being tested and the character next to it, and the character one character apart from the character being tested is equal to zero.  If any of the neighbor scores is equal to zero, then the word is not valid.

Getting More Accurate Results

One way to get more accurate results when performing a character position analysis is to raise the minimum required neighbor score from zero to a higher threshold.

Rules of English-like Phrases

Spaces

A valid English-like phrase cannot include any occurrence of three or more space characters in a row.  On instances of two spaces in a row, the phrase should be split at the doublespace and each sub-phrase should be tested separately.  If any of the sub-phrases are not valid, then the entire phrase is not valid.

Examples:

row row fight the power is a valid phrase.

it's dangerous   to go alone, take this is not a valid phrase (because there are three spaces in a row)

Word Repetition

The same word can never be repeated more than three times in a valid English-like phrase.

Examples:

row row fight the power is a valid phrase.

row row row row your boat is not a valid phrase.

Phrase Ending Punctuation (! ? .)

Phrase ending punctuation may only appear at the end of a phrase.

Examples:

zelda is so over powered in brawl! is a valid phrase.

zelda is so! over powered in brawl! is not a valid phrase.

Commas

Commas may only appear at the end of words, and never at the end of a phrase.

Examples:

charizard is cool, but so is blastoise is a valid phrase.

cooking is so fun, is not a valid phrase.

Words

In order for an English-like phrase to be considered valid, each word in the phrase must be a valid English-like word.

So what about those passwords that include numbers or goofy punctuation?

While heuristic brute-forcing algorithms are great for generating English-like words and phrases, things begin to be a lot more complicated if your target password includes numbers or funky punctuation ($, &, @, etc.)

Going back to the topic of the psychology of password creation, remember that in general people pick passwords that are actually made up of words.  Keeping this in mind, we can reasonably conclude that passwords that include numbers and/or funky punctuation still follow this rule, but the word(s) in the password are obfuscated by these nonstandard characters.

Another point to consider for passwords that include numbers only is that often they're just appended to the end of a password.

Take, for instance, the password: jalapeno

It's a valid English-like word and, using it as a base, you can obfuscate it with numbers and funky punctuation.

Examples:
jalapen0
jalap3no
j414p3n0
jalapeno1
jalapeno123
j@l@peno
All of the passwords above would not be considered valid English-like words, but under all of the numbers and punctuation, they actually are.

So in order to crack passwords that include numbers and/or funky punctuation using a heuristic brute-force algorithm, you need to develop a method that can mutate strings generated by the algorithm (making alterations like in the examples above) and then test all the password variants as well as the original password string against your target hash.

Are there any heuristic brute-forcing programs available?

Why, yes there are! I maintain a small proof-of-concept Ruby application that implements heuristic brute-forcing.

See github.com/jamespenguin/gentle-brute for more details.

Conclusion

Heuristic brute-forcing provides hackers with the ability to crack long and complicated passwords using brute-force style password cracking, while not wasting eons trying unrealistic passwords.

To illustrate my point, let's pit heuristic brute-forcing against standard brute-forcing to crack a five character password consisting of the letters A-Z.

Using heuristic brute-forcing (via the Ruby program above): 517,839 potential phrases

Using standard brute-forcing: 11,881,376 potential phrases

That's 96 percent fewer phrases to try using heuristic brute-forcing compared to standard brute-forcing!

Return to $2600 Index