Password Cracking in the Modern Age

by Yuval tisf Nativ and Tom Zahov

A long time ago we understood that storing passwords leads to many issues with security.

The idea behind the password or passphrase is to provide a layer of security for the user.  Basically, the computer or system can work perfectly without needing a password.  You can get to the login screen, see a list of available users, and just click on one of them and the machine will load the settings required for the specific user.

The issue is that sometimes, part of those settings are a bit more sensitive to the user.  Or maybe you, as an admin, want to know which user is which and not have them logging on as another user.  We can say today that most systems are interested in the segregation between users - sometimes due to sensitive content, sometime due to different privileges and features on the system and sometime it's just to load the content for that user.

For example, you can probably post your SoundCloud credentials online.  Most SoundCloud users are not content creators, but rather consumers and your playlist is most likely available to the public and you don't care about it.  The only reason you had those credentials in the first place was to get your playlists when you load the application.

Password Protection on Different Systems

As always in the real world, the solution is not a single solution for everyone and everything.

Take, for example, two systems which are completely different by nature.  The first will be your home desktop/laptop (we'll call it your PC) and a shopping site like eBay.  In your home PC, you will store a few user credentials, most likely no more than ten users.  The protection you expect your system to provide is to make it difficult on a user to perform actions or access data from another account.

eBay, however, is different by nature.  Let's first make something clear: eBay is not a shopping site.  eBay is a security company.  eBay is a system whose sole purpose is to allow sellers and customers to exchange goods with security - commercial security and information security.

On eBay, your needs concerning passwords are completely different than what you would expect on your home system.  You expect the system to use your credentials to identify you, keep others out of your account, and never to disclose your password.  They seem like the same thing at first, but your home laptop is something which offers services only to you and maybe a few other members of your family.  eBay, however, is offering services to millions of users.

In the Beginning

Let's take your home computer as an example to start understanding issues and how we commonly practice password storing today.

Let's imagine you are the administrator on your own network at work.  You have John, who you have just appointed as the new help desk manager.  John is a great guy and, in order to help him do his job, you give him administrative access to your main domain controller.  Remember, John needs those privileges not because he's a great guy, but because his job will require having significant changes to you network as part of his daily routine.  If John has such a high access, what keeps John from reading the file containing all of the users' passwords and logging in as one of them?  One of the main problems is that in almost every system you want a principal called non-repudiation.  Non-repudiation is a state where if you, the administrator, can see an action in the log of the system made by User A, User A cannot deny having taken that action.

One of the technologies used to solve this problem is hashing.

A hash is a mathematical function which takes a random length of bits and maps them out into a constant length of bits.  To better understand this, let's quickly go over the XOR function and a bit of your high school math classes.  The XOR function is a logical operand which takes two bits and outputs the difference.

For example: 1 and 0 going into XOR will give 1 since there is a difference.  0 and 0 going into XOR will output 0 since there is no difference.  The important thing to notice about XOR is that if you have two parts of the equation, you are able to easily map the missing part.

If, however, you only have the output, it is mathematically impossible to know the inputs.  If I say that the output of XOR is 0, the inputs could have been 0 and 0 or 1 and 1 and you have no way of knowing which, unless you have more information (statistical sample or other types of data).

Now your high school math classes will be handy since they will help you theoretically grasp the way hash functions work and therefore understand the features later on.  Remember that test you had and there was this question where you got that weird outcome of "-3.452x" and you knew it was wrong but you had no idea why?

Later on, when the teacher returned your exam, you noticed that you flipped a "-" or just mixed up in copying a number and instead of 2 you wrote 7?  That's another feature of hash functions.  They work in a mode we call block ciphers.  When you give the hash function an input to compute, it has a routine it has to follow, but this routine is not one.  It's comprised of blocks where the output from the previous block is then fed into the next block as input.  This will cause any minute change to the input to "drag" the "error" (more correctly - change) all across the computation progress and provide a significantly changed output.

So these are the features of hash functions we spoke of up until now:

They take any size of input and output a known (constant) size output.

Each change to the original input will result in a significant change to the output.

They are one-way functions.  You can easily compute a nonce to the hash sum of it, but it is infeasible to compute a nonce given the sum.

How Does This Work Then?!

Well, fine you should ask.

When you first enter your password for your user account, the operating system takes your password and hashes it.  Depending on your OS, it can be with LM, NTLMv1, NTLMv2, MD5, or other types.  After the password is hashed, the sum of it (e.g. the output of the hash function) is then stored into a file.

Next time you want to login, the machine gets your password, but it does not know the previous password (it knows the sum).  The machine uses your input as the nonce for the same hash function and then checks if the sum is identical to the hash stored in the file.

This allows the machine to store the passwords in a file on disk and, if an attacker gets a hold of these sums, the attacker cannot use them to know the original password for those users.  There are a few cryptographic attacks which can allow an attacker to leverage those sums and be able to then login to that system.

The first is if a weak hashing algorithm was used and is susceptible to collision attacks.  A collision attack is when given a sum of a hashing algorithm, you can compute a nonce that will result in the same sum.  Let's go over this again: we said that a hashing algorithm will compute a fixed-length sum for any given input.  That was not accurate.  Each hashing algorithm hashes its own nonce size limitations.

Let's take MD5 for example.  MD5 will give us a 128-bit sum every time, typically represented as a sequence of 32 hexadecimal digits.  Now the input is practically limited to the amount of computer memory you have while mathematically being infinite; therefore we have infinite set of inputs which will result in the same output.

There are two questions left:

How common are these collisions?

Is there a way to compute them or is there just random guessing?

For this example, we'll use the work of Peter Selinger of the Department of Mathematics and Statistics from Dalhousie University.  We'll add a story behind the data:

John has a remote connection to his security camera at home.  The security camera stores the password as an MD5 hash.  John is using a secure connection so that a "man-in-the-middle" will not be able to understand the data.  John chose an extremely long password:
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
Darth is a hacker who was able to clone the hard drive of the camera while visiting John. Now Darth tries to read the image of the drive and finds that the password is hashed.

Darth finds the following hash:
79054025255fb1a26e4bc422aef54eb4
Now Darth wants to find the password. He knows that John used a very long password and now turns to a collision attack. During this, Darth find this value:
d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
Which is different than the original but has the same sum under MD5. (Editor's Note: No they don't?)

Darth can now login to John's camera, since the camera does not know the original password, but only the MD5 sum of the password and in this case:
MD5(John's_password) == MD5(Darth_collision)
Practices

Hashes today are used in many places in many forms; they are used in local machines to store passwords, they are used in websites to protect sensitive information in case an attacker can ex-filtrate the data from the database, they are used in verification of certificates and in file integrity checks. Now let's look at a more practical view of hash cracking.

There are several attitudes towards cracking hashes:

Open-source cracking

Mathematical attacks

Brute-forcing

We won't go over each of them in great detail. The first is quite simple: many sites today offer the service of cracking hashes (we won't go into how they work) and you can just Google a sum.

For example; you can Google this hash and see what the plain text of it is yourself:
e10adc3949ba59abbe56e057f20f883e
Mathematical attacks depend on the hashing cipher used and, in any case, they are usually not valid when talking about modern ciphers.  Sure, MD5 has a known collision generation algorithm (referred to above), but they will not lead us to a plain text from the hash and there are no known attacks for SHA-256 or SHA-512 for now, so we'll just skip them.

Brute-Forcing

Assuming the hashing algorithm is a strong hashing algorithm, we cannot reverse it nor can we find a collision easily.  We would prefer getting the plain text anyway.  Our way of doing that would be by taking plain text values, computing the hash sum for them, and then comparing it with the original sum.  It might sound like hunting with a club, but only because it is.  There are ways to make this search smarter and smarter since computing hash sums consumes a lot of resources from most processors.

A word list is just an ASCII file containing words that we think might be used as the password.  Sometimes we can even "improve" the file by pre-computing the sum and saving it right next to the word so we can just search the file for a given hash.  This file will be called a rainbow table.  They are very big files and searching through them is not easy, but most of the time it's easier than computing the hash all over again and it's more cost-effective when testing several hashes and not just one.

This might not sound like a big improvement, but imagine you just hacked a database and stole 20,000 credentials which are MD5 hashed.  Most of the time, you are not interested in just one password but rather as many as possible.  Instead of trying to crack each and every one of the hashes in this list, you can use the list of 10,000 most used passwords to try and crack them.  A lot of them will probably fit and, again, you are rarely interested in recovering all of the hashes.  You can get the top 10,000 most commonly used passwords and even the statistics.

On Kali Linux, type these commands:
$ wget -O crypted-storage.lst http://pastebin.com/download.php?i=YULUgrnd
$ wget -O 10k-most-common.txt http://xato.net/files/10k%20most%20common.zip
Now we'll use John the Ripper to crack those hashes:
$ john --wordlist="10k-most-common.txt" crypted-storage.lst
You might say, "What are the probabilities that so many people will have the same password if it's not on the top 10k list?"  Well, this attack is based on the birthday problem in probability theory.  This theory tests the chances for a set of randomly chosen people of a pair of them having the same birthday.  Unlike common belief, the probability that two people out of a set of 23 having their birthday on the same day is close to 50 percent.  With a birthday attack, a hacker randomly generates output of a given cryptographic function until two inputs map to the same password.

Hashcat

Those of you who are familiar with this topic are probably a bit mad right now since I have titled this section after the name of a tool for cracking hashes.  I would like to say that by my standard, Hashcat is not just a tool but it is the tool for hash cracking.  The main reason this tool is unique for me is the way this tool is configured and works for GPUs (yes, there are other tools working with GPUs and I'm going to talk about Hashcat only).

I would like to refer back to an algorithm called LAN Manager (LM).

LM was an algorithm used to store passwords on the older versions of Microsoft's Windows.  In the newer products by Microsoft, we generally do not see LM used anymore.  The reason is because this algorithm was fine at the time that it was designed, but these days with our i5 and i7 processors, this algorithm is prone to attack and a 64-bit output is suddenly a very small range and we can easily find values.

The biggest limitation we have on hash cracking is our processors.  Storing and sorting through large rainbow tables is possible, but requires very large disks and very fast and large memory, so we mostly compute the hashes on the fly.  Though our i7s are strong, they are still not fast enough to allow us feasible cracking of strong passwords on MD5 or SHA-256.  Today, when we're talking about hash cracking, most of us are talking about hash cracking using graphical processing units rather than central processing units.  There are many differences between the two to make a GPU more suitable for hash cracking, but the main reason is the amount of cores.  Let's take the brand new Intel Core i7 fourth-generation 4550U processor and the AMD Radeon HD 7950 GPU.  The i7 has two cores with four threads at a speed of 3.00 GHz.  The Radeon 7950 has an engine clock of 875 MHz but 1792 stream processors!

GPUs are particularly good for hash cracking since they are really good in parallelism, especially if you are referring to identical operations, which is what hashing is.  Remember that block feature we talked about in the beginning?  Here you see it coming to life.

A Comparison Between GPU and CPU

Let's take a simple graphics card.

For this example, again, we'll use the AMD Radeon 7950.  There are a total of 1792 stream processors on the 7950.  Without optimizing the computation to the GPU architecture, you can still get a reasonable 160 x 10⁶ SHA-1 computations per second on this GPU.  Now let's compare this to a CPU:

We'll assume the new Intel Core i7 fourth-generation is here to make things easier.  So when referring to the technical spreadsheet, we notice the two cores and four threads.  To simplify things, we'll take it as a real eight-core processor.  A single SHA-1 computation will consume about 500 clock cycles.  If we are to create an optimized hashing function to use the 128-bit registers to try and require less and less computation, we might reach even a point where we can use 300 clock cycles to compute a single SHA-1.  Assuming we can run in parallel (this is not such a reasonable assumption since there is a limited number of registers we can use and we assume no other application will require any CPU time), we can get to eight computations with each using 300 clock cycles, which will leave us with 2,400 clock cycles resulting in 2,450 SHA-1 hashes per second.

Using Hashcat for Your Hash Cracking

Let's start with downloading and compiling Hashcat.  Yes, there is a version on Kali, but Hashcat is frequently updated and improved and you want the newest version of it.
# AMD Cards:
$ wget http://hashcat.net/files/oclHashcat-1.21.7z
# NVidia Cards:
$ wget http://hashcat.net/files/cudaHashcat-1.21.7z
$ 7z x *Hashcat-1.21.7z
$ cd *Hashcat-1.21
And now you're ready to start cracking hashes. Try, for example, the following MD5 hashes:
dbbac35898019d27b13c94fdb5cc82cd
816691aec055aae10d337decbe58d8a2
5538b84b6f7035fcdce5dca9dcc71fa4
0034c5e418ae4f2eba590a16696edbb3
f7bb639bbba868018149ce68a441324d [4everlost]
3cd652bda3ca64817ed90a866ff6eeee
abc3f57afa87e3bfd1be8adcf6fcd417
510de649803ef411d3b7571dfcea5734
c3ab409980f029231ef5231ed34f2a70 [ciribao]
bff5bf8132ce4e3bbdffa90110585b90 [agoslife]
e9a90c5a3940f0210719f876b7dc0557
77738a9ac7dba7fbe37dfe2828749b76
24451efb8375de1725c7c04c8ff8fdfd [2404ilona]
f4a9cb127ccecb6ea8663a40a0bcbe66
138fd9fd7142c26847f1ad3b5020047d
faa91447811f4e251dc8acf5955fe4c7
670678ca9f5888a78224b10207d7acaf
95bc7aa683fa36f91c478c8f07255830 [skelton1987]
3a7641135fd35d1281ac3d3297cc0a00
17e1a9143e600adccde43fd60f114d73
Test using this Hashcat command line:
$ hashcat -m0 -a7 hashes.txt ?a?a?a?a Top2Billion-probable-v2.txt
Use Hashcat's default dictionaries or better yet, grab a pre-compiled wordlists.

Static Salting

Another solution we have created to handle these hashes is called salting.  In the process, a system concatenates the original value before passing it to the hash function.

For this example, if we have a database of hashed credentials with MD5, we can find many rainbow tables and easily compute many of the passwords relatively easily.  A system can protect the users further on by salting the values of the passwords.

In this example, John entered the password 123456 which will be found easily by any rainbow table.  Our system will concatenate each password with the following format:
MD5("123" && user.password && "abc4rfdgf4")
This will result in each password being harder to crack.  The attacker needs to get a hold of the salting format before cracking the passwords and, even then, any existing rainbow tables will probably not fit the salting format and an attacker will have to calculate the hashsums again.

Dynamic Salting

Dynamic salting is the more recent evolution.

While computing power improved, we started seeing programs like Combina, which are very efficient and allow users to easily create rainbow tables whenever the need to arises.  The next evolution in the field is dynamic salting, which means that each value is salted with its own unique string.

When you salt each value with its own data, it means that a computed hash by the attacker cannot be used twice since the salting data for the others have changed and he needs to compute his wordlist for each and every value.  This is powerful, especially if you keep the salting information separate from where the passwords are stored, meaning that an attacker will have to gain access to both of the sections prior to being able to attack them.

Summary

Remember that the world changes.

Hash functions decay over time, not because they were designed wrong, but rather because the world changes and people have more computing power at home, plus new devices are invented for the sole purpose of cracking hashes (e.g. Bitcoin and Butterfly Labs).

Return to $2600 Index