Further Password Discourse - Fundamentals

by Modus Mundi

"If every copy is a meta-original, then who will benefit from being truly original?"

    - Scallops Hotel, Bookoo Bread Co.

There have been a lot of discussions around passwords in the past few issues.

While they have been well-intentioned, there have been flaws, and I felt the need to help further understanding of the readership.  Let's talk about why passwords even matter and an idea of what passwords look like in the wild.

As a note, I'm going to drop some terms that are deeply couched in identity and access management.  A great resource for what I mean and in what context can be found in the IDPro Body of Knowledge¹.

Basics

At its most fundamental, a password is "something you know."

In a digital context, a password plus some uniquely identifying factor (such as a username, email, phone number, etc.) is utilized to determine whether or not you have access to a given logical entity.

This entity may be as simple as the user account you post flamebait on over Discord, or it may be an administrative user on a computer in your home.  This process is generally called authentication, and a password is called a "factor of authentication."

As a note, we need to understand that authentication to a system does not mean we are permitted to do anything on the system!

The at-runtime determination to allow an operation to happen is known as authorization.  Systems have moved from a "one and done" system of authorization to, in many cases, continuous authorization; every action is checked against permissions and if permissions are revoked at any time, the action fails.

There are multiple factors of authentication: NIST defines the others as "something you have" (e.g. a YubiKey), and "something you are" (e.g. biometrics).²

Depending on what vendor you ask though, there are additional factors such as "something you do" (e.g. behavior analysis) and "where you are" (e.g. location analysis).

These are all important points for the overall concept of authentication, and indeed massive swaths of industry want to eliminate passwords.  Before we go further into passwords, let's take a second to talk about the brave new world ahead of us, one without passwords.

What if Passwords, but Not?

It's no secret that the FIDO Alliance ("Fast IDentity Online") - whose major members include Apple, Google, Microsoft, and others - are pushing a concept called "passkeys."

I won't delve into it too much here (and I recommend you all do read up on FIDO2), but the general idea is to replace passwords with a secure credential (basically a whole mess of HMAC SHA-256 going on) that is unlocked and utilized via a second factor - typically some biometric capability or a PIN.

This sounds great on the surface - we don't have to remember passwords anymore - and, depending on your risk profile, this could be great.

But we need to take a step back and consider what exactly this "unlocking" process for a passkey looks like.

We simply trade a password for each site for a single password or biometric identifier, localized to the device, that then authenticates.  This means that if a user has a fingerprint or facial unlock, police can require the unlock.

With a PIN, many jurisdictions have no direct key disclosure laws or have laws that protect against self-incrimination.

That said, unless steps have been taken to make the PIN more akin to a traditional password, the PIN can be brute-forced (Hello Cellebrite!  Hello Grayshift!), and even then there are ways around/through.

What this means is that not only is your phone compromised, authentication to various services that were set up to utilize passkeys are now compromised.  Not great.

Passkeys are, eventually, supposed to be "sharable," but are still very kludgy as of this writing.  I do not feel great about the potential for interoperability between operating systems for this technology without substantial efforts from the open source community, and even then there are no guarantees the FIDO Alliance simply won't change the spec a few years from now to wall it in more.

Let's Talk About 39:2 for a Second

In 39:2, William Ben Bellamy Jr. noted that "When you type in your password, which is simply a string of characters, your system immediately calculates the hash value of that string."  This is an overly simple characterization of how passwords are generated.

Generally speaking, the modern password is not a string of characters that is then blindly converted into another string by way of a cryptographic hash function.

If this were the case, and we had direct access to a given system, pre-computational attacks such as rainbow tables would render password cracking trivial for large swaths of passwords, and would continue to render whole character spaces of hashing algorithms "dead" by way of performing a lookup against these sorts of tables - for instance, CrackStation³ as well as other websites offer a massive amount of pre-computed hashes, and from there it becomes an arbitrary lookup for an offline attack.

Even an arbitrarily complex password (such as Bellamy Jr.'s "nnood Ha(k [poi C0ffee" example) is made modest in MD5.

Bellamy Jr. states "So a hash is like an absolutely precise fingerprint of the original material," but has disregarded that when we consider collision attacks in hashes,⁴ we don't necessarily need to know the password.

We just need to know a string that produces the same hash.

There are mitigations to these issues, the first being adding cryptographically-aligned inputs to the string (we'll talk about these when you are older); the second being using robust hashing methods (or use a key derivation function instead; I'll simply refer to this as hashing for now, but there is a difference and you should read up on it) that make collisions infeasible; or the third being forcing authentication to be as "online" as possible (so that you can obtain additional context about the authentication event, lock out accounts that are potentially being attacked, and so on).

Bellamy Jr.'s article also makes an assumption that hashing is performed at the system where the password is collected.  This is a flawed assumption.

Comparison of a given string to a hash does not happen on the front end (except in the case of localized authentication, where the password does not leave the system we are authenticating to).

Why?  Consider the ramifications of this.  The front end would need to have knowledge of the following things:

The hashing method.

Any parameters of the hashing method.

Any cryptographically-aligned inputs to the string (you're almost old enough, hold on).

Then, after performing the hashing operation at the front-end, it would have to transmit the hash over the network to the back-end.  The back-end would have to understand it was being passed a hash for comparison, and then would have to determine equality from this.

There are substantial issues with this approach: the ability to generate a DDOS attack arbitrarily via calling the now-exposed hashing methods (which, given we're dealing with authentication, have to be available publicly) and the fact that if we intercept the hash, it is no different than intercepting the password in a "raw" format, among other things.

And while the astute may argue back with Shannon's maxim (The enemy knows the system) on the second point, the fact that our hash now is no different than the string input is damning - should a "man-in-the-middle" attack be engaged or plaintext protocols be used, we have gained nothing but wasted compute at the front end and a false sense of security.

Authentication is worthless without secure encrypted channels with which to exchange data.  A great example of this is SSH⁵ - it transmits the password to the host after negotiation of a secure tunnel.

Okay, But How Does This All Work, Really?

Something that is overlooked by people when discussing passwords is that encoding and character support play a massive role in not only the allowed input of the string but the results from the string when hashing.

Consider the following string: Frühjahrsmüdigkeit

In U.S. ASCII, this string might look more like Fr?hjahrsm?digkeit to the system, as it doesn't know how to interpret characters outside of its boundaries.

(Fun note - password systems that rely on ASCII that allow input of non-ASCII characters do really weird things.  Explore and see how you can break stuff if you ever get a chance.)

Extended ASCII was created to help with this (we get our umlauts in extended ASCII), but it isn't great.  In UTF-8, we would get the string as represented previously, not to mention the approximately 1.11 million code points in Unicode that could theoretically map to a character.

The point here is that choice of encoding matters, and we must be "speaking" the same encoding mechanism lest things not work the way you expect.  Because of the tendency for ASCII to absolutely demolish non-U.S. characters when you attempt to encode them, UTF-8 is the de facto standard for string encoding for authentication purposes.

This isn't always the case (I'm looking at you, databases), but it's good to understand the general case and then go looking for exceptions.

Once we understand what we're encoding in and we understand where the data is going, we need to understand how a hash is really calculated.

As pointed out earlier, simply keeping around a hash of the string would lead to pre-computational attacks, and to mitigate the feasibility of pre-computational attacks, we would use specific cryptographically-aligned inputs to introduce additional user-independent entropy (as I alluded to earlier - congratulations, you are now older).

Many systems implement something called a "salt," or what is commonly several bits of random data as an additional input to the string prior to hashing.

Let's use our prior example of:
Frühjahrsmüdigkeit
This could be any string, and I implore you to test this along at home using CyberChef⁶.

Anyway, back to our string. If I take the string, and I use SHA-256 with 64 rounds, I get the following hash:
4b6ee7182221d17332a25302a5225ffd86801547ed8bf0460a8be0597bcb920d
In a system that stores passwords, this hash is commonly prepended with the hashing mechanism so that a given system knows how to treat the hash (as different users in the system may use different hashes, etc.).

The above hash in many LDAP systems may be represented as:
{SSHA256}4b6ee7182221d17332a25302a5225ffd86801547ed8bf0460a8be0597bcb920d
Generally speaking, the salt is generated through a Pseudo-Random Number Generator (PRNG) function and either prepended or appended (typically appended as it makes it way harder to perform a length extension attack) to the string.

In many systems where hashing is performed, the salt is kept separate from the password itself so that it can be combined with the password prior to hashing.

Once the salt is appended, the hash function is run. Let's assume from a given PRNG function, I generate the salt of: k!X2x

(This is a "short salt" - you would not see something this small in the wild usually)

Unhashed, the string now looks like this:
Frühjahrsmüdigkeitk!X2x
Which, when we perform the hashing operation and assume it is in our LDAP system, it looks like this:
{SSHA256}9531d78266ecf43977fdbf311c3185a63ac266b5a9a9a31fa1e605535625f963
Generally, if another user uses the same password, the PRNG nature of the salt will modify the string input to the hash function, and the hash function will output an entirely different hash.

I won't go too in depth here on how salts are generated, but it's a lot of good reading ahead of you if you get into it.

Another cryptographic input (albeit not used as often) is commonly referred to as a "pepper" (or if you prefer NIST terminology, a "secret salt').

A pepper is like a salt in that it is a randomized value, but differs in implementation as it could be static for all users (not best practice) or could be a random but known value for each user. A key difference between pepper and salt is where the determination of the PRNG function lies - commonly peppers are created by an HSM outside of the system where authentication occurs.

The technical details get a little wild, and I'm trying to keep this easy, so I leave it as an exercise for the reader to look into implementation details.

For sake of ease, let's assume in the above case we have a pepper, and the applied pepper is j.7DtT for all users (again, short pepper, not really a thing in the wild).

This means our string, pre-hash, now looks like this:
Frühjahrsmüdigkeitk!X2xj.7DtT
And the resultant hash, in our little LDAP server, looks like this:
{SSHA256}da3aac7ab5af642231c32191f54ffbe3ad02b8e9af6f45ad0c1a90fc39619cdd
This effectively makes rainbow tables too expensive to operate and we are forced into attacks that require either direct access to the hashes or require directly authenticating against the service.

Putting It All Together

Taking the previous rant in totality, we see that authentication and passwords are more complex than we may have thought initially.

In an ideal world, the flow for a given single-factor, password-based authentication process looks something like this across systems:

User supplies credentials (username, password) to a front end system over a TLS'ed connection.

User does something to initiate the authentication process.

The front-end transmits the password over TLS to the back-end system that stores authentication data.

The back-end performs validation of the provided password, adding salt and pepper where appropriate, hashing the provided password, and performing a comparison of the two values.

If the password is wrong, the hash values do not match, we return to the front-end that the user is wrong.

If the password is right, the hash values match, we return a successful authentication message back to the front-end.

The above flow entirely disregards a whole universe of access management, tokens, assertions about the user, anything like that.

The rabbit hole goes incredibly deep and smart organizations have a great deal of nuance in how they allow access to protected resources.  Just from passwords alone, there are a number of topics.

For instance, which hashing methods should be used?

What restrictions should we place on password selection?

Do we force password rotation?  If so, under what circumstances?

What's the process for a user to reset their password - how do we verify them?

Where do we keep the passwords?

How long should it take from password transmission to hash validation?

How do we recover in the event that our passwords become compromised?

Are there better ways to manage passwords than just throwing plain text over the wire?

Stay learning.  Semper Porro.  (Always Further.)

References

github.com/IDPros/bok/blob/master/terminology.md

pages.nist.gov/800-63-3/sp800-63-3.html#af

crackstation.net/hashing-security.htm

www.mscs.dal.ca/~selinger/md5collision

www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process

gchq.github.io/CyberChef

Return to $2600 Index