Saturday, July 14, 2012

Six Simple Rules for Secure Storage of Passwords

A few weeks ago, password files from LinkedIn, Last.fm, and eHarmony were released by crackers. Yesterday it was Yahoo and today it's nVidia and Billabong, an Australian clothing retailer. The Yahoo and maybe Billabong passwords were stored as plain text, a particularly egregious mistake.

In (a tiny bit of) defense of the yahoos at Yahoo, the file that was stolen and released came from a system not developed at Yahoo. Yahoo bought Associated Content and renamed it Yahoo Voices. Apparently no one at Yahoo thought to do a security audit of the systems they acquired when they bought Associated Content.

Unhappily for over 400,000 people, Associated Content and later Yahoo used email addresses for user IDs.  Other organizations do this too.  All the time.  In at least some cases, it is likely that the hapless subscriber used the same password for Yahoo as for the email account.  Oops!

Securing passwords is not rocket science. Here are six simple rules for securing stored passwords. Toward the end, I'll give you a seventh rule that might have saved Yahoo's bacon even though they violated rules two through six and maybe rule one.
  1. Plain text passwords must be transmitted from user to host over an encrypted connection, e.g. SSL/TLS.
  2. No password will ever be stored in plain text; all passwords will be hashed. 
  3. The password hash will use a salt of non-trivial length, e.g. 128 bits. 
  4. Each password must have a different, random salt. 
  5. The hash function must be resistant to collision attacks. 
  6. The hash function must be computationally expensive and difficult to parallelize. 
Let's look at each one of these in turn.

Plain text passwords must be transmitted from user to host over an encrypted connection

This one ought to make sense right on its face.  If someone can install a network sniffer between a user's web browser and the the server they're trying to talk to, that someone can capture passwords unless they're encrypted.   I've never worried much about the garden variety hacker capturing passwords with a sniffer.  Unless they could get close either to my computer or the the server I want to talk to, it would be like drinking from a fire hose.  However, if the FBI, CIA, NSA, and/or Sheriff are after you, they can probably do it.  The same is likely to be true in certain other countries. Using an encrypted connection makes it much harder to snoop on passwords as they're sent to the server.

No password will ever be stored in plain text; all passwords will be hashed

"Hashed" means  that when we set up the account for someone on our server, we transform the password into gibberish, or "hash."  To produce the hash, we want to use a function that cannot be undone.  So, given the hash value, there is no way to "decrypt" it to recover the original password.  If an attacker succeeds in capturing the password file, he or she still cannot undo the hash; the passwords are safe.  (Not.)

After an account is set up with the hashed password, when our user wants to log in, we just run the hash function using the password supplied with the login attempt.  If the result of hash function gives the same value that was stored when the account was set up, we believe that the password given matches that when the account was set up, and allow access. (But, see the section on collision resistance below.)

The password hash will use a salt of non-trivial length

Although you cannot undo a password hash to recover passwords from the hashed file,  it isn't really necessary to do so.  An attacker can take a large electronic dictionary plus a list of commonly used passwords and run the whole list through the hash function. The attacker now has a list of dictionary words and their hashes.  If a hash from the stolen password file matches a hash in the attacker's list, the attacker has determined the password, and done so without inverting (or "decrypting") the hash.  How does an attacker know which hash algorithm to use?  Kerckhoffs' Principle says that, to be secure, we must assume that an attacker knows the system.  A determined attacker can try many hash algorithms.  As soon as one produces a hit, the attacker knows that's the one in use.

To defend against the use of dictionaries of plain text words and their hashed equivalents, we add some random bits to the password before hashing it.  Cryptographers call this a "salt."  If we just add a zero or a one bit, now an attacker must compute the hashed dictionary twice, once for the case of zero and once for the case of one.  That doesn't help much, but if we add 12 random bits, the attacker needs 4,096 encrypted dictionaries.  Adding 32 random bits to the password before hashing it means the hacker has to make 232 or more than four billion encrypted dictionaries.  With 128 bits, the attacker would need 2128 encrypted dictionaries.  That's a Very Large Number Indeed.  (It's about 3.4×1038.)  Pre-computing that many copies of a hashed dictionary may be theoretically impossible because of the amount of energy required for the computation.

Of course, we have to store the salt someplace, and if an attacker can get the password file, the attacker can probably get the salt, too.  That leads to the next rule.

Each password must have a different, random salt

Instead of just one salt for everyone, use a different, random value for each user. You only have to calculate it once, and you store it with the password hash.   By making the salt very large (see above), a system could have millions, or even hundreds of millions, of users but the chance that two users would get the same random salt remains very small.  Now the attacker has to hash the dictionary file for every user password to be tested.  The problem just got significantly harder.

Each salt must be as close as we can get to truly random; if you use something like user ID for the salt, an adversary can predict it.   Such random numbers are generated using an algorithm called a cryptographically secure pseudo-random number generator.

A second salt, called a "key," used system-wide and not stored with the hashed password adds still more work for the bad guys.  Such a scheme is called a "keyed hash." One place to store the key is as a constant in the programs that process passwords.

The hash function must be resistant to collision attacks

A hash collision occurs when two different passwords generate the same hash.  For example, if "icecream" and "darkbeer" both generate the hash "etaionshrdlu" then either password can be used to log on to the account which has that hash. (By the way, both of those are pretty miserable passwords.)  The salt helps, but we still need to avoid collisions.  If "darkbeer" isn't in the cracker's dictionary, but "icecream" is, they've still cracked your password!

Happily, cryptographers know all about collision attacks, and so they design hash functions that are resistant to such attacks.  Perhaps now is a good time to caution you against "roll your own" cryptography.  It turns out to be surprisingly hard to do right, and you don't need to.  There is software that will do what you need, freely available.

The hash function must be computationally expensive and difficult to parallelize

We should be more worried about criminals cracking passwords than about "hackers" cracking passwords.  Willie Sutton robbed banks because "that's where the money is."  Criminals want to steal your login credentials for the same reason, and some of them can devote impressive resources to the task.  When we say that the hash function must be computationally expensive, we only mean that it takes the computer a long time to turn the plain text password into the hash code.  You won't care, and probably won't even notice, if it takes a quarter second longer to log in to some server.  However, if each try at password guessing takes a quarter second, that's four guesses per second for the attacker, instead of potentially billions of guesses per second.

Graphics cards with multiple processors are available and cheap.  If an adversary can spread computation among multiple processors, then "computationally expensive" might no longer mean "slow."  So, the hash function should be designed so that it is difficult or impossible to divide the work among multiple processors.

About users and passwords

Your users will pick terrible passwords.  No amount of programming can completely prevent that.  Attackers know this, and they'll try common passwords even if you've made a brute force attack too difficult.  For some fun reading, type 100 most common passwords into your favorite search engine and have a look at a few of the lists. (Um, and if you find any of your passwords on any of the lists, change 'em right now!)

You can help users pick good passwords by providing some guidance and, most especially, providing for very long pass phrases.  To see what I mean, check this cartoon from XKCD.  Suggest that, when people register on your site, that they pick a pass phrase rather than a password, and maybe even link to XKCD.

Some security experts will disagree with me, but I think trying to force particular combinations, like "one uppercase letter, one number, one punctuation character" makes matters worse rather than better.  It encourages people to re-use passwords or to write them down.  Writing passwords down isn't necessarily bad in and of itself, but it provides an opportunity for something else to go wrong.

About these six simple rules

Following these six simple rules will make it much harder for attackers who steal your password file to recover plain text passwords.  Following the rules does not make guessing impossible, so if there's a breach, you have to notify people.  With luck, the things you've done will provide time for you to notify your subscribers in case of a breach and for them to change their passwords elsewhere.  They'll also keep people from saying (about us) "How could they be so stupid?"

None of these things will prevent your password file from getting stolen in the first place.  That's a subject for another day, but see below for one suggestion.  None of these things will save the user who chooses "keepout" for a password.

One last thing about the simple rules above.  Attackers can't easily recover a password even if they have the password file.  Neither can you!  So, you'll have to provide a mechanism for the hapless person who forgets a password to reset it.  There is no way to tell them what it was.  Another subject for another day is how to keep an adversary from subverting the password reset process, but it's something to worry about.

Password software

There is time-tested software, developed by real cryptographers, that will help you store passwords in a way that meets the last five tests.  (You have to use SSL or TLS to meet the first one.) I'm not going to include links because there's too much information, but if you're storing passwords, look into bcrypt, scrypt, or PBKDF2.  This really isn't rocket science.  One just needs to understand the need to do it.

The seventh rule: Parameterized queries

The Yahoo password file was stolen through an SQL injection attack.  It seems possible that others were stolen in the same way.  SQL injection is also a subject for another day, but one bit of advice will stop SQL injection attacks cold: use parameterized queries for all database operations.

May password file compromise never happen to you.