Password storage in summary

We discussed the password storage in the article Speaking of passwords…and concluded that password implementation requires a cryptographically strong, contemporary (as in “very, very slow”) one-way hash function with a randomly generated salt for every password.

This is pretty much all you need to take care of. Salting is fairly straight-forward but it is essential to make sure it always works. Achieving a good balance between the slowness of the hashing algorithm for the attacker and an acceptable user performance is just a bit more involved but the things like key stretching techniques have been around for literally ages now too.

It is rumored that Thomas Ptacek once said:

What have we learned? We learned that if it’s 1975, you can set the ARPANet on fire with rainbow table attacks. If it’s 2007, and rainbow table attacks set you on fire, we learned that you should go back to 1975 and wait 30 years before trying to design a password hashing scheme.

We learned that if we had learned anything from this blog post, we should be consulting our friends and neighbors in the security field for help with our password schemes, because nobody is going to find the game-over bugs in our MD5 schemes until after my Mom’s credit card number is being traded out of a curbside stall in Tallinn, Estonia.

We learned that in a password hashing scheme, speed is the enemy. We learned that MD5 was designed for speed. So, we learned that MD5 is the enemy. Also Jeff Atwood and Richard Skrenta.

Finally, we learned that if we want to store passwords securely we have three reasonable options: PHK’s MD5 scheme, Provos-Maziere’s Bcrypt scheme, and SRP. We learned that the correct choice is Bcrypt.

And I think that is a great summary.

News: Website and app security tips

TechRepublic has an interesting article “Website and app security tips for software developers” that talks in a very short space about a whole bunch of things, from the “shelf life of software developers” to the advice on security for the website developer.

It provides in particular an interesting insight into why a person thoroughly familiar with security made security mistakes again and again.

I know why I made those mistakes — it was either the hubris of “I can roll my own better than off-the-shelf,” or the idea that slapping something together quickly would be fine “for now” and I would pay the technical debt off later. I was wrong on both counts, every single time.

How often do we get trapped like that?

Philosophy of door locks

When working on security, there is something extremely important to keep in mind at all times. We are not trying to make systems impenetrable. We are trying to make it real, real hard for the attacker, that’s all.

Security guards everywhere

If an attacker has physical access to your system, you lost. All measures, passwords, firewalls, everything is there to deter an attacker that is attacking remotely. But the only thing that actually stands between your system and a determined attacker is your door lock. Never thought of that, did you? The security of your computer at home is only as good as your door lock.

Yes, there are smart cards that are physically secure computers. But their application is limited and most if the time we have to deal with systems that we protect in the “virtual world” while in the real world they are basically defenseless. So we make it harder for the attackers with door locks, security guards and CCTV cameras.

Again, we are just making it harder, not impossible. Impossible would be impossible, not to mention prohibitively expensive. Given that an attack is always possible and there are many venues of attack, the attacker will always tend to choose a path that is most economical – the cheapest way to break into your system.

My task as I see it is to convince you to use such security measures that it becomes cheaper for the attacker to break into your house than to attack your computer through the software. Once we are at that point, you start looking into the well-understood world of physical security and my task is done. But we are far from there.

Speaking of passwords…

Wouldn’t it be quite logical to talk about passwords after user names? Most certainly. Trouble is, the subject is very, very large. Creating, storing, transmitting, verifying, updating, recovering, wiping… Did I get all of it? It is going to take a while to get through all of that, do you reckon? Let’s split the subject and talk about password storage now, as the subject that comes most often in the security discussions and in the news.

Speaking of which, some recent break-ins if you were not keeping track:

"Enter Password"LinkedIn  – 6.5 million passwords stolen, Yahoo – 450 thousand passwords stolen, Android Forums – 1 million, Last.fm – 8 million, Nvidia – 400 thousand, eHarmony – 1.5 million, Billabong – 21 thousand, TechRadar … the list is going on and on.

Out of 8 million passwords in LinkedIn and Last.fm breach, “It took a user on the forum less than 2½ hours to crack 1.2 million of the hashed passwords, Ars Technica reported.”

Oops. Is that supposed to be so easy? Actually… no.

There are few easy rules for storing the passwords. First of all, never store passwords in clear, unencrypted, like Billabong did. You remember that any and every system was or will eventually be broken into. You have to assume that your password database will fall into wrong hands sooner or later. Your password database has to be prepared for that eventuality to look good in the eyes of the press.

So, when your password database is in the hands of the attackers, it has to defend itself. A database full of unencrypted passwords does not provide any defense of course. What about an encrypted database?

Well, since you have to be able to use the database, you have to decrypt it when you need it. So the system will have the key to the database somewhere. Since the attacker got hands onto the database, there is no reason why the attacker should not get the encryption keys at the same time. So this is definitely not improving the situation.

Secure hashes (as in the name of this blog) are the ultimate answer. The important thing about the hashes is that they do not require a use of a key and they can be easily computed only one way: from the clear piece of information into the hash. They cannot be reversed, one cannot easily compute the original piece of information from the hash. That’s why they are called one-way hashes.

The hashes were invented a long time ago and they were improving over the years. The old hashes are not secure anymore with the increases in the computing power. That’s what they talked about when they referred to recovering the plain text passwords – they computed passwords that will result in the hash that is in the database.

Finding the passwords then given a database of password hashes boils down to taking a password, computing its hash according to the algorithm used, and comparing it to the hashes stored in the database. When a match is found – we have a good password. This is where the cost of computing the hashes comes in. Older hashes are much faster, newer hashes are much slower. With the advent of rental cloud computing services this is becoming a small distinction though. All SHA-1 passwords of up to 6 characters in length could be brute forced in 49 minutes with the help of Amazon EC2 for a cost of $2 two years ago. And it’s getting cheaper and faster. So here is where the speed matters but it has the opposite effect. The hash, to be secure, must be a very, very slow one. Almost too slow to be useful at all would be a good start.

Even if the computer systems weren’t getting blistering fast compared to the blistering fast of five years ago all the time, a workaround was figured a long time ago. If you are prepared to invest in some large storage, you can compute slowly but surely an enormous amount of hashes and keep them somewhere. When the time comes, you just have to go and compare the hashes you computed in advance to the given hashes in the password database. This is called using rainbow tables. And it’s bloody effective.

Ok, ok, it is not all that gloomy. This fight is an old one and we have defenses. A very effective measure against the rainbow tables is to use a cryptographic salt. A salt is an additional piece of data supplied to the hash function together with the password. Since the attacker did not know the salt in advance, precomputed rainbow tables suddenly become useless. Great. Unfortunately, many sites use a fixed salt that is generated once and set in stone. This effectively makes rainbow tables useful again. One just has to compute them once with that salt again for the whole database. So the salt, to be useful, must be generated new for every password and stored together with the password.

So, finally, the answer is simple: a cryptographically strong, contemporary (as in “very, very slow”) one-way hash function with a randomly generated salt for every password. And anything deviating from that is just plain tomfoolery.

Biometrics – any good?

I think I already talked about this subject previously but not here. Anyhow, the subject bears repeating.

Many go “yippee!” at the mention of biometrics and start to think their user authentication problem is solved. Do not pay attention, they will end up in the newspaper headlines fairly soon, either for massive security failures or being bankrupt, or both.

Cartoon of a man being checked on biometric fe...

The problem is not a huge false negative rate, and it is not the huge false positive rate either. The problem is immutability of the characteristics. The biometric characteristics change slowly over time as you age and can be influenced by the environment but they cannot be changed at will (well, at least not easily). The problem is that whatever this thing is, it is a part of your body and is most of the time something you do not want to change even if you had a possibility to do so.

And that’s also why it is dangerous – you may end up losing a limb or two.

The first question that should be asked then, “what’s it good for, anyway?” A characteristic that is fairly stable, cannot easily be changed at will, – that’s a fairly reasonable user name, i.e. the user identification. Even then, it is a questionable approach because it is a good idea to let users change names.

Biometrics is definitely not any good for authentication, that is, proving that you are who you say you are. If you compare to the familiar authentication with passwords, you will notice that the means of authentication are supposedly:

  • secret or concealable
  • non-degradable
  • easily changeable in a controlled manner
  • transferrable in a controlled manner

And biometrics is none of that.

But why then? Oh, I do not expect to find a definitive answer to that but one thing could be that it looks cool in movies. The other is that biometrics were historically good for tracking people that are not actively resisting such tracking. But then we talk about politics and power and that is not the subject of this discussion at all. One thing is certain: whatever the reason to use biometrics is, it has absolutely nothing to do with security.

So when you see claims like “Biometric is the most secure and convenient authentication tool”, now you know that’s just utter nonsense and you should stay away from people (and companies) making such claims. Unless there isn’t enough nonsense in your life, of course.

When it’s your responsibility to implement a security system, try to stay away from biometrics, you’ll live a happier life. Leave it to Hollywood.

What’s in a name?

Here is something quite interesting. Nobody ever considers the user names. They are just sort of “given”. Well, are they? Most of the time, they are not. They are assumed and designed into the system one way or another. And they can have an impact on security.

An old saying goes that a secure Windows computer is the one that is switched off, locked in a safe and dumped into the ocean. That goes for any other system and still leaves a tiny chance of a deep water diver break-in. We cannot make a computer system impenetrable. What we can do is make it harder for an attacker. One of the ways to make it just that bit harder is to choose good user names.

What’s a good user name? What is a user name? What is a user id? What is user identification? What on Earth is the difference? Hold it there, it’ll be all right in a minute.

I promise I will try to be as unscientific about it as possible.

Let’s start by making a distinction between what you think of as your user name and what the system thinks of as your user identification. In a normal traditional system design those are different things. And in security conservatism is a good trait.

Historically, the users were identified with numbers, account numbers. Still, in most systems internally the users are identified with account references, usually numbers. We will call these “user id”. Originally, users used the account numbers, the user ids, to log in to the system. Nowadays they usually don’t. Typically users use some kind of aliases to log in, those being user names, e-mail addresses and so on. We will call these “user names”.

Functionally, only one property is important for both the user id and the user name. They must be unique within the boundaries of the system. This is quite normal, no? The system must be able to distinguish between users, so the ids must be unique. The log in procedure must refer the user name to a single user id, so the user name must also be unique. Simple.

What are the security requirements to the user ids and user names?

The user names have to be unpredictable to help fighting the brute force attacks. We know very well that the most popular password is “password”. If an attacker knows the user names on the system, he can try all of them with the password “password” and he may get lucky. Another important issue is the lockout of users, also known as a denial-of-service attack. If the system locks users out after three unsuccessful attempts and the user names are known, an attacker can go through all user names and perform those three attempts, locking everyone out of the system. So unpredictable user names make an attacker’s life difficult. Mind you, unpredictable user names, not necessarily random user names.

We must distinguish between two very different cases now.

In one case the user base is automatically versatile and the user names selected for some semi-obscure reason are sufficiently diverse. For example, you have a website where people register with their email addresses from all over the world. Those are hard to predict. They are not random but making an exhaustive search for user names is technically hard. So you are lucky and you can leave things as they are.

On the other side, if your user base is fairly uniform, selecting something else as an identifier would work better. Suppose you run a company and all registered users belong to your company. If you select email address as an identifier, the attacker’s life suddenly becomes much easier. Get the list of people and there you go, you know all user identifiers. In this case letting users select their own identifiers, for example, would work much better. Do not be tempted into assigning the employee numbers, that are usually sequential, as user names. Letting people come up with aliases, nicknames is a really good idea then.

Let’s come back to the user ids now. For the user ids, unpredictability also matters. Suppose that there are methods in your application that take user id as a parameter to perform some operations, perhaps, viewing the user record. Then knowing the user id makes it easy to view user records and finding information about users and the system. And don’t count too much on your access control, it may fail some time and the only thing that will stand between your users and the attacker will be the unpredictability of those numbers. Remember, that’s what went wrong at Citibank. And, really, you have nothing to lose – the computer does not particularly care what kind of numbers it is crunching.

So, what’s a good strategy for the user ids? There is only one: the user ids must be random. They do not really need to be generated from cryptographically strong random numbers but they must be sufficiently random as to be unpredictable for an external observer a.k.a. the attacker. Oh, and UUIDs are not unpredictable, just in case you were wondering.

What’s a good strategy for user names? They must be unpredictable. Either they are already because you have a diverse user base and all you have to do is to force no additional scheme on them. Or your user base is uniform and then you must come up with a scheme to introduce unpredictability, the easiest being to use nicknames.

And that’s it for names, I think. Wasn’t that easy? ;)

For an extra bit of fun: Default Admin Username and Password list. May be useful, you know.

Why bother?

Hmm… Good question… Well, let’s get this straightened out before we jump into other interesting subjects. Every single website and application, every single computer system gets broken into. For fun, money, fame, accidentally. This is just the way it is and I have to accept this as the current reality. I may not like it but who cares about that?

Whether you are a large corporation or a student writing the first website, your system will get broken into. If your system has been around for a while, it was already broken into. My not-so-extremely-popular website was broken into already three times (that I know of) and I am not ashamed to admit it. Denial is futile. Take it as inevitable.

There is even a line of thought nowadays with some of the security people that we should not bother to concentrate so much on trying to protect things for we can’t prevent break-ins anyway. They say we should concentrate on detecting and containing the damage from break-ins. Ah, bollocks. We have to do both. Do not give up your defenses just because you know they will be eventually breached. But be prepared.

What I really want to say is that when you make a computer system, be it a website, corporate network, smart card or anything else, you have no choice. Thinking that security is somebody else’s problem is extremely common, second only to not thinking about security at all, and usually disastrous in a not-so-distant future. Don’t be like that. Come to the good side, protect your system, think of security long and hard, apply the Hash and the Crypto the Right Way™ and your system will run happily ever after (well, at least to the next major breakthrough in cryptography or something).

Welcome to “Holy Hash!”

This is a lighter software security blog. I start it now mainly because of two reasons.

First, something has to be done. The recent break-ins at the likes of LinkedIn and Yahoo show that even at the large companies people do not understand the basics of security. By looking at what is proposed and advised under the guise of security to people starting out to write their own web applications I understand that those are not far behind. Should their applications become famous, they will be broken as easily. There needs to be a place to discuss even the most basic things, so people do not keep making the same mistakes over and over again… like if it’s bloody Groundhog Day.

Second, why do we have to talk about software security always with a grave face? Yes, it is a serious subject but that does not warrant the long faces. Lighten up, people! Relax, let the Force flow. Have a break and make a joke. Security can be an entertaining subject. Let’s not make it appear harder than it is.

So here we are, something has to be done and it better be done with a smile. Or a grin… a smirk, a beam, a crack. Not with a frown. I will write my thoughts on software security, you are welcome to comment, make fun of, ask questions and generally have a good time.