Data breach at LinkedIn

linkedin-default-shareApparently, there was a serious data breach at LinkedIn and many customer records were stolen including “member email addresses, hashed passwords, and LinkedIn member IDs”. LinkedIn sent out a notification informing that the passwords were invalidated. What is interesting in the note is that they included a cryptic note that the break-in was “not new”. What could they mean by that?

On May 17, 2016, we became aware that data stolen from LinkedIn in 2012 was being made available online. This was not a new security breach or hack. We took immediate steps to invalidate the passwords of all LinkedIn accounts that we believed might be at risk. These were accounts created prior to the 2012 breach that had not reset their passwords since that breach.

I can take a wild guess that they passwords prior to 2012 were stored either unencrypted, without salt, or using some very weak algorithm. The security breach itself was, of  course, “new” but the only information at risk are those passwords in the database that were stored in this old-fashioned way.

So, according to my wild guess, there must be more information stolen than they tell us but LinkedIn judged that the only information that threatens themselves were those old passwords so they finally invalidated them (what they should have done back in 2012) and told us they are happy with it.Unfortunately, there is no way to know for sure.

You can make your own wild guess at what happened.

They never learn password security: Domino Pizza

Domino-PizzaFrance and Belgium Domino Pizza password database was stolen by the hackers of Rex Mundi. They require a 30,000 euro payment to avoid disclosure. Well, Domino Pizza went to police, so the 592,000 French and 58,000 Belgian customer records will be in the open tonight.

What is interesting though? This is 2014. Do you know what they used to store passwords? They used MD5 without salt or stretching. Like if the previous 20 years never happened in their computer universe. We keep reiterating the good ways of storing passwords over and over again and nobody listens.

Domino said in a statement: ‘The security of customer information is very important to us. We regularly test our UK website for penetration as part of the ongoing rigorous checks and continual routine maintenance of our online operations.’

I can feel sympathy to the challenges of securing a large network where you have to get many things right. If you only do penetration and ignore all the other things, you will end up with a false sense of security. Judging by them not knowing how to store the passwords, that’s exactly what happened. Penetration testing can supplement but never replace an all-encompassing security program.

The only upside is that Domino apparently stored credit card details separately and the financial data has not fallen into hackers’ hands. So, that’s good.

domino-pizza-ransom

More: at The Guardian, Daily Mail, The Register.

Password recovery mechanisms – Part 3

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

If you did not read Part 1 – Secret questions and Part 2 – Secondary channel, I recommend you do so before reading on.

Part 3 – Example procedure: put it all together

Security - any lock matters as much as any other.

Let’s assume we are putting together a website and we will have passwords stored in a salted hash form and we identify the users with their e-mail address. I will describe what I think a good strategy for password recovery then is and you are welcome to comment and improve upon.

Since we have the users’ e-mail addresses, that is the natural secondary authentication channel. So if a user needs password recovery, we will use their e-mail to authenticate them. Here is how.

The user will come to a login page and clicks the link for “forgot password” or similar. They have to provide then an e-mail address. The form for e-mail address submission has to have means of countering automated exhaustive searches to both lower the load onto the server in case of an attack and provide some small level of discouragement against such attacks. There are two ways that come to mind: using a CAPTCHA and slowing down the form submission with a random large (an order of seconds) delay. Let’s not go into the holy war on CAPTCHA, you are welcome to use any other means you can think of and, please, suggest them so that others can benefit from your thoughts here. You should also provide an additional hidden field that will be automatically filled in by an automated form scanning robot, so you can detect that too and discard the request. Anyway, the important part is: slow down the potential attacker. The person going through recovery will not mind if it takes a while.

As the next step, we will look up the user e-mail address in the database, create a random token, mail it out and provide the feedback to the user. The feedback should be done in constant time, so that an attacker does not use your recovery mechanism to collect valid e-mail addresses from your website. The process thus should take the same time whether you found the user or not. This is difficult to get right and the best solution is to store the request for off-line processing and return immediately. Another way is to use the user names instead and look up the e-mail address but a user is more likely to know their own e-mail address than remember their user name, so there is a caveat. If you cannot (or would not) do off-line processing of requests, you should at least measure your server and try to get the timing similar with delays. The timing of the server can be measured fairly precisely and this is difficult to get right, especially under fluctuating load but you must give it a try. Still, it’s best if you keep the submitted information and trigger an off-line processing while reporting to user something along the lines of “if your e-mail is correct, you will receive an automated e-mail with instructions within an hour”. The feedback should never say whether the e-mail is correct or not.

Now we generate a long, random, cryptographically strong token. It must be cryptographically strong because the user may actually be an attacker and if he can guess how we generate tokens and can do the same, he will be able to reset passwords for arbitrary users. We generate the token, format it in a way that can be e-mailed (base64 encoding, hex, whatever) and store it in a database together with a timestamp and the origin (e-mail address). The same token is then e-mailed to the e-mail address of the user.

The user receives the token, comes to the website and goes to the form for token verification. Here he has to enter his e-mail address again, of course, the token, and the new password. In principle, some measure against the automated searches is in order here too, to lower the load on the server in case of an attack. The tokens are verified against our database and then the e-mail is checked too. If we see a token, we remove it from the database anyway, then we check if the e-mail matches and we continue only if it does. This way, tokens are single use: once we see a token, it is removed from the database and cannot be used again.

Tokens also expire. We must have a policy at our server that sets the expiration period. Let’s say, that is 24 hours. Before we do any look up in our token database, we perform a query that removes all tokens with a creation timestamp older than 24 hours ago. That way, any token that expires is gone from the database when we start looking.

Well, now, if the token matches and e-mail is correct, we can look up the user in our passwords database and update the password hash to the new value. Then, flush the authentication tokens and session identifiers for the user, forcing logout of all preexisting sessions. Simple.

Password recovery mechanisms – Part 2

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

If you did not read Part 1 – Secret questions, I recommend you do so before reading on.

Part 2 – Secondary channel

A second way to do recovery is to use a secondary channel for authentication. Once authenticated on this secondary channel, the password for the primary channel can be changed. The secondary channel may be slower and more cumbersome but since it is used rarely it is not a problem.

You could ask the person to call user support. The user support would ask some questions for personal information and compare the answers with what they have on file. That would effectively reduce the system to the “secret questions” described in Part 1. There are better (and cheaper) ways to do it.

Historically, the server usually stores the e-mail address of the user provided at registration. That is what becomes the secondary channel. Although it is still over the Internet, but capturing the e-mails on their way to the intended recipient is not a trivial task unless you control one of the nodes through which the e-mail would be routed.

Originally the passwords were stored in plaintext at the server and the user could request the password to be e-mailed. Some services still operate like that. The notorious Mailman list server e-mails you your plaintext password once a month in case you forgot it. That is a convenient way but has a bit of a security problem, of course. Should the password database be recovered by an attacker, all passwords to all accounts are immediately known. On the other hand, it has the advantage that user passwords are not really changed, so if someone requests a password reminder, the original subscriber will receive an e-mail and that’s all.

The inventive thought then went to the idea of hashing the passwords for storage, which is a great idea in itself and protects the passwords in case the database gets stolen. It has a side effect that suddenly the password is not known to the server anymore. Only the hash is. That is sufficient for the authentication but isn’t very helpful if you want to mail out a password reminder. So, someone had a bright idea that the password reminder should become a password reset. And what they did is: when a user requests, the server generates a new password, sends it to the user, and changes the hash in the database to the new password’s hash. All secure and … very prone to the denial of service attacks. Basically, anyone may now request a password reset for any users at will and that user’s password will get changed. Very annoying.

So we went further and decided that changing the password is not such a good idea. What we do then is make a separate database of single-use tokens. When a user requests a password change, we generate a unique random token, keep the token in the database and send it out to the user. If user did not request a token, the user need not react, the password was not changed and the token will harmlessly expire some time later. When the user needs a password change, he provides the token back to the service in a password change form (or through a clicked URL) and that allows us to perform this secondary authentication and then change the primary password. And that’s the way to do it.

There are variations where the secondary channel can be an SMS, an automated telephone call, or even an actual letter from the bank. But the important thing is that those messages only provide a token that verifies your identity on the secondary channel before allowing a security relevant operation on the primary channel.

Next, we will look at an example procedure for a website in Part 3.

Password recovery mechanisms – Part 1

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

Part 1 – Secret questions

A widespread mechanism is to use so-called “secret questions”. This probably originates with the banks and their telephone service where they ask you several questions to compare your knowledge of personal information with what they have on file. In the times before the internet this was a fair mechanism since coming up with all the personal information was a tough task that often required physically going there and rummaging through the garbage cans to find out things. Still, some determined attackers would do precisely that – dumpster diving – and could gain access to the bank accounts even in those times.

Right now this mechanism is, of course, total fallacy. The internet possesses so much information about you … It is hard to imagine that questions about your private life would remain a mystery to an attacker for long. Your birthday, your dog, your school and schoolmates, your spouse and your doctor – they are all there. It is hard to come up with a generic question that would be suitable to everyone and at the same time would not have the answer printed on your favorite social network page.

And even if it is not. Imagine that the secret question is “what’s your dog’s name?” How many dog names are there? Not as many as letter combinations in a password. And the most common dog names are probably only a handful. So it is by far much easier to brute force a security question than a password.

This mechanism of secret questions and answers is antiquated and should not be used.

There is a variation where you have to provide your own question and your own answer. This is not better. Most people will anyway tend to pick up the obvious questions. The attacker will see the question and can dig for information. The answer will usually be that one word that is easy to brute force. So, no good.

And, by the way, what should you do when you are presented with this folly on a website you use? Provide a strong password instead of the answer. Store that password in whichever way you store all the other recovery passwords. All other rules for password management apply.

So much for secret questions. In the next part, we will see how to do password recovery with a secondary channel.

On the utility of technical security

It is often said that the system is only as strong as the weakest link. When you have good security and strong passwords, the weakest link will be the human. As has always been. Think of how the system can be recovered from a breach when the problem is not technical but human.

[youtube=http://youtu.be/W50L4UPfWsg]

Common passwords blacklist

Any system that implements password authentication must check whether the passwords are not too common. Every system faces the brute-force attacks that try one or another list of most common password (and usually succeed, by the way). The system must have a capability to slow down an attacker by any means available: slowing down system response every time an unsuccessful authentication is detected, blocking an account for a short time after a number of unsuccessful authentication attempts or throwing up captchas.

Your password is not long enoughHowever, even the most sophisticated system fails if the user’s password is the most common word: “password”. The attacker simply succeeds then at once because that is likely to be the first word tried. So we need a system for blacklisting passwords that are thought of as most likely to be tried in a dictionary brute-force attack. This may be annoying for users of the system who may prefer to use a simple word as a password but this is the reality – any simple word used as a password is likely to be a security hole and must be banned.

While implementing the user login plugin for CakePHP I came across this simple question. Where do we get the password lists to check the newly entered passwords against? And here is a resource I can recommend: 62K Common Passwords by InfoSec Daily. Depending on your system’s speed you could use a smaller file of 6 MB, a 1.5 GB file that should take care of most common passwords or fuse the files into your own list.

IEEE should be embarrassed

The world’s largest professional association for the advancement of technology” has been thoroughly embarrassed in an accident where they left their log files containing user names and passwords open for FTP access to all on the Net for more than a month, according to a DarkReading report. Or, at least, I think they should be embarrassed although they do not seem to be very.

The data for at least 100 000 members were exposed and IEEE took care to close the access. However, having access to the log files is not what I think they should be embarrassed about. As the things go, mistakes in configuration happen and files may become exposed. That’s just life.

However, what is really troublesome is that IEEE, the “world’s largest professional association for the advancement of technology” (according to themselves), has logged the usernames together with passwords in plaintext. I mean, we know that’s bad, and that’s been bad for at least a couple of decades. They are definitely at least a couple of decades behind on good security practices. I think that’s really embarrassing.

Password storage in summary

We discussed the password storage in the article Speaking of passwords…and concluded that password implementation requires a cryptographically strong, contemporary (as in “very, very slow”) one-way hash function with a randomly generated salt for every password.

This is pretty much all you need to take care of. Salting is fairly straight-forward but it is essential to make sure it always works. Achieving a good balance between the slowness of the hashing algorithm for the attacker and an acceptable user performance is just a bit more involved but the things like key stretching techniques have been around for literally ages now too.

It is rumored that Thomas Ptacek once said:

What have we learned? We learned that if it’s 1975, you can set the ARPANet on fire with rainbow table attacks. If it’s 2007, and rainbow table attacks set you on fire, we learned that you should go back to 1975 and wait 30 years before trying to design a password hashing scheme.

We learned that if we had learned anything from this blog post, we should be consulting our friends and neighbors in the security field for help with our password schemes, because nobody is going to find the game-over bugs in our MD5 schemes until after my Mom’s credit card number is being traded out of a curbside stall in Tallinn, Estonia.

We learned that in a password hashing scheme, speed is the enemy. We learned that MD5 was designed for speed. So, we learned that MD5 is the enemy. Also Jeff Atwood and Richard Skrenta.

Finally, we learned that if we want to store passwords securely we have three reasonable options: PHK’s MD5 scheme, Provos-Maziere’s Bcrypt scheme, and SRP. We learned that the correct choice is Bcrypt.

And I think that is a great summary.

Speaking of passwords…

Wouldn’t it be quite logical to talk about passwords after user names? Most certainly. Trouble is, the subject is very, very large. Creating, storing, transmitting, verifying, updating, recovering, wiping… Did I get all of it? It is going to take a while to get through all of that, do you reckon? Let’s split the subject and talk about password storage now, as the subject that comes most often in the security discussions and in the news.

Speaking of which, some recent break-ins if you were not keeping track:

"Enter Password"LinkedIn  – 6.5 million passwords stolen, Yahoo – 450 thousand passwords stolen, Android Forums – 1 million, Last.fm – 8 million, Nvidia – 400 thousand, eHarmony – 1.5 million, Billabong – 21 thousand, TechRadar … the list is going on and on.

Out of 8 million passwords in LinkedIn and Last.fm breach, “It took a user on the forum less than 2½ hours to crack 1.2 million of the hashed passwords, Ars Technica reported.”

Oops. Is that supposed to be so easy? Actually… no.

There are few easy rules for storing the passwords. First of all, never store passwords in clear, unencrypted, like Billabong did. You remember that any and every system was or will eventually be broken into. You have to assume that your password database will fall into wrong hands sooner or later. Your password database has to be prepared for that eventuality to look good in the eyes of the press.

So, when your password database is in the hands of the attackers, it has to defend itself. A database full of unencrypted passwords does not provide any defense of course. What about an encrypted database?

Well, since you have to be able to use the database, you have to decrypt it when you need it. So the system will have the key to the database somewhere. Since the attacker got hands onto the database, there is no reason why the attacker should not get the encryption keys at the same time. So this is definitely not improving the situation.

Secure hashes (as in the name of this blog) are the ultimate answer. The important thing about the hashes is that they do not require a use of a key and they can be easily computed only one way: from the clear piece of information into the hash. They cannot be reversed, one cannot easily compute the original piece of information from the hash. That’s why they are called one-way hashes.

The hashes were invented a long time ago and they were improving over the years. The old hashes are not secure anymore with the increases in the computing power. That’s what they talked about when they referred to recovering the plain text passwords – they computed passwords that will result in the hash that is in the database.

Finding the passwords then given a database of password hashes boils down to taking a password, computing its hash according to the algorithm used, and comparing it to the hashes stored in the database. When a match is found – we have a good password. This is where the cost of computing the hashes comes in. Older hashes are much faster, newer hashes are much slower. With the advent of rental cloud computing services this is becoming a small distinction though. All SHA-1 passwords of up to 6 characters in length could be brute forced in 49 minutes with the help of Amazon EC2 for a cost of $2 two years ago. And it’s getting cheaper and faster. So here is where the speed matters but it has the opposite effect. The hash, to be secure, must be a very, very slow one. Almost too slow to be useful at all would be a good start.

Even if the computer systems weren’t getting blistering fast compared to the blistering fast of five years ago all the time, a workaround was figured a long time ago. If you are prepared to invest in some large storage, you can compute slowly but surely an enormous amount of hashes and keep them somewhere. When the time comes, you just have to go and compare the hashes you computed in advance to the given hashes in the password database. This is called using rainbow tables. And it’s bloody effective.

Ok, ok, it is not all that gloomy. This fight is an old one and we have defenses. A very effective measure against the rainbow tables is to use a cryptographic salt. A salt is an additional piece of data supplied to the hash function together with the password. Since the attacker did not know the salt in advance, precomputed rainbow tables suddenly become useless. Great. Unfortunately, many sites use a fixed salt that is generated once and set in stone. This effectively makes rainbow tables useful again. One just has to compute them once with that salt again for the whole database. So the salt, to be useful, must be generated new for every password and stored together with the password.

So, finally, the answer is simple: a cryptographically strong, contemporary (as in “very, very slow”) one-way hash function with a randomly generated salt for every password. And anything deviating from that is just plain tomfoolery.