Data breach at LinkedIn

linkedin-default-shareApparently, there was a serious data breach at LinkedIn and many customer records were stolen including “member email addresses, hashed passwords, and LinkedIn member IDs”. LinkedIn sent out a notification informing that the passwords were invalidated. What is interesting in the note is that they included a cryptic note that the break-in was “not new”. What could they mean by that?

On May 17, 2016, we became aware that data stolen from LinkedIn in 2012 was being made available online. This was not a new security breach or hack. We took immediate steps to invalidate the passwords of all LinkedIn accounts that we believed might be at risk. These were accounts created prior to the 2012 breach that had not reset their passwords since that breach.

I can take a wild guess that they passwords prior to 2012 were stored either unencrypted, without salt, or using some very weak algorithm. The security breach itself was, of  course, “new” but the only information at risk are those passwords in the database that were stored in this old-fashioned way.

So, according to my wild guess, there must be more information stolen than they tell us but LinkedIn judged that the only information that threatens themselves were those old passwords so they finally invalidated them (what they should have done back in 2012) and told us they are happy with it.Unfortunately, there is no way to know for sure.

You can make your own wild guess at what happened.

Cloud security

Let’s talk a little about the very popular subject nowadays – the so-called ‘cloud security’. Let’s determine what it is, what we are talking about, in fact, and see what may be special about it.

Magnificent cumulonimbus clouds

‘Cloud’ – what is it? Basically, the mainframes have been doing ‘cloud’ all along, for decades now. Cloud is simply a remotely accessible computer with shared resources. Typically, what most people ‘get from the cloud’ is either a file server with a certain amount of redundancy and fault tolerance, a web service with some database resources attached, or a virtual machine (VM) to do with as they please. Yes, it is all that simple. Even the most hyped-up services, like Amazon, boil down to these things.

So when you determine the basics, you are then talking about three distinctly different types of operation: a file server, a web/database application and a VM. The three have different security models and can be attacked in completely different ways. So it does not really make sense to speak about ‘cloud security’ as such. It only makes sense to speak about the security of a particular service type. And all three of them have been studied in depth and have the defenses worked out in detail.

Mind you, there is also another type of ‘cloud security’ – the security of the data center itself, where the physical computers accessible through the physical network are. And the security of data centers is an important subject of interest to the operators of those data centers. At the same time, consumers of the services rarely are concerned with that type of security, assuming (sometimes without a good cause) that the data center took good care of its security.

So, from the point of view of the developer or consumer of services it makes sense to talk about three types of security in three different security models that are fairly well understood and were analyzed many times over the decades. Not using that experience of, first, the mainframe developers, and then, of the open systems developers, is at the very least irresponsible.

Speaking of passwords…

Wouldn’t it be quite logical to talk about passwords after user names? Most certainly. Trouble is, the subject is very, very large. Creating, storing, transmitting, verifying, updating, recovering, wiping… Did I get all of it? It is going to take a while to get through all of that, do you reckon? Let’s split the subject and talk about password storage now, as the subject that comes most often in the security discussions and in the news.

Speaking of which, some recent break-ins if you were not keeping track:

"Enter Password"LinkedIn  – 6.5 million passwords stolen, Yahoo – 450 thousand passwords stolen, Android Forums – 1 million, Last.fm – 8 million, Nvidia – 400 thousand, eHarmony – 1.5 million, Billabong – 21 thousand, TechRadar … the list is going on and on.

Out of 8 million passwords in LinkedIn and Last.fm breach, “It took a user on the forum less than 2½ hours to crack 1.2 million of the hashed passwords, Ars Technica reported.”

Oops. Is that supposed to be so easy? Actually… no.

There are few easy rules for storing the passwords. First of all, never store passwords in clear, unencrypted, like Billabong did. You remember that any and every system was or will eventually be broken into. You have to assume that your password database will fall into wrong hands sooner or later. Your password database has to be prepared for that eventuality to look good in the eyes of the press.

So, when your password database is in the hands of the attackers, it has to defend itself. A database full of unencrypted passwords does not provide any defense of course. What about an encrypted database?

Well, since you have to be able to use the database, you have to decrypt it when you need it. So the system will have the key to the database somewhere. Since the attacker got hands onto the database, there is no reason why the attacker should not get the encryption keys at the same time. So this is definitely not improving the situation.

Secure hashes (as in the name of this blog) are the ultimate answer. The important thing about the hashes is that they do not require a use of a key and they can be easily computed only one way: from the clear piece of information into the hash. They cannot be reversed, one cannot easily compute the original piece of information from the hash. That’s why they are called one-way hashes.

The hashes were invented a long time ago and they were improving over the years. The old hashes are not secure anymore with the increases in the computing power. That’s what they talked about when they referred to recovering the plain text passwords – they computed passwords that will result in the hash that is in the database.

Finding the passwords then given a database of password hashes boils down to taking a password, computing its hash according to the algorithm used, and comparing it to the hashes stored in the database. When a match is found – we have a good password. This is where the cost of computing the hashes comes in. Older hashes are much faster, newer hashes are much slower. With the advent of rental cloud computing services this is becoming a small distinction though. All SHA-1 passwords of up to 6 characters in length could be brute forced in 49 minutes with the help of Amazon EC2 for a cost of $2 two years ago. And it’s getting cheaper and faster. So here is where the speed matters but it has the opposite effect. The hash, to be secure, must be a very, very slow one. Almost too slow to be useful at all would be a good start.

Even if the computer systems weren’t getting blistering fast compared to the blistering fast of five years ago all the time, a workaround was figured a long time ago. If you are prepared to invest in some large storage, you can compute slowly but surely an enormous amount of hashes and keep them somewhere. When the time comes, you just have to go and compare the hashes you computed in advance to the given hashes in the password database. This is called using rainbow tables. And it’s bloody effective.

Ok, ok, it is not all that gloomy. This fight is an old one and we have defenses. A very effective measure against the rainbow tables is to use a cryptographic salt. A salt is an additional piece of data supplied to the hash function together with the password. Since the attacker did not know the salt in advance, precomputed rainbow tables suddenly become useless. Great. Unfortunately, many sites use a fixed salt that is generated once and set in stone. This effectively makes rainbow tables useful again. One just has to compute them once with that salt again for the whole database. So the salt, to be useful, must be generated new for every password and stored together with the password.

So, finally, the answer is simple: a cryptographically strong, contemporary (as in “very, very slow”) one-way hash function with a randomly generated salt for every password. And anything deviating from that is just plain tomfoolery.