Worst languages for software security

I was sent an article about program languages that generate most security bugs in software today. The article seemed to refer to a report by Veracode, a company I know well, to discuss what software security problems are out there in applications written in different languages. That is an excellent question and a very interesting subject for a discussion. Except that the article really failed to discuss anything, making instead misleading and incoherent statements about both old-school lnguages like C/C++ and the PHP scripting. I fear we will have to look into this problem ourselves then instead.

So, what languages are the worst when it comes to software security? Are they the old C and C++, like so many proponents of Java would like us to believe? Or are they the new and quickly developing languages with little enforcement of structure, like PHP? Let’s go to Veracode and get their report: “State of Software Security. Focus on Application Development. Supplement to Volume 6.

The report includes a very informative diagram showing what percentage of applications passes the OWASP policy for a secure application out of the box grouped by the language of the application. OWASP policy is defined as “not containing any of the security problems mentioned on the OWASP Top 10 most important vulnerabilities for web application” and OWASP is the accepted industry authority on web application security. So if they say that something is a serious vulnerability, you can be sure it is. Let’s look at the diagram:

Veracode OWASP by language 2016-01-18-01

Fully 60% of applications written in C/C++ come without those most severe software security vulnerabilities listed by OWASP. That is a very good result and a notable achievement. Next, down one and a half to two times, come the three mobile platforms. And the next actual programming language, .NET, comes out more than two times as bad! Java is 2 and a half times as bad as C/C++. The scripting languages are three times as bad.

Think about it. Applications written in Java are almost three times as likely to contain security vulnerabilities as those written in C/C++. And C/C++ is the only language that gives you a more than 50% chance of not having serious security vulnerabilities in your application.

Why is that?

The reasons are many. For one thing, Java has never delivered on its promises of security, stability and uniformity. People must struggle with issues that have been long resolved in other languages, like the idiotic memory management and garbage collection, reinventing the wheel on any more or less non-trivial piece of software. The language claims to be “easy” and “fool-proof” while letting people to compare string objects instead of strings with an equal operator unknowingly. The discrepancy between the fantasy and reality is huge in the Java world and getting worse all the time.

Still, the main reason, I think, is the quality of the developer: both the level of developer knowledge, expertise, as it were, and the sheer carelessness of the Java programmers. Where C/C++ developers are actually masters of the software development, the Java developers are most of the time just coders. That makes a difference. People learn Java in all sorts of courses or by themselves – companies constantly hire Java developers, so it makes sense to follow the market demand. Except that those people are kids with an ad-hoc knowledge of a programming language and absolutely no concept of software engineering. As opposed to that, most C/C++ people are actually engineers and they know much better what they are doing, even when they write things in a different language. But the “coders” are much cheaper than real engineers, so the companies developing in Java end up with lots of those and the software quality goes down the drain.

The difference in the quality of the software is easily apparent when you compare the diagrams for types of the issues detected mostly from the same report:

Veracode Problem Areas 2016-01-18

You can see that code quality problems are only 27% of the total number of issues detected in the case of C/C++ while for Java code the code quality issues represent the whopping 80% of total.

Think again. The code written in Java has several time worse quality than the code written in C/C++.

It is not surprising that the quality problems result in security vulnerabilities. Both quality and security go hand in hand and require discipline and knowledge on the part of developer. Where one suffers, the other inevitably does as well.

The conclusion: if you want secure software, you want C/C++. You definitely do not want Java. And even if you are stuck with Java, you still want to have C/C++ developers to write your Java code because they are more likely to write better and more secure software.

Passwords and other secrets in source code

key-under-matSecrets are bad. Secrets in source code are an order of magnitude worse.

Secrets are difficult to protect. Every attacker goes after the secrets and we must protect our secrets against all of them. The secrets are the valuable part of our software and that’s why they are bad – they represent an area of heightened risk.

What would a developer do when his piece of software needs to access a password protected server? That’s right, he will write the user name and the password into some constant and compile them into the code. If the developer wants to be clever and heard the word “security” before, he will base64 encode the password to prevent it from “being read by anyone”.

The bad news is, of course, that whoever goes through the code will be able to follow the algorithm and data through and recover the user name and password. Most of the software is available to anyone in its source form, so that is not a stretch to assume an attacker will have it as well. Moreover, with the current level of binary code scanning tools, they do not need the source code and do not need to do anything manually. The source and binary scanners pick out the user names and passwords easily, even when obscuring algorithms are used.

So, the password you store in the source code is readily available. It’s really like placing the key to your home under the doormat. It’s that obvious.

Now, you shipped that same code to every customer. That means that the same password works at every of those sites. Your customers and whoever else got the software in their hands can access all of the sites that have your software installed with the same password. And to top it off, you have no way of changing the password short of releasing a new version with the new password inside.

Interestingly, Facebook had this as one of their main messages to the attendees of the F8 Developers Conference: “Facebook security engineer Ted Reed offered security suggestions of a more technical nature. Reed recommended that conference attendees—particularly managers or executives that oversee software development—tell coders to remove any secret tokens or keys that may be lurking around in your company’s source code.”

Which means the story is far from over. Mainstream applications continue to embed the secrets into the source code defying the attempts to make our software world secure.

The thought of compiling the user names and passwords into the application should never cross your mind. If it does, throw it out. It’s one of those things you just don’t do.

Google bots subversion

There is a lot of truth in saying that every tool can be used by good and by evil. There is no point in blocking the tools themselves as the attacker will turn to new tools and subvert the very familiar tools in unexpected ways. Now Google crawler bots were turned into such a weapon to execute SQL injection attacks against websites chosen by attackers.

it_photo_76483_200x133The discussion of whether Google should or should not do anything about that is interesting but we are not going to talk about that. Instead, think that this is a prime case of a familiar tool that comes back to your website regularly subverted into doing something evil. You did not expect that to happen and you cannot just block the Google from your website. This is a perfect example of a security attack where your application security is the only way to stop the attacker.

The application must be written in such a way that it does not matter whether it is protected by a firewall – you will not always be able to block the attacks with the firewall. The application must also be written so that it withstands an unanticipated attack, something that you were not able to predict in advance would happen. The application must be prepared to ward off things that are not there yet at the time of writing. Secure design and coding cannot be replaced with firewalls and add-on filtering.

Only such securely designed and implemented applications withstand unexpected attacks.

Security Assurance vs. Quality Assurance

7033818-3d-abbild-monster-mit-investigate-linseIt is often debated how Quality assurance relates to Security assurance. I have a slightly unconventional view of the relation between the two.

You see, when we talk about the security assurance in software, I view the whole process in my head end to end. And the process runs roughly like this:

  • The designer has an idea in his head
  • The software design is a translation of that into a document
  • Development translates the design into the code
  • The code is delivered
  • Software is installed, configured and run

Security, in my view, is the process of making sure that whatever the designer was thinking about in his head ends up actually running at the customer site. The software must run exactly the way the designer imagined, that is the task.

Now, the software has to run correctly both under the normal circumstances and under really weird conditions, i.e. under attack. So the Quality Assurance takes the part of verifying that it runs correctly under normal circumstances while Security Assurance takes care of the whole picture.

Thus Quality Assurance becomes an integral part of Security Assurance.

Password recovery mechanisms – Part 3

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

If you did not read Part 1 – Secret questions and Part 2 – Secondary channel, I recommend you do so before reading on.

Part 3 – Example procedure: put it all together

Security - any lock matters as much as any other.

Let’s assume we are putting together a website and we will have passwords stored in a salted hash form and we identify the users with their e-mail address. I will describe what I think a good strategy for password recovery then is and you are welcome to comment and improve upon.

Since we have the users’ e-mail addresses, that is the natural secondary authentication channel. So if a user needs password recovery, we will use their e-mail to authenticate them. Here is how.

The user will come to a login page and clicks the link for “forgot password” or similar. They have to provide then an e-mail address. The form for e-mail address submission has to have means of countering automated exhaustive searches to both lower the load onto the server in case of an attack and provide some small level of discouragement against such attacks. There are two ways that come to mind: using a CAPTCHA and slowing down the form submission with a random large (an order of seconds) delay. Let’s not go into the holy war on CAPTCHA, you are welcome to use any other means you can think of and, please, suggest them so that others can benefit from your thoughts here. You should also provide an additional hidden field that will be automatically filled in by an automated form scanning robot, so you can detect that too and discard the request. Anyway, the important part is: slow down the potential attacker. The person going through recovery will not mind if it takes a while.

As the next step, we will look up the user e-mail address in the database, create a random token, mail it out and provide the feedback to the user. The feedback should be done in constant time, so that an attacker does not use your recovery mechanism to collect valid e-mail addresses from your website. The process thus should take the same time whether you found the user or not. This is difficult to get right and the best solution is to store the request for off-line processing and return immediately. Another way is to use the user names instead and look up the e-mail address but a user is more likely to know their own e-mail address than remember their user name, so there is a caveat. If you cannot (or would not) do off-line processing of requests, you should at least measure your server and try to get the timing similar with delays. The timing of the server can be measured fairly precisely and this is difficult to get right, especially under fluctuating load but you must give it a try. Still, it’s best if you keep the submitted information and trigger an off-line processing while reporting to user something along the lines of “if your e-mail is correct, you will receive an automated e-mail with instructions within an hour”. The feedback should never say whether the e-mail is correct or not.

Now we generate a long, random, cryptographically strong token. It must be cryptographically strong because the user may actually be an attacker and if he can guess how we generate tokens and can do the same, he will be able to reset passwords for arbitrary users. We generate the token, format it in a way that can be e-mailed (base64 encoding, hex, whatever) and store it in a database together with a timestamp and the origin (e-mail address). The same token is then e-mailed to the e-mail address of the user.

The user receives the token, comes to the website and goes to the form for token verification. Here he has to enter his e-mail address again, of course, the token, and the new password. In principle, some measure against the automated searches is in order here too, to lower the load on the server in case of an attack. The tokens are verified against our database and then the e-mail is checked too. If we see a token, we remove it from the database anyway, then we check if the e-mail matches and we continue only if it does. This way, tokens are single use: once we see a token, it is removed from the database and cannot be used again.

Tokens also expire. We must have a policy at our server that sets the expiration period. Let’s say, that is 24 hours. Before we do any look up in our token database, we perform a query that removes all tokens with a creation timestamp older than 24 hours ago. That way, any token that expires is gone from the database when we start looking.

Well, now, if the token matches and e-mail is correct, we can look up the user in our passwords database and update the password hash to the new value. Then, flush the authentication tokens and session identifiers for the user, forcing logout of all preexisting sessions. Simple.

Password recovery mechanisms – Part 2

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

If you did not read Part 1 – Secret questions, I recommend you do so before reading on.

Part 2 – Secondary channel

A second way to do recovery is to use a secondary channel for authentication. Once authenticated on this secondary channel, the password for the primary channel can be changed. The secondary channel may be slower and more cumbersome but since it is used rarely it is not a problem.

You could ask the person to call user support. The user support would ask some questions for personal information and compare the answers with what they have on file. That would effectively reduce the system to the “secret questions” described in Part 1. There are better (and cheaper) ways to do it.

Historically, the server usually stores the e-mail address of the user provided at registration. That is what becomes the secondary channel. Although it is still over the Internet, but capturing the e-mails on their way to the intended recipient is not a trivial task unless you control one of the nodes through which the e-mail would be routed.

Originally the passwords were stored in plaintext at the server and the user could request the password to be e-mailed. Some services still operate like that. The notorious Mailman list server e-mails you your plaintext password once a month in case you forgot it. That is a convenient way but has a bit of a security problem, of course. Should the password database be recovered by an attacker, all passwords to all accounts are immediately known. On the other hand, it has the advantage that user passwords are not really changed, so if someone requests a password reminder, the original subscriber will receive an e-mail and that’s all.

The inventive thought then went to the idea of hashing the passwords for storage, which is a great idea in itself and protects the passwords in case the database gets stolen. It has a side effect that suddenly the password is not known to the server anymore. Only the hash is. That is sufficient for the authentication but isn’t very helpful if you want to mail out a password reminder. So, someone had a bright idea that the password reminder should become a password reset. And what they did is: when a user requests, the server generates a new password, sends it to the user, and changes the hash in the database to the new password’s hash. All secure and … very prone to the denial of service attacks. Basically, anyone may now request a password reset for any users at will and that user’s password will get changed. Very annoying.

So we went further and decided that changing the password is not such a good idea. What we do then is make a separate database of single-use tokens. When a user requests a password change, we generate a unique random token, keep the token in the database and send it out to the user. If user did not request a token, the user need not react, the password was not changed and the token will harmlessly expire some time later. When the user needs a password change, he provides the token back to the service in a password change form (or through a clicked URL) and that allows us to perform this secondary authentication and then change the primary password. And that’s the way to do it.

There are variations where the secondary channel can be an SMS, an automated telephone call, or even an actual letter from the bank. But the important thing is that those messages only provide a token that verifies your identity on the secondary channel before allowing a security relevant operation on the primary channel.

Next, we will look at an example procedure for a website in Part 3.

Password recovery mechanisms – Part 1

Passwords remain the main means of authentication on the internet. People often forget their passwords and then they have to recover their access to the website services through some kind of mechanism. We try to make that so-called “password recovery” simple and automated, of course. There are several ways to do it, all of them but one are wrong. Let’s see how it is done.

Part 1 – Secret questions

A widespread mechanism is to use so-called “secret questions”. This probably originates with the banks and their telephone service where they ask you several questions to compare your knowledge of personal information with what they have on file. In the times before the internet this was a fair mechanism since coming up with all the personal information was a tough task that often required physically going there and rummaging through the garbage cans to find out things. Still, some determined attackers would do precisely that – dumpster diving – and could gain access to the bank accounts even in those times.

Right now this mechanism is, of course, total fallacy. The internet possesses so much information about you … It is hard to imagine that questions about your private life would remain a mystery to an attacker for long. Your birthday, your dog, your school and schoolmates, your spouse and your doctor – they are all there. It is hard to come up with a generic question that would be suitable to everyone and at the same time would not have the answer printed on your favorite social network page.

And even if it is not. Imagine that the secret question is “what’s your dog’s name?” How many dog names are there? Not as many as letter combinations in a password. And the most common dog names are probably only a handful. So it is by far much easier to brute force a security question than a password.

This mechanism of secret questions and answers is antiquated and should not be used.

There is a variation where you have to provide your own question and your own answer. This is not better. Most people will anyway tend to pick up the obvious questions. The attacker will see the question and can dig for information. The answer will usually be that one word that is easy to brute force. So, no good.

And, by the way, what should you do when you are presented with this folly on a website you use? Provide a strong password instead of the answer. Store that password in whichever way you store all the other recovery passwords. All other rules for password management apply.

So much for secret questions. In the next part, we will see how to do password recovery with a secondary channel.

Coverity reports on Open Source

Coverity is running a source code scan project started by U.S. Department of Homeland Security in 2006, a Net Security article reports. They published their report on quality defects recently pointing out some interesting facts.

Coverity is a lot into code quality but they also report security problems. On the other hand, any quality problem is easily a security problem under the right (or, rather, wrong) circumstances. So the report is interesting for its security implications.

The Open Source is notably better at handling quality than corporations. Apparently, the corporation can achieve the same level of quality as Open Source by going with Coverity tools. An interesting marketing twist, but, although the subject of Open Source superiority has been beaten to death, this deals the issue another blow.

Another interesting finding is that the corporations only get better at code quality after the size of the project goes beyond 1 million of lines of code. This is not so surprising and it is good to have some data backing up the idea that corporate coders are either not motivated or not professional to write good code without some formalization of the code production, testing and sign-off.

This is the necessary evil that hinders productivity at first but ensures an acceptable level of quality later.

Exodus from Java

Finally the news that I was subconsciously waiting for: the exodus of companies from Java has started. It does not come as a surprise at all. Java has never fulfilled the promises it had at the beginning. It did not provide any of the portability, security and ease of programming. I am only surprised it took so long, although knowing full well that companies’ managers routinely optimize for their own career and bonuses that does not come as a shock either.

For those not in the know, the gist of the problem is that Java promised at all times to provide some sort of “inherent security”. That is, no matter how bad you write the code, it will still be more secure that the code written in C or other advanced high-level algorithmic languages. Java claimed absence of buffer overflows, null pinter dereferences and similar problems, which all turned out to be not true after all. And it had a very important consequence.

The consequence is that anyone writing in Java or learning it is subconsciously aware of that promise. So people tend to relax and allow themselves to be sloppy. So the code written in C ends up being tighter, more organized, and more secure than the code written in Java. And the developers in C tend to be on average better educated in the intricacies of software development and more aware of potential pitfalls. And that makes a huge difference.

So, the punch line is, if you want something done well, forget Java.

SAMATE Reference Dataset

Through the  news we can become alerted to many interesting things and one of the recent useful bits is the SAMATE Reference Dataset built by NIST Software Assurance Metrics And Tool Evaluation project. Should you need information on common vulnerabilities test cases, the database has more than 80,000 test cases by now.

From the project website:samate

The purpose of the SAMATE Reference Dataset (SRD) is to provide users, researchers, and software security assurance tool developers with a set of known security flaws. This will allow end users to evaluate tools and tool developers to test their methods. These test cases are designs, source code, binaries, etc., i.e. from all the phases of the software life cycle. The dataset includes “wild” (production), “synthetic” (written to test or generated), and “academic” (from students) test cases. This database will also contain real software application with known bugs and vulnerabilities. The dataset intends to encompass a wide variety of possible vulnerabilities, languages, platforms, and compilers. The dataset is anticipated to become a large-scale effort, gathering test cases from many contributors

Isn’t it good when you do not need to reinvent the wheel?

Posts navigation

1 2