Heartbleed: What can be Learned from Playing the “Blame Game”?
By now, everybody who hasn’t been living under a rock since April 7th this year has heard of Heartbleed. Most know that it is a devastating blow to security which can lead to the loss of a wealth of sensitive information from affected servers and that vulnerable machines were ubiquitous at the time of release. Many know that it affects versions of OpenSSL deployed over the last couple of years, while fewer know that it was a buffer over-read vulnerability extracting information from an area of process memory known as the “heap”.
For me, the most fascinating thing about Heartbleed is that it is so banal. Such a mundane, textbook, well understood and well-worn coding error. A payload (message) sent by the client must be reflected back by the server, telling the server that the length is longer than the payload really was, leading it to read and reflect the message plus whatever was hanging around in nearby memory. What most people don’t seem to appreciate is that this isn’t some extraordinary deeply hidden and fiendishly difficult vulnerability to exploit or detect; once any researcher who can use a fuzzer turns their hand to the Heartbeat extension, they should surely spot the vulnerabilty … and yet, it went for two years in open source software until this finally happened.
So the question is, “how did this happen?” or the synonymous “who is to blame?”. This isn’t about going after people or shaming them. It is important because it highlights a slew of mistakes which can be fixed by moving forward, in the hope that they can be avoided in the future.
Robin Seggelmann was instrumental both in the protocol design and the implementation in OpenSSL, therefore many have jumped on blaming and accusing him for all this mess. The fact is he is a long standing contributor to OpenSSL and tries to help improve it, which is more than can be said for almost all users of the software who simply take it for granted. The more one helps, the further their head is above the parapet and the more likely they are to eventually make mistakes. For this reason alone, the community needs to come down hard on anyone with the knee-jerk reaction of blaming the team, as this will only make others think twice about contributing and will exacerbate the current situation.
I believe this bug really first began its life in February 2012 with the submission of RFC-6520 by Robin Seggelmann et. al., which detailed an extension to the TLS and DTLS protocols (modern SSL). This describes a “keep-alive” feature for these protocols, essentially allowing the bug to be more efficient in terms of CPU and Bandwidth costs in several circumstances. Section 4 details the messages themselves: the three important fields are the payload_length, payload, and padding.
Possible Blame #1
Section four mandates that the padding (filling to make the packet the right size) be random content. The implication of this is that the receiving server cannot automatically discriminate padding from the message, and therefore the payload_length parameter was also required. This is the first seed of the vulnerability, making it logically much easier for the programmer to trust the payload_length which is controlled by an attacker. Had the requirement been to pad with zeros, and that the payload contain no zeros, the payload could just be read up to the first null and that would be that. No space for an attacker to manipulate a length parameter and much harder for a programmer to make the mistake.
If the pros of random padding really do outweigh null padding, then at least the RFC could have said “If the payload_length is longer than payload minus the padding, the HeartbeatMessage MUST be discarded”. Wording this is actually a little more complicated, because padding has a minimum length but no maximum length in the specification. But you get the idea. Essentially the specification itself forgot about this problem, so it is no surprise that so too did a programmer.
Possible Blame #2
Seggelmann then endeavoured to implement this new protocol extension in OpenSSL, helping to keep this free product at the leading edge of implementations. This patch was one of several which were approved after review by Dr Stephen Henson. Much speculation has been made about the irrelevant fact that these patches were finally approved in the last minute before the new year. The fact is, most of the code and probably the heartbeat implementation was reviewed well before this.
Can we blame Henson for not spotting this? Perhaps, but anyone who has reviewed code knows that mistakes can be made, especially when appreciating the vast undertaking that code review actually is. Added to this, the offending line employs a C macro to grab and copy the data which has been used all over the OpenSSL code base, so would be unlikely to be focused upon. You’d have to be incredibly smart and lucky to spot this from a dead-listing of the code without knowing you’re looking for it, unless you have an awful lot of time on your hands.
So who or what can we blame? I think this is best laid at the feet of the process. Code review is one way to spot issues, but is not the only way and in this case almost certainly not the best way. Had the process incorporated rigorous unit tests and protocol fuzzing, the buffer over-read would have almost certainly been found automatically. I would hazard that most of the OpenSSL project team already thinks this would be a great idea, if only they had the man-power available. Unfortunately despite the massive scale on which this software is deployed for commercial gain, almost none of the companies who rely on it bother to provide any time or money towards the project. According to Steve Marquess, the contributor handling business aspects of the OpenSSL Software Foundation, donations are typically just $2000 a year, which takes us on to the next point.
Possible Blame #3
Facebook, Instagram, Pinterest, Tumblr, Google, Yahoo, Amazon Web Services, GoDaddy, Flickr, Minecraft, Netflix, Youtube, Dropbox, IFTTT, OKCupid, SpiderOak. What to all these have in common? You guessed it, they were all affected by the Heartbleed bug. Surely this makes them victims rather than someone to blame? Not nearly. The OpenSSL foundation offers support contracts, work-for-hire schemes and other ways to get resources to aid development which are not begging for hand-outs. Unfortunately most companies who rely on this product do not lift a finger to support it, and rather “nag us for free consulting services” (Steve Marquess’ words).
The most major failing which has been highlighted by the Heartbleed bug is that those who use it have not shared enough back. This is not even a matter of ethics, something which is lost on most board members, but a clear business case. By employing professionals from the OSF or specialised security consultants who are free to report back issues to the OpenSSL project, risks can be mitigated before a costly scramble to fix them, and the contributing company gets PR gold in the technical arena rather than a loss of confidence from consumers as they have to admit they don’t care about security of their customer’s details enough to pay to ensure it. This is embarrassing and hits the bottom line.
If the OSF is given the kind of financial and manpower support it deserves, they would easily be able to incorporate proper unit testing, multiple reviewers for each patch, regular security testing at the protocol level, and more. The tangible benefits to participating business should far outweigh the costs, not to mention benefit to the wider community and an overall increase in security across the globe.
For more detailed discussions of the failure of commercial stakeholders to support OpenSSL and what can be done about it, a good starting point is the following article by Christina Warren: http://mashable.com/2014/04/14/heartbleed-open-source/