Back in 2012, when Juliano Rizzo and Thai Duong announced the CRIME attack, a TLS / SSL Compression attack against HTTPS, the ability to recover selected parts of the traffic through side channel attacks was proven. This attack was mitigated by disabling the TLS / SSL level compression for most of the browsers. This year at Black Hat, a new attack called BREACH (browser reconnaissance and exfiltration via adaptive compression of hypertext) was announced and it commanded the attention of entire industry. This presentation, titled “SSL Gone in 30 seconds,” is not properly understood and hence there seems to be some confusion about how to mitigate the problem. So I felt that this article should give some detailed insight into how notorious the attack is, how it works, how practical it is, and what needs to be done to mitigate it. So let’s have a look.
Unlike the previously known attacks, such as BEAST, LUCKY, etc., BREACH is not an attack against TLS; it is basically an attack against HTTP. If you are familiar with the famous Oracle padding attack, BREACH is somewhat easy to understand. A BREACH attack can extract login tokens, email addresses, and other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted). The attacker just needs to trick the victim into visiting a malicious link to execute the attack. Before going into the details, let me explain a little bit more about the basic things you need to know. Web pages are generally compressed before the responses are sent out, which is called HTTP compression, primarily to make better use of available bandwidth and to provide greater transmission speeds. The browser usually tells the server (through the “Accept-Encoding” header), what compression methods it supports and the server accordingly compresses the content and sends it across. If the browser does not support any compression then the response is not compressed. The most commonly used compression algorithms are gzip and deflate.
Accept-Encoding: gzip, deflate
When the content arrives, it is uncompressed by the browser and processed. So, basically with SSL-enabled web sites, the content is first compressed, then encrypted and sent. But you can determine the length of this compressed content even when it’s wrapped by SSL.
How Does It Work?
The attack primarily works by taking advantage of the compressed size of the text when there are repetitive terms. Here is a small example that explains how deflate takes advantage of repetitive terms to reduce the compressed size of the response.
Consider the search page below, which is present after logging into this site:
Observe that the text highlighted in red box is the username. Now enter any text (say “random”) and click “Search.”
So you can control the response through the input parameter in the URL. Now imagine that the search term is “Pentesting” (which is the username in this case).
Now, when the deflate algorithm is compressing the above response, it finds that the term “Pentesting” is repeated more than once in the response. So, instead of displaying it a second time, the compressor says “this text is found 101 characters ago.” This reduces the size of the compressed output. In other words, by controlling the input search parameter, you can guess the username. How? The compressed size would be least when the search parameter matches the username. This concept is the base for the BREACH attack.
Now let us see how an attacker would practically exploit this issue and steal any sensitive information. Consider the site below and assume a legitimate user has just signed in.
[Before signing in to the application]
[Search page, which is accessible after logging in]
As shown in the above figure, also assume that there is some sensitive data in the Search page, for example, a card number. When the user searches for something (say “test”) the following message is displayed.
The attacker can also get the compressed sizes of the responses for each of these requests. Can you guess why the compresses sizes for each of these responses would differ and can you guess which request would have the smallest compressed size? Below are the requests with the smallest compressed sizes:
Below is the explanation of why the above requests have the smallest compressed sizes. Take the first request. Here is the response from the server:
As shown above, when the deflate algorithm encounters this, it makes an easy representation of the repetitions and thus results in a least compressed size. So by analyzing the compressed size for each of the requests from 100-10000, an attacker can simply deduce what the card number is in this case. This the beauty of this attack lies in the fact that we did not decrypt any traffic but just by analyzing the size of the responses we were able to predict the text.
To summarize in simple steps, for an application to be vulnerable to this breach attack, here are the conditions that it must fulfill:
The server should be using HTTP level compression.
There must be a parameter that reflects the input text. (This will be controlled by the attacker).
The page should contain some sensitive text that would be of interest to the attacker.
Turning off HTTP compression would save the day, but that cannot be a possible solution, since all the servers rely on it to effectively manage the bandwidth. Here are some of the other solutions that can be tried:
Protecting the vulnerable pages with a CSRF token.
Adding random bytes to the response to hide the actual compressed length.
Separating the sensitive data from the pages where input text is displayed.