Meltdown and Spectre Patches: a story of delays, lies, and failures
On January 3, white hackers from Google Project Zero have disclosed some vulnerabilities in Intel chips called Meltdown (CVE-2017-5754) and Spectre (CVE-2017-5753 and CVE-2017-5715), which could be exploited by attackers to steal sensitive data processed by the CPU.
The vulnerabilities potentially impact all major CPUs, including the ones manufactured by AMD, ARM, and Intel.
Both Meltdown and Spectre attacks rely on issues in the “speculative execution” technique used by most modern CPUs to optimize performance.
“A processor can execute past a branch without knowing whether it will be taken or where its target is, therefore executing instructions before it is known w
hether they should be executed. If this speculation turns out to have been incorrect, the CPU can discard the resulting state without architectural effects and continue execution on the correct execution path. Instructions do not retire before it is known that they are on the correct execution path,” reads the description of ‘speculative execution’ provided by Google hackers.
The Google researchers discovered that it is possible for this speculative execution to have side effects which are not restored when the CPU state is unwound and can lead to information disclosure.
The Meltdown Attack
The Meltdown attack could allow attackers to read the entire physical memory of the target machines stealing credentials, personal information, and more.
“Meltdown is a related microarchitectural attack which exploits out-of-order execution to leak the target’s physical memory,” reads the paper on the Spectre attack.
“Meltdown exploits a privilege escalation vulnerability specific to Intel processors, due to which speculatively executed instructions can bypass memory protection.”
The Meltdown exploits the speculative execution to breach the isolation between user applications and the operating system, in this way any application can access all system memory.
Almost any computer was vulnerable to the Meltdown attack at the time of the disclosure and experts highlighted that it is easy to exploit.
The Spectre Attack
While the Meltdown flaw could be fixed via software, the Spectre attack is hard to mitigate because and requires changes to processor architecture to solve it.
The Spectre attack breaks the isolation between different applications, allowing to leak information from the kernel to user programs, as well as from virtualization hypervisors to guest systems. The Spectre attack works on almost every system, including desktops, laptops, cloud servers, as well as smartphones.
“KAISER patch, which has been widely applied as a mitigation to the Meltdown attack, does not protect against Spectre.”
Intel excluded any problems with the Meltdown and Security patches, but …
To protect systems from bot Meltdown and Spectre attacks, it is possible to implement the hardening technique known as kernel page table isolation (KPTI). The technique allows isolating kernel space from user space memory.
“Intel has developed and is rapidly issuing updates for all types of Intel-based computer systems — including personal computers and servers — that render those systems immune from both exploits (referred to as “Spectre” and “Meltdown”) reported by Google Project Zero,” states the press release published by Intel.
“Intel has already issued updates for the majority of processor products introduced within the past five years. By the end of next week, Intel expects to have issued updates for more than 90 percent of processor products introduced within the past five years.”
While experts were speculating security, patches could have a significant effect on the performance of the affected products; Intel pointed out that average users will not notice any difference.
“Intel continues to believe that the performance impact of these updates is highly workload-dependent and, for the average computer user, should not be significant and will be mitigated over time,” stated Intel.
“While on some discrete workloads the performance impact from the software updates may initially be higher, additional post-deployment identification, testing and improvement of the software updates should mitigate that impact.”
Intel confirmed that extensive testing conducted by tech giants (Apple, Amazon, Google, and Microsoft) to assess any impact on system performance from security updates did not reveal negative effects.
Security experts from Google Project Zero also proposed their mitigation technique called Retpoline.
“In response to the vulnerabilities that were discovered we developed a novel mitigation called “Retpoline” — a binary modification technique that protects against “branch target injection” attacks. We shared Retpoline with our industry partners and have deployed it on Google’s systems, where we have observed negligible impact on performance,” wrote Google.
“In addition, we have deployed Kernel Page Table Isolation (KPTI) — a general purpose technique for better protecting sensitive information in memory from other software running on a machine — to the entire fleet of Google Linux production servers that support all of our products, including Search, Gmail, YouTube, and Google Cloud Platform.”
Users and companies reported severe issues
Microsoft was one of the companies that first released security patches for the Meltdown and Spectre attacks (Security Update for Windows KB4056892), but unfortunately, its users running the popular OS on some AMD Athlon-powered machines reported severe problems.
AMD CPUs are not susceptible to the Meltdown attack but are vulnerable only to Spectre attacks.
In one of the threads on answers.microsoft.com, many users claimed that the Security Update for Windows KB4056892 was bricking some AMD-powered PCs leaving them displayed with the Windows startup logo.
“I have older AMD Athlon 64 X2 6000+, Asus MB; after installation of KB4056892 the system doesn’t boot, it only shows the Windows logo without animation and nothing more. After several failed boots it does roll-back then it shows error 0x800f0845. Unfortunately, it seems it’s not easy to disable the automatic updates without gpedit tweaks, so it tries installing and rolling-back the update over and over,” reported an angry user.
Machines based on AMD Athlon CPU just after the installation of the patch stopped working, and the worst news is that the fix did not create a recovery point.
Some of the affected users tried to re-install Windows 10 without success.
Affected users had to disable Windows Update.
One of the first companies that conducted an interesting analysis of the impact of the security patches on its infrastructure is the vendor of IT Management Software & Monitoring Tools SolarWinds. The company analyzed the impact on the performance of Meltdown and Spectre security patches on its own Amazon Web Services infrastructure.
The results are worrisome, a graph representing the performance of “a Python worker service tier” on paravirtualized AWS instances it is possible to note a significant increase in the CPU utilization (+25%) just after Amazon restarted the PV instance used by the company.
“As you can see from the following chart taken from a Python worker service tier, when we rebooted our PV instances on Dec 20th ahead of the maintenance date, we saw CPU jumps of roughly 25%,” states the analysis published by SolarWinds.
Figure 1 – Average CPU utilization (SolarWinds)
The company also shared data related to the performance of its EC2 instances that shows a degradation while Amazon was rolling out the Meltdown patches.
“AWS was able to live patch HVM instances with the Meltdown mitigation patches without requiring instance reboots. From what we observed, these patches started rolling out on Jan 4th, 00:00 UTC in us-east-1 and completed around 20:00 UTC for EC2 HVM instances in us-east-1,” continues the analysis.
“CPU bumps like this were noticeable across several different service tiers:”
Figure 2 – Average CPU utilization (SolarWinds)
To summarize, the packet rate drops up to 40% on its Kafka cluster, while CPU utilization spiked by around 25 percent on Cassandra.
The deployment of the patches had also some positive effects, CPU utilization rates decreased. The company issued an update on Jan 12, 2018.
“As of 10:00 UTC this morning we are noticing a step reduction in CPU usage across our instances. It is unclear if there are additional patches being rolled out, but CPU levels appear to be returning to pre-HVM patch levels,” states the firm.
Intel’s certainties collapse
On January 18, Intel published the results of the test conducted on the Meltdown and Spectre patches and their impact on performance confirming severe problems.
According to Chipzilla, systems with several types of processors running Meltdown and Spectre patches may experience more frequent reboots.
A few days before Intel reported that extensive test conducted on home and business PCs demonstrated a negligible performance impact on these types of systems (from 2 up to 14%).
The tech giant conducted later some performance tests on data centers, and results show that the impact on the performance depends on the system configuration and the workload.
“As expected, our testing results to date show performance impact that ranges depending on specific workloads and configurations. Generally speaking, the workloads that incorporate a larger number of user/kernel privilege changes and spend a significant amount of time in privileged mode will be more adversely impacted,” reads the analysis conducted by Intel.
Impacts ranging from 0-2% on industry-standard measures of integer and floating-point throughput, LINPACK, STREAM, server-side Java, and energy efficiency benchmarks. The tests are related to benchmarks that cover typical workloads for enterprise and cloud customers.
Intel also evaluated the impact on online transaction processing (OLTP), estimating it at roughly 4%.
Benchmarks for storage demonstrated a strict dependence on the benchmark, test setup, and system configuration.
For FlexibleIO, which simulates various I/O workloads, throughput performance decreased by 18% when the CPU was stressed, but there was no impact when CPU usage was low.
The tests for FlexibleIO were conducted using different benchmark simulating different types of I/O loads; the results depend on many factors, including read/write mix, block size, drives and CPU utilization.
“For FlexibleIO, a benchmark simulating different types of I/O loads results depend on many factors, including read/write mix, block size, drives and CPU utilization. When we conducted testing to stress the CPU (100% write case), we saw an 18% decrease in throughput performance because there was not CPU utilization headroom,” continues the analysis. “When we used a 70/30 read/write model, we saw a 2% decrease in throughput performance. When CPU utilization was low (100% read case), as is the case with common storage provisioning, we saw an increase in CPU utilization, but no throughput performance impact.”
The most severe degradation of the performance was observed during Storage Performance Development Kit (SPDK) tests, using iSCSI the degradation reached 25% when only a single core was used. Fortunately, there was no degradation of the performance when SPDK vHost was used.
Figure 3 – Testing Meltdown and Spectre patches (Intel)
Intel also reported that Meltdown and Spectre patches are causing more frequent reboots, this behavior was observed for systems running Broadwell, Haswell, Ivy Bridge-, Sandy Bridge-, Skylake-, and Kaby Lake-based platforms.
“We have reproduced these issues internally and are making progress toward identifying the root cause. In parallel, we will be providing beta microcode to vendors for validation by next week,” said Navin Shenoy, executive vice president and general manager of Intel’s Data Center Group.
Only the newest Intel 8th-gen CPUs Coffee Lake seems to be not affected by reboots.
Intel recommended stopping deploying the current versions of Spectre/Meltdown patches
On January 23, Intel recommended OEMs, cloud service providers, system manufacturers, software vendors as well as end users to stop deploying the current versions of Spectre/Meltdown patches.
Intel detailed its approach in addressing the issue with a technical note about Spectre mitigation (“Speculative Execution Side Channel Mitigations“). The company addressed the issue with an opt-in flag dubbed IBRS_ALL bit (IBRS states for Indirect Branch Restricted Speculation).
The Indirect Branch Restricted Speculation, along with Single Thread Indirect Branch Predictors (STIBP) and Indirect Branch Predictor Barrier (IBPB), prevent the abuse of the prediction feature and the exploitation of the flaw.
Torvalds speculate the Intel’s decision to address the issues in this way is mainly motivated by the intention to avoid legal liability. Recalling two decades of flawed chips would have a catastrophic impact on the tech giant.
Torvalds explained that the impact of using IBRS on existing hardware is so severe that no one will set the hardware capability bits.
The company announced to have found the root cause of reboot issued for Broadwell and Haswell platforms and is asking to wait for a fix.
The tech giant began rolling out to industry partners a beta update to address the issue.
“As we start the week, I want to provide an update on the reboot issues we reported Jan. 11. We have now identified the root cause for Broadwell and Haswell platforms, and made good progress in developing a solution to address it,” Intel said in a press release published on Monday. “Over the weekend, we began rolling out an early version of the updated solution to industry partners for testing, and we will make a final release available once that testing has been completed.”
The response of the tech companies
While Meltdown and Spectre Variant 1 could be theoretically being addressed by patching the OS, Spectre Variant 2 require a firmware/microcode update to address the issue.
Red Hat joint to the list of companies that observed problems after the installation of the patches, it is releasing updates that are reverting previous patches for the Spectre vulnerability (Variant 2, aka CVE-2017-5715).
In response to the announcement made by Intel, the company decided to revert the initial security updates because it received from some customers complaints about booting failure for their systems.
Red Hat is recommending its customers to contact their OEM hardware provider to receive the latest release of firmware to mitigate the CVE-2017-5715.
“Red Hat Security is currently recommending that subscribers contact their CPU OEM vendor to download the latest microcode/firmware for their processor.” reads the advisory published by Red Hat.
“The latest microcode_ctl and Linux-firmware packages from Red Hat do not include resolutions to the CVE-2017-5715 (variant 2) exploit. Red Hat is no longer providing microcode to address Spectre, variant 2, due to instabilities introduced that are causing customer systems not to boot. The latest microcode_ctl and Linux-firmware packages are reverting these unstable microprocessor firmware changes to versions that were known to be stable and well tested, released before the Spectre/Meltdown embargo lift date on Jan 3rd. Customers are advised to contact their silicon vendor to get the latest microcode for their particular processor.”
Other distributions based on Red Hat Enterprise Linux like CentOS could suffer similar problems, and it could be necessary to revert Spectre Variant 2 security updates.
The company suggests customers to access the Red Hat Customer Portal Lab App to verify systems have the necessary microprocessor firmware to address CVE-2017-5715 (variant 2):
“Our own experience is that system instability can in some circumstances cause data loss or corruption,” states the security advisory published by Microsoft.
“While Intel tests, updates and deploys new microcode, we are making available an out of band update today, KB4078130, that specifically disables only the mitigation against CVE-2017-5715 – ‘Branch target injection vulnerability.’ In our testing, this update has been found to prevent the behavior described.”
Microsoft and the companies above observed problems after the installation of the Spectre vulnerability (Variant 2, aka CVE-2017-5715, that is a branch target injection vulnerability) for this reason opted to revert previous patches.
Microsoft claimed that the security updates issued by Intel cause system instability and can in some cases lead to data loss or corruption, for this reason, the company distributed over the weekend the Update KB4078130 for Windows 7, Windows 8.1 and Windows 10 that disables the mitigation for CVE-2017-5715.
The company has also provided detailed instructions for manually enable and disable Spectre Variant 2 mitigations through registry settings.
Microsoft said it is not aware of any attack in the wild that exploited the Spectre variant 2 (CVE 2017-5715).
“As of January 25, there are no known reports to indicate that this Spectre variant 2 (CVE 2017-5715) has been used to attack customers. We recommend Windows customers, when appropriate, reenable the mitigation against CVE-2017-5715 when Intel reports that this unpredictable system behavior has been resolved for your device,” continues the advisory.
The last element of this embarrassing story is possibility that Intel alerted Chinese companies before US Government about Meltdown and Spectre flaws
The news was reported by The Wall Street Journal, according to its report, Intel warned Chinese tech giants before notifying the flaws to the US government.
Citing unnamed people familiar with the matter and some of the companies involved, The WSJ revealed that the list of Chinese companies includes Lenovo and Alibaba.
It is not clear when Intel notified the flaw to Lenovo, but a leaked memo from Intel to computer makers suggests the company reported the issues to an unnamed group of on November 29 via a non-disclosure agreement. The same day, the Intel CEO Brian Krzanich sold off his shares.
Last week, French tech publication LeMagIT’s Christophe Bardy disclosed the first page of the “Technical Advisory” issued by the Intel Product Security Incident Response Team.
Security experts speculate the companies might have passed this information to the Chinese Government, but Alibaba spokesman refused any accusation.
In this scenario, it is impossible to believe that the Chinese Government was not informed by the Chinese companies alerted by Intel.
We also know that the Meltdown flaw is easy to exploit, this means that it is likely that threat actors might have triggered it to extract passwords and other sensitive data from a target machine. The situation is worrisome in cloud-computing environments were many customers share the same servers, in this scenario an attacker can launch a Meltdown attack to steal info belonging to other clients with applications hosted on the same server.
El Reg reached Intel for comment, below the reply of the chip vendor:
“The Google Project Zero team and impacted vendors, including Intel, followed best practices of responsible and coordinated disclosure. Standard and well-established practice on initial disclosure is to work with industry participants to develop solutions and deploy fixes ahead of publication. In this case, news of the exploit was reported ahead of the industry coalition’s intended public disclosure date at which point Intel immediately engaged the US government and others,” states the El Reg.
We've encountered a new and totally unexpected error.
Get instant boot camp pricing
A new tab for your requested boot camp pricing will open in 5 seconds. If it doesn't open, click here.