Meltdown – How does it work?
I thought some of you would be interested in a high level summary of the mechanism the Meltdown Exploit uses.
Meltdown is a relatively straight forward exploit to implement. It’s also a beautifully elegant attack – and demonstrates how modern hackers think and work.
To be clear the CPU issues don’t allow a process to directly access unauthorised memory or cache. They do allow any program to determine what is in any memory location on the computer using the following technique below called a “side channel attack”.
A side channel attack is where you observe a consequence of an action to determine something you can’t find out more directly.
An example would be if you thought a submarine were close to a ship. You pretend the ship is on fire to see if that causes the submarine to break silence and broadcast. You don’t know what the submarine broadcasts – and can’t locate the source – but if it broadcasts you can deduce it’s likely the submarine can see the ship.
The background technologies
Pre-Emptive execution
Modern CPUs include a feature called pre-emptive execution. This is where a CPU executes code it expects to have to execute before being asked to execute it.
One common case for this is for access to parts of the system that require authorisation (e.g. system memory). In this case
• One CPU process will start to check whether you have rights to access the memory
• The second will start to execute the code that will be executed if you do have rights
• If the first process determines you don’t have rights then the results are discarded (not returned to the process – and the memory not committed)
This would all seem to be OK as it saves a lot of time and a process never gets to actually see the results of something it shouldn’t do.
CPU Caches
All modern CPUs contain caches. This is ultra-fast memory on the chip – and stores chunks of data for faster access. The caches are loaded from main memory when a program asks for a chunk of data.
When a chunk of data is already in the CPU cache accessing it is far faster than if it has to be obtained from main memory.
The initial exploit (Setup)
• A process sends a program to a CPU to read a byte of memory to which it shouldn’t have access. Based on the contents of the byte it loads one of 256 randomly selected chunks of memory. This causes the chunk of memory to be loaded into the CPU cache.
• The program is executed by the CPU in parallel with authorisation. The authorisation fails as the process doesn’t have permissions – so the results are then discarded.
• So far so good – it would appear nothing has been compromised
Now for the clever bit
The one thing that has changed though is now one of the 256 possible chunks of memory is in the CPU cache
A second program runs immediately afterwards that requests each of the 256 chunks of memory. It also records the number of clock cycles taken to load each chunk.
The chunk already in the processor cache will load far faster.
Now we know which chunk was pulled into the processor cache we can determine what value was in the byte of memory we shouldn’t be able to access
Clearly this process can be repeated to read several megabytes of memory per minute.
What does this tell us about fighting the exploit
• Firstly this allows any process to access any memory. The process could be any program, driver, plugin or downloaded piece of code.
• As there are almost an infinite number of options for writing the code snippet it is not really possible to detect the code
• Ironically as the CPU runs and discards the code snippet the anti-virus software has no opportunity to detect it. It’s effectively invisible – as from a logical perspective it was never run.
• “Software patches” are likely to be very costly and ineffective. They involve wrapping untrusted code before committing it to the CPU.
• The software that initiates the attack needs to be patched – software patches cannot protect a piece of software from being attacked.
• Operating system patches will not protect the whole system. Software running on the O/S can still use the exploit to snoop on other processes.
• This needs a micro-code fix to the CPU so that chunks of memory loaded during pre-emptive execution are not retained for future use. This is the only sensible way to solve the issue.
• The exploit can also be optimised for speed – determining the content of a single bit of memory only required two possible chunks. Using this technique a byte can be read using 16 rather than 256 memory requests. You would have 16 possible chunks – of which 8 were retrieved by the program – one for each bit.