In the last post we covered Wasmjit’s mitigations for Spectre Variant 1, also known as Branch Check Bypass, or BCB. In this post we’ll cover Wasmjit’s mitigations for Spectre Variant 2, also known as Branch Target Injection, or BTI.
Description of the Vulnerability
Like BCB, the primary danger of BTI is unintended information leakage in the CPU’s cache after speculative execution through an incorrect branch prediction. Specifically, BTI is a vulnerability in how the branch predictor handles indirect branches on certain CPUs. An indirect branch is a branch whose destination is dynamically loaded from memory, for example:
With detailed knowledge of the internals of a CPU’s branch predictor, an attacker is able to exploit weaknesses in the branch predictor to control the destination of an indirect branch during speculative execution. Once the attacker is able to branch to arbitrary locations during mis-speculation, they can use the observed effects on the cache to infer the value of data that would otherwise be inaccessible.
There are multiple ways for an attacker to manipulate the CPU’s indirect branch predictor. One particularly insidious way allows an attacker to influence the indirect branch predictor from another thread. As in the last post, I’ll direct you to Google Project Zero’s post for more details.
Retpoline Mitigation for BTI
Different CPUs may use different methods for predicting indirect branches and, in turn, may require different BTI mitigations. For x86_64 CPUs, the de facto mitigation technique was invented by engineers at Google and is called “retpoline”. The retpoline technique is also recommended by Intel.
In many modern x86_64 CPUs, the method for predicting the destination
is different than the method used to predict the destination of the
which are vulnerable to BTI. Taking that into account, retpoline works
by using the
ret instruction to perform indirect branches instead of
call instructions. To show how that works, first
consider an indirect
Now consider the equivalent retpoline sequence:
call go loop: pause lfence jmp loop go: mov %rax, (%rsp) ret
Upon execution of the
call instruction, the CPU pushes the return
address, i.e. the address of
loop:, to the top of stack at
and branches to
go:, retpoline overwrites the top of the
stack with the desired destination address in
ret then pops
the address from the top of the stack and branches there. If the CPU
doesn’t know the value of the destination address upon execution of
ret, then, due to the way
ret branch prediction works, it will
speculatively execute starting at
loop: and loop endlessly until the
destination address is resolved.
The corresponding retpoline sequence for
call *%rax works on a
Every indirect jump in Wasmjit is vulnerable to BTI. Fortunately, BTI can be automatically mitigated by the compiler. All major compilers provide automatic mitigation.
Since Wasmjit is a JIT, however, it must also make sure to not emit
vulnerable indirect branches at runtime. There are 2 WebAssembly instructions
that require an indirect branch:
Additionally Wasmjit emits indirect jumps in a few other places for
convenience of implementation. It was enough to change the instruction
sequences emitted in the affected areas.
The burden incurred on general software developers by BTI is relatively low compared to BCB. It really only affects projects written in assembly code, or projects that emit assembly code, like compilers and JITs. The impact on Wasmjit is minimal but persistent. BTI will need to be considered each time a new JIT backend is added. For now, Wasmjit only supports x86_64, but may need to address BTI again when AArch64 support is added.