Amongst the initial reactions to Wasmjit were concerns about its vulnerability to Meltdown and Spectre. This isn’t surprising since Spectre primarily affects operating system kernels and language runtimes, with Wasmjit being a happy mixture of the two. Wasmjit isn’t vulnerable to Meltdown but it is vulnerable to Spectre Variant 1, Bounds Check Bypass (BCB), and Spectre Variant 2, Branch Target Injection. In this post I’ll cover Wasmjit’s mitigations for Spectre Variant 1. In a following post I’ll cover the mitigations for Spectre Variant 2.

Description of the Vulnerability

Since the initial disclosure on 2018-01-03 a lot has been published explaining how the vulnerabilities work. Google’s Project Zero has a good rundown. I enjoyed Colin Percival’s post as well. I’ll briefly summarize it here but I recommend those posts if you want to dig a little deeper.

These vulnerabilities primarily concern the effect a CPU’s speculative execution mechanism has on externally observable state. Specifically, vulnerable CPUs fill their cache with memory loaded during mis-speculated branches without flushing it after the mis-speculated branch is discarded. One common type of mis-speculated branch is a bounds check before accessing an array using an untrusted array index:

if (untrusted_input < array_len) {
    val = array[untrusted_input];
    /* do something that affects cache with val */
}

In BCB, an attacker provokes the target into attempting a load from an invalid array index and then, if mis-speculation occurs, can observe the effect on the cache to infer the value of data that would otherwise be inaccessible.

Important to note that Spectre Variant 1 is typically only concerned with mis-speculated loads. There is also Variant 1.1 which is concerend with mis-speculated stores. At the present moment Wasmjit doesn’t implement mitigations for that. The reasoning is that there hasn’t been a strong industry response to that vulnerability but Wasmjit will likely deal with Spectre Variant 1.1 before it is released.

BCB Mitigation Techniques

Intel recommends two techniques for mitigating BCB: lfence and bounds-clipping. Applying the lfence technique transforms the above example into the following:

if (untrusted_input < array_len) {
    lfence();
    val = array[untrusted_input];
    /* ... */
}

lfence is essentially a serializing operation. It doesn’t execute until all previous instructions have completed. Serializing execution between every bounds check and subsequent load is a fool-proof way to block malicious behavior during mis-speculations but isn’t ideal because it can have a dramatic negative effect on performance.

Here’s how bounds-clipping works:

if (untrusted_input <= POWER_OF_TWO_MINUS_ONE) {
    untrusted_input &= POWER_OF_TWO_MINUS_ONE;
    val = array[untrusted_input];
    /* ... */
}

Bounds-clipping ensures that even during mis-speculation the index won’t be larger than the length of the array. This technique is a lot more efficient than lfence but it isn’t ideal because it only works with arrays that have lengths that are powers of two. Additionally Intel won’t guarantee its effectiveness with future processor generations.

The approach taken by Wasmjit is similar to the approach taken in the Linux kernel and described by Chandler Carruth:

if (untrusted_input < len) {
    untrusted_input = array_index_nospec(untrusted_input, len)
    val = array[untrusted_input];
}

The array_index_nospec() function is responsible for “hardening” the array index. Here’s a simplified version of how that works:

size_t array_index_nospec(size_t idx, size_t len)
{
    __asm__ ("" : "=r" (idx) : "0" (idx));
    size_t mask = idx < len ? ~(size_t) 0 : (size_t) 0;
    return idx & mask;
}

The idea here is that when the CPU mis-speculates, either the mask will be 0 or the computation of mask will stall until the mis-speculation can be rectified. The point of the __asm__ statement is to force the compiler to compute the mask without optimizations based on knowledge of what idx may be.

This method requires that the computation of mask be done without any branches (otherwise the CPU could speculate around the mask) and that the CPU isn’t able to (mis-)predict the value of mask (instead of stalling). On GCC and Clang on x86_64 these preconditions are satisfied. Similar to bounds-clipping, Intel doesn’t guarantee this method will work with future processor generations. The good news is that GCC has implemented the functionality of array_index_nospec() natively as a compiler builtin, so going forward the compiler will be responsible for the implementation details.

Wasmjit Mitigations

At a high level, Wasmjit is vulnerable to BCB in two main places 1) in the runtime host functions made available to user programs and 2) in the code generated by the JIT.

Host Runtime Function Mitigations

User code directly interacts with Wasmjit through the host functions it exports. These host functions mimic the de-facto interface implemented by Emscripten, which, in turn, roughly mimics the Linux kernel system call interface. This interface is the only way user programs interact with the outer world. From the perspective of the user program, it uses normal C pointers to pass references to data to the host interface. From the perspective of Wasmjit, these pointers are actually indices into the singleton memory instance of that WebAssembly module.

To safely load data from user-provided pointers, Wasmjit first checks that the pointer is a valid index in the singleton memory instance. After that, a custom memcpy() routine is run that properly hardens the array index before performing the load. Here’s an example:

void custom_memcpy(void *restrict memory_base, size_t memory_size,
                   void *restrict dst, uint32_t wasm_ptr, uint32_t size)
{
    size_t i;
    for (i = 0; i < size & ~(size_t)0x7; i += 8) {
        uint32_t hardened = array_index_nospec(wasm_ptr + i, 8, memory_size);
        memcpy((char *) dst + i, (char *) memory_base + hardened, 8);
    }
    for (i = size & ~(size_t)0x7; i < size; ++i) {
        uint32_t hardened = array_index_nospec(wasm_ptr + i, 1, memory_size);
        *((char *) dst + i) = *((char *) memory_base + hardened);
    }
}

We copy in blocks of 8 bytes to minimize the performance impact of hardening the index on every load. To load in blocks of 8 bytes safely, an extra argument needs to be provided to array_index_nospec():

size_t array_index_nospec(size_t idx, size_t extent, size_t len)
{
    __asm__ ("" : "=r" (idx) : "0" (idx));
    size_t mask = (idx + extent) <= len ? ~(size_t) 0 : (size_t) 0;
    return idx & mask;
}

Without the extent argument, array_index_nospec() only hardens based on whether a single access is safe. That’s no longer the case since we’re copying in blocks of 8. If accessing a single element of the array, just invoke array_index_nospec() with an extent argument of 1.

Altogether, a typical host function roughly looks like this:

uint32_t foo(uint32_t ptr, uint32_t size, struct wasmjit_ctx *ctx)
{
    uint32_t user_int;
    
    if (size != sizeof(user_int))
        return 0;

    /* check if untrusted memory reference is valid */
    if (ptr + size > ctx->memory_size)
        return 0;

    custom_memcpy(ctx->memory_base, ctx->memory_size,
                  &user_int, ptr, sizeof(user_int));

    /* do stuff with user_int... */

    return 1;
}

JIT Mitigations

There are 2 WebAssembly instructions that directly involve array indexing using a user-provided index: br_table and call_indirect. In addition, every load instruction is vulnerable: i32.load, i64.load, f32.load, f64.load, i32.load8_s, i32.load8_u, i32.load16_s, i32.load16_u, i64.load8_s, i64.load8_u, i64.load16_s, i64.load16_u, i64.load32_s, i64.load32_u.

Since the JIT generates machine code that executes those instructions, Wasmjit can’t simply use the array_index_nospec() function to harden the array indexes. Instead, we need a machine code sequence that does the equivalent hardening. On x86_64 we can use the sbb instruction after cmp in the following way:

# %rax contains the untrusted index
# %rdx contains the array size
cmp %rdx, %rax

# jump if CF == 0, mis-speculations may not jump if CF == 0
jae BAD_INDEX

# sbb computes %rcx = (%rcx - %rcx - CF)
sbb %rcx, %rcx

# if CF == 0, then %rcx == 0
and %rcx, %rax

After the preceding instruction sequence, it should be safe to use %rax as an array index.

Final Words

While the above technique is effective in mitigating BCB, requiring programmers to manually annotate each conditional array access doesn’t scale and is error-prone. LLVM is working on an automatic technique called “Speculative Load Hardening” (SLH) that works at the compiler level but, as of the time of this writing, it’s not yet ready for production use. A GCC implementation of SLH would also be necessary for Wasmjit on Linux. Sadly, the existence of “good enough” software mitigations leaves me doubtful we’ll see a robust hardware solution from Intel, AMD, ARM and others. Unlike the other Spectre/Meltdown vulnerabilities, Intel’s latest 9th generation processors still don’t address BCB. Fortunately, there has been a hardware response from Intel on Spectre Variant 2, which we’ll cover next time.