All the memory safety of C combined with all the blazing speed of JavaScript

Slim (@soylentqueen)

Notion (notion.so)

My WebAssembly knowledge in 2018

  • Fast (successor to asm.js)

  • Many use cases

    • Games
    • Graphics
    • Large Fibonacci numbers
  • Formally specified, verified

  • Might put me out of a job

WebAssembly logo

Some context

I’m an academic first, software engineer second

By day 🌝

  • Building Notion, a practical computing toolkit
  • Previously building Khan Academy
  • Extremely professional JavaScript developer

By night 🌚

  • Researching programming languages as user interfaces
  • Functional programming, rich type systems
  • Skipped operating systems class to study type theory

Two kinds of WebAssembly talks

Some WebAssembly talks are fairly shallow and broad

Two kinds of WebAssembly talks

This WebAssembly talk is narrow and deep

Why visit the abyss?

Have you encountered a nifty-sounding technology and thought to yourself:

This seems nifty!

Too bad I am a JavaScript developer and don’t actually understand this lol ¯\_(ツ)_/¯

“I’m just a frontend developer…”

  • Algebraic data types? I don’t do algebra, data, or types
  • Spectre/Meltdown? I’m having a Meltdown
  • Containers? Does following @jessfraz on Twitter count

I was like this once

Those are all actual quotes from me

Ok so WebAssembly

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine.

Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust.

Why does WebAssembly matter?

  • Need a performant runtime for the Web
  • But as we all know, JavaScript was designed by accident in 10 minutes and originally intended to make WordArt
  • A way to interoperate with native and low-level libraries
  • Prior art: ActiveX, NaCl, asm.js

Design goals

How does it look

  • Compact and fast to decode
  • Easy to validate, compile (single-pass)

How does it run

  • Safe to execute
  • Fast
  • Portable
  • Deterministic
  • Easy to reason about

(Haas et al. 2017)

To our knowledge, WebAssembly is the first industrial-strength language or VM that has been designed with a formal semantics from the start.

This not only demonstrates the “real world” feasibility of such an approach, but also that it leads to a notably clean design. (Haas et al. 2017)

Design choices

  • Types: i32, i64, f32, f64, functions

  • Linear memory: each module has its own sandboxed memory space separate from code space and execution stack

  • Structured control flow: block and loop instead of arbitrary jumps

Compilation target

For “high-level” languages like C, C++, Rust

Compiling functions

int addOne(int x) {
    return x + 1;
}

Compiling functions

C++

int addOne(int x) {
    return x + 1;
}

WebAssembly

(func $func0 (param $var0 i32) (result i32)
  get_local $var0
  i32.const 1
  i32.add
)

Compiling modules

int addOne(int x) {
    return x + 1;
}

int main(int argc) {
    return addOne(argc);
}

Compiling modules

int addOne(int x) {
    return x + 1;
}

int main(int argc) {
    return addOne(argc);
}

WebAssembly

(module
  (type $type0 (func (param i32) (result i32)))
  (table 0 anyfunc)
  (memory 1)
  (export "memory" memory)
  (export "_Z6addOnei" $func0)
  (export "main" $func1)
  (func $func0 (param $var0 i32) (result i32)
    get_local $var0
    i32.const 1
    i32.add
  )
  (func $func1 (param $var0 i32) (result i32)
    get_local $var0
    call $func0
  )
)

Compiling modules

int addOne(int x) {
    return x + 1;
}

int main(int argc) {
    return addOne(argc);
}
(module
  (type $type0 (func (param i32) (result i32)))
  (table 0 anyfunc)
  (memory 1)
  (export "memory" memory)
  (export "_Z6addOnei" $func0)
  (export "main" $func1)
  (func $func0 (param $var0 i32) (result i32)
    get_local $var0
    i32.const 1
    i32.add
  )
  (func $func1 (param $var0 i32) (result i32)
    get_local $var0
    call $func0
  )
)

Stack-based virtual machine

How does it run?

Stack machine semantics

(func $func0 (param $var0 i32) (result i32)
  get_local $var0
  i32.const 1
  i32.add
)
(func $func1 (param $var0 i32) (result i32)
  get_local $var0
  call $func0
)

Referencing local variables

  1. get_local $var0

  2. call $func0

$var0

Calling functions

  1. get_local $var0

  2. call $func0

$var0

Function signatures

(func $func0 (param $var0 i32) (result i32)
  ; ...
)

Pop arguments off the stack

(func $func0 (param $var0 i32) (result i32) ...)
  1. get_local $var0

  2. call $func0

Create a new stack frame

(func $func0 (param $var0 i32) (result i32) ...)
  1. get_local $var0

  2. call $func0

Inside a function: get local variable

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

$var0

Constants

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

1
$var0

Addition

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

1
$var0

Instruction signatures

i32.add takes two i32s and returns an i32

WebAssembly Reference Manual

Addition: pop inputs

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

Addition: push outputs

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

$var0 + 1

Returning to caller

  1. get_local $var0

  2. call $func0

    1. get_local $var0

    2. i32.const 1

    3. i32.add

$var0 + 1

Slim pls

“There has been no JavaScript in this talk so far”

JavaScript API

Loading a .wasm file in three lines of JavaScript

fetch('demo/calls.wasm')
  .then(response => response.arrayBuffer())
  .then(WebAssembly.instantiate)
  .then(({module, instance}) => {
      // ... do stuff
  })

Inspecting exports

instance.exports
{ _Z6addOnei: function 1()
, main: function 2()
, memory: WebAssembly.Memory
}

Calling functions

instance.exports._Z6addOnei(2019)
2020

wasm-trace

Don’t play with matches, play with binaries!

Background

My roommate, Meg Grasse
  • This is my roommate Meg

  • We like static analysis, systems programming, Rust, compilers

  • Needed a cool Rust project

Nick’s idea

a tool that would take a wasm module and modify its code to inject tracing calls, so that you could get an trace of the wasm’s execution in the console

Nick Fitzgerald’s GitHub avatar

Nick Fitzgerald, Mozilla

Jim Blandy’s GitHub avatar

Jim Blandy, Mozilla

Example

fn do_stuff(x: i32) -> i32 {
    return double(x) + negate(5);
}

do_stuff(4)
call function do_stuff
 |  call function double
 |  return 8 from double
 |  call function negate
 |  return -5 from negate
return 3 from do_stuff

Start with a Rust program

Rust

fn addOne(x: i32) -> i32 {
  x + 1
}
fn main(x: i32) -> i32 {
  addOne(x) + addOne(x)
}

JavaScript

function addOne(x) {
  return x + 1
}
function main(x) {
  return addOne(x) + addOne(x)
}

Allocate global memory

static ref TRACER = Mutex::new(Tracer::new());

fn expose_tracer() -> *const i32 {
    TRACER.lock().unwrap().as_ptr()
}
const TRACER = []

function exposeTracer() {
    return TRACER  // exposes a memory address
}

*little bit of hand-waving here

Compile Rust program to .wasm module

Then transform each function in module.wasm!

Module layout

(module
  (type $type0 (func (param i32) (result i32)))
  (table 0 anyfunc)
  (memory 1)
  (export "memory" memory)
  (export "_Z6addOnei" $func0)
  (export "main" $func1)
  (func $func0 (param $var0 i32) (result i32)
    get_local $var0
    i32.const 1
    i32.add
  )
  (func $func1 (param $var0 i32) (result i32)
    get_local $var0
    call $func0
  )
)

Add prologue instructions

  (func $func0 (param $var0 i32) (result i32)
+   i32.const 0  ; func0
+   call $logFunctionCall
    get_local $var0
    i32.const 1
    i32.add
  )

How to write return value to memory?

get_local $var0
i32.const 1
i32.add
$var0 + 1

Calling logFunctionReturn pops an argument off the stack!

  get_local $var0
  i32.const 1
  i32.add
+ call $logFunctionReturn

THIS IS AN INVALID STATE!!!!!!

$func0 must return a value, but the stack is now empty!

tee_local

“Copies the top of the stack” to a local variable

  get_local $var0
  i32.const 1
  i32.add
+ tee_local $return_value
// stack = [var0 + 1]
const temp = stack.pop()
// stack = []
returnValue = temp
stack.push(temp)
// stack = [var0 + 1]

Now we can get_local

   get_local $var0
   i32.const 1
   i32.add
+  tee_local $return_value
+  get_local $return_value
$var0 + 1
$var0 + 1

…and call $logFunctionReturn

   get_local $var0
   i32.const 1
   i32.add
+  tee_local $return_value
+  get_local $return_value
+  call $logFunctionReturn
$var0 + 1

Final epilogue instructions

  (func $func0 (param $var0 i32) (result i32)
    i32.const 0  ; func0
    call $logFunctionCall
    get_local $var0
    i32.const 1
    i32.add
+   tee_local $return_value
+   get_local $return_value
+   call $logFunctionReturn
  )

Now, the magic

fetch('demo/output.wasm')
  .then(response => response.arrayBuffer())
  .then(WebAssembly.instantiate)
  .then(({module, instance}) => {
    window.module = module
    window.instance = instance
  })

Run a function…

instance.exports.negate(10) // -10

And look at the memory!

const buf = new Int32Array(
  instance.exports.memory.buffer,
  instance.exports.__expose_tracer(),
  instance.exports.__expose_tracer_len(),
)

Soundness

  • Type safety: no invalid calls, no illegal accesses to locals
  • Memory safety: no buffer overflows, no dangling pointers
    • Code and call stack are not accessible to the program

Typing rules for instructions

Typing rules for instructions

Typing rules for modules

Typing rules for modules

This is actually impressive

Instructions Modules

  • Looks scary!

  • But compare to JVM bytecode verification: 150 pages of the current spec!

A formal semantics from the start

Validation ensures that the module is well-defined and that its code cannot exhibit any undefined behavior. In particular, along with some runtime checks, this ensures that no program can access or corrupt memory it does not own.

Further reading

With thanks

Talk to me on the internet