Announcing the first Java library to run WebAssembly: Wasmer JNI

This is a copy of an article I wrote for Wasmer.


Image for post

WebAssembly is a portable binary format. That means the same file can run anywhere.

To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.

People who are familiar with Wasmer are used to this kind of announcement! Wasmer is written in Rust, and comes with an additional native C API. But you can use it in a lot of other languages. After having announced libraries to use Wasmer, and thus WebAssembly, in:

…we are jazzed to announce that Wasmer has now landed in Java!

Let’s discover the Wasmer JNI library together.

Installation

The Wasmer JNI (Java Native Interface) library is based on the Wasmer runtime, which is written in Rust, and is compiled to a shared library. For your convenience, we produce one JAR (Java Archive) per architecture and platform. By now, the following are supported, consistently tested, and pre-packaged (available in Bintray and Github Releases):

  • amd64-darwin for macOS, x86 64bits,
  • amd64-linux for Linux, x86 64 bits,
  • amd64-windows for Windows, x86 64 bits.

More architectures and more platforms will be added in the near future. If you need a specific one, feel free to ask! However, it is possible to produce your own JAR for your own platform and architecture.

The JAR files are named as follows: wasmer-jni-$(architecture)-$(os)-$(version).jar. Thus, to include Wasmer JNI as a dependency of your project (assuming you use Gradle), write for instance:

dependencies {
implementation "org.wasmer:wasmer-jni-amd64-linux:0.2.0"
}

JAR are hosted on the Bintray/JCenter repository under the wasmer-jni project. They are also attached to our Github releases as assets.

Calling a WebAssembly function from Java

As usual, let’s start with a simple Rust program that we will compile to WebAssembly, and then execute from Java.

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

After compilation to WebAssembly, we get a file like this one, named simple.wasm.

The following Java program executes the sum exported function by passing 5 and 37 as arguments:

import org.wasmer.Instance;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

class SimpleExample {
    public static void main(String[] args) throws IOException {
        // Read the WebAssembly bytes.
        byte[] bytes = Files.readAllBytes(Paths.get("simple.wasm"));

        // Instantiate the WebAssembly module.
        Instance instance = new Instance(bytes);

        // Get the `sum` exported function, call it by passing 5 and 37, and get the result.
        Integer result = (Integer) instance.exports.getFunction("sum").apply(5, 37)[0];

        assert result == 42;

        instance.close();
    }
}

Great! We have successfully executed a Rust program, compiled to WebAssembly, in Java. As you can see, it is pretty straightforward. The API is very similar to the standard JavaScript API, or the other API we have designed for PHP, Python, Go, Ruby etc.

The assiduous reader might have noticed the [0] in .apply(5, 37)[0] pattern. A WebAssembly function can return zero to many values, and in this case, we are reading the first one.

Note: Java values passed to WebAssembly exported functions are automatically downcasted to WebAssembly values. Types are inferred at runtime, and casting is done automatically. Thus, a WebAssembly function acts as any regular Java function.

Technically, an exported function is a functional interface as defined by the Java Language Specification (i.e. it is a FunctionalInterface). Thus, it is possible to write the following code where sum is an actual function (of kind org.wasmer.exports.Function):

import org.wasmer.Instance;
import org.wasmer.exports.Function;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

class SimpleExample {
    public static void main(String[] args) throws IOException {
        // Read the WebAssembly bytes.
        byte[] bytes = Files.readAllBytes(Paths.get("simple.wasm"));

        // Instantiate the WebAssembly module.
        Instance instance = new Instance(bytes);

        // Declare the `sum` function, as a regular Java function.
        Function sum = instance.exports.getFunction("sum");
        
        // Call `sum`.
        Integer result = (Integer) sum.apply(1, 2)[0];

        assert result == 3;

        instance.close();
    }
}

But a WebAssembly module not only exports functions, it also exports memory.

Reading the memory

A WebAssembly instance has one or more linear memories, a contiguous and byte-addressable range of memory spanning from offset 0 and extending up to a varying memory size, represented by the org.wasmer.Memory class. Let’s see how to use it. Consider the following Rust program:

#[no_mangle]
pub extern fn return_hello() -> *const u8 {
    b"Hello, World!\0".as_ptr()
}

The return_hello function returns a pointer to the statically allocated string. The string exists in the linear memory of the WebAssembly module. It is then possible to read it in Java:

import org.wasmer.Instance;
import org.wasmer.Memory;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.nio.file.Paths;

class MemoryExample {
    public static void main(String[] args) throws IOException {
        // Read the WebAssembly bytes.
        byte[] bytes = Files.readAllBytes(Paths.get("memory.wasm"));

        // Instantiate the WebAssembly module.
        Instance instance = new Instance(bytes);

        // Get a pointer to the statically allocated string returned by `return_hello`.
        Integer pointer = (Integer) instance.exports.getFunction("return_hello").apply()[0];

        // Get the exported memory named `memory`.
        Memory memory = instance.exports.getMemory("memory");

        // Get a direct byte buffer view of the WebAssembly memory.
        ByteBuffer memoryBuffer = memory.buffer();

        // Prepare the byte array that will hold the data.
        byte[] data = new byte[13];

        // Let's position the cursor, and…
        memoryBuffer.position(pointer);
        
        // … read!
        memoryBuffer.get(data);

        // Let's encode back to a Java string.
        String result = new String(data);

        // Hello!
        assert result.equals("Hello, World!");

        instance.close();
    }
}

As we can see, the Memory API provides a buffer method. It returns a direct byte buffer (of kind java.nio.ByteBuffer) view of the memory. It’s a standard API for any Java developer. We think it’s best to not reinvent the wheel and use standard API as much as possible.

The WebAssembly memory is dissociated from the JVM memory, and thus from the garbage collector.

You can read the Greet Example to see a more in-depth usage of the Memory API.

More documentation

The project comes with a Makefile. The make javadoc command will generate a traditional local Javadoc for you, in the build/docs/javadoc/index.html file.

In addition, the project’s README.md file has an API of the wasmer library Section.

Finally, the project comes with a set of examples. Use the make run-example EXAMPLE=Simple to run the SimpleExample.java example for instance.

Performance

WebAssembly aims at being safe, but also fast. Since Wasmer JNI is the first Java library to execute WebAssembly, we can’t compare to prior works in the Java ecosystem. However, you might know that Wasmer comes with 3 backends: Singlepass, Cranelift and LLVM. We’ve even written an article about it: A WebAssembly Compiler tale. The Wasmer JNI library uses the Cranelift backend for the moment, which offers the best compromise between compilation-time and execution-time.

Credits

Asami (d0iasm on Twitter) has improved this project during its internship at Wasmer under my guidance. She finished the internship before the release of the Wasmer JNI project, but she deserves credits for pushing the project forward! Good work Asami!

This is an opportunity to remind everyone that we hire anywhere in the world. Asami was working from Japan while I am working from Switzerland, and the rest of the team is from US, Spain, China etc. Feel free to contact me (@mnt_io or @syrusakbary on Twitter) if you want to join us on this big adventure!

Conclusion

Wasmer JNI is a library to execute WebAssembly directly in Java. It embeds the WebAssembly runtime Wasmer. The first releases provide the core API with Module, Instance, and Memory. It comes pre-packaged as a JAR, one per architecture and per platform.

The source code is open and hosted on Github at https://github.com/wasmerio/java-ext-wasm. We are constantly improving the project, so if you have feedback, issues, or feature requests please open an issue in the repository, or reach us on Twitter at @wasmerio or @mnt_io.

We look forward to see what you build with this!

Announcing the first Postgres extension to run WebAssembly

This is a copy of an article I wrote for Wasmer.


Image for post
Elephant. Source of the photo.

WebAssembly is a portable binary format. That means the same program can run anywhere.

To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.

Let’s say it again. Wasmer is a WebAssembly runtime. We have successfully embedded the runtime in other languages:

The community has also embedded Wasmer in awesome projects:

It is now time to continue the story and to hang around… Postgres!

We are so happy to announce a newcrazy idea: WebAssembly on Postgres. Yes, you read that correctly. On Postgres.

Calling a WebAssembly function from Postgres

As usual, we have to go through the installation process. There is no package manager for Postgres, so it’s a manual step. The Installation Section of the documentation explains the details; here is a summary:

$ # Build the shared library.
$ just build

$ # Install the extension in the Postgres tree.
$ just install

$ # Activate the extension.
$ echo 'CREATE EXTENSION wasm;' | \
      psql -h $host -d $database

$ # Initialize the extension.
$ echo "SELECT wasm_init('$(pwd)/target/release/libpg_ext_wasm.dylib');" | \
      psql -h $host -d $database

Once the extension is installed, activated and initialized, we can start having fun!

The current API is rather small, however basic features are available. The goal is to gather a community and to design a pragmatic API together, discover the expectations, how developers would use this new technology inside a database engine.

Let’s see how it works. To instantiate a WebAssembly module, we use the wasm_new_instance function. It takes 2 arguments: The absolute path to the WebAssembly module, and a prefix for the module exported functions. Indeed, if a module exports a function named sum, then a Postgres function named prefix_sum calling the sum function will be created dynamically.

Let’s see it in action. Let’s start by editing a Rust program that compiles to WebAssembly:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

Once this file compiled to simple.wasm, we can instantiate the module, and call the exported sum function:

-- New instance of the `simple.wasm` WebAssembly module.
SELECT wasm_new_instance('/absolute/path/to/simple.wasm', 'ns');

-- Call a WebAssembly exported function!
SELECT ns_sum(1, 2);

--  ns_sum
-- --------
--       3
-- (1 row)

Et voilà ! The ns_sum function calls the Rust sum function through WebAssembly! How fun is that 😄?

Inspect a WebAssembly instance

This section shows how to inspect a WebAssembly instance. At the same time, it quickly explains how the extension works under the hood.

The extension provides two foreign data wrappers, gathered together in the wasm foreign schema:

  • wasm.instances is a table with the id and wasm_file columns, respectively for the unique instance ID, and the path of the WebAssembly module,
  • wasm.exported_functions is a table with the instance_id,name, inputs, and outputs columns, respectively for the instance ID of the exported function, its name, its input types (already formatted for Postgres), and its output types (already formatted for Postgres).

Let’s see:

-- Select all WebAssembly instances.
SELECT * FROM wasm.instances;

--                   id                  |          wasm_file
-- --------------------------------------+-------------------------------
--  426e17af-c32f-5027-ad73-239e5450dd91 | /absolute/path/to/simple.wasm
-- (1 row)

-- Select all exported functions for a specific instance.
SELECT
    name,
    inputs,
    outputs
FROM
    wasm.exported_functions
WHERE
    instance_id = '426e17af-c32f-5027-ad73-239e5450dd91';

--   name  |     inputs      | outputs
-- --------+-----------------+---------
--  ns_sum | integer,integer | integer
-- (1 row)

Based on these information, the wasm Postgres extension is able to generate the SQL function to call the WebAssembly exported functions.

It sounds simplistic, and… to be honest, it is! The trick is to use foreign data wrappers, which is an awesome feature of Postgres.

How fast is it, or: Is it an interesting alternative to PL/pgSQL?

As we said, the extension API is rather small for now. The idea is to explore, to experiment, to have fun with WebAssembly inside a database. It is particularly interesting in two cases:

  1. To write extensions or procedures with any languages that compile to WebAssembly in place of PL/pgSQL,
  2. To remove a potential performance bottleneck where speed is involved.

Thus we run a basic benchmark. Like most of the benchmarks out there, it must be taken with a grain of salt.

The goal is to compare the execution time between WebAssembly and PL/pgSQL, and see how both approaches scale.

The Postgres WebAssembly extension uses Wasmer as the runtime, compiled with the Cranelift backend (learn more about the different backends). We run the benchmark with Postgres 10, on a MacBook Pro 15″ from 2016, 2.9Ghz Core i7 with 16Gb of memory.

The methodology is the following:

  • Load both the plpgsql_fibonacci and the wasm_fibonacci functions,
  • Run them with a query like SELECT *_fibonacci(n) FROM generate_series(1, 1000) where n has the following values: 50, 500, and 5000, so that we can observe how both approaches scale,
  • Write the timings down,
  • Run this methodology multiple times, and compute the median of the results.

Here come the results. The lower, the better.

Comparing WebAssembly vs. PL/pgSQL when computing the Fibonacci sequence with n=50, 500 and 5000.

We notice that the Postgres WebAssembly extension is faster to run numeric computations. The WebAssembly approach scales pretty well compared to the PL/pgSQL approach, in this situation.

When to use the WebAssembly extension?

So far, the extension only supports integers (on 32- and 64-bits). The extension doesn’t support strings yet. It also doesn’t support records, views or other Postgres types. Keep in mind this is the very first step.

Hence, it is too soon to tell whether WebAssembly can be an alternative to PL/pgSQL. But regarding the benchmark results above, we are sure they can live side-by-side, WebAssembly has clearly a place in the ecosystem! And we want to continue to pursue this exploration.

Conclusion

We are already talking with people that are interested in using WebAssembly inside databases. If you have any particular use cases, please reach us at wasmer.io, or on Twitter at @wasmerio directly or me @mnt_io.

Everything is open source, as usual! Happy hacking.

Announcing the fastest WebAssembly runtime for Go: wasmer

This is a copy of an article I wrote for Wasmer.


Image for post
Go loves WebAssembly — image attributions to the original Gopher drawing.

WebAssembly is a portable binary format. That means the same file can run anywhere.

To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.

Wasmer is a WebAssembly runtime written in Rust. It goes without saying that the runtime can be used in any Rust application. We have also successfully embedded the runtime in other languages:

We are super happy to announce github.com/wasmerio/go-ext-wasm/wasmer, a Go library to run WebAssembly binaries, fast.

Calling a WebAssembly function from Go

First, let’s install wasmer in your go environment (with cgo support).

export CGO_ENABLED=1; export CC=gcc; go install github.com/wasmerio/go-ext-wasm/wasmer

Let’s jump immediately into some examples.github.com/wasmerio/go-ext-wasm/wasmer is a regular Go library. The installation is automated with import "github.com/wasmerio/go-ext-wasm/wasmer".

Let’s get our hands dirty. We will write a program that compiles to WebAssembly easily, using Rust for instance:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

After compilation to WebAssembly, we get a file like this one, named simple.wasm.
The following Go program executes the sum function by passing 5 and 37 as arguments:

package main

import (
	"fmt"
	wasm "github.com/wasmerio/go-ext-wasm/wasmer"
)

func main() {
	// Reads the WebAssembly module as bytes.
	bytes, _ := wasm.ReadBytes("simple.wasm")
	
	// Instantiates the WebAssembly module.
	instance, _ := wasm.NewInstance(bytes)
	defer instance.Close()

	// Gets the `sum` exported function from the WebAssembly instance.
	sum := instance.Exports["sum"]

	// Calls that exported function with Go standard values. The WebAssembly
	// types are inferred and values are casted automatically.
	result, _ := sum(5, 37)

	fmt.Println(result) // 42!
}

Great! We have successfully executed a WebAssembly file inside Go.

Note: Go values passed to the WebAssembly exported function are automatically cast to WebAssembly values. Types are inferred and casting is done automatically. Thus, a WebAssembly function acts as any regular Go function.

WebAssembly calling Go funtions

A WebAssembly module exports some functions, so that they can be called from the outside world. This is the entry point to execute WebAssembly.

Nonetheless, a WebAssembly module can also have imported functions. Let’s consider the following Rust program:

extern {
    fn sum(x: i32, y: i32) -> i32;
}

#[no_mangle]
pub extern fn add1(x: i32, y: i32) -> i32 {
    unsafe { sum(x, y) } + 1
}

The exported function add1 calls the sum function. Its implementation is absent, only its signature is defined. This is an “extern function”, and for WebAssembly, this is an imported function, because its implementation must be imported.

Let’s implement the sum function in Go! To do so, need to use cgo:

  1. The sum function signature is defined in C (see the comment above import "C"),
  2. The sum implementation is defined in Go. Notice the //export which is the way cgo uses to map Go code to C code,
  3. NewImports is an API used to create WebAssembly imports. In this code "sum" is the WebAssembly imported function name, sum is the Go function pointer, and C.sum is the cgo function pointer,
  4. Finally, NewInstanceWithImports is the constructor to use to instantiate the WebAssembly module with imports. That’s it.

Let’s see the complete program:

package main

// // 1️⃣ Declare the `sum` function signature (see cgo).
//
// #include <stdlib.h>
//
// extern int32_t sum(void *context, int32_t x, int32_t y);
import "C"

import (
	"fmt"
	wasm "github.com/wasmerio/go-ext-wasm/wasmer"
	"unsafe"
)

// 2️⃣ Write the implementation of the `sum` function, and export it (for cgo).
//export sum
func sum(context unsafe.Pointer, x int32, y int32) int32 {
	return x + y
}

func main() {
	// Reads the WebAssembly module as bytes.
	bytes, _ := wasm.ReadBytes("import.wasm")

	// 3️⃣ Declares the imported functions for WebAssembly.
	imports, _ := wasm.NewImports().Append("sum", sum, C.sum)

	// 4️⃣ Instantiates the WebAssembly module with imports.
	instance, _ := wasm.NewInstanceWithImports(bytes, imports)

	// Close the WebAssembly instance later.
	defer instance.Close()

	// Gets the `add1` exported function from the WebAssembly instance.
	add1 := instance.Exports["add1"]

	// Calls that exported function.
	result, _ := add1(1, 2)

	fmt.Println(result)
	//   add1(1, 2)
	// = sum(1 + 2) + 1
	// = 1 + 2 + 1
	// = 4
	// QED
}

Reading the memory

A WebAssembly instance has a linear memory. Let’s see how to read it. Consider the following Rust program:

#[no_mangle]
pub extern fn return_hello() -> *const u8 {
    b"Hello, World!\0".as_ptr()
}

The return_hello function returns a pointer to a string. The string terminates by a null byte, à la C. Let’s jump on the Go side:

bytes, _ := wasm.ReadBytes("memory.wasm")
instance, _ := wasm.NewInstance(bytes)
defer instance.Close()

// Calls the `return_hello` exported function.
// This function returns a pointer to a string.
result, _ := instance.Exports["return_hello"]()

// Gets the pointer value as an integer.
pointer := result.ToI32()

// Reads the memory.
memory := instance.Memory.Data()

fmt.Println(string(memory[pointer : pointer+13])) // Hello, World!

The return_hello function returns a pointer as an i32 value. We get its value by calling ToI32. Then, we fetch the memory data with instance.Memory.Data().

This function returns a slice over the WebAssembly instance memory. It can be used as any regular Go slice.

Fortunately for us, we already know the length of the string we want to read, so memory[pointer : pointer+13] is enough to read the bytes, that are then cast to a string. Et voilà !

You can read the Greet Example to see a more advanced usage of the memory API.

Benchmarks

So far, github.com/wasmerio/go-ext-wasm/wasmer has a nice API, but …is it fast?

Contrary to PHP or Ruby, there are already existing runtimes in the Go world to execute WebAssembly. The main candidates are:

  • Life, from Perlin Network, a WebAssembly interpreter
  • Wagon, from Go Interpreter, a WebAssembly interpreter and toolkit.

In our blog post about the PHP extension, we have used the n-body algorithm to benchmark the performance. Life provides more benchmarks: the Fibonacci algorithm (the recursive version), the Pollard’s rho algorithm, and the Snappy Compress operation. The latter works successfully with github.com/wasmerio/go-ext-wasm/wasmer but not with Life or Wagon. We have removed it from the benchmark suites. Benchmark sources are online.

We use Life 20190521143330–57f3819c2df0, and Wagon 0.4.0, i.e. the latest versions to date.

The benchmark numbers represent the average result for 10 runs each. The computer that ran these benchmarks is a MacBook Pro 15″ from 2016, 2.9Ghz Core i7 with 16Gb of memory.

Results are grouped by benchmark algorithm on the X axis. The Y axis represents the time used to run the algorithm, expressed in milliseconds. The lower, the better.

Speed comparison between Wasmer, Wagon and Life. Benchmark suites are the n-body, Fibonacci, and Pollard’s rho algorithms. Speed is expressed in ms. Lower is better.

While both Life and Wagon provide on average the same speed, Wasmer (github.com/wasmerio/go-ext/wasmer) is on average 72 times faster 🎉.

It is important to know that Wasmer comes with 3 backends: Singlepass, Cranelift, and LLVM. The default backend that is used by the Go library is Cranelift (learn more about Cranelift). Using LLVM will provide performance close to native, but we decided to start with Cranelift as it offers the best tradeoff between compilation-time and execution-time (learn more about the different backends, when to use them, pros and cons etc.).

Conclusion

github.com/wasmerio/go-ext-wasm/wasmer is a new Go library to execute WebAssembly binaries. It embeds the Wasmer runtime. The first version supports all the required API for the most common usages.

The current benchmarks (a mix from our benchmark suites and from Life suites) show that Wasmer is — on average — 72 times faster than Life and Wagon, the two major existing WebAssembly runtimes in the Go world.

If you want to follow the development, take a look at @wasmerio and @mnt_io on Twitter, or @wasmer@webassembly.social on Mastodon. And of course, everything is open source at https://github.com/wasmerio/go-ext-wasm.wasmerio/go-ext-wasm🐹🕸️ Go library to run WebAssembly binaries. Contribute to wasmerio/go-ext-wasm development by creating an account on…github.com

Thank you for your time, we can’t wait to see what you build with us!

🐘+🦀+🕸 php-ext-wasm: Migrating from wasmi to Wasmer

This is a copy of an article I wrote for Wasmer.


Image for post
Elephant in the forest. Source of the photo.

First as a joke, now as a real product, I started to develop php-ext-wasm: a PHP extension allowing to execute WebAssembly binaries.

The PHP virtual machine (VM) is Zend Engine. To write an extension, one needs to develop in C or C++. The extension was simple C bindings to a Rust library I also wrote. At that time, this Rust library was using wasmi for the WebAssembly VM. I knew that wasmi wasn’t the fastest WebAssembly VM in the game, but the API is solid, well-tested, it compiles quickly, and is easy to hack. All the requirements to start a project!

After 6 hours of development, I got something working. I was able to run the following PHP program:

<?php

$instance = new Wasm\Instance('simple.wasm');
$result = $instance->sum(1, 2);

var_dump($result); // int(3)

The API is straightforward: create an instance (here of simple.wasm), then call functions on it (here sum with 1 and 2 as arguments). PHP values are transformed into WebAssembly values automatically. For the record, here is the simple.rs Rust program that is compiled to a WebAssembly binary:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

It was great! 6 hours is a relatively small number of hours to go that far according to me.

However, I quickly noticed that wasmi is… slow. One of the promise of WebAssembly is:

WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

And clearly, my extension wasn’t fulfilling this promise. Let’s see a basic comparison with a benchmark.

I chose the n-body algorithm from the Computer Language Benchmarks Game from Debian, mostly because it’s relatively CPU intensive. Also, the algorithm has a simple interface: based on an integer, it returns a floating-point number; this API doesn’t involve any advanced instance memory API, which is perfect to test a proof-of-concept.

As a baseline, I’ve run the n-body algorithm written in Rust, let’s call it rust-baseline. The same algorithm has been written in PHP, let’s call it php. Finally, the algorithm has been compiled from Rust to WebAssembly, and executed with the php-ext-wasm extension, let’s call that case php+wasmi. All results are for nbody(5000000):

  • rust-baseline: 287ms,
  • php: 19,761ms,
  • php+wasmi: 67,622ms.

OK, so… php-ext-wasm with wasmi is 3.4 times slower than PHP itself, it is pointless to use WebAssembly in such conditions!

It confirms my first intuition though: In our case, wasmi is really great to mock something up, but it’s not fast enough for our expectations.

Faster, faster, faster…

I wanted to use Cranelift since the beginning. It’s a code generator, à la LLVM (excuse the brutal shortcut, the goal isn’t to explain what Cranelift is in details, but that’s a really awesome project!). To quote the project itself:

Cranelift is a low-level retargetable code generator. It translates a target-independent intermediate representation into executable machine code.

It basically means that the Cranelift API can be used to generate executable code.

It’s perfect! I can replace wasmi by Cranelift, and boom, profit. But… there is other ways to get even faster code execution — at the cost of a longer code compilation though.

For instance, LLVM can provide a very fast code execution, almost at native speed. Or we can generate assembly code dynamically. Well, there is multiple ways to achieve that. What if a project could provide a WebAssembly virtual machine with multiple backends?

Enter Wasmer

And it was at that specific time that I’ve been hired by Wasmer. To be totally honest, I was looking at Wasmer a few weeks before. It was a surprise and a great opportunity for me. Well, the universe really wants this rewrite from wasmi to Wasmer, right 😅?

Wasmer is organized as a set of Rust libraries (called crates). There is even a wasmer-runtime-c-api crate which is a C and a C++ API on top of the wasmer-runtime crate and the wasmer-runtime-core crate, i.e. it allows running the WebAssembly virtual machine as you want, with the backend of your choice: Cranelift, LLVM, or Dynasm (at the time of writing). That’s perfect, it removes my Rust library between the PHP extension and wasmi. Then php-ext-wasm is reduced to a PHP extension without any Rust code, everything goes to wasmer-runtime-c-api. That’s sad to remove Rust from this project, but it relies on more Rust code!

Counting the time to make some patches on wasmer-runtime-c-api, I’ve been able to migrate php-ext-wasm to Wasmer in 5 days.

By default, php-ext-wasm uses Wasmer with the Cranelift backend, it does a great balance between compilation and execution time. It is really good. Let’s run the benchmark, with the addition of php+wasmer(cranelift):

  • rust-baseline: 287ms,
  • php: 19,761ms,
  • php+wasmi: 67,622ms,
  • php+wasmer(cranelift): 2,365ms 🎉.

Finally, the PHP extension provides a faster execution than PHP itself! php+wasmer(cranelift) is 8.6 times faster than php to be exact. And it is 28.6 times faster than php+wasmi. Can we reach the native speed (represented by rust-baseline here)? It’s very likely with LLVM. That’s for another article. I’m super happy with Cranelift for the moment. (See our previous blog post to learn how we benchmark different backends in Wasmer, and other WebAssembly runtimes).

More Optimizations

Wasmer provides more features, like module caching. Those features are now included in the PHP extension. When booting the nbody.wasm file (19kb), it took 4.2ms. By booting, I mean: reading the WebAssembly binary from a file, parsing it, validating it, compiling it to executable code and a WebAssembly module structure.

PHP execution model is: starts, runs, dies. Memory is freed for each request. If one wants to use php-ext-wasm, you don’t really want to pay that “booting cost” every time.

Hopefully, wasmer-runtime-c-api now provides a module serialization API, which is integrated into the PHP extension itself. It saves the “booting cost”, but it adds a “deserialization cost”. That second cost is smaller, but still, we need to know it exists.

Hopefully again, Zend Engine has an API to get persistent in-memory data between PHP executions. php-ext-wasm supports that API to get persistent modules, et voilà.

Now it takes 4.2ms for the first boot of nbody.wasm and 0.005ms for all the next boots. It’s 840 times faster!

Conclusion

Wasmer is a young — but mature — framework to build WebAssembly runtimes on top of. The default backend is Cranelift, and it shows its promises: It brings a correct balance between compilation time and execution time.

wasmi has been a good companion to develop a Proof-Of-Concept. This library has its place in other usages though, like very short-living WebAssembly binaries (I’m thinking of Ethereum contracts that compile to WebAssembly for instance, which is one of the actual use cases). It’s important to understand that no runtime is better than another, it depends on the use case.

The next step is to stabilize php-ext-wasm to release a 1.0.0 version.

See you there!

If you want to follow the development, take a look at @wasmerio and @mnt_io on Twitter.

Bye bye Automattic, hello Wasmer

Today is my first day at Wasmer.

Wasmer’s logo. Build Once, Run Anywhere.
Universal Binaries powered by WebAssembly.

It’s with a lot of regrets that I leave Automattic. To be clear, I’m not leaving because something negative happened, I’m leaving because I’ve received the same job offer, 3 times in 10 days, from Wasmer, Google and Mozilla. Namely to work with Rust or C++ to build a WebAssembly runtime. This is an offer I can barely decline. It’s an opportunity and a dream for me. And I was lucky enough to get a choice between 3 excellent companies!

I can only encourage you to work with Automattic. It’s definitely the best company I’ve ever work with; stealing the 1st place to Mozilla. Automattic is not only about WordPress.com and other services: It’s a way of living. The culture, the spirit, the interactions between people, the mission, everything is exceptional. It has been a super great experience.

I could write 100 pages about my team. They have all been remarkable in many ways. I’m closer to them although they live at 10’000km, rather than colleagues I met everyday in person in the past. Congrats to Matt for this incredible project.

Now it’s time to work on Wasmer. It’s a WebAssembly runtime written in Rust: My two current passions. It’s powerful, modular, well-designed, and it comes with great ambitions. I’m really exciting. I work with an extraordinary team: Syrus Akbary (the author of Graphene, a GraphQL framework in Python), Lachlan Sneff (the author of Nebulet, a microkernel that implements a WebAssembly “usermode” that runs in Ring 0), Brandon Fish (a great contributor of Truffleruby, a high performance implementation of Ruby with GraalVM), Mackenzie Clark, and soon more.

My job will consist to work on the runtime of course, and also to integrate/embed the runtime into different languages, such as PHP —like I did with php-ext-wasm, more to come on this blog—. More secret projects coming. Let’s turn them into realities 🎉!

From Rust to beyond: The ASM.js galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies. Rust has visited:


The second galaxy that our Rust parser will explore is the ASM.js galaxy. This post will explain what ASM.js is, how to compile the parser into ASM.js, and how to use the ASM.js module with Javascript in a browser. The goal is to use ASM.js as a fallback to WebAssembly when it is not available. I highly recommend to read the previous episode about WebAssembly since they have a lot in common.

What is ASM.js, and why?

The main programming language on the Web is Javascript. Applications that want to exist on the Web had to compile to Javascript, like for example games. But a problem occurs: The resulting file is heavy (hence WebAssembly) and Javascript virtual machines have difficulties to optimise this particular code, resulting in slow or inefficient executions (considering the example of games). Also —in this context— Javascript is a compilation target, and as such, some language constructions are useless (like eval).

So what if a “new” language can be a compilation target and still be executed by Javascript virtual machines? This is WebAssembly today, but in 2013, the solution was ASM.js:

asm.js, a strict subset of Javascript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a sandboxed virtual machine for memory-unsafe languages like C or C++. A combination of static and dynamic validation allows Javascript engines to employ an ahead-of-time (AOT) optimizing compilation strategy for valid asm.js code.

So an ASM.js program is a regular Javascript program. It is not a new language but a subset of it. It can be executed by any Javascript virtual machines. However, the specific usage of the magic statement 'use asm'; instructs the virtual machine to optimise the program with an ASM.js “engine”.

ASM.js introduces types by using arithmetical operators as an annotation system. For instance, x | 0 annotes x to be an integer, +x annotates x to be a double, and fround(x) annotates x to be a float. The following example declares a function fn increment(x: u32) -> u32:

function increment(x) {
    x = x | 0;
    return (x + 1) | 0;
}

Another important difference is that ASM.js works by module in order to isolate them from Javascript. A module is a function that takes 3 arguments:

  1. stdlib, an object with references to standard library APIs,
  2. foreign, an object with user-defined functionalities (such as sending something over a WebSocket),
  3. heap, an array buffer representing the memory (because memory is manually managed).

But it’s still Javascript. So the good news is that if your virtual machine has no specific optimisations for ASM.js, it is executed as any regular Javascript program. And if it does, then you get a pleasant boost.

macro4b
A graph showing 3 benchmarks running against different Javascript engines: Firefox, Firefox + asm.js, Google, and native.

Remember that ASM.js has been designed to be a compilation target. So normally you don’t have to care about that because it is the role of the compiler. The typical compilation and execution pipeline from C or C++ to the Web looks like this:

1yoy1fa
Classical ASM.js compilation and execution pipeline from C or C++ to the Web.

Emscripten, as seen in the schema above, is a very important project in this whole evolution of the Web platform. Emscripten is:

a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins.

You are very likely to see this name one day or another if you work with ASM.js or WebAssembly.

I will not explain deeply what ASM.js is with a lot of examples. I recommend instead to read Asm.js: The Javascript Compile Target by John Resig, or Big Web app? Compile it! by Alon Zakai.

Our process will be different though. We will not compile our Rust code directly to ASM.js, but instead, we will compile it to WebAssembly, which in turn will be compiled into ASM.js.

Rust 🚀 ASM.js

Rust to ASM.js

This episode will be very short, and somehow the most easiest one. To compile Rust to ASM.js, you need to first compile it to WebAssembly (see the previous episode), and then compile the WebAssembly binary into ASM.js.

Actually, ASM.js is mostly required when the browser does not support WebAssembly, like Internet Explorer. It is essentially a fallback to run our program on the Web.

The workflow is the following:

  1. Compile your Rust project into WebAssembly,
  2. Compile your WebAssembly binary into an ASM.js module,
  3. Optimise and shrink the ASM.js module.

The wasm2js tool will be your best companion to compile the WebAssembly binary into an ASM.js module. It is part of Binaryen project. Then, assuming we have the WebAssembly binary of our program, all we have to do is:

$ wasm2js --pedantic --output gutenberg_post_parser.asm.js gutenberg_post_parser.wasm

At this step, the gutenberg_post_parser.asm.js weights 212kb. The file contains ECMAScript 6 code. And remember that old browsers are considered, like Internet Explorer, so the code needs to be transformed a little bit. To optimise and shrink the ASM.js module, we will use the uglify-es tool, like this:

$ # Transform code, and embed in a function.
$ sed -i '' '1s/^/function GUTENBERG_POST_PARSER_ASM_MODULE() {/; s/export //' gutenberg_post_parser.asm.js
$ echo 'return { root, alloc, dealloc, memory }; }' >> gutenberg_post_parser.asm.js

$ # Shrink the code.
$ uglifyjs --compress --mangle --output .temp.asm.js gutenberg_post_parser.asm.js
$ mv .temp.asm.js gutenberg_post_parser.asm.js

Just like we did for the WebAssembly binary, we can compress the resulting files with gzip and brotli:

$ # Compress.
$ gzip --best --stdout gutenberg_post_parser.asm.js > gutenberg_post_parser.asm.js.gz
$ brotli --best --stdout --lgwin=24 gutenberg_post_parser.asm.js > gutenberg_post_parser.asm.js.br

We end up with the following file sizes:

  • .asm.js: 54kb,
  • .asm.js.gz: 13kb,
  • .asm.js.br: 11kb.

That’s again pretty small!

When you think about it, this is a lot of transformations: From Rust to WebAssembly to Javascript/ASM.js… The amount of tools is rather small compared to the amount of work. It shows a well-designed pipeline and a collaboration between many groups of people.


Aside: If you are reading this post, I assume you are developers. And as such, I’m sure you can spend hours looking at a source code like if it is a master painting. Did you ever wonder what a Rust program looks like once compiled to Javascript? See bellow:

Screen Shot 2018-08-28 at 09.29.26
A Rust program compiled as WebAssembly compiled as ASM.js.

I like it probably too much.

ASM.js 🚀 Javascript

The resulting gutenberg_post_parser.asm.js file contains a single function named GUTENBERG_POST_PARSER_ASM_MODULE which returns an object pointing to 4 private functions:

  1. root, the axiom of our grammar,
  2. alloc, to allocate memory,
  3. dealloc, to deallocate memory, and
  4. memory, the memory buffer.

It sounds familiar if you have read the previous episode with WebAssembly. Don’t expect root to return a full AST: It will return a pointer to the memory, and the data need to be encoded and decoded, and to write into and to read from the memory the same way. Yes, the same way. The exact same way. So the code of the boundary layer is strictly the same. Do you remember the Module object in our WebAssembly Javascript boundary? This is exactly what the GUTENBERG_POST_PARSER_ASM_MODULE function returns. You can replace Module by the returned object, et voilà!

The entired code lands here. It completely reuses the Javascript boundary layer for WebAssembly. It just sets the Module differently, and it does not load the WebAssembly binary. Consequently, the ASM.js boundary layer is made of 34 lines of code, only 🙃. It compresses to 218 bytes.

Conclusion

We have seen that ASM.js can be fallback to WebAssembly in environments that only support Javascript (like Internet Explorer), with or without ASM.js optimisations.

The resulting ASM.js file and its boundary layer are quite small. By design, the ASM.js boundary layer reuses almost the entire WebAssembly boundary layer. Therefore there is again a tiny surface of code to review and to maintain, which is helpful.

We have seen in the previous episode that Rust is very fast. We have been able to observe the same statement for WebAssembly compared to the actual Javascript parser for the Gutenberg project. However, is it still true for the ASM.js module? In this case, ASM.js is a fallback, and like all fallbacks, they are notably slower than the targeted implementations. Let’s run the same benchmark but use the Rust parser as an ASM.js module:

Javascript parser (ms) Rust parser as an ASM.js module (ms) speedup
demo-post.html 15.368 2.718 × 6
shortcode-shortcomings.html 31.022 8.004 × 4
redesigning-chrome-desktop.html 106.416 19.223 × 6
web-at-maximum-fps.html 82.92 27.197 × 3
early-adopting-the-future.html 119.880 38.321 × 3
pygmalian-raw-html.html 349.075 23.656 × 15
moby-dick-parsed.html 2,543.75 361.423 × 7

The ASM.js module of the Rust parser is in average 6 times faster than the actual Javascript implementation. The median speedup is 6. That’s far from the WebAssembly results, but this is a fallback, and in average, it is 6 times faster, which is really great!

So not only the whole pipeline is safer because it starts from Rust, but it ends to be faster than Javascript.

We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.

Thanks for reading!

From Rust to beyond: The WebAssembly galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies:


The first galaxy that our Rust parser will explore is the WebAssembly (WASM) galaxy. This post will explain what WebAssembly is, how to compile the parser into WebAssembly, and how to use the WebAssembly binary with Javascript in a browser and with NodeJS.

What is WebAssembly, and why?

If you already know WebAssembly, you can skip this section.

WebAssembly defines itself as:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

Should I say more? Probably, yes…

WebAssembly is a new portable binary format. Languages like C, C++, or Rust already compiles to this target. It is the spirit successor of ASM.js. By spirit successor, I mean it is the same people trying to extend the Web platform and to make the Web fast that are working on both technologies. They share some design concepts too, but that’s not really important right now.

Before WebAssembly, programs had to compile to Javascript in order to run on the Web platform. The resulting files were most of the time large. And because the Web is a network, the files had to be downloaded, and it took time. WebAssembly is designed to be encoded in a size- and load-time efficient binary format.

WebAssembly is also faster than Javascript for many reasons. Despites all the crazy optimisations engineers put in the Javascript virtual machines, Javascript is a weakly and dynamically typed language, which requires to be interpreted. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities. WebAssembly also loads faster than Javascript because parsing and compiling happen while the binary is streamed from the network. So once the binary is entirely fetched, it is ready to run: No need to wait on the parser and the compiler before running the program.

Today, and our blog series is a perfect example of that, it is possible to write a Rust program, and to compile it to run on the Web platform. Why? Because WebAssembly is implemented by all major browsers, and because it has been designed for the Web: To live and run on the Web platform (like a browser). But its portable aspect and its safe and sandboxed memory design make it a good candidate to run outside of the Web platform (see a serverless WASM framework, or an application container built for WASM).

I think it is important to remind that WebAssembly is not here to replace Javascript. It is just another technology which solves many problems we can meet today, like load-time, safety, or speed.

Rust 🚀 WebAssembly

Rust to WASM

The Rust WASM team is a group of people leading the effort of pushing Rust into WebAssembly with a set of tools and integrations. There is a book explaining how to write a WebAssembly program with Rust.

With the Gutenberg Rust parser, I didn’t use tools like wasm-bindgen (which is a pure gem) when I started the project few months ago because I hit some limitations. Note that some of them have been addressed since then! Anyway, we will do most of the work by hand, and I think this is an excellent way to understand how things work in the background. When you are familiar with WebAssembly interactions, then wasm-bindgen is an excellent tool to have within easy reach, because it abstracts all the interactions and let you focus on your code logic instead.

I would like to remind the reader that the Gutenberg Rust parser exposes one AST, and one root function (the axiom of the grammar), respectively defined as:

pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phrase(Input<'a>)
}

and

pub fn root(
    input: Input
) -> Result<(Input, Vec<ast::Node>), nom::Err<Input>>;

Knowing that, let’s go!

General design

Here is our general design or workflow:

  1. Javascript (for instance) writes the blog post to parse into the WebAssembly module memory,
  2. Javascript runs the root function by passing a pointer to the memory, and the length of the blog post,
  3. Rust reads the blog post from the memory, runs the Gutenberg parser, compiles the resulting AST into a sequence of bytes, and returns the pointer to this sequence of bytes to Javascript,
  4. Javascript reads the memory from the received pointer, and decodes the sequence of bytes as Javascript objects in order to recreate an AST with a friendly API.

Why a sequence of bytes? Because WebAssembly only supports integers and floats, not strings or vectors, and also because our Rust parser takes a slice of bytes as input, so this is handy.

We use the term boundary layer to refer to this Javascript piece of code responsible to read from and write into the WebAssembly module memory, and responsible of exposing a friendly API.

Now, we will focus on the Rust code. It consists of only 4 functions:

  • alloc to allocate memory (exported),
  • dealloc to deallocate memory (exported),
  • root to run the parser (exported),
  • into_bytes to transform the AST into a sequence of bytes.

The entire code lands here. It is approximately 150 lines of code. We explain it.

Memory allocation

Let’s start by the memory allocator. I choose to use wee_alloc for the memory allocator. It is specifically designed for WebAssembly by being very small (less than a kilobyte) and efficient.

The following piece of code describes the memory allocator setup and the “prelude” for our code (enabling some compiler features, like alloc, declaring external crates, some aliases, and declaring required function like panic, oom etc.). This can be considered as a boilerplate:

#![no_std]
#![feature(
    alloc,
    alloc_error_handler,
    core_intrinsics,
    lang_items
)]

extern crate gutenberg_post_parser;
extern crate wee_alloc;
#[macro_use] extern crate alloc;

use gutenberg_post_parser::ast::Node;
use alloc::vec::Vec;
use core::{mem, slice};

#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::intrinsics::abort(); }
}

#[alloc_error_handler]
fn oom(_: core::alloc::Layout) -> ! {
    unsafe { core::intrinsics::abort(); }
}

// This is the definition of `std::ffi::c_void`, but WASM runs without std in our case.
#[repr(u8)]
#[allow(non_camel_case_types)]
pub enum c_void {
    #[doc(hidden)]
    __variant1,

    #[doc(hidden)]
    __variant2
}

The Rust memory is the WebAssembly memory. Rust will allocate and deallocate memory on its own, but Javascript for instance needs to allocate and deallocate WebAssembly memory in order to communicate/exchange data. So we need to export one function to allocate memory and one function to deallocate memory.

Once again, this is almost a boilerplate. The alloc function creates an empty vector of a specific capacity (because it is a linear segment of memory), and returns a pointer to this empty vector:

#[no_mangle]
pub extern "C" fn alloc(capacity: usize) -> *mut c_void {
    let mut buffer = Vec::with_capacity(capacity);
    let pointer = buffer.as_mut_ptr();
    mem::forget(buffer);

    pointer as *mut c_void
}

Note the #[no_mangle] attribute that instructs the Rust compiler to not mangle the function name, i.e. to not rename it. And extern "C" to export the function in the WebAssembly module, so it is “public” from outside the WebAssembly binary.

The code is pretty straightforward and matches what we announced earlier: A Vec is allocated with a specific capacity, and the pointer to this vector is returned. The important part is mem::forget(buffer). It is required so that Rust will not deallocate the vector once it goes out of scope. Indeed, Rust enforces Resource Acquisition Is Initialization (RAII), so whenever an object goes out of scope, its destructor is called and its owned resources are freed. This behavior shields against resource leaks bugs, and this is why we will never have to manually free memory or worry about memory leaks in Rust (see some RAII examples). In this case, we want to allocate and keep the allocation after the function execution, hence the mem::forget call.

Let’s jump on the dealloc function. The goal is to recreate a vector based on a pointer and a capacity, and to let Rust drops it:

#[no_mangle]
pub extern "C" fn dealloc(pointer: *mut c_void, capacity: usize) {
    unsafe {
        let _ = Vec::from_raw_parts(pointer, 0, capacity);
    }
}

The Vec::from_raw_parts function is marked as unsafe, so we need to delimit it in an unsafe block so that the dealloc function is considered as safe.

The variable _ contains our data to deallocate, and it goes out of scope immediately, so Rust drops it.

From input to a flat AST

Now the core of the binding! The root function reads the blog post to parse based on a pointer and a length, then it parses it. If the result is OK, it serializes the AST into a sequence of bytes, i.e. it flatten it, otherwise it returns an empty sequence of bytes.

Flatten AST
The logic flow of the parser: The input on the left is parsed into an AST, which is serialized into a flat sequence of bytes on the right.
#[no_mangle]
pub extern "C" fn root(pointer: *mut u8, length: usize) -> *mut u8 {
    let input = unsafe { slice::from_raw_parts(pointer, length) };
    let mut output = vec![];

    if let Ok((_remaining, nodes)) = gutenberg_post_parser::root(input) {
        // Compile the AST (nodes) into a sequence of bytes.
    }

    let pointer = output.as_mut_ptr();
    mem::forget(output);

    pointer
}

The variable input contains the blog post. It is fetched from memory with a pointer and a length. The variable output is the sequence of bytes the function will return. gutenberg_post_parser::root(input) runs the parser. If parsing is OK, then the nodes are compiled into a sequence of bytes (omitted for now). Then the pointer to the sequence of bytes is grabbed, the Rust compiler is instructed to not drop it, and finally the pointer is returned. The logic is again pretty straightforward.

Now, let’s focus on the AST to the sequence of bytes (u8) compilation. All data the AST hold are already bytes, which makes the process easier. The goal is only to flatten the AST:

  • The first 4 bytes represent the number of nodes at the first level (4 × u8 represents u32) ,
  • Next, if the node is Block:
    • The first byte is the node type: 1u8 for a block,
    • The second byte is the size of the block name,
    • The third to the sixth bytes are the size of the attributes,
    • The seventh byte is the number of node children the block has,
    • Next bytes are the block name,
    • Next bytes are the attributes (&b"null"[..] if none),
    • Next bytes are node children as a sequence of bytes,
  • Next, if the node is Phrase:
    • The first byte is the node type: 2u8 for a phrase,
    • The second to the fifth bytes are the size of the phrase,
    • Next bytes are the phrase.

Here is the missing part of the root function:

if let Ok((_remaining, nodes)) = gutenberg_post_parser::root(input) {
    let nodes_length = u32_to_u8s(nodes.len() as u32);

    output.push(nodes_length.0);
    output.push(nodes_length.1);
    output.push(nodes_length.2);
    output.push(nodes_length.3);

    for node in nodes {
        into_bytes(&node, &mut output);
    }
}

And here is the into_bytes function:

fn into_bytes<'a>(node: &Node<'a>, output: &mut Vec<u8>) {
    match *node {
        Node::Block { name, attributes, ref children } => {
            let node_type = 1u8;
            let name_length = name.0.len() + name.1.len() + 1;
            let attributes_length = match attributes {
                Some(attributes) => attributes.len(),
                None => 4
            };
            let attributes_length_as_u8s = u32_to_u8s(attributes_length as u32);

            let number_of_children = children.len();
            output.push(node_type);
            output.push(name_length as u8);
            output.push(attributes_length_as_u8s.0);
            output.push(attributes_length_as_u8s.1);
            output.push(attributes_length_as_u8s.2);
            output.push(attributes_length_as_u8s.3);
            output.push(number_of_children as u8);

            output.extend(name.0);
            output.push(b'/');
            output.extend(name.1);

            if let Some(attributes) = attributes {
                output.extend(attributes);
            } else {
                output.extend(&b"null"[..]);
            }

            for child in children {
                into_bytes(&child, output);
            }
        },

        Node::Phrase(phrase) => {
            let node_type = 2u8;
            let phrase_length = phrase.len();

            output.push(node_type);

            let phrase_length_as_u8s = u32_to_u8s(phrase_length as u32);

            output.push(phrase_length_as_u8s.0);
            output.push(phrase_length_as_u8s.1);
            output.push(phrase_length_as_u8s.2);
            output.push(phrase_length_as_u8s.3);
            output.extend(phrase);
        }
    }
}

What I find interesting with this code is it reads just like the bullet list above the code.

For the most curious, here is the u32_to_u8s function:

fn u32_to_u8s(x: u32) -> (u8, u8, u8, u8) {
    (
        ((x >> 24) & 0xff) as u8,
        ((x >> 16) & 0xff) as u8,
        ((x >> 8)  & 0xff) as u8,
        ( x        & 0xff) as u8
    )
}

Here we are. alloc, dealloc, root, and into_bytes. Four functions, and everything is done.

Producing and optimising the WebAssembly binary

To get a WebAssembly binary, the project has to be compiled to the wasm32-unknown-unknown target. For now (and it will change in a near future), the nightly toolchain is needed to compile the project, so make sure you have the latest nightly version of rustc & co. installed with rustup update nightly. Let’s run cargo:

$ RUSTFLAGS='-g' cargo +nightly build --target wasm32-unknown-unknown --release

The WebAssembly binary weights 22kb. Our goal is to reduce the file size. For that, the following tools will be required:

  • wasm-gc to garbage-collect unused imports, internal functions, types etc.,
  • wasm-snip to mark some functions as unreachable, this is useful when the binary includes unused code that the linker were not able to remove,
  • wasm-opt from the Binaryen project, to optimise the binary,
  • gzip and brotli to compress the binary.

Basically, what we do is the following:

$ # Garbage-collect unused data.
$ wasm-gc gutenberg_post_parser.wasm

$ # Mark fmt and panicking as unreachable.
$ wasm-snip --snip-rust-fmt-code --snip-rust-panicking-code gutenberg_post_parser.wasm -o gutenberg_post_parser_snipped.wasm
$ mv gutenberg_post_parser_snipped.wasm gutenberg_post_parser.wasm

$ # Garbage-collect unreachable data.
$ wasm-gc gutenberg_post_parser.wasm

$ # Optimise for small size.
$ wasm-opt -Oz -o gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm
$ mv gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm

$ # Compress.
$ gzip --best --stdout gutenberg_post_parser.wasm > gutenberg_post_parser.wasm.gz
$ brotli --best --stdout --lgwin=24 gutenberg_post_parser.wasm > gutenberg_post_parser.wasm.br 

We end up with the following file sizes:

  • .wasm: 16kb,
  • .wasm.gz: 7.3kb,
  • .wasm.br: 6.2kb.

Neat! Brotli is implemented by most browsers, so when the client sends Accept-Encoding: br, the server can response with the .wasm.br file.

To give you a feeling of what 6.2kb represent, the following image also weights 6.2kb:

1398208027wordpress-logo-simplified-rgb

The WebAssembly binary is ready to run!

WebAssembly 🚀 Javascript

WASM to JS

In this section, we assume Javascript runs in a browser. Thus, what we need to do is the following:

  1. Load/stream and instanciate the WebAssembly binary,
  2. Write the blog post to parse in the WebAssembly module memory,
  3. Call the root function on the parser,
  4. Read the WebAssembly module memory to load the flat AST (a sequence of bytes) and decode it to build a “Javascript AST” (with our own objects).

The entire code lands here. It is approximately 150 lines of code too. I won’t explain the whole code since some parts of it is the “friendly API” that is exposed to the user. So I will rather explain the major pieces.

Loading/streaming and instanciating

The WebAssembly API exposes multiple ways to load a WebAssembly binary. The best you can use is the WebAssembly.instanciateStreaming function: It streams the binary and compiles it in the same time, nothing is blocking. This API relies on the Fetch API. You might have guessed it: It is asynchronous (it returns a promise). WebAssembly itself is not asynchronous (except if you use thread), but the instanciation step is. It is possible to avoid that, but this is tricky, and Google Chrome has a strong limit of 4kb for the binary size which will make you give up quickly.

To be able to stream the WebAssembly binary, the server must send the application/wasm MIME type (with the Content-Type header).

Let’s instanciate our WebAssembly:

const url = '/gutenberg_post_parser.wasm';
const wasm =
    WebAssembly.
        instantiateStreaming(fetch(url), {}).
        then(object => object.instance).
        then(instance => { /* step 2 */ });

The WebAssembly binary has been instanciated! Now we can move to the next step.

Last polish before running the parser

Remember that the WebAssembly binary exports 3 functions: alloc, dealloc, and root. They can be found on the exports property, along with the memory. Let’s write that:

        then(instance => {
            const Module = {
                alloc: instance.exports.alloc,
                dealloc: instance.exports.dealloc,
                root: instance.exports.root,
                memory: instance.exports.memory
            };

            runParser(Module, '<!-- wp:foo /-->xyz');
        });

Great, everything is ready to write the runParser function!

The parser runner

As a reminder, this function has to: Write the input (the blog post to parse) in the WebAssembly module memory (Module.memory), to call the root function (Module.root), and to read the result from the WebAssembly module memory. Let’s do that:

function runParser(Module, raw_input) {
    const input = new TextEncoder().encode(raw_input);
    const input_pointer = writeBuffer(Module, input);
    const output_pointer = Module.root(input_pointer, input.length);
    const result = readNodes(Module, output_pointer);

    Module.dealloc(input_pointer, input.length);

    return result;
}

In details:

  • The raw_input is encoded into a sequence of bytes with the TextEncoderAPI, in input,
  • The input is written into the WebAssembly memory module with writeBuffer and its pointer is returned,
  • Then the root function is called with the pointer to the input and the length of the input as expected, and the pointer to the output is returned,
  • Then the output is decoded,
  • And finally, the input is deallocated. The output of the parser will be deallocated in the readNodes function because its length is unknown at this step.

Great! So we have 2 functions to write right now: writeBuffer​ and readNodes.

Writing the data in memory

Let’s go with the first one, writeBuffer:

function writeBuffer(Module, buffer) {
    const buffer_length = buffer.length;
    const pointer = Module.alloc(buffer_length);
    const memory = new Uint8Array(Module.memory.buffer);

    for (let i = 0; i < buffer_length; ++i) {
        memory[pointer + i] = buffer[i];
    }

    return pointer;
}

In details:

  • The length of the buffer is read in buffer_length,
  • A space in memory is allocated to write the buffer,
  • Then a uint8 view of the buffer is instanciated, which means that the buffer will be viewed as a sequence of u8, exactly what Rust expects,
  • Finally the buffer is copied into the memory with a loop, that’s very basic, and return the pointer.

Note that, unlike C strings, adding a NUL byte at the end is not mandatory. This is just the raw data (on the Rust side, we read it with slice::from_raw_parts, slice is a very simple structure).

Reading the output of the parser

So at this step, the input has been written in memory, and the root function has been called so it means the parser has run. It has returned a pointer to the output (the result) and we now have to read it and decode it.

Remind that the first 4 bytes encodes the number of nodes we have to read. Let’s go!

function readNodes(Module, start_pointer) {
    const buffer = new Uint8Array(Module.memory.buffer.slice(start_pointer));
    const number_of_nodes = u8s_to_u32(buffer[0], buffer[1], buffer[2], buffer[3]);

    if (0 >= number_of_nodes) {
        return null;
    }

    const nodes = [];
    let offset = 4;
    let end_offset;

    for (let i = 0; i < number_of_nodes; ++i) {
        const last_offset = readNode(buffer, offset, nodes);

        offset = end_offset = last_offset;
    }

    Module.dealloc(start_pointer, start_pointer + end_offset);

    return nodes;
}

In details:

  • A uint8 view of the memory is instanciated… more precisely: A slice of the memory starting at start_pointer,
  • The number of nodes is read, then all nodes are read,
  • And finally, the output of the parser is deallocated.

For the record, here is the u8s_to_u32 function, this is the exact opposite of u32_to_u8s:

function u8s_to_u32(o, p, q, r) {
    return (o << 24) | (p << 16) | (q << 8) | r;
}

And I will also share the readNode function, but I won’t explain the details. This is just the decoding part of the output from the parser.

function readNode(buffer, offset, nodes) {
    const node_type = buffer[offset];

    // Block.
    if (1 === node_type) {
        const name_length = buffer[offset + 1];
        const attributes_length = u8s_to_u32(buffer[offset + 2], buffer[offset + 3], buffer[offset + 4], buffer[offset + 5]);
        const number_of_children = buffer[offset + 6];

        let payload_offset = offset + 7;
        let next_payload_offset = payload_offset + name_length;

        const name = new TextDecoder().decode(buffer.slice(payload_offset, next_payload_offset));

        payload_offset = next_payload_offset;
        next_payload_offset += attributes_length;

        const attributes = JSON.parse(new TextDecoder().decode(buffer.slice(payload_offset, next_payload_offset)));

        payload_offset = next_payload_offset;
        let end_offset = payload_offset;

        const children = [];

        for (let i = 0; i < number_of_children; ++i) {
            const last_offset = readNode(buffer, payload_offset, children);

            payload_offset = end_offset = last_offset;
        }

        nodes.push(new Block(name, attributes, children));

        return end_offset;
    }
    // Phrase.
    else if (2 === node_type) {
        const phrase_length = u8s_to_u32(buffer[offset + 1], buffer[offset + 2], buffer[offset + 3], buffer[offset + 4]);
        const phrase_offset = offset + 5;
        const phrase = new TextDecoder().decode(buffer.slice(phrase_offset, phrase_offset + phrase_length));

        nodes.push(new Phrase(phrase));

        return phrase_offset + phrase_length;
    } else {
        console.error('unknown node type', node_type);
    }
}

Note that this code is pretty simple and easy to optimise by the Javascript virtual machine. It is almost important to note that this is not the original code. The original version is a little more optimised here and there, but they are very close.

And that’s all! We have successfully read and decoded the output of the parser! We just need to write the Block and Phrase classes like this:

class Block {
    constructor(name, attributes, children) {
        this.name = name;
        this.attributes = attributes;
        this.children = children;
    }
}

class Phrase {
    constructor(phrase) {
        this.phrase = phrase;
    }
}

The final output will be an array of those objects. Easy!

WebAssembly 🚀 NodeJS

WASM to NodeJS

The differences between the Javascript version and the NodeJS version are few:

  • The Fetch API does not exist in NodeJS, so the WebAssembly binary has to be instanciated with a buffer directly, like this: WebAssembly.instantiate(fs.readFileSync(url), {}),
  • The TextEncoder and TextDecoder objects do not exist as global objects, they are in util.TextEncoder and util.TextDecoder.

In order to share the code between both environments, it is possible to write the boundary layer (the Javascript code we wrote) in a .mjs file, aka ECMAScript Module. It allows to write something like import { Gutenberg_Post_Parser } from './gutenberg_post_parser.mjs' for example (considering the whole code we wrote before is a class). On the browser side, the script must be loaded with<script type="module" src="…" />, and on the NodeJS side, node must run with the --experimental-modules flag. I can recommend you this talk Please wait… loading: a tale of two loaders by Myles Borins at the JSConf EU 2018 to understand all the story about that.

The entire code lands here.

Conclusion

We have seen in details how to write a real world parser in Rust, how to compile it into a WebAssembly binary, and how to use it with Javascript and with NodeJS.

The parser can be used in a browser with regular Javascript code, or as a CLI with NodeJS, or on any platforms NodeJS supports.

The Rust part for WebAssembly plus the Javascript part totals 313 lines of code. This is a tiny surface of code to review and to maintain compared to writing a Javascript parser from scratch.

Another argument is the safety and performance. Rust is memory safe, we know that. It is also performant, but is it still true for the WebAssembly target? The following table shows the benchmark results of the actual Javascript parser for the Gutenberg project (implemented with PEG.js), against this project: The Rust parser as a WebAssembly binary.

Javascript parser (ms) Rust parser as a WebAssembly binary (ms) speedup
demo-post.html 13.167 0.252 × 52
shortcode-shortcomings.html 26.784 0.271 × 98
redesigning-chrome-desktop.html 75.500 0.918 × 82
web-at-maximum-fps.html 88.118 0.901 × 98
early-adopting-the-future.html 201.011 3.329 × 60
pygmalian-raw-html.html 311.416 2.692 × 116
moby-dick-parsed.html 2,466.533 25.14 × 98

The WebAssembly binary is in average 86 times faster than the actual Javascript implementation. The median of the speedup is 98. Some edge cases are very interesting, like moby-dick-parsed.html where it takes 2.5s with the Javascript parser against 25ms with WebAssembly.

So not only it is safer, but it is faster than Javascript in this case. And it is only 300 lines of code.

Note that WebAssembly does not support SIMD yet: It is still a proposal. Rust is gently supporting it (example with PR #549). It will dramatically improve the performances!

We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.

Thanks for reading!