šŸ˜+šŸ¦€+šŸ•ø php-ext-wasm: Migrating from wasmi to Wasmer

This is a copy of an article I wrote for Wasmer.


First as a joke, now as a real product, I started to develop php-ext-wasm: a PHP extension allowing to execute WebAssembly binaries.

The PHP virtual machine (VM) is Zend Engine. To write an extension, one needs to develop in C or C++. The extension was simple C bindings to a Rust library I also wrote. At that time, this Rust library was using wasmi for the WebAssembly VM. I knew that wasmi wasnā€™t the fastest WebAssembly VM in the game, but the API is solid, well-tested, it compiles quickly, and is easy to hack. All the requirements to start a project!

After 6 hours of development, I got something working. I was able to run the following PHP program:

<?php

$instance = new Wasm\Instance('simple.wasm');
$result = $instance->sum(1, 2);

var_dump($result); // int(3)

The API is straightforward: create an instance (here of simple.wasm), then call functions on it (here sum with 1 and 2 as arguments). PHP values are transformed into WebAssembly values automatically. For the record, here is the simple.rs Rust program that is compiled to a WebAssembly binary:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

It was great! 6 hours is a relatively small number of hours to go that far according to me.

However, I quickly noticed that wasmi isā€¦ slow. One of the promise of WebAssembly is:

WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

And clearly, my extension wasnā€™t fulfilling this promise. Letā€™s see a basic comparison with a benchmark.

I chose the n-body algorithm from the Computer Language Benchmarks Game from Debian, mostly because itā€™s relatively CPU intensive. Also, the algorithm has a simple interface: based on an integer, it returns a floating-point number; this API doesnā€™t involve any advanced instance memory API, which is perfect to test a proof-of-concept.

As a baseline, Iā€™ve run the n-body algorithm written in Rust, letā€™s call it rust-baseline. The same algorithm has been written in PHP, letā€™s call it php. Finally, the algorithm has been compiled from Rust to WebAssembly, and executed with the php-ext-wasm extension, letā€™s call that case php+wasmi. All results are for nbody(5000000):

OK, soā€¦ php-ext-wasm with wasmi is 3.4 times slower than PHP itself, it is pointless to use WebAssembly in such conditions!

It confirms my first intuition though: In our case, wasmi is really great to mock something up, but itā€™s not fast enough for our expectations.

Faster, faster, fasterā€¦

I wanted to use Cranelift since the beginning. Itā€™s a code generator, Ć  la LLVM (excuse the brutal shortcut, the goal isnā€™t to explain what Cranelift is in details, but thatā€™s a really awesome project!). To quote the project itself:

Cranelift is a low-level retargetable code generator. It translates a target-independent intermediate representation into executable machine code.

It basically means that the Cranelift API can be used to generate executable code.

Itā€™s perfect! I can replace wasmi by Cranelift, and boom, profit. Butā€¦ there is other ways to get even faster code execution ā€” at the cost of a longer code compilation though.

For instance, LLVM can provide a very fast code execution, almost at native speed. Or we can generate assembly code dynamically. Well, there is multiple ways to achieve that. What if a project could provide a WebAssembly virtual machine with multiple backends?

Enter Wasmer

And it was at that specific time that Iā€™ve been hired by Wasmer. To be totally honest, I was looking at Wasmer a few weeks before. It was a surprise and a great opportunity for me. Well, the universe really wants this rewrite from wasmi to Wasmer, right šŸ˜…?

Wasmer is organized as a set of Rust libraries (called crates). There is even a wasmer-runtime-c-api crate which is a C and a C++ API on top of the wasmer-runtime crate and the wasmer-runtime-core crate, i.e. it allows running the WebAssembly virtual machine as you want, with the backend of your choice: Cranelift, LLVM, or Dynasm (at the time of writing). Thatā€™s perfect, it removes my Rust library between the PHP extension and wasmi. Then php-ext-wasm is reduced to a PHP extension without any Rust code, everything goes to wasmer-runtime-c-api. Thatā€™s sad to remove Rust from this project, but it relies on more Rust code!

Counting the time to make some patches on wasmer-runtime-c-api, Iā€™ve been able to migrate php-ext-wasm to Wasmer in 5 days.

By default, php-ext-wasm uses Wasmer with the Cranelift backend, it does a great balance between compilation and execution time. It is really good. Letā€™s run the benchmark, with the addition of php+wasmer(cranelift):

Finally, the PHP extension provides a faster execution than PHP itself! php+wasmer(cranelift) is 8.6 times faster than php to be exact. And it is 28.6 times faster than php+wasmi. Can we reach the native speed (represented by rust-baseline here)? Itā€™s very likely with LLVM. Thatā€™s for another article. Iā€™m super happy with Cranelift for the moment. (See our previous blog post to learn how we benchmark different backends in Wasmer, and other WebAssembly runtimes).

More Optimizations

Wasmer provides more features, like module caching. Those features are now included in the PHP extension. When booting the nbody.wasm file (19kb), it took 4.2ms. By booting, I mean: reading the WebAssembly binary from a file, parsing it, validating it, compiling it to executable code and a WebAssembly module structure.

PHP execution model is: starts, runs, dies. Memory is freed for each request. If one wants to use php-ext-wasm, you donā€™t really want to pay that ā€œbooting costā€ every time.

Hopefully, wasmer-runtime-c-api now provides a module serialization API, which is integrated into the PHP extension itself. It saves the ā€œbooting costā€, but it adds a ā€œdeserialization costā€. That second cost is smaller, but still, we need to know it exists.

Hopefully again, Zend Engine has an API to get persistent in-memory data between PHP executions. php-ext-wasm supports that API to get persistent modules, et voilĆ .

Now it takes 4.2ms for the first boot of nbody.wasm and 0.005ms for all the next boots. Itā€™s 840 times faster!

Conclusion

Wasmer is a young ā€” but mature ā€” framework to build WebAssembly runtimes on top of. The default backend is Cranelift, and it shows its promises: It brings a correct balance between compilation time and execution time.

wasmi has been a good companion to develop a Proof-Of-Concept. This library has its place in other usages though, like very short-living WebAssembly binaries (Iā€™m thinking of Ethereum contracts that compile to WebAssembly for instance, which is one of the actual use cases). Itā€™s important to understand that no runtime is better than another, it depends on the use case.

The next step is to stabilize php-ext-wasm to release a 1.0.0 version.

See you there!

If you want to follow the development, take a look at @wasmerio and @mnt_io on Twitter.