From Rust to beyond: The C galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies. Rust has visited:


The galaxy we will explore today is the C galaxy. This post will explain what C is (shortly), how to compile any Rust program in C in theory, and how to do that practically with our Rust parser from the Rust side and the C side. We will also see how to test such a binding.

What is C, and why?

C is probably the most used and known programming language in the world. Quoting Wikipedia:

C […] is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations. By design, C provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, including operating systems, as well as various application software for computers ranging from supercomputers to embedded systems.

dennis_ritchie_2011
Dennis Ritchie, the inventor of the C language.

The impact of C is probably without precedent on the progamming language world. Almost everything is written in C, starting with operating systems. Today, it is one of the few common denominator between any programs on any systems on any machines in the world. In other words, being compatible with C opens a large door to everything. Your program will be able to talk directly to any program easily.

Because languages like PHP or Python are written in C, in our particular Gutenberg parser usecase, it means that the parser can be embedded and used by PHP or Python directly, with almost no overhead. Neat!

Rust 🚀 C

Rust to C

In order to use Rust from C, one may need 2 elements:

  1. A static library (.a file),
  2. A header file (.h file).

The theory

To compile a Rust project into a static library, the crate-type property must contain the staticlib value. Let’s edit the Cargo.toml file such as:

[lib]
name = "gutenberg_post_parser"
crate-type = ["staticlib"]

Once cargo build --release is run, a libgutenberg_post_parser.a file is created in target/release/. Done. cargo and rustc make this step really a doddle.

Now the header file. It can be written manually, but it’s tedious and it gets easily outdated. The goal is to automatically generate it. Enter cbindgen:

cbindgen can be used to generate C bindings for Rust code. It is currently being developed to support creating bindings for WebRender, but has been designed to support any project.

To install cbindgen, edit your Cargo.toml file, such as:

[package]
build = "build.rs"

[build-dependencies]
cbindgen = "^0.6.0"

Actually, cbindgen comes in 2 flavors: CLI executable, or a library. I prefer to use the library approach, which makes installation easier.

Note that Cargo has been instructed to use the build.rs file to build the project. This file is an appropriate place to generate the C headers file with cbindgen. Let’s write it!

extern crate cbindgen;

fn main() {
    let crate_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();

    cbindgen::generate(crate_dir)
        .expect("Unable to generate C bindings.")
        .write_to_file("dist/gutenberg_post_parser.h");
}

With those information, cbindgen will scan the source code of the project and will generate C headers automatically in the dist/gutenberg_post_parser.h header file. Scanning will be detailed in a moment, but before that, let’s quickly see how to control the content of the header file. With the code snippet above, cbindgen will look for a cbindgen.toml configuration file in the CARGO_MANIFEST_DIR directory, i.e. the root of your crate. Mine looks like this:

header = """
/*

Gutengerg Post Parser, the C bindings.

Warning, this file is autogenerated by `cbindgen`.
Do not modify this manually.

*/"""
tab_width = 4
language = "C"

It describes itself quite easily. The documentation details the configuration very well.

cbindgen will scan the code and will stop on structs or enums that have the decorator #[repr(C)], #[repr(size)] or #[repr(transparent)], or functions that are marked as extern "C" and are public. So when one writes:

#[repr(C)]
pub struct Slice {
    pointer: *const c_char,
    length: usize
}

#[repr(C)]
pub enum Option {
    Some(Slice),
    None
}

#[no_mangle]
pub extern "C" parse(pointer: *const c_char) -> c_void { … }

Then cbindgen will generate this:

… header comment …

typedef struct {
    const char *pointer;
    uintptr_t length;
} Slice;

typedef enum {
    Some,
    None,
} Option_Tag;

typedef struct {
    Slice _0;
} Some_Body;

typedef struct {
    Option_Tag tag;
    union {
        Some_Body some;
    };
} Option;

void parse(const char *pointer);

It works; Great!

Note the #[no_mangle] that decorates the Rust parse function. It instructs the compiler to not rename the function, so that the function has the same name from the perspective of C.

OK, that’s all for the theory. Let’s practise now, we have a parser to bind to C!

Practise

We want to bind a function named parse. The function outputs an AST representing the language being analysed. For the recall, the original AST looks like this:

pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phase(Input<'a>)
}

This AST is defined in the Rust parser. The Rust binding to C will transform this AST into another set of structs and enums for C. It is mandatory only for types that are directly exposed to C, not internal types that Rust uses. Let’s start by defining Node:

#[repr(C)]
pub enum Node {
    Block {
        namespace: Slice_c_char,
        name: Slice_c_char,
        attributes: Option_c_char,
        children: *const c_void
    },
    Phrase(Slice_c_char)
}

Some immediate thoughts:

  • The structure Slice_c_char emulates Rust slices (see below),
  • The enum Option_c_char emulates Option (see below),
  • The field children has type *const c_void. It should be *const Vector_Node (our definition of Vector), but the definition of Node is based on Vector_Node and vice versa. This cyclical definition case is unsupported by cbindgen so far. So… yes, it is defined as a void pointer, and will be casted later in C,
  • The fields namespace and name are originally a tuple in Rust. Tuples have no equivalent in C with cbindgen, so two fields are used instead.

Let’s define Slice_c_char:

#[repr(C)]
pub struct Slice_c_char {
    pointer: *const c_char,
    length: usize
}

This definition borrows the semantics of Rust’ slices. The major benefit is that there is no copy when binding a Rust slice to this structure.

Let’s define Option_c_char:

#[repr(C)]
pub enum Option_c_char {
    Some(Slice_c_char),
    None
}

Finally, we need to define Vector_Node and our own Result for C. They mimic the Rust semantics closely:

#[repr(C)]
pub struct Vector_Node {
    buffer: *const Node,
    length: usize
}

#[repr(C)]
pub enum Result {
    Ok(Vector_Node),
    Err
}

Alright, all types are declared! It’s time to write the parse function:

#[no_mangle]
pub extern "C" fn parse(pointer: *const c_char) -> Result {
    …
}

The function takes a pointer from C. It means that the data to analyse (i.e. the Gutenberg blog post) is allocated and owned by C: The memory is allocated on the C side, and Rust is only responsible of the parsing. This is where Rust shines: No copy, no clone, no memory mess, only pointers to this data will be returned to C as slices and vectors.

The workflow will be the following:

  • First thing to do when we deal with C: Check that the pointer is not null,
  • Reconstitute an input from the pointer with CStr. This standard API is useful to abstract C strings from the Rust point of view. The difference is that a C string terminates by a NULL byte and has no length, while in Rust a string has a length and does not terminate with a NULL byte,
  • Run the parser, then transform the AST into the “C AST”.

Let’s do that!

pub extern "C" fn parse(pointer: *const c_char) -> Result {
    if pointer.is_null() {
        return Result::Err;
    }

    let input = unsafe { CStr::from_ptr(pointer).to_bytes() };

    if let Ok((_remaining, nodes)) = gutenberg_post_parser::root(input) {
        let output: Vec =
            nodes
                .into_iter()
                .map(|node| into_c(&node))
                .collect();

        let vector_node = Vector_Node {
            buffer: output.as_slice().as_ptr(),
            length: output.len()
        };

        mem::forget(output);

        Result::Ok(vector_node);
    } else {
        Result::Err
    }
}

Only pointers are used in Vector_Node: Pointer to the output, and the length of the output. The conversion is light.

Now let’s see the into_c function. Some parts will not be detailed; Not because they are difficult but because they are repetitive. The entire code lands here.

fn into_c<'a>(node: &ast::Node<'a>) -> Node {
    match *node {
        ast::Node::Block { name, attributes, ref children } => {
            Node::Block {
                namespace: …,
                name: …,
                attributes: …,
                children: …
            }
        },

        ast::Node::Phrase(input) => {
            Node::Phrase(…)
        }
    }
}

I want to show namespace for the warm-up (name, attributes and Phrase are very similar), and children because it deals with void.

Let’s convert ast::Node::Block.name.0 into Node::Block.namespace:

ast::Node::Block { name, …, … } => {
    Node::Block {
        namespace: Slice_c_char {
            pointer: name.0.as_ptr() as *const c_char,
            length: name.0.len()
        },

        …

Pretty straightforward so far. namespace is a Slice_c_char. The pointer is the pointer of the name.0 slice, and the length is the length of the same name.0. This is the same process for other Rust slices.

children is different though. It works in three steps:

  1. Collect all children as C AST nodes in a Rust vector,
  2. Transform the Rust vector into a valid Vector_Node,
  3. Transform the Vector_Node into a *const c_void pointer.
ast::Node::Block { …, …, ref children } => {
    Node::Block {
        …

        children: {
            // 1. Collect all children as C AST nodes.
            let output: Vec =
                children
                    .into_iter()
                    .map(|node| into_c(&node))
                    .collect();

            // 2. Transform the vector into a Vector_Node.
            let vector_node = if output.is_empty() {
                Box::new(
                    Vector_Node {
                        buffer: ptr::null(),
                        length: 0
                    }
                )
            } else {
                Box::new(
                    Vector_Node {
                        buffer: output.as_slice().as_ptr(),
                        length: output.len()
                    }
                )
            }

            // 3. Transform Vector_Node into a *const c_void pointer.
            let vector_node_pointer = Box::into_raw(vector_node) as *const c_void;

            mem::forget(output);

            vector_node_pointer
        }

Step 1 is straightforward.

Step 2 defines what is the behavior when there is no node. In other words, it defines what an empty Vector_Node is. The buffer must contain a NULL raw pointer, and the length is obviously 0. Without this behavior I got various segmentation fault in my code, even if I checked the length before the buffer. Note that Vector_Node is allocated on the heap with Box::new so that the pointer can be easily shared with C.

Step 3 uses the  Box::into_raw function to consume the box and to return the wrapped raw pointer of the data it owns. Rust will not free anything here, it’s our responsability (or the responsability of C to be pedantic). Then the *mut Vector_Node returned by Box::into_raw can be freely casted into *const c_void.

Finally, we instruct the compiler to not drop output when it goes out of scope with mem::forget (at this step of the series, you are very likely to know what it does).

Personally, I spent few hours to understand why my pointers got random addresses, or were pointing to a NULL data. The resulting code is simple and kind of clear to read, but it wasn’t obvious for me what to do beforehand.

And that’s all for the Rust part! The next section will present the C code that calls Rust, and how to compile everything all together.

C 🚀 executable

C to executable
“Artist View of a ray of light”… Don’t judge me!

Now the Rust part is ready, the C part must be written to call it.

Minimal Working Example

Let’s do something very quick to see if it links and compiles:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "gutenberg_post_parser.h"

int main(int argc, char **argv) {
    FILE* file = fopen(argv[1], "rb");
    fseek(file, 0, SEEK_END);
    long file_size = ftell(file);
    rewind(file);

    char* file_content = (char*) malloc(file_size * sizeof(char));
    fread(file_content, 1, file_size, file);

    // Let's call Rust!
    Result output = parse(file_content);

    if (output.tag == Err) {
        printf("Error while parsing.\n");

        return 1;
    }

    const Vector_Node nodes = output.ok._0;
    // Do something with nodes.

    free(file_content);
    fclose(file);

    return 0;
}

To keep the code concise, I left all the error handlers out of the example. The entire code lands here if you’re curious.

What happens in this code? The first thing to notice is #include "gutenberg_post_parser.h" which is the header file that is automatically generated by cbindgen.

Then a filename from argv[1] is used to read a blog post to parse. The parse function is from Rust, just like the Result and Vector_Node types.

The Rust enum Result { Ok(Vector_Node), Err } is compiled to C as:

typedef enum {
    Ok,
    Err,
} Result_Tag;

typedef struct {
    Vector_Node _0;
} Ok_Body;

typedef struct {
    Result_Tag tag;
    union {
        Ok_Body ok;
    };
} Result;

No need to say that the Rust version is easier and more compact to read, but this isn’t the point. To check if Result contains an Ok value or an Error, one has to check the tag field, like we did with output.tag == Err. To get the content of the Ok, we did output.ok._0 (_0 is a field from Ok_Body).

Let’s compile this with clang! We assume that this code above is located in the same directory than the gutenberg_post_parser.h file, i.e. in a dist/ directory. Thus:

$ cd dist
$ clang \
      # Enable all warnings. \
      -Wall \

      # Output executable name. \
      -o gutenberg-post-parser \

      # Input source file. \
      gutenberg_post_parser.c \

      # Directory where to find the static library (*.a). \
      -L ../target/release/ \

      # Link with the gutenberg_post_parser.h file. \
      -l gutenberg_post_parser \

      # Other libraries to link with.
      -l System \
      -l pthread \
      -l c \
      -l m

And that’s all! We end up with a gutenberg-post-parser executable that runs C and Rust.

More details

In the original source code, a recursive function that prints the entire AST on stdout can be found, namely print (original, isn’t it?). Here is some side-by-side comparisons between Rust syntax and C syntax.

The Vector_Node struct in Rust:

pub struct Vector_Node {
    buffer: *const Node,
    length: usize
}

The Vector_Node struct in C:

typedef struct {
    const Node *buffer;
    uintptr_t length;
} Vector_Node;

So to respectivelly read the number of nodes (length of the vector) and the nodes in C, one has to write:

const uintptr_t number_of_nodes = nodes->length;

for (uintptr_t nth = 0; nth < number_of_nodes; ++nth) {
    const Node node = nodes->buffer[nth];
}

This is almost idiomatic C code!

A Node is defined in C as:

typedef enum {
    Block,
    Phrase,
} Node_Tag;

typedef struct {
    Slice_c_char namespace;
    Slice_c_char name;
    Option_c_char attributes;
    const void* children;
} Block_Body;

typedef struct {
    Slice_c_char _0;
} Phrase_Body;

typedef struct {
    Node_Tag tag;
    union {
        Block_Body block;
        Phrase_Body phrase;
    };
} Node;

So once a node is fetched, one can write the following code to detect its kind:

if (node.tag == Block) {
    // …
} else if (node.tag == Phrase) {
    // …
}

Let’s focus on Block for a second, and let’s print the namespace and the name of the block separated by a slash (/):

const Block_Body block = node.block;

const Slice_c_char namespace = block.namespace;
const Slice_c_char name = block.name;

printf(
    "%.*s/%.s\n",
    (int) namespace.length, namespace.pointer,
    (int) name.length, name.pointer
);

The special %.*s form in printf allows to print a string based on its length and its pointer.

I think it is interesting to see the cast from void to Vector_Node for children. It’s a single line:

const Vector_Node* children = (const Vector_Node*) (block.children);

I think that’s all for the details!

Testing

I reckon it is also interesting to see how to unit test C bindings directly with Rust. To emulate a C binding, first, the inputs must be in “C form”, so strings must be C strings. I prefer to write a macro for that:

macro_rules! str_to_c_char {
    ($input:expr) => (
        {
            ::std::ffi::CString::new($input).unwrap()
        }
    )
}

And second, the opposite: The parse function returns data for C, so they need to be “converted back” to Rust. Again, I prefer to write a macro for that:

macro_rules! slice_c_char_to_str {
    ($input:ident) => (
        unsafe {
            ::std::ffi::CStr::from_bytes_with_nul_unchecked(
                ::std::slice::from_raw_parts(
                    $input.pointer as *const u8,
                    $input.length + 1
                ).to_str().unwrap()
            )
        }
    )
}

All right! The final step is to write a unit test. As an example, a Phrase will be tested; The idea remains the same for Block but the code is more concise for the former.

#[test]
fn test_root_with_a_phrase() {
    let input = str_to_c_char!("foo");
    let output = parse(input.as_ptr());

    match output {
        Result::Ok(result) => match result {
            Vector_Node { buffer, length } if length == 1 =>
                match unsafe { &*buffer } {
                    Node::Phrase(phrase) => {
                        assert_eq!(slice_c_char_to_str!(phrase), "foo");
                    },

                    _ => assert!(false)
                },

            _ => assert!(false)
        },

        _ => assert!(false)
    }
}

What happens here? The input and output have been prepared. The former is the C string "foo". The latter is the result of parse. Then there is a match to validate the form of the AST. Rust is very expressive, and this test is a good illustration. The Vector_Node branch is activated if and only if the length of the vector is 1, which is expressed with the guard if length == 1. Then the content of the phrase is transformed into a Rust string and compared with a regular assert_eq! macro.

Note that —in this case— buffer is of type *const Node, so it represents the first element of the vector. If we want to access the next elements, we would need to use the Vec::from_raw_parts function to get a proper Rust API to manipulate this vector.

Conclusion

We have seen that Rust can be embedded in C very easily. In this example, Rust has been compiled to a static library, and a header file; the former is native with Rust tooling, the latter is automatically generated with cbindgen.

The parser written in Rust manipulates a string allocated and owned by C. Rust only returns pointers (as slices) to this string back to C. Then C has no difficulties to read those pointers. The only tricky part is that Rust allocates some data (like vectors of nodes) on the heap that C must free. The “free” part has been omitted from the article though: It does not represent a big challenge, and a C developer is likely to be used to this kind of situation.

The fact that Rust does not use a garbage collector makes it a perfect candidate for these usecases. The story behind these bindings is actually all about memory: Who allocates what, and What is the form of the data in memory. Rust has a #[repr(C)] decorator to instruct the compiler to use a C memory layout, which makes C bindings extremely simple for the developer.

We have also seen that the C bindings can be unit tested within Rust itself, and run with cargo test.

cbindgen is a precious companion in this adventure, by automating the header file generation, it reduces the update and the maintenance of the code to a build.rs script.

In terms of performance, C should have similar results than Rust, i.e. extremely fast. I didn’t run a benchmark to verify this statement, it’s purely theoretical. It can be a subject for a next post!

Now that we have successfully embedded Rust in C, a whole new world opens up to us! The next episode will push Rust in the PHP world as a native extension (written in C). Let’s go!

From Rust to beyond: The ASM.js galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies. Rust has visited:


The second galaxy that our Rust parser will explore is the ASM.js galaxy. This post will explain what ASM.js is, how to compile the parser into ASM.js, and how to use the ASM.js module with Javascript in a browser. The goal is to use ASM.js as a fallback to WebAssembly when it is not available. I highly recommend to read the previous episode about WebAssembly since they have a lot in common.

What is ASM.js, and why?

The main programming language on the Web is Javascript. Applications that want to exist on the Web had to compile to Javascript, like for example games. But a problem occurs: The resulting file is heavy (hence WebAssembly) and Javascript virtual machines have difficulties to optimise this particular code, resulting in slow or inefficient executions (considering the example of games). Also —in this context— Javascript is a compilation target, and as such, some language constructions are useless (like eval).

So what if a “new” language can be a compilation target and still be executed by Javascript virtual machines? This is WebAssembly today, but in 2013, the solution was ASM.js:

asm.js, a strict subset of Javascript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a sandboxed virtual machine for memory-unsafe languages like C or C++. A combination of static and dynamic validation allows Javascript engines to employ an ahead-of-time (AOT) optimizing compilation strategy for valid asm.js code.

So an ASM.js program is a regular Javascript program. It is not a new language but a subset of it. It can be executed by any Javascript virtual machines. However, the specific usage of the magic statement 'use asm'; instructs the virtual machine to optimise the program with an ASM.js “engine”.

ASM.js introduces types by using arithmetical operators as an annotation system. For instance, x | 0 annotes x to be an integer, +x annotates x to be a double, and fround(x) annotates x to be a float. The following example declares a function fn increment(x: u32) -> u32:

function increment(x) {
    x = x | 0;
    return (x + 1) | 0;
}

Another important difference is that ASM.js works by module in order to isolate them from Javascript. A module is a function that takes 3 arguments:

  1. stdlib, an object with references to standard library APIs,
  2. foreign, an object with user-defined functionalities (such as sending something over a WebSocket),
  3. heap, an array buffer representing the memory (because memory is manually managed).

But it’s still Javascript. So the good news is that if your virtual machine has no specific optimisations for ASM.js, it is executed as any regular Javascript program. And if it does, then you get a pleasant boost.

macro4b
A graph showing 3 benchmarks running against different Javascript engines: Firefox, Firefox + asm.js, Google, and native.

Remember that ASM.js has been designed to be a compilation target. So normally you don’t have to care about that because it is the role of the compiler. The typical compilation and execution pipeline from C or C++ to the Web looks like this:

1yoy1fa
Classical ASM.js compilation and execution pipeline from C or C++ to the Web.

Emscripten, as seen in the schema above, is a very important project in this whole evolution of the Web platform. Emscripten is:

a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins.

You are very likely to see this name one day or another if you work with ASM.js or WebAssembly.

I will not explain deeply what ASM.js is with a lot of examples. I recommend instead to read Asm.js: The Javascript Compile Target by John Resig, or Big Web app? Compile it! by Alon Zakai.

Our process will be different though. We will not compile our Rust code directly to ASM.js, but instead, we will compile it to WebAssembly, which in turn will be compiled into ASM.js.

Rust 🚀 ASM.js

Rust to ASM.js

This episode will be very short, and somehow the most easiest one. To compile Rust to ASM.js, you need to first compile it to WebAssembly (see the previous episode), and then compile the WebAssembly binary into ASM.js.

Actually, ASM.js is mostly required when the browser does not support WebAssembly, like Internet Explorer. It is essentially a fallback to run our program on the Web.

The workflow is the following:

  1. Compile your Rust project into WebAssembly,
  2. Compile your WebAssembly binary into an ASM.js module,
  3. Optimise and shrink the ASM.js module.

The wasm2js tool will be your best companion to compile the WebAssembly binary into an ASM.js module. It is part of Binaryen project. Then, assuming we have the WebAssembly binary of our program, all we have to do is:

$ wasm2js --pedantic --output gutenberg_post_parser.asm.js gutenberg_post_parser.wasm

At this step, the gutenberg_post_parser.asm.js weights 212kb. The file contains ECMAScript 6 code. And remember that old browsers are considered, like Internet Explorer, so the code needs to be transformed a little bit. To optimise and shrink the ASM.js module, we will use the uglify-es tool, like this:

$ # Transform code, and embed in a function.
$ sed -i '' '1s/^/function GUTENBERG_POST_PARSER_ASM_MODULE() {/; s/export //' gutenberg_post_parser.asm.js
$ echo 'return { root, alloc, dealloc, memory }; }' >> gutenberg_post_parser.asm.js

$ # Shrink the code.
$ uglifyjs --compress --mangle --output .temp.asm.js gutenberg_post_parser.asm.js
$ mv .temp.asm.js gutenberg_post_parser.asm.js

Just like we did for the WebAssembly binary, we can compress the resulting files with gzip and brotli:

$ # Compress.
$ gzip --best --stdout gutenberg_post_parser.asm.js > gutenberg_post_parser.asm.js.gz
$ brotli --best --stdout --lgwin=24 gutenberg_post_parser.asm.js > gutenberg_post_parser.asm.js.br

We end up with the following file sizes:

  • .asm.js: 54kb,
  • .asm.js.gz: 13kb,
  • .asm.js.br: 11kb.

That’s again pretty small!

When you think about it, this is a lot of transformations: From Rust to WebAssembly to Javascript/ASM.js… The amount of tools is rather small compared to the amount of work. It shows a well-designed pipeline and a collaboration between many groups of people.


Aside: If you are reading this post, I assume you are developers. And as such, I’m sure you can spend hours looking at a source code like if it is a master painting. Did you ever wonder what a Rust program looks like once compiled to Javascript? See bellow:

Screen Shot 2018-08-28 at 09.29.26
A Rust program compiled as WebAssembly compiled as ASM.js.

I like it probably too much.

ASM.js 🚀 Javascript

The resulting gutenberg_post_parser.asm.js file contains a single function named GUTENBERG_POST_PARSER_ASM_MODULE which returns an object pointing to 4 private functions:

  1. root, the axiom of our grammar,
  2. alloc, to allocate memory,
  3. dealloc, to deallocate memory, and
  4. memory, the memory buffer.

It sounds familiar if you have read the previous episode with WebAssembly. Don’t expect root to return a full AST: It will return a pointer to the memory, and the data need to be encoded and decoded, and to write into and to read from the memory the same way. Yes, the same way. The exact same way. So the code of the boundary layer is strictly the same. Do you remember the Module object in our WebAssembly Javascript boundary? This is exactly what the GUTENBERG_POST_PARSER_ASM_MODULE function returns. You can replace Module by the returned object, et voilà!

The entired code lands here. It completely reuses the Javascript boundary layer for WebAssembly. It just sets the Module differently, and it does not load the WebAssembly binary. Consequently, the ASM.js boundary layer is made of 34 lines of code, only 🙃. It compresses to 218 bytes.

Conclusion

We have seen that ASM.js can be fallback to WebAssembly in environments that only support Javascript (like Internet Explorer), with or without ASM.js optimisations.

The resulting ASM.js file and its boundary layer are quite small. By design, the ASM.js boundary layer reuses almost the entire WebAssembly boundary layer. Therefore there is again a tiny surface of code to review and to maintain, which is helpful.

We have seen in the previous episode that Rust is very fast. We have been able to observe the same statement for WebAssembly compared to the actual Javascript parser for the Gutenberg project. However, is it still true for the ASM.js module? In this case, ASM.js is a fallback, and like all fallbacks, they are notably slower than the targeted implementations. Let’s run the same benchmark but use the Rust parser as an ASM.js module:

Javascript parser (ms) Rust parser as an ASM.js module (ms) speedup
demo-post.html 15.368 2.718 × 6
shortcode-shortcomings.html 31.022 8.004 × 4
redesigning-chrome-desktop.html 106.416 19.223 × 6
web-at-maximum-fps.html 82.92 27.197 × 3
early-adopting-the-future.html 119.880 38.321 × 3
pygmalian-raw-html.html 349.075 23.656 × 15
moby-dick-parsed.html 2,543.75 361.423 × 7

The ASM.js module of the Rust parser is in average 6 times faster than the actual Javascript implementation. The median speedup is 6. That’s far from the WebAssembly results, but this is a fallback, and in average, it is 6 times faster, which is really great!

So not only the whole pipeline is safer because it starts from Rust, but it ends to be faster than Javascript.

We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.

Thanks for reading!

From Rust to beyond: The WebAssembly galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies:


The first galaxy that our Rust parser will explore is the WebAssembly (WASM) galaxy. This post will explain what WebAssembly is, how to compile the parser into WebAssembly, and how to use the WebAssembly binary with Javascript in a browser and with NodeJS.

What is WebAssembly, and why?

If you already know WebAssembly, you can skip this section.

WebAssembly defines itself as:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

Should I say more? Probably, yes…

WebAssembly is a new portable binary format. Languages like C, C++, or Rust already compiles to this target. It is the spirit successor of ASM.js. By spirit successor, I mean it is the same people trying to extend the Web platform and to make the Web fast that are working on both technologies. They share some design concepts too, but that’s not really important right now.

Before WebAssembly, programs had to compile to Javascript in order to run on the Web platform. The resulting files were most of the time large. And because the Web is a network, the files had to be downloaded, and it took time. WebAssembly is designed to be encoded in a size- and load-time efficient binary format.

WebAssembly is also faster than Javascript for many reasons. Despites all the crazy optimisations engineers put in the Javascript virtual machines, Javascript is a weakly and dynamically typed language, which requires to be interpreted. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities. WebAssembly also loads faster than Javascript because parsing and compiling happen while the binary is streamed from the network. So once the binary is entirely fetched, it is ready to run: No need to wait on the parser and the compiler before running the program.

Today, and our blog series is a perfect example of that, it is possible to write a Rust program, and to compile it to run on the Web platform. Why? Because WebAssembly is implemented by all major browsers, and because it has been designed for the Web: To live and run on the Web platform (like a browser). But its portable aspect and its safe and sandboxed memory design make it a good candidate to run outside of the Web platform (see a serverless WASM framework, or an application container built for WASM).

I think it is important to remind that WebAssembly is not here to replace Javascript. It is just another technology which solves many problems we can meet today, like load-time, safety, or speed.

Rust 🚀 WebAssembly

Rust to WASM

The Rust WASM team is a group of people leading the effort of pushing Rust into WebAssembly with a set of tools and integrations. There is a book explaining how to write a WebAssembly program with Rust.

With the Gutenberg Rust parser, I didn’t use tools like wasm-bindgen (which is a pure gem) when I started the project few months ago because I hit some limitations. Note that some of them have been addressed since then! Anyway, we will do most of the work by hand, and I think this is an excellent way to understand how things work in the background. When you are familiar with WebAssembly interactions, then wasm-bindgen is an excellent tool to have within easy reach, because it abstracts all the interactions and let you focus on your code logic instead.

I would like to remind the reader that the Gutenberg Rust parser exposes one AST, and one root function (the axiom of the grammar), respectively defined as:

pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phrase(Input<'a>)
}

and

pub fn root(
    input: Input
) -> Result<(Input, Vec<ast::Node>), nom::Err<Input>>;

Knowing that, let’s go!

General design

Here is our general design or workflow:

  1. Javascript (for instance) writes the blog post to parse into the WebAssembly module memory,
  2. Javascript runs the root function by passing a pointer to the memory, and the length of the blog post,
  3. Rust reads the blog post from the memory, runs the Gutenberg parser, compiles the resulting AST into a sequence of bytes, and returns the pointer to this sequence of bytes to Javascript,
  4. Javascript reads the memory from the received pointer, and decodes the sequence of bytes as Javascript objects in order to recreate an AST with a friendly API.

Why a sequence of bytes? Because WebAssembly only supports integers and floats, not strings or vectors, and also because our Rust parser takes a slice of bytes as input, so this is handy.

We use the term boundary layer to refer to this Javascript piece of code responsible to read from and write into the WebAssembly module memory, and responsible of exposing a friendly API.

Now, we will focus on the Rust code. It consists of only 4 functions:

  • alloc to allocate memory (exported),
  • dealloc to deallocate memory (exported),
  • root to run the parser (exported),
  • into_bytes to transform the AST into a sequence of bytes.

The entire code lands here. It is approximately 150 lines of code. We explain it.

Memory allocation

Let’s start by the memory allocator. I choose to use wee_alloc for the memory allocator. It is specifically designed for WebAssembly by being very small (less than a kilobyte) and efficient.

The following piece of code describes the memory allocator setup and the “prelude” for our code (enabling some compiler features, like alloc, declaring external crates, some aliases, and declaring required function like panic, oom etc.). This can be considered as a boilerplate:

#![no_std]
#![feature(
    alloc,
    alloc_error_handler,
    core_intrinsics,
    lang_items
)]

extern crate gutenberg_post_parser;
extern crate wee_alloc;
#[macro_use] extern crate alloc;

use gutenberg_post_parser::ast::Node;
use alloc::vec::Vec;
use core::{mem, slice};

#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::intrinsics::abort(); }
}

#[alloc_error_handler]
fn oom(_: core::alloc::Layout) -> ! {
    unsafe { core::intrinsics::abort(); }
}

// This is the definition of `std::ffi::c_void`, but WASM runs without std in our case.
#[repr(u8)]
#[allow(non_camel_case_types)]
pub enum c_void {
    #[doc(hidden)]
    __variant1,

    #[doc(hidden)]
    __variant2
}

The Rust memory is the WebAssembly memory. Rust will allocate and deallocate memory on its own, but Javascript for instance needs to allocate and deallocate WebAssembly memory in order to communicate/exchange data. So we need to export one function to allocate memory and one function to deallocate memory.

Once again, this is almost a boilerplate. The alloc function creates an empty vector of a specific capacity (because it is a linear segment of memory), and returns a pointer to this empty vector:

#[no_mangle]
pub extern "C" fn alloc(capacity: usize) -> *mut c_void {
    let mut buffer = Vec::with_capacity(capacity);
    let pointer = buffer.as_mut_ptr();
    mem::forget(buffer);

    pointer as *mut c_void
}

Note the #[no_mangle] attribute that instructs the Rust compiler to not mangle the function name, i.e. to not rename it. And extern "C" to export the function in the WebAssembly module, so it is “public” from outside the WebAssembly binary.

The code is pretty straightforward and matches what we announced earlier: A Vec is allocated with a specific capacity, and the pointer to this vector is returned. The important part is mem::forget(buffer). It is required so that Rust will not deallocate the vector once it goes out of scope. Indeed, Rust enforces Resource Acquisition Is Initialization (RAII), so whenever an object goes out of scope, its destructor is called and its owned resources are freed. This behavior shields against resource leaks bugs, and this is why we will never have to manually free memory or worry about memory leaks in Rust (see some RAII examples). In this case, we want to allocate and keep the allocation after the function execution, hence the mem::forget call.

Let’s jump on the dealloc function. The goal is to recreate a vector based on a pointer and a capacity, and to let Rust drops it:

#[no_mangle]
pub extern "C" fn dealloc(pointer: *mut c_void, capacity: usize) {
    unsafe {
        let _ = Vec::from_raw_parts(pointer, 0, capacity);
    }
}

The Vec::from_raw_parts function is marked as unsafe, so we need to delimit it in an unsafe block so that the dealloc function is considered as safe.

The variable _ contains our data to deallocate, and it goes out of scope immediately, so Rust drops it.

From input to a flat AST

Now the core of the binding! The root function reads the blog post to parse based on a pointer and a length, then it parses it. If the result is OK, it serializes the AST into a sequence of bytes, i.e. it flatten it, otherwise it returns an empty sequence of bytes.

Flatten AST
The logic flow of the parser: The input on the left is parsed into an AST, which is serialized into a flat sequence of bytes on the right.
#[no_mangle]
pub extern "C" fn root(pointer: *mut u8, length: usize) -> *mut u8 {
    let input = unsafe { slice::from_raw_parts(pointer, length) };
    let mut output = vec![];

    if let Ok((_remaining, nodes)) = gutenberg_post_parser::root(input) {
        // Compile the AST (nodes) into a sequence of bytes.
    }

    let pointer = output.as_mut_ptr();
    mem::forget(output);

    pointer
}

The variable input contains the blog post. It is fetched from memory with a pointer and a length. The variable output is the sequence of bytes the function will return. gutenberg_post_parser::root(input) runs the parser. If parsing is OK, then the nodes are compiled into a sequence of bytes (omitted for now). Then the pointer to the sequence of bytes is grabbed, the Rust compiler is instructed to not drop it, and finally the pointer is returned. The logic is again pretty straightforward.

Now, let’s focus on the AST to the sequence of bytes (u8) compilation. All data the AST hold are already bytes, which makes the process easier. The goal is only to flatten the AST:

  • The first 4 bytes represent the number of nodes at the first level (4 × u8 represents u32) ,
  • Next, if the node is Block:
    • The first byte is the node type: 1u8 for a block,
    • The second byte is the size of the block name,
    • The third to the sixth bytes are the size of the attributes,
    • The seventh byte is the number of node children the block has,
    • Next bytes are the block name,
    • Next bytes are the attributes (&b"null"[..] if none),
    • Next bytes are node children as a sequence of bytes,
  • Next, if the node is Phrase:
    • The first byte is the node type: 2u8 for a phrase,
    • The second to the fifth bytes are the size of the phrase,
    • Next bytes are the phrase.

Here is the missing part of the root function:

if let Ok((_remaining, nodes)) = gutenberg_post_parser::root(input) {
    let nodes_length = u32_to_u8s(nodes.len() as u32);

    output.push(nodes_length.0);
    output.push(nodes_length.1);
    output.push(nodes_length.2);
    output.push(nodes_length.3);

    for node in nodes {
        into_bytes(&node, &mut output);
    }
}

And here is the into_bytes function:

fn into_bytes<'a>(node: &Node<'a>, output: &mut Vec<u8>) {
    match *node {
        Node::Block { name, attributes, ref children } => {
            let node_type = 1u8;
            let name_length = name.0.len() + name.1.len() + 1;
            let attributes_length = match attributes {
                Some(attributes) => attributes.len(),
                None => 4
            };
            let attributes_length_as_u8s = u32_to_u8s(attributes_length as u32);

            let number_of_children = children.len();
            output.push(node_type);
            output.push(name_length as u8);
            output.push(attributes_length_as_u8s.0);
            output.push(attributes_length_as_u8s.1);
            output.push(attributes_length_as_u8s.2);
            output.push(attributes_length_as_u8s.3);
            output.push(number_of_children as u8);

            output.extend(name.0);
            output.push(b'/');
            output.extend(name.1);

            if let Some(attributes) = attributes {
                output.extend(attributes);
            } else {
                output.extend(&b"null"[..]);
            }

            for child in children {
                into_bytes(&child, output);
            }
        },

        Node::Phrase(phrase) => {
            let node_type = 2u8;
            let phrase_length = phrase.len();

            output.push(node_type);

            let phrase_length_as_u8s = u32_to_u8s(phrase_length as u32);

            output.push(phrase_length_as_u8s.0);
            output.push(phrase_length_as_u8s.1);
            output.push(phrase_length_as_u8s.2);
            output.push(phrase_length_as_u8s.3);
            output.extend(phrase);
        }
    }
}

What I find interesting with this code is it reads just like the bullet list above the code.

For the most curious, here is the u32_to_u8s function:

fn u32_to_u8s(x: u32) -> (u8, u8, u8, u8) {
    (
        ((x >> 24) & 0xff) as u8,
        ((x >> 16) & 0xff) as u8,
        ((x >> 8)  & 0xff) as u8,
        ( x        & 0xff) as u8
    )
}

Here we are. alloc, dealloc, root, and into_bytes. Four functions, and everything is done.

Producing and optimising the WebAssembly binary

To get a WebAssembly binary, the project has to be compiled to the wasm32-unknown-unknown target. For now (and it will change in a near future), the nightly toolchain is needed to compile the project, so make sure you have the latest nightly version of rustc & co. installed with rustup update nightly. Let’s run cargo:

$ RUSTFLAGS='-g' cargo +nightly build --target wasm32-unknown-unknown --release

The WebAssembly binary weights 22kb. Our goal is to reduce the file size. For that, the following tools will be required:

  • wasm-gc to garbage-collect unused imports, internal functions, types etc.,
  • wasm-snip to mark some functions as unreachable, this is useful when the binary includes unused code that the linker were not able to remove,
  • wasm-opt from the Binaryen project, to optimise the binary,
  • gzip and brotli to compress the binary.

Basically, what we do is the following:

$ # Garbage-collect unused data.
$ wasm-gc gutenberg_post_parser.wasm

$ # Mark fmt and panicking as unreachable.
$ wasm-snip --snip-rust-fmt-code --snip-rust-panicking-code gutenberg_post_parser.wasm -o gutenberg_post_parser_snipped.wasm
$ mv gutenberg_post_parser_snipped.wasm gutenberg_post_parser.wasm

$ # Garbage-collect unreachable data.
$ wasm-gc gutenberg_post_parser.wasm

$ # Optimise for small size.
$ wasm-opt -Oz -o gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm
$ mv gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm

$ # Compress.
$ gzip --best --stdout gutenberg_post_parser.wasm > gutenberg_post_parser.wasm.gz
$ brotli --best --stdout --lgwin=24 gutenberg_post_parser.wasm > gutenberg_post_parser.wasm.br 

We end up with the following file sizes:

  • .wasm: 16kb,
  • .wasm.gz: 7.3kb,
  • .wasm.br: 6.2kb.

Neat! Brotli is implemented by most browsers, so when the client sends Accept-Encoding: br, the server can response with the .wasm.br file.

To give you a feeling of what 6.2kb represent, the following image also weights 6.2kb:

1398208027wordpress-logo-simplified-rgb

The WebAssembly binary is ready to run!

WebAssembly 🚀 Javascript

WASM to JS

In this section, we assume Javascript runs in a browser. Thus, what we need to do is the following:

  1. Load/stream and instanciate the WebAssembly binary,
  2. Write the blog post to parse in the WebAssembly module memory,
  3. Call the root function on the parser,
  4. Read the WebAssembly module memory to load the flat AST (a sequence of bytes) and decode it to build a “Javascript AST” (with our own objects).

The entire code lands here. It is approximately 150 lines of code too. I won’t explain the whole code since some parts of it is the “friendly API” that is exposed to the user. So I will rather explain the major pieces.

Loading/streaming and instanciating

The WebAssembly API exposes multiple ways to load a WebAssembly binary. The best you can use is the WebAssembly.instanciateStreaming function: It streams the binary and compiles it in the same time, nothing is blocking. This API relies on the Fetch API. You might have guessed it: It is asynchronous (it returns a promise). WebAssembly itself is not asynchronous (except if you use thread), but the instanciation step is. It is possible to avoid that, but this is tricky, and Google Chrome has a strong limit of 4kb for the binary size which will make you give up quickly.

To be able to stream the WebAssembly binary, the server must send the application/wasm MIME type (with the Content-Type header).

Let’s instanciate our WebAssembly:

const url = '/gutenberg_post_parser.wasm';
const wasm =
    WebAssembly.
        instantiateStreaming(fetch(url), {}).
        then(object => object.instance).
        then(instance => { /* step 2 */ });

The WebAssembly binary has been instanciated! Now we can move to the next step.

Last polish before running the parser

Remember that the WebAssembly binary exports 3 functions: alloc, dealloc, and root. They can be found on the exports property, along with the memory. Let’s write that:

        then(instance => {
            const Module = {
                alloc: instance.exports.alloc,
                dealloc: instance.exports.dealloc,
                root: instance.exports.root,
                memory: instance.exports.memory
            };

            runParser(Module, '<!-- wp:foo /-->xyz');
        });

Great, everything is ready to write the runParser function!

The parser runner

As a reminder, this function has to: Write the input (the blog post to parse) in the WebAssembly module memory (Module.memory), to call the root function (Module.root), and to read the result from the WebAssembly module memory. Let’s do that:

function runParser(Module, raw_input) {
    const input = new TextEncoder().encode(raw_input);
    const input_pointer = writeBuffer(Module, input);
    const output_pointer = Module.root(input_pointer, input.length);
    const result = readNodes(Module, output_pointer);

    Module.dealloc(input_pointer, input.length);

    return result;
}

In details:

  • The raw_input is encoded into a sequence of bytes with the TextEncoderAPI, in input,
  • The input is written into the WebAssembly memory module with writeBuffer and its pointer is returned,
  • Then the root function is called with the pointer to the input and the length of the input as expected, and the pointer to the output is returned,
  • Then the output is decoded,
  • And finally, the input is deallocated. The output of the parser will be deallocated in the readNodes function because its length is unknown at this step.

Great! So we have 2 functions to write right now: writeBuffer​ and readNodes.

Writing the data in memory

Let’s go with the first one, writeBuffer:

function writeBuffer(Module, buffer) {
    const buffer_length = buffer.length;
    const pointer = Module.alloc(buffer_length);
    const memory = new Uint8Array(Module.memory.buffer);

    for (let i = 0; i < buffer_length; ++i) {
        memory[pointer + i] = buffer[i];
    }

    return pointer;
}

In details:

  • The length of the buffer is read in buffer_length,
  • A space in memory is allocated to write the buffer,
  • Then a uint8 view of the buffer is instanciated, which means that the buffer will be viewed as a sequence of u8, exactly what Rust expects,
  • Finally the buffer is copied into the memory with a loop, that’s very basic, and return the pointer.

Note that, unlike C strings, adding a NUL byte at the end is not mandatory. This is just the raw data (on the Rust side, we read it with slice::from_raw_parts, slice is a very simple structure).

Reading the output of the parser

So at this step, the input has been written in memory, and the root function has been called so it means the parser has run. It has returned a pointer to the output (the result) and we now have to read it and decode it.

Remind that the first 4 bytes encodes the number of nodes we have to read. Let’s go!

function readNodes(Module, start_pointer) {
    const buffer = new Uint8Array(Module.memory.buffer.slice(start_pointer));
    const number_of_nodes = u8s_to_u32(buffer[0], buffer[1], buffer[2], buffer[3]);

    if (0 >= number_of_nodes) {
        return null;
    }

    const nodes = [];
    let offset = 4;
    let end_offset;

    for (let i = 0; i < number_of_nodes; ++i) {
        const last_offset = readNode(buffer, offset, nodes);

        offset = end_offset = last_offset;
    }

    Module.dealloc(start_pointer, start_pointer + end_offset);

    return nodes;
}

In details:

  • A uint8 view of the memory is instanciated… more precisely: A slice of the memory starting at start_pointer,
  • The number of nodes is read, then all nodes are read,
  • And finally, the output of the parser is deallocated.

For the record, here is the u8s_to_u32 function, this is the exact opposite of u32_to_u8s:

function u8s_to_u32(o, p, q, r) {
    return (o << 24) | (p << 16) | (q << 8) | r;
}

And I will also share the readNode function, but I won’t explain the details. This is just the decoding part of the output from the parser.

function readNode(buffer, offset, nodes) {
    const node_type = buffer[offset];

    // Block.
    if (1 === node_type) {
        const name_length = buffer[offset + 1];
        const attributes_length = u8s_to_u32(buffer[offset + 2], buffer[offset + 3], buffer[offset + 4], buffer[offset + 5]);
        const number_of_children = buffer[offset + 6];

        let payload_offset = offset + 7;
        let next_payload_offset = payload_offset + name_length;

        const name = new TextDecoder().decode(buffer.slice(payload_offset, next_payload_offset));

        payload_offset = next_payload_offset;
        next_payload_offset += attributes_length;

        const attributes = JSON.parse(new TextDecoder().decode(buffer.slice(payload_offset, next_payload_offset)));

        payload_offset = next_payload_offset;
        let end_offset = payload_offset;

        const children = [];

        for (let i = 0; i < number_of_children; ++i) {
            const last_offset = readNode(buffer, payload_offset, children);

            payload_offset = end_offset = last_offset;
        }

        nodes.push(new Block(name, attributes, children));

        return end_offset;
    }
    // Phrase.
    else if (2 === node_type) {
        const phrase_length = u8s_to_u32(buffer[offset + 1], buffer[offset + 2], buffer[offset + 3], buffer[offset + 4]);
        const phrase_offset = offset + 5;
        const phrase = new TextDecoder().decode(buffer.slice(phrase_offset, phrase_offset + phrase_length));

        nodes.push(new Phrase(phrase));

        return phrase_offset + phrase_length;
    } else {
        console.error('unknown node type', node_type);
    }
}

Note that this code is pretty simple and easy to optimise by the Javascript virtual machine. It is almost important to note that this is not the original code. The original version is a little more optimised here and there, but they are very close.

And that’s all! We have successfully read and decoded the output of the parser! We just need to write the Block and Phrase classes like this:

class Block {
    constructor(name, attributes, children) {
        this.name = name;
        this.attributes = attributes;
        this.children = children;
    }
}

class Phrase {
    constructor(phrase) {
        this.phrase = phrase;
    }
}

The final output will be an array of those objects. Easy!

WebAssembly 🚀 NodeJS

WASM to NodeJS

The differences between the Javascript version and the NodeJS version are few:

  • The Fetch API does not exist in NodeJS, so the WebAssembly binary has to be instanciated with a buffer directly, like this: WebAssembly.instantiate(fs.readFileSync(url), {}),
  • The TextEncoder and TextDecoder objects do not exist as global objects, they are in util.TextEncoder and util.TextDecoder.

In order to share the code between both environments, it is possible to write the boundary layer (the Javascript code we wrote) in a .mjs file, aka ECMAScript Module. It allows to write something like import { Gutenberg_Post_Parser } from './gutenberg_post_parser.mjs' for example (considering the whole code we wrote before is a class). On the browser side, the script must be loaded with<script type="module" src="…" />, and on the NodeJS side, node must run with the --experimental-modules flag. I can recommend you this talk Please wait… loading: a tale of two loaders by Myles Borins at the JSConf EU 2018 to understand all the story about that.

The entire code lands here.

Conclusion

We have seen in details how to write a real world parser in Rust, how to compile it into a WebAssembly binary, and how to use it with Javascript and with NodeJS.

The parser can be used in a browser with regular Javascript code, or as a CLI with NodeJS, or on any platforms NodeJS supports.

The Rust part for WebAssembly plus the Javascript part totals 313 lines of code. This is a tiny surface of code to review and to maintain compared to writing a Javascript parser from scratch.

Another argument is the safety and performance. Rust is memory safe, we know that. It is also performant, but is it still true for the WebAssembly target? The following table shows the benchmark results of the actual Javascript parser for the Gutenberg project (implemented with PEG.js), against this project: The Rust parser as a WebAssembly binary.

Javascript parser (ms) Rust parser as a WebAssembly binary (ms) speedup
demo-post.html 13.167 0.252 × 52
shortcode-shortcomings.html 26.784 0.271 × 98
redesigning-chrome-desktop.html 75.500 0.918 × 82
web-at-maximum-fps.html 88.118 0.901 × 98
early-adopting-the-future.html 201.011 3.329 × 60
pygmalian-raw-html.html 311.416 2.692 × 116
moby-dick-parsed.html 2,466.533 25.14 × 98

The WebAssembly binary is in average 86 times faster than the actual Javascript implementation. The median of the speedup is 98. Some edge cases are very interesting, like moby-dick-parsed.html where it takes 2.5s with the Javascript parser against 25ms with WebAssembly.

So not only it is safer, but it is faster than Javascript in this case. And it is only 300 lines of code.

Note that WebAssembly does not support SIMD yet: It is still a proposal. Rust is gently supporting it (example with PR #549). It will dramatically improve the performances!

We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.

Thanks for reading!

From Rust to beyond: Prelude

At my work, I had an opportunity to start an experiment: Writing a single parser implementation in Rust for the new Gutenberg post format, bound to many platforms and environments.

gutenberg_logo
The logo of the Gutenberg post parser project.

This series of posts is about those bindings, and explains how to send Rust beyond earth, into many different galaxies. Rust will land in:

The ship is currently flying into the Java galaxy, this series may continue if the ship does not crash or has enough resources to survive!

The Gutenberg post format

Let’s introduce quickly what Gutenberg is, and why a new post format. If you want an in-depth presentation, I highly recommend to read The Language of Gutenberg. Note that this is not required for the reader to understand the Gutenberg post format.

Gutenberg is the next WordPress editor. It is a little revolution on its own. The features it unlocks are very powerful.

The editor will create a new page- and post-building experience that makes writing rich posts effortless, and has “blocks” to make it easy what today might take shortcodes, custom HTML, or “mystery meat” embed discovery. — Matt Mullenweg

The format of a blog post was HTML. And it continues to be. However, another semantics layer is added through annotations. Annotations are written in comments and borrow the XML syntax, e.g.:

<!-- wp:ns/block-name {"attributes": "as JSON"} -->
    <p>phrase</p>
<!-- /wp:ns/block-name -->

The Gutenberg format provides 2 constructions: Block, and Phrase. The example above contains both: There is a block wrapping a phrase. A phrase is basically anything that is not a block. Let’s describe the example:

  • It starts with an annotation (<!-- … -->),
  • The wp: is mandatory to represent a Gutenberg block,
  • It is followed by a fully qualified block name, which is a pair of an optional namespace (here sets to ns, defaults to core) and a block name (here sets to block-name), separated by a slash,
  • A block has optional attributes encoded as a JSON object (see RFC 7159, Section 4, Objects),
  • Finally, a block has optional children, i.e. an heterogeneous collection of blocks or phrases. In the example above, there is one child that is the phrase <p>phrase</p>. And the following example below shows a block with no child:
<!-- wp:ns/block-name {"attributes": "as JSON"} /-->

The complete grammar can be found in the parser’s documentation.

Finally, the parser is used on the editor side, not on the rendering side. Once rendered, the blog post is a regular HTML file. Some blocks are dynamics though, but this is another topic.

block-logic-flow1
The logic flow of the editor (How Little Blocks Work).

The grammar is relatively small. The challenges are however to be as much performant and memory efficient as possible on many platforms. Some posts can reach megabytes, and we don’t want the parser to be the bottleneck. Even if it is used when creating the post state (cf. the schema above), we have measured several seconds to load some posts. Time during which the user is blocked, and waits, or see an error. In other scenarii, we have hit memory limit of the language’s virtual machines.

Hence this experimental project! The current parsers are written in JavaScript (with PEG.js) and in PHP (with phpegjs). This Rust project proposes a parser written in Rust, that can run in the JavaScript and in the PHP virtual machines, and on many other platforms. Let’s try to be very performant and memory efficient!

Why Rust?

That’s an excellent question! Thanks for asking. I can summarize my choice with a bullet list:

  • It is fast, and we need speed,
  • It is memory safe, and also memory efficient,
  • No garbage collector, which simplifies memory management across environments,
  • It can expose a C API (with Foreign Function Interface, FFI), which eases the integration into multiple environments,
  • It compiles to many targets,
  • Because I love it.

One of the goal of the experimentation is to maintain a single implementation (maybe the future reference implementation) with multiple bindings.

The parser

The parser is written in Rust. It relies on the fabulous nom library.

nom
nom will happily take a byte out of your files 🙂.

The source code is available in the src/ directory in the repository. It is very small and fun to read.

The parser produces an Abstract Syntax Tree (AST) of the grammar, where nodes of the tree are defined as:

pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phrase(Input<'a>)
}

That’s all! We find again the block name, the attributes and the children, and the phrase. Block children are defined as a collection of node, this is recursive. Input<'a> is defined as &'a [u8], i.e. a slice of bytes.

The main parser entry is the root function. It represents the axiom of the grammar, and is defined as:

pub fn root(
    input: Input
) -> Result<(Input, Vec<ast::Node>), nom::Err<Input>>;

So the parser returns a collection of nodes in the best case. Here is an simple example:

use gutenberg_post_parser::{root, ast::Node};

let input = &b"<!-- wp:foo {\"bar\": true} /-->"[..];
let output = Ok(
    (
        // The remaining data.
        &b""[..],

        // The Abstract Syntax Tree.
        vec![
            Node::Block {
                name: (&b"core"[..], &b"foo"[..]),
                attributes: Some(&b"{\"bar\": true}"[..]),
                children: vec![]
            }
        ]
    )
);

assert_eq!(root(input), output);

The root function and the AST will be the items we are going to use and manipulate in the bindings. The internal items of the parser will stay private.

Bindings

Rust to

From now, our goal is to expose the root function and the Node enum in different platforms or environments. Ready?

3… 2… 1… lift-off!

atoum supports TeamCity

atoum is a popular PHP test framework. TeamCity is a Continuous Integration and Continuous Delivery software developed by Jetbrains. Despites atoum supports many industry standards to report test execution verdicts, TeamCity uses its own non-standard report, and thus atoum is not compatible with TeamCity… until now.

icon_TeamCity

The atoum/teamcity-extension provides TeamCity support inside atoum. When executing tests, the reported verdicts are understandable by TeamCity, and activate all its UI features.

Install

If you have Composer, just run:

$ composer require atoum/teamcity-extension '~1.0'

From this point, you need to enable the extension in your .atoum.php configuration file. The following example forces to enable the extension for every test execution:

$extension = new atoum\teamcity\extension($script);
$extension->addToRunner($runner);

The following example enables the extension only within a TeamCity environment:

$extension = new atoum\teamcity\extension($script);
$extension->addToRunnerWithinTeamCityEnvironment($runner);

This latter installation is recommended. That’s it 🙂.

Glance

The default CLI report looks like this:

Default atoum CLI report

The TeamCity report looks like this in your terminal (note the TEAMCITY_VERSION variable as a way to emulate a TeamCity environment):

TeamCity report inside the terminal

Which is less easy to read. However, when it comes into TeamCity UI, we will have the following result:

TeamCity running atoum

We are using it at Automattic. Hope it is useful for someone else!

If you find any bugs, or would like any other features, please use Github at the following repository: https://github.com/Hywan/atoum-teamcity-extension/.

Welcome to Chaos

Recently, I joined Automattic. This is a world-wide distributed company. The first three weeks you incarn a Happiness Engineer. This is part of the Happiness Rotation duty. This article explains why I loved it, and why I reckon you should do it too.

Happiness Engineer, really?

Does it sound mad as a Cheshire cat? Pretentious maybe? Actually, it’s not at all.

As a Happiness Engineer, I had to make the support. This is part of the Happiness Rotation: Once a year, almost everyone swaps its position to help our users. I will go back on this later.

My role was to make our users happy. To achieve that, I had to:

  • Meet our users, understand who they are, what they want to achieve,
  • Listen to and understand their issues,
  • Find a way to fix the issues.

Meet the users

I need motivations in my job. Learning who our users are, and what they want to achieve, is a great motivation. After these three weeks, I know what my contributions will serve. It gives a meaning to each contribution, to each day I wake up.

Especially in a distributed company on Internet, our users are world-wide, they speak almost all the languages on Earth, they are present on all continents. Their needs vary a lot, they use our softwares in ways I was not able to foresee.

Listen to, understand, and fix their issues

When you are chatting with a “support guy”, you cannot imagine this is a real engineer. This is not a random person filling a pre-defined vague form somewhere where it is cheap to hire her. You will chat with someone very competent. Someone that has no superior. Someone that has all the tools to make you happy.

Personally, when I started, it was the first time I was using WordPress. I was more novice than the user I was talking to. So how to fix it on my end? I had to:

  • Ask help to the right persons,
  • Therefore, meet Automatticians (people working with Automattic),
  • Discover all the interactions between them,
  • Understand the structure of the company,
  • How to ask help, how to formulate my questions, how to reformulate the issues of the users…
  • Discover all the internal tools,
  • Therefore, learn how the softwares work internally and together,
  • Discover the giant internal and public documentations,
  • When needed, create bug reports or feature requests to the appropriated teams,
  • Learn the culture of the company.

This is why it is called Welcome to Chaos. Yes, you have to learn a lot in three weeks, but it is extremely educative. This is like a speed training.

Happiness

I can ensure that when a user is grateful after you fixed its issue, the term Happiness Engineer makes a lot of sense. Automattic provides a lot of freedom to their Happiness Engineers to make people really happy, both in term of tooling or financial.

This is the first time I see a company that is that much generous with its customers.

Thanks buddy

Of course, when embracing the chaos, you are not alone. Everyone is here to help you, and to answer your questions. After all, this is part of the Automattic’s creed (story of the creed):

I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.

In addition to everyone willing to help, a buddy was assigned to me. A person that helps and teaches you everytime. This is very helpful. Thank you Hannah!

Happiness Rotation

This experience is great. But after some time, you might forget it. So as a reminder, once a year, you incarn a Happiness Engineer again. This is part of the happiness rotation. As far as I understand, it implies almost everyone in the company.

Note: Obviously, there is permanent happiness engineers.

Conclusion

I deeply think this approach has many advantages. Some of them are listed above. It helps to understand the company, and more importantly the users. The happiness rotation stresses the fact that users are central to Automattic, probably like any companies, but not with this care. Remember the creed: I will build our business sustainably through passionate and loyal customers. To have passionate and loyal users, you need to know them.

For me, it was a great experience. It was chaotic at first, but it is worth it.

Bye bye Liip, hello Automattic

Since April 2017, I have left Liip to join Automattic.

Bye bye Liip

Liip's logo

After almost 20 months at Liip, I am leaving. Liip was a great experience. It was my first industrial non-remote job. It was also my first job in the country I am currently living in. And I have discovered a new way of working.

First industrial non-remote job

Before working for Liip, I was working for fruux. My situation was the following: A french citizen, living as a foreigner in Switzerland, working for a German company, with employees from Germany, Holland, and Canada. Everything happened on chat, mail, and Skype. When my son was born, I had to change my work to simplify my life. It was not the only reason, but one of them.

And before fruux, I was working for INRIA, a research institute in France. It was partially a remote job.

Liip has several offices. I was based in Lausanne.

So, yes, Liip was my first industrial non-remote job. And I liked it. Working in the train on the morning, walking in Lausanne, seeing real people, everything in my local language. Because yes, it was my first job in my native language too.

Everything was simpler. And when you have your first baby, anything else that is simpler saves your life.

Introducing Holacracy

Giant discussions were happening to remove any form of hierarchy in Liip. Then we discovered Holacracy, and we started moving to this system. This is a new governance system. If you are familiar with distributed network topologies in Computer Science, or data structures, it really looks like a Distributed Spanning Tree [DahanPN09]. Note: I am sure that the authors of Holacracy are not aware of DST, but, eh.

So nothing new from a research point of view, but it is cool to see this algorithm coming alive in real life. And it worked: Less meetings, more self-organisation, more shared responsabilities, no more “boss” etc. This is not a tool for all companies, but I am sure that if you are reading my blog, then your company should give it a try.

Open source projects

Liip has been very generous with me regarding my open source engagements. I was involved in Hoa, atoum, and Pickle when joining the company. Liip gave me a 5% budget, so roughly 1 hour per day to work on Hoa. Thank you for that!

After that, I have started a new big project, called Tagua VM. They gave me an additional 5% budget. So I got 2 hours per day to work on Hoa and Tagua VM. Again, thank you for that!

Finally, I have started an in-house open project called The A11y Machine (a11ym for short). I have written a case study for this tool on the Liip’s blog: Accessibility: make your website barrier-free with a11ym!

The goal of a11ym is to automate the accessibility testing of any site by crawling and testing each page. A sweet report is generated, showing all errors, warnings, and notices, with all information needed by the developer to fix the issues as fast as possible.

Dashboard of a11ym, showing the evolution of the accessibility of a site in time
A typical a11ym report listing all errors, warnings, and notices for a given URL

This project has received really good feedbacks from the accessibility community. It has been downloaded 7000 times so far, which is not bad considering the niche it targets.

A new SaaS platform is being build around this software. I enjoyed working on it, and it was really tangible.

Main customer, huge project

Liip is a Web agency, so you have dozens of customers at the same time. However, I was in a special team for an important customer. The site is a luxury watches and jewellery e-commerce platform, located in several countries, in 10 languages, accessible from 16 domains, shared in 2 datacenters. This is not a casual site.

I learned a lot about all the complexity such a site brings: Checkout rules (oh damned…), product catalogs in different formats for different countries with different references, all the business logic inherent to each country, different payment providers, crazy front end compatibilities etc.

I have a hundred of crazy anecdotes to tell. This was clearly not a job for me at first glance: I am a researcher, I have an open source culture background, I am not tailored for this kind of project. But at the end of the story, I learned a lot. Really a lot. I have a better overview of the crazy things any customer can ask, or has to deal with, and the infrastructure craziness that can be set up. I learned how to make better things: How to transform a really crappy software into something understandable by everyone, how to not break a 10+ years old progam with no test etc. And it requires skills. I learned it the hard way, but I learned it.

Why leaving?

Because even if I learned during my time at Liip, the Web agency model was definitively not for me. I am very thankful to every Liiper, I had a great time, I love the Web, but not in an agency.

My son is now 21 months old, and I need fresh air. I can take new challenges.

Welcome Automattic

Automattic's logo

Automattic is the company behind WordPress.com, WooCommerce, Akismet, Simplenote, Cloudup, Simperium, Gravatar and other giant services.

I came to Automattic by coincidence. I was looking for a sponsor for Tagua VM, and someone pointed me out Automattic. After some researches about the company, it appears that it could be a really great place where to work. So I applied.

The hiring process was 4 months long. It was exhausting because it happened at the same time than a big sprint at Liip (remember the SaaS platform for The A11y Machine?). But after 4 months, it appears I succeeded, and I am very glad of that fact!

I am just starting my job at Automattic. I don’t have anything strong and finite to say now, apart that everything is just awesome so far. In few weeks, I am likely to write about my start at Automattic I did, see Welcome to Chaos. They have a very interesting way to get you on board.

Time for a new adventure!