From Rust to beyond: Prelude

At my work, I had an opportunity to start an experiment: Writing a single parser implementation in Rust for the new Gutenberg post format, bound to many platforms and environments.

gutenberg_logo
The logo of the Gutenberg post parser project.

This series of posts is about those bindings, and explains how to send Rust beyond earth, into many different galaxies. Rust will land in:

The ship is currently flying into the Java galaxy, this series may continue if the ship does not crash or has enough resources to survive!

The Gutenberg post format

Let’s introduce quickly what Gutenberg is, and why a new post format. If you want an in-depth presentation, I highly recommend to read The Language of Gutenberg. Note that this is not required for the reader to understand the Gutenberg post format.

Gutenberg is the next WordPress editor. It is a little revolution on its own. The features it unlocks are very powerful.

The editor will create a new page- and post-building experience that makes writing rich posts effortless, and has “blocks” to make it easy what today might take shortcodes, custom HTML, or “mystery meat” embed discovery. — Matt Mullenweg

The format of a blog post was HTML. And it continues to be. However, another semantics layer is added through annotations. Annotations are written in comments and borrow the XML syntax, e.g.:

<!-- wp:ns/block-name {"attributes": "as JSON"} -->
    <p>phrase</p>
<!-- /wp:ns/block-name -->

The Gutenberg format provides 2 constructions: Block, and Phrase. The example above contains both: There is a block wrapping a phrase. A phrase is basically anything that is not a block. Let’s describe the example:

  • It starts with an annotation (<!-- … -->),
  • The wp: is mandatory to represent a Gutenberg block,
  • It is followed by a fully qualified block name, which is a pair of an optional namespace (here sets to ns, defaults to core) and a block name (here sets to block-name), separated by a slash,
  • A block has optional attributes encoded as a JSON object (see RFC 7159, Section 4, Objects),
  • Finally, a block has optional children, i.e. an heterogeneous collection of blocks or phrases. In the example above, there is one child that is the phrase <p>phrase</p>. And the following example below shows a block with no child:
<!-- wp:ns/block-name {"attributes": "as JSON"} /-->

The complete grammar can be found in the parser’s documentation.

Finally, the parser is used on the editor side, not on the rendering side. Once rendered, the blog post is a regular HTML file. Some blocks are dynamics though, but this is another topic.

block-logic-flow1
The logic flow of the editor (How Little Blocks Work).

The grammar is relatively small. The challenges are however to be as much performant and memory efficient as possible on many platforms. Some posts can reach megabytes, and we don’t want the parser to be the bottleneck. Even if it is used when creating the post state (cf. the schema above), we have measured several seconds to load some posts. Time during which the user is blocked, and waits, or see an error. In other scenarii, we have hit memory limit of the language’s virtual machines.

Hence this experimental project! The current parsers are written in JavaScript (with PEG.js) and in PHP (with phpegjs). This Rust project proposes a parser written in Rust, that can run in the JavaScript and in the PHP virtual machines, and on many other platforms. Let’s try to be very performant and memory efficient!

Why Rust?

That’s an excellent question! Thanks for asking. I can summarize my choice with a bullet list:

  • It is fast, and we need speed,
  • It is memory safe, and also memory efficient,
  • No garbage collector, which simplifies memory management across environments,
  • It can expose a C API (with Foreign Function Interface, FFI), which eases the integration into multiple environments,
  • It compiles to many targets,
  • Because I love it.

One of the goal of the experimentation is to maintain a single implementation (maybe the future reference implementation) with multiple bindings.

The parser

The parser is written in Rust. It relies on the fabulous nom library.

nom
nom will happily take a byte out of your files 🙂.

The source code is available in the src/ directory in the repository. It is very small and fun to read.

The parser produces an Abstract Syntax Tree (AST) of the grammar, where nodes of the tree are defined as:

pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phrase(Input<'a>)
}

That’s all! We find again the block name, the attributes and the children, and the phrase. Block children are defined as a collection of node, this is recursive. Input<'a> is defined as &'a [u8], i.e. a slice of bytes.

The main parser entry is the root function. It represents the axiom of the grammar, and is defined as:

pub fn root(
    input: Input
) -> Result<(Input, Vec<ast::Node>), nom::Err<Input>>;

So the parser returns a collection of nodes in the best case. Here is an simple example:

use gutenberg_post_parser::{root, ast::Node};

let input = &b"<!-- wp:foo {\"bar\": true} /-->"[..];
let output = Ok(
    (
        // The remaining data.
        &b""[..],

        // The Abstract Syntax Tree.
        vec![
            Node::Block {
                name: (&b"core"[..], &b"foo"[..]),
                attributes: Some(&b"{\"bar\": true}"[..]),
                children: vec![]
            }
        ]
    )
);

assert_eq!(root(input), output);

The root function and the AST will be the items we are going to use and manipulate in the bindings. The internal items of the parser will stay private.

Bindings

Rust to

From now, our goal is to expose the root function and the Node enum in different platforms or environments. Ready?

3… 2… 1… lift-off!

One conference per day, for one year (2017)

My self-assigned challenge for 2017 was to watch at least one conference per day, for one year. That’s the first time I try this challenge. Let’s dive in for a recap.

267 conferences

In some way, I failed the challenge because I’ve been able to watch only 267 conferences. With an average of 34 minutes per conference, I’ve watched 9078 minutes, or 151 hours of freely available conferences online. Why did I fail to watch 365 of them? Because my first kid was 1.5 years in January 2017, a new little lady came in December 2017, I got a new job, I travelled for my job, I gave talks, I maintain important open source projects requiring lot of time, I’m building my own self-sufficient ecological house, the vegetable garden requires many hours, I watch other videos, and because I’m lazy sometimes. Most of the time, I was able to watch 2 or 3 conferences in a row.

Where to find the resources?

All these conferences are freely available online, on YouTube, or on Vimeo, for most of them. The channel I mostly watch are the following:

It’s very Computer Science centric as you might have noticed, and it targets Rust, C++, Elm, LLVM, or Web technologies (JS, CSS…), but not only, you can find Haskell or Clojure sometimes.

My best-of list

In March 2017, more and more people were questionning me, and asked for sharing. I then decided to start a playlist of my “best-of” conferences. I’ve added 78 conferences in 2017, and 3 new conferences have been added since then.

Thumnails of my “best-of” 2017

Thoughts and conclusion

The challenge was sometimes easy and relaxing, or it was very hard to understand everything especially at 2am after a long day (looking at you CppCon). But it has been a very enjoyable way to learn a lot in a very short period of time. Many speakers are talented, and listening to them is a real pleasure. Some others are just… let’s say unprepared, and it’s good to stop and jump onto another talk. It’s also a good way to get inspired by technologies you don’t necessarily know (for instance, I’m not a big fan of Clojure, but some projects are really inspiring, like Proto REPL).

Sometimes I tweeted about the talk I watched, and it was quite appreciated too. I reckon because it’s a fun and an easy way to learn, especially with the help of video platforms like Youtube.

Am I going to continue this challenge in 2018? Yes! But maybe not at this frequency. It’s now part of my routine to watch conferences many times per week. I like it. I don’t want to stop.

As a closing note, I would like to thank every speakers, and more importantly, every conference organizer. You are doing an amazing job: From the program, to the event, to the final sharing on Internet with everyone. Most of you are volunteers. I know the work it represents. You are producing extremely valuable resources. Thank you!