From Rust to beyond: The PHP galaxy

This blog post is part of a series explaining how to send Rust beyond earth, into many different galaxies. Rust has visited:
The galaxy we will explore today is the PHP galaxy. This post will explain what PHP is, how to compile any Rust program to C and then to a PHP native extension.

What is PHP, and why?

PHP is a:
popular general-purpose scripting language that is especially suited to Web development. Fast, flexible, and pragmatic, PHP powers everything from your blog to the most popular websites in the world.
PHP has sadly acquired a bad reputation along the years, but recent releases (since PHP 7.0 mostly) have introduced neat language features, and many cleanups, which are excessively ignored by haters. PHP is also a fast scripting language, and is very flexible. PHP now has declared types, traits, variadic arguments, closures (with explicit scopes!), generators, and a huge backward compatibility. The development of PHP is led by RFCs, which is an open and democratic process. The Gutenberg project is a new editor for WordPress. The latter is written in PHP. This is naturally that we want a native extension for PHP to parse the Gutenberg post format. PHP is a language with a specification. The most popular virtual machine is Zend Engine. Other virtual machines exist, like HHVM (but the PHP support has been dropped recently in favor of their own PHP fork, called Hack), Peachpie, or Tagua VM (under development). In this post, we will create an extension for Zend Engine. This virtual machine is written in C. Great, we have visited the C galaxy in the previous episode!

Rust 🚀 C 🚀 PHP

Rust to PHP To port our Rust parser into PHP, we first need to port it to C. It’s been done in the previous episode. Two files result from this port to C: libgutenberg_post_parser.a and gutenberg_post_parser.h, respectively a static library, and the header file.

Bootstrap with a skeleton

PHP comes with a script to create an extension skeleton/template, called ext_skel.php. This script is accessible from the source of the Zend Engine virtual machine (which we will refer to as php-src). One can invoke the script like this:
$ cd php-src/ext/
$ ./ext_skel.php \
      --ext gutenberg_post_parser \
      --author 'Ivan Enderlin' \
      --dir /path/to/extension \
      --onlyunix
$ cd /path/to/extension
$ ls gutenberg_post_parser
tests/
.gitignore
CREDITS
config.m4
gutenberg_post_parser.c
php_gutenberg_post_parser.h
The ext_skel.php script recommends to go through the following steps:
  • Rebuild the configuration of the PHP source (run ./buildconf at the root of the php-src directory),
  • Reconfigure the build system to enable the extension, like ./configure --enable-gutenberg_post_parser,
  • Build with make,
  • Done.
But our extension is very likely to live outside the php-src tree. So we will use phpize instead. phpize is an executable that comes with php, php-cgi, phpdbg, php-config etc. It allows to compile extensions against an already compiled php binary, which is perfect in our case! We will use it like this :
$ cd /path/to/extension/gutenberg_post_parser

$ # Get the bin directory for PHP utilities.
$ PHP_PREFIX_BIN=$(php-config --prefix)/bin

$ # Clean (except if it is the first run).
$ $PHP_PREFIX_BIN/phpize --clean

$ # “phpize” the extension.
$ $PHP_PREFIX_BIN/phpize

$ # Configure the extension for a particular PHP version.
$ ./configure --with-php-config=$PHP_PREFIX_BIN/php-config

$ # Compile.
$ make install
In this post, we will not show all the edits we have done, but we will rather focus on the extension binding. All the sources can be found here. Shortly, here is the config.m4 file:
PHP_ARG_ENABLE(gutenberg_post_parser, whether to enable gutenberg_post_parser support,
[  --with-gutenberg_post_parser          Include gutenberg_post_parser support], no)

if  test "$PHP_GUTENBERG_POST_PARSER" != "no"; then
  PHP_SUBST(GUTENBERG_POST_PARSER_SHARED_LIBADD)

  PHP_ADD_LIBRARY_WITH_PATH(gutenberg_post_parser, ., GUTENBERG_POST_PARSER_SHARED_LIBADD)

  PHP_NEW_EXTENSION(gutenberg_post_parser, gutenberg_post_parser.c, $ext_shared)
fi
What it does is basically the following:
  • Register the --with-gutenberg_post_parser option in the build system, and
  • Declare the static library to compile with, and the source of the extension itself.
We must add the libgutenberg_post_parser.a and gutenberg_post_parser.h files in the same directory (a symlink is perfect), to get a structure such as:
$ ls gutenberg_post_parser
tests/                       # from ext_skel
.gitignore                   # from ext_skel
CREDITS                      # from ext_skel
config.m4                    # from ext_skel (edited)
gutenberg_post_parser.c      # from ext_skel (will be edited)
gutenberg_post_parser.h      # from Rust
libgutenberg_post_parser.a   # from Rust
php_gutenberg_post_parser.h  # from ext_skel
The core of the extension is the gutenberg_post_parser.c file. This file is responsible to create the module, and to bind our Rust code to PHP.

The module, aka the extension

As said, we will work in the gutenberg_post_parser.c file. First, let’s include everything we need:
#include "php.h"
#include "ext/standard/info.h"
#include "php_gutenberg_post_parser.h"
#include "gutenberg_post_parser.h"
The last line includes the gutenberg_post_parser.h file generated by Rust (more precisely, by cbindgen, if you don’t remember, take a look at the previous episode). Then, we have to decide what API we want to expose into PHP? As a reminder, the Rust parser produces an AST defined as:
pub enum Node<'a> {
    Block {
        name: (Input<'a>, Input<'a>),
        attributes: Option<Input<'a>>,
        children: Vec<Node<'a>>
    },
    Phrase(Input<'a>)
}
The C variant of the AST is very similar (with more structures, but the idea is almost identical). So in PHP, the following structure has been selected:
class Gutenberg_Parser_Block {
    public string $namespace;
    public string $name;
    public string $attributes;
    public array $children;
}

class Gutenberg_Parser_Phrase {
    public string $content;
}

function gutenberg_post_parse(string $gutenberg_post): array;
The gutenberg_post_parse function will output an array of objects of kind Gutenberg_Parser_Block or Gutenberg_Parser_Phrase, i.e. our AST. So, let’s declare those classes!

Declare the classes

Note: The next 4 code blocks are not the core of the post, it is just code that needs to be written, you can skip it if you are not about to write a PHP extension.
zend_class_entry *gutenberg_parser_block_class_entry;
zend_class_entry *gutenberg_parser_phrase_class_entry;
zend_object_handlers gutenberg_parser_node_class_entry_handlers;

typedef struct _gutenberg_parser_node {
    zend_object zobj;
} gutenberg_parser_node;
A class entry represents a specific class type. A handler is associated to a class entry. The logic is somewhat complicated. If you need more details, I recommend to read the PHP Internals Book. Then, let’s create a function to instanciate those objects:
static zend_object *create_parser_node_object(zend_class_entry *class_entry)
{
    gutenberg_parser_node *gutenberg_parser_node_object;

    gutenberg_parser_node_object = ecalloc(1, sizeof(*gutenberg_parser_node_object) + zend_object_properties_size(class_entry));

    zend_object_std_init(&gutenberg_parser_node_object->zobj, class_entry);
    object_properties_init(&gutenberg_parser_node_object->zobj, class_entry);

    gutenberg_parser_node_object->zobj.handlers = &gutenberg_parser_node_class_entry_handlers;

    return &gutenberg_parser_node_object->zobj;
}
Then, let’s create a function to free those objects. It works in two steps: Destruct the object by calling its destructor (in the user-land), then free it for real (in the VM-land):
static void destroy_parser_node_object(zend_object *gutenberg_parser_node_object)
{
    zend_objects_destroy_object(gutenberg_parser_node_object);
}

static void free_parser_node_object(zend_object *gutenberg_parser_node_object)
{
    zend_object_std_dtor(gutenberg_parser_node_object);
}
Then, let’s initialize the “module”, i.e. the extension. During the initialisation, we will create the classes in the user-land, declare their attributes etc.
PHP_MINIT_FUNCTION(gutenberg_post_parser)
{
    zend_class_entry class_entry;

    // Declare Gutenberg_Parser_Block.
    INIT_CLASS_ENTRY(class_entry, "Gutenberg_Parser_Block", NULL);
    gutenberg_parser_block_class_entry = zend_register_internal_class(&class_entry TSRMLS_CC);

    // Declare the create handler.
    gutenberg_parser_block_class_entry->create_object = create_parser_node_object;

    // The class is final.
    gutenberg_parser_block_class_entry->ce_flags |= ZEND_ACC_FINAL;

    // Declare the `namespace` public attribute,
    // with an empty string for the default value.
    zend_declare_property_string(gutenberg_parser_block_class_entry, "namespace", sizeof("namespace") - 1, "", ZEND_ACC_PUBLIC);

    // Declare the `name` public attribute,
    // with an empty string for the default value.
    zend_declare_property_string(gutenberg_parser_block_class_entry, "name", sizeof("name") - 1, "", ZEND_ACC_PUBLIC);

    // Declare the `attributes` public attribute,
    // with `NULL` for the default value.
    zend_declare_property_null(gutenberg_parser_block_class_entry, "attributes", sizeof("attributes") - 1, ZEND_ACC_PUBLIC);

    // Declare the `children` public attribute,
    // with `NULL` for the default value.
    zend_declare_property_null(gutenberg_parser_block_class_entry, "children", sizeof("children") - 1, ZEND_ACC_PUBLIC);

    // Declare the Gutenberg_Parser_Block.

    … skip …

    // Declare Gutenberg parser node object handlers.

    memcpy(&gutenberg_parser_node_class_entry_handlers, zend_get_std_object_handlers(), sizeof(gutenberg_parser_node_class_entry_handlers));

    gutenberg_parser_node_class_entry_handlers.offset = XtOffsetOf(gutenberg_parser_node, zobj);
    gutenberg_parser_node_class_entry_handlers.dtor_obj = destroy_parser_node_object;
    gutenberg_parser_node_class_entry_handlers.free_obj = free_parser_node_object;

    return SUCCESS;
}
If you are still reading, first: Thank you, and second: Congrats! Then, there is a PHP_RINIT_FUNCTION and a PHP_MINFO_FUNCTION functions that are already generated by the ext_skel.php script. Same for the module entry definition and other module configuration details.

The gutenberg_post_parse function

We will now focus on the gutenberg_post_parse PHP function. This function takes a string as a single argument  and returns either false if the parsing failed, or an array of objects of kind Gutenberg_Parser_Block or Gutenberg_Parser_Phrase otherwise. Let’s write it! Notice that it is declared with the PHP_FUNCTION macro.
PHP_FUNCTION(gutenberg_post_parse)
{
    char *input;
    size_t input_len;

    // Read the input as a string.
    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &input, &input_len) == FAILURE) {
        return;
    }
At this step, the argument has been declared and typed as a string ("s"). The string value is in input and the string length is in input_len. The next step is to parse the input. (The length of the string is not needed). This is where we are going to call our Rust code! Let’s do that:
    // Parse the input.
    Result parser_result = parse(input);

    // If parsing failed, then return false.
    if (parser_result.tag == Err) {
        RETURN_FALSE;
    }

    // Else map the Rust AST into a PHP array.
    const Vector_Node nodes = parse_result.ok._0;
The Result type and the parse function come from Rust. If you don’t remember those types, please read the previous episode about the C galaxy. Zend Engine has a macro called RETURN_FALSE to return… false! Handy isn’t it? Finally, if everything went well, we get back a collection of node as a Vector_Node type. The next step is to map those Rust/C types into PHP types, i.e. an array of the Gutenberg classes. Let’s go:
    // Note: return_value is a “magic” variable that holds the value to be returned.
    //
    // Allocate an array.
    array_init_size(return_value, nodes.length);

    // Map the Rust AST.
    into_php_objects(return_value, &nodes);
}
Done 😁! Oh wait… the into_php_objects function need to be written!

The into_php_objects function

This function is not terribly complex: It’s just full of Zend Engine specific API as expected. We are going to explain how to map a Block into a Gutenberg_Parser_Block object, and to let the Phrase mapping to Gutenberg_Parser_Phrase for the assiduous readers. And there we go:
void into_php_objects(zval *php_array, const Vector_Node *nodes)
{
    const uintptr_t number_of_nodes = nodes->length;

    if (number_of_nodes == 0) {
        return;
    }

    // Iterate over all nodes.
    for (uintptr_t nth = 0; nth < number_of_nodes; ++nth) {
        const Node node = nodes->buffer[nth];

        if (node.tag == Block) {
            // Map Block into Gutenberg_Parser_Block.
        } else if (node.tag == Phrase) {
            // Map Phrase into Gutenberg_Parser_Phrase.
        }
    }
}
Now let’s map a block. The process is the following:
  1. Allocate PHP strings for the block namespace, and for the block name,
  2. Allocate an object,
  3. Set the block namespace and the block name to their respective object properties,
  4. Allocate a PHP string for the block attributes if any,
  5. Set the block attributes to its respective object property,
  6. If any children, initialise a new array, and call into_php_objects with the child nodes and the new array,
  7. Set the children to its respective object property,
  8. Finally, add the block object inside the array to be returned.
const Block_Body block = node.block;
zval php_block, php_block_namespace, php_block_name;

// 1. Prepare the PHP strings.
ZVAL_STRINGL(&php_block_namespace, block.namespace.pointer, block.namespace.length);
ZVAL_STRINGL(&php_block_name, block.name.pointer, block.name.length);
Do you remember that namespace, name and other similar data are of type Slice_c_char? It’s just a structure with a pointer and a length. The pointer points to the original input string, so that there is no copy (and this is the definition of a slice actually). Well, Zend Engine has a ZVAL_STRINGL macro that allows to create a string from a pointer and a length, great! Unfortunately for us, Zend Engine does a copy behind the scene… There is no way to keep the pointer and the length only, but it keeps the number of copies small. I think it is to take the full ownership of the data, which is required for the garbage collector.
// 2. Create the Gutenberg_Parser_Block object.
object_init_ex(&php_block, gutenberg_parser_block_class_entry);
The object has been instanciated with a class represented by the gutenberg_parser_block_class_entry.
// 3. Set the namespace and the name.
add_property_zval(&php_block, "namespace", &php_block_namespace);
add_property_zval(&php_block, "name", &php_block_name);

zval_ptr_dtor(&php_block_namespace);
zval_ptr_dtor(&php_block_name);
The zval_ptr_dtor adds 1 to the reference counter. This is required for the garbage collector.
// 4. Deal with block attributes if some.
if (block.attributes.tag == Some) {
    Slice_c_char attributes = block.attributes.some._0;
    zval php_block_attributes;

    ZVAL_STRINGL(&php_block_attributes, attributes.pointer, attributes.length);

    // 5. Set the attributes.
    add_property_zval(&php_block, "attributes", &php_block_attributes);

    zval_ptr_dtor(&php_block_attributes);
}
It is similar to what has been done for namespace and name. Now let’s continue with children.
// 6. Handle children.
const Vector_Node *children = (const Vector_Node*) (block.children);

if (children->length > 0) {
    zval php_children_array;

    array_init_size(&php_children_array, children->length);

    // Recursion.
    into_php_objects(&php_children_array, children);

    // 7. Set the children.
    add_property_zval(&php_block, "children", &php_children_array);

    Z_DELREF(php_children_array);
}

free((void*) children);
Finally, add the block instance into the array to be returned:
// 8. Insert the object in the collection.
add_next_index_zval(php_array, &php_block);
The entire code lands here.

PHP extension 🚀 PHP userland

Now the extension is written, we have to compile it. That’s the repetitive set of commands we have shown above with phpize. Once the extension is compiled, the generated gutenberg_post_parser.so file must be located in the extension directory. This directory can be found with the following command:
$ php-config --extension-dir
For instance, in my computer, the extension directory is /usr/local/Cellar/php/7.2.11/pecl/20170718. Then, to enable the extension for a given execution, you must write:
$ php -d extension=gutenberg_post_parser -m | \
      grep gutenberg_post_parser
Or, to enable the extension for all executions, locate the php.ini file with php --ini and edit it to add:
extension=gutenberg_post_parser
Done! Now, let’s use some reflection to check the extension is correctly loaded and handled by PHP:
$ php --re gutenberg_post_parser
Extension [ <persistent> extension #64 gutenberg_post_parser version 0.1.0 ] {

  - Functions {
    Function [ <internal:gutenberg_post_parser> function gutenberg_post_parse ] {

      - Parameters [1] {
        Parameter #0 [ <required> $gutenberg_post_as_string ]
      }
    }
  }

  - Classes [2] {
    Class [ <internal:gutenberg_post_parser> final class Gutenberg_Parser_Block ] {

      - Constants [0] {
      }

      - Static properties [0] {
      }

      - Static methods [0] {
      }

      - Properties [4] {
        Property [ <default> public $namespace ]
        Property [ <default> public $name ]
        Property [ <default> public $attributes ]
        Property [ <default> public $children ]
      }

      - Methods [0] {
      }
    }

    Class [ <internal:gutenberg_post_parser> final class Gutenberg_Parser_Phrase ] {

      - Constants [0] {
      }

      - Static properties [0] {
      }

      - Static methods [0] {
      }

      - Properties [1] {
        Property [ <default> public $content ]
      }

      - Methods [0] {
      }
    }
  }
}
Everything looks good: There is one function and two classes that are defined as expected. Now, let’s write some PHP code for the first time in this blog post!
<?php

var_dump(
    gutenberg_post_parse(
        '<!-- wp:foo /-->bar<!-- wp:baz -->qux<!-- /wp:baz -->'
    )
);

/**
 * Will output:
 *     array(3) {
 *       [0]=>
 *       object(Gutenberg_Parser_Block)#1 (4) {
 *         ["namespace"]=>
 *         string(4) "core"
 *         ["name"]=>
 *         string(3) "foo"
 *         ["attributes"]=>
 *         NULL
 *         ["children"]=>
 *         NULL
 *       }
 *       [1]=>
 *       object(Gutenberg_Parser_Phrase)#2 (1) {
 *         ["content"]=>
 *         string(3) "bar"
 *       }
 *       [2]=>
 *       object(Gutenberg_Parser_Block)#3 (4) {
 *         ["namespace"]=>
 *         string(4) "core"
 *         ["name"]=>
 *         string(3) "baz"
 *         ["attributes"]=>
 *         NULL
 *         ["children"]=>
 *         array(1) {
 *           [0]=>
 *           object(Gutenberg_Parser_Phrase)#4 (1) {
 *             ["content"]=>
 *             string(3) "qux"
 *           }
 *         }
 *       }
 *     }
 */
It works very well!

Conclusion

The journey is:
  • A string written in PHP,
  • Allocated by the Zend Engine from the Gutenberg extension,
  • Passed to Rust through FFI (static library + header),
  • Back to Zend Engine in the Gutenberg extension,
  • To generate PHP objects,
  • That are read by PHP.
Rust fits really everywhere! We have seen in details how to write a real world parser in Rust, how to bind it to C and compile it to a static library in addition to C headers, how to create a PHP extension exposing one function and two objects, how to integrate the C binding into PHP, and how to use this extension in PHP. As a reminder, the C binding is about 150 lines of code. The PHP extension is about 300 lines of code, but substracting “decorations” (the boilerplate to declare and manage the extension) that are automatically generated, the PHP extension reduces to about 200 lines of code. Once again, I find this is a small surface of code to review considering the fact that the parser is still written in Rust, and modifying the parser will not impact the bindings (except if the AST is updated obviously)! PHP is a language with a garbage collector. It explains why all strings are copied, so that they are owned by PHP itself. However, the fact that Rust does not copy any data saves memory allocations and deallocations, which is the biggest cost most of the time. Rust also provides safety. This property can be questionned considering the number of binding we are going through: Rust to C to PHP: Does it still hold? From the Rust perspective, yes, but everything that happens inside C or PHP must be considered unsafe. A special care must be put in the C binding to handle all situations. Is it still fast? Well, let’s benchmark. I would like to remind that the first goal of this experiment was to tackle the bad performance of the original PEG.js parser. On the JavaScript ground, WASM and ASM.js have shown to be very much faster (see the WebAssembly galaxy, and the ASM.js galaxy). For PHP, phpegjs is used: It reads the grammar written for PEG.js and compiles it to PHP. Let’s see how they compare:
file PEG PHP parser (ms) Rust parser as a PHP extension (ms) speedup
demo-post.html 30.409 0.0012 × 25341
shortcode-shortcomings.html 76.39 0.096 × 796
redesigning-chrome-desktop.html 225.824 0.399 × 566
web-at-maximum-fps.html 173.495 0.275 × 631
early-adopting-the-future.html 280.433 0.298 × 941
pygmalian-raw-html.html 377.392 0.052 × 7258
moby-dick-parsed.html 5,437.630 5.037 × 1080
The PHP extension of the Rust parser is in average 5230 times faster than the actual PEG PHP implementation. The median of the speedup is 941. Another huge issue was that the PEG parser was not able to handle many Gutenberg documents because of a memory limit. Of course, it is possible to grow the size of the memory, but it is not ideal. With the Rust parser as a PHP extension, memory stays constant and close to the size of the parsed document. I reckon we can optimise the extension further to generate an iterator instead of an array. This is something I want to explore and analyse the impact on the performance. The PHP Internals Book has a chapter about Iterators. We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting. Thanks for reading!

One conference per day, for one year (2017)

My self-assigned challenge for 2017 was to watch at least one conference per day, for one year. That’s the first time I try this challenge. Let’s dive in for a recap.

267 conferences

In some way, I failed the challenge because I’ve been able to watch only 267 conferences. With an average of 34 minutes per conference, I’ve watched 9078 minutes, or 151 hours of freely available conferences online. Why did I fail to watch 365 of them? Because my first kid was 1.5 years in January 2017, a new little lady came in December 2017, I got a new job, I travelled for my job, I gave talks, I maintain important open source projects requiring lot of time, I’m building my own self-sufficient ecological house, the vegetable garden requires many hours, I watch other videos, and because I’m lazy sometimes. Most of the time, I was able to watch 2 or 3 conferences in a row.

Where to find the resources?

All these conferences are freely available online, on YouTube, or on Vimeo, for most of them. The channel I mostly watch are the following:

It’s very Computer Science centric as you might have noticed, and it targets Rust, C++, Elm, LLVM, or Web technologies (JS, CSS…), but not only, you can find Haskell or Clojure sometimes.

My best-of list

In March 2017, more and more people were questionning me, and asked for sharing. I then decided to start a playlist of my “best-of” conferences. I’ve added 78 conferences in 2017, and 3 new conferences have been added since then.

Thumnails of my “best-of” 2017

Thoughts and conclusion

The challenge was sometimes easy and relaxing, or it was very hard to understand everything especially at 2am after a long day (looking at you CppCon). But it has been a very enjoyable way to learn a lot in a very short period of time. Many speakers are talented, and listening to them is a real pleasure. Some others are just… let’s say unprepared, and it’s good to stop and jump onto another talk. It’s also a good way to get inspired by technologies you don’t necessarily know (for instance, I’m not a big fan of Clojure, but some projects are really inspiring, like Proto REPL).

Sometimes I tweeted about the talk I watched, and it was quite appreciated too. I reckon because it’s a fun and an easy way to learn, especially with the help of video platforms like Youtube.

Am I going to continue this challenge in 2018? Yes! But maybe not at this frequency. It’s now part of my routine to watch conferences many times per week. I like it. I don’t want to stop.

As a closing note, I would like to thank every speakers, and more importantly, every conference organizer. You are doing an amazing job: From the program, to the event, to the final sharing on Internet with everyone. Most of you are volunteers. I know the work it represents. You are producing extremely valuable resources. Thank you!

Random thoughts about `::class` in PHP

The special ::class constant allows for fully qualified class name resolution at compile, this is useful for namespaced classes.

I’m quoting the PHP manual. But things can be funny sometimes. Let’s go through some examples.

  • use A\B as C;
    
    $_ = C::class;

    resolves to A\B, which is perfect 🙂

  • class C
    {
        public function f()
        {
            $_ = self::class;
        }
    }

    resolves to C, which is perfect 😀

  • class C { }
    
    class D extends C
    {
        public function f()
        {
            $_ = parent::class;
        }
    }

    resolves to C, which is perfect 😄

  • class C
    {
        public static function f()
        {
            $_ = static::class;
        }
    }
    
    class D extends C { }
    
    D::f();

    resolves to D, which is perfect 😍

  • 'foo'::class

    resolves to 'foo', which is… huh? 🤨

  • "foo"::class

    resolves to 'foo', which is… expected somehow 😕

  • $a = 'oo';
    "f{$a}"::class

    generates a parse error 🙃

  • PHP_VERSION::class

    resolves to 'PHP_VERSION', which is… strange: It resolves to the fully qualified name of the constant, not the class 🤐

::class is very useful to get rid off of the get_class or the get_called_class functions, or even the get_class($this) trick. This is something truly useful in PHP where entities are referenced as strings, not as symbols. ::class on constants makes sense, but the name is no longer relevant. And finally, ::class on single quote strings is absolutely useless; on double quotes strings it is a source of error because the value can be dynamic (and remember, ::class is resolved at compile time, not at run time).

atoum supports TeamCity

atoum is a popular PHP test framework. TeamCity is a Continuous Integration and Continuous Delivery software developed by Jetbrains. Despites atoum supports many industry standards to report test execution verdicts, TeamCity uses its own non-standard report, and thus atoum is not compatible with TeamCity… until now.

icon_TeamCity

The atoum/teamcity-extension provides TeamCity support inside atoum. When executing tests, the reported verdicts are understandable by TeamCity, and activate all its UI features.

Install

If you have Composer, just run:

$ composer require atoum/teamcity-extension '~1.0'

From this point, you need to enable the extension in your .atoum.php configuration file. The following example forces to enable the extension for every test execution:

$extension = new atoum\teamcity\extension($script);
$extension->addToRunner($runner);

The following example enables the extension only within a TeamCity environment:

$extension = new atoum\teamcity\extension($script);
$extension->addToRunnerWithinTeamCityEnvironment($runner);

This latter installation is recommended. That’s it 🙂.

Glance

The default CLI report looks like this:

Default atoum CLI report

The TeamCity report looks like this in your terminal (note the TEAMCITY_VERSION variable as a way to emulate a TeamCity environment):

TeamCity report inside the terminal

Which is less easy to read. However, when it comes into TeamCity UI, we will have the following result:

TeamCity running atoum

We are using it at Automattic. Hope it is useful for someone else!

If you find any bugs, or would like any other features, please use Github at the following repository: https://github.com/Hywan/atoum-teamcity-extension/.

Export functions in PHP à la Javascript

Warning: This post is totally useless. It is the result of a fun private company thread.

Export functions in Javascript

In Javascript, a file can export functions like this:

export function times2(x) {
    return x * 2;
}

And then we can import this function in another file like this:

import {times2} from 'foo';

console.log(times2(21)); // 42

Is it possible with PHP?

Export functions in PHP

Every entity is public in PHP: Constant, function, class, interface, or trait. They can live in a namespace. So exporting functions in PHP is absolutely useless, but just for the fun, let’s keep going.

A PHP file can return an integer, a real, an array, an anonymous function, anything. Let’s try this:

<?php

return function (int $x): int {
    return $x * 2;
};

And then in another file:

<?php

$times2 = require 'foo.php';
var_dump($times2(21)); // int(42)

Great, it works.

What if our file returns more than one function? Let’s use an array (which has most hashmap properties):

<?php

return [
    'times2' => function (int $x): int {
        return $x * 2;
    },
    'answer' => function (): int {
        return 42;
    }
];

To choose what to import, let’s use the list intrinsic. It has several forms: With or without key matching, long (list(…)) and short syntax ([…]). Because we are modern, we will use the short syntax with key matching to selectively import functions:

<?php

['times2' => $mul] = require 'foo.php';

var_dump($mul(21)); // int(42)

Notice that times2 has been aliased to $mul. What a feature!

Is it useful? Absolutely not. Is it fun? For me it is.

Finite-State Machine as a Type System illustrated with a store product

Hello fellow coders!

In this article, I would like to talk about how to implement a Finite-State Machine (FSM) with the PHP type system. The example is a store product (in an e-commerce solution for instance), something we are likely to meet once in our lifetime. Our goal is to simply avoid impossible states and transitions.

I am in deep love with Type theory, however I will try to keep the formulas away from this article to focus on the code. Moreover, you might be aware that the PHP runtime type system is somewhat very permissive and “poor” (this is not a formal definition), hopefully some tricks can help us to express nice constraints.

The Product FSM

A product in a store might have the following states:

  • Active: Can be purchased,
  • Inactive: Has been cancelled or discontinued (a discontinued product can no longer be purchased),
  • Purchased and renewable,
  • Purchased and not renewable,
  • Purchased and cancellable.

The transitions between these states can be viewed as a Finite-State Machine (FSM).

AyxEp2j8B4hCLIZEI4p9By_CIrT8IymfJkNYYjQALT3LjLDmv784qquALWfA1QL5oHc9nQbAN4u8mQBKlDHo9QWoPv18Vbvogcv-Mfe2GZrKWmj8EZaHA9-ZnEMC8GG0
Product FSM (editable source).

We read this graph as: A product is in the state A. If the purchase action is called, then it transitions to the state B. If the once-off purchase action is called, then it transitions to the state C. From the state B, if the renew action is called, it remains in the same state. If the cancel action is called, it transitions to the D state. Same for the C to D states.

Our goal is to respect this FSM. Invalid actions must be impossible to do.

Finite-State Machine as a Type System

Having a FSM is a good thing to define the states and the transitions between them: It is formal and clear. However, it is tested at runtime, not at compile-time, i.e. if statements are required to test if the state of a product can transition into another state, or else throw an exception, and this is decided at runtime. Note that PHP does not really have a compile-time because it is an online compiler (learn more by reading Tagua VM, a safe PHP virtual machine, at slide 29). Our goal is to prevent illegal/invalid states at parse-/compile-time so that the PHP virtual machine, IDE or static analysis tools can prove the state of a product without executing PHP code.

Why is this important? Imagine that we decide to change a product to be once-off purchasable instead of purchasable, then we can no longer renew it. We replace an interface on this product, and boom, the IDE tells us that the code is broken in x places. It detects impossible scenarios ahead of code execution.

No more talking. Here is the code.

The mighty product

/**
 * A product.
 */
interface Product { }

A product is a class implementing the Product interface. It allows to type a generic product, with no regards about its state.

Active and inactive

/**
 * A product that is active.
 */
interface Active extends Product
{
    public function getProduct(): self;
}

/**
 * A product that has been cancelled, or not in stock.
 */
interface Inactive extends Product
{
    public function getProduct(): self;
}

The Active and Inactive interfaces are useful to create constraints such as:

  • A product can be purchased only if it is active, and
  • A product is inactive if and only if it has been cancelled,
  • To finally conclude that an inactive product can no longer be purchased, nor renewed, nor cancelled.

Basically, it defines the axiom (initial state) and the final states of our FSM.

The getProduct(): self trick will make sense later. It helps to express the following constraint: “A valid product cannot be invalid, and vice-versa”, i.e. both interfaces cannot be implemented by the same value.

Purchase, renew, and cancel

/**
 * A product that can be purchased.
 */
interface Purchasable extends Active
{
    public function purchase(): Renewable;
}

Only an active product can be purchased. The action is purchase and it generates a product that is renewable. purchase transitions from the state A to B (regarding the graph above).

/**
 * A product that can be cancelled.
 */
interface Cancellable extends Active
{
    public function cancel(): Inactive;
}

Only an active product can be cancelled. The action is cancel and it generates an inactive product, so it transitions from the state B to D.

/**
 * A product that can be renewed.
 */
interface Renewable extends Cancellable
{
    public function renew(): self;
}

A renewable product is also cancellable. The action is renew and this is a reflexive transition from the state B to B.

/**
 * A product that can be once-off purchased, i.e. it can be purchased but not
 * renewed.
 */
interface PurchasableOnce extends Active
{
    public function purchase(): Cancellable;
}

Finally, a once-off purchasable product has one action: purchase that produces a Cancellable product, and it transitions from the state A to C.

Take a breath

AyxEp2j8B4hCLIZEI4p9By_CIrT8IymfJkNYAYv9B4bLS4mkoInBLQZcKW22QArO1LrTEmL7CCyHp7PIi59G2YWjIiv8B4vCoacriYg0S5ALmAgS4Ag2KlDIoo5gYa1C9IHZdD6CySzBHZ6g5kOWpxn4P2T1Z7S1wNPE1Eh9oO5Oa0pcG6nm9g2c5W
Detailed product FSM (editable source).

So far we have defined interfaces, but the FSM is not implemented yet. Interfaces only define constraints in our type system. An interface provides a constraint but also defines type capabilities: What operations can be performed on a value implementing a particular interface.

SecretProduct

Let’s consider the SecretProduct as a new super secret product that will revolutionise our store:

/**
 * The `SecretProduct` class is:
 *
 *   * A product,
 *   * Active,
 *   * Purchasable.
 *
 * Note that in this implementation, the `SecretProduct` instance is mutable: Every
 * action happens on the same `SecretProduct` instance. It makes sense because
 * having 2 instances of the same product with different states might be error-prone
 * in most scenarios.
 */
class SecretProduct implements Active, Purchasable
{
    public function getProduct(): Active
    {
        return $this;
    }

    /**
     * Purchase the product will return an active product that is renewable,
     * and also cancellable.
     */
    public function purchase(): Renewable
    {
        return new class ($this->getProduct()) implements Renewable {
            protected $product;

            public function __construct(SecretProduct $product)
            {
                $this->product = $product;
                // Do the purchase.
            }

            public function getProduct(): Active
            {
                return $this->product;
            }

            public function renew(): Renewable
            {
                // Do the renew.
                return $this;
            }

            public function cancel(): Inactive
            {
                return new class ($this->getProduct()) implements Inactive {
                    protected $product;

                    public function __construct(SecretProduct $product)
                    {
                        $this->product = $product;
                        // Do the cancel.
                    }

                    public function getProduct(): Inactive
                    {
                        return $this->product;
                    }
                };
            }
        };
    }
}

The SecretProduct is a product that is active and purchasable. PHP verifies that the Active::getProduct method is implemented, and that the Purchasable::purchase method is implemented too.

When this latter is called, it returns an object implementing the Renewable interface (which is also a cancellable active product). The object in this context is an instance of an anonymous class implementing the Renewable interface. So the Active::getProduct, Renewable::renew, and Cancellable::cancel methods must be implemented.

Having an anonymous class is not required at all, this is just simpler for the example. A named class may even be better from the testing point of view.

Note that:

  • The real purchase action is performed in the constructor of the anonymous class: This is not a hard rule, this is just convenient; it can be done in the method before returning the new instance,
  • The real renew action is performed in the renew method before returning $this,
  • And the real cancel action is performed in… we have to dig a little bit more (the principle is exactly the same though):
    • The Cancellable::cancel method must return an object implementing the Inactive interface.
    • It generates an instance of an anonymous class implementing the Inactive interface, and the real cancel action is done in the constructor.

Assert possible and impossible actions

Let’s try some valid and invalid actions. Those followings are possible actions:

assert((new SecretProduct())->purchase()                             instanceof Product);
assert((new SecretProduct())->purchase()->renew()                    instanceof Product);
assert((new SecretProduct())->purchase()->cancel()                   instanceof Product);
assert((new SecretProduct())->purchase()->renew()->renew()->cancel() instanceof Product);

It is possible to purchase a product, then renew it zero or many times, and finally to cancel it. It matches the FSM!

Those followings are impossible actions:

(new SecretProduct())->renew();
(new SecretProduct())->cancel();
(new SecretProduct())->purchase()->cancel()->purchase();
(new SecretProduct())->purchase()->cancel()->renew();
(new SecretProduct())->purchase()->purchase();
(new SecretProduct())->purchase()->cancel()->cancel();

It is impossible:

  • To renew or to cancel a product that has not been purchased,
  • To purchase or renew a product that has been cancelled,
  • To purchase a product more than once,
  • To cancel a product more than once.

Those followings are impossible implementations:

class SecretProduct implements Active, Purchasable, PurchasableOnce { }

A product cannot be purchasable and once-off purchasable at the same time, because Purchasable::purchase is not compatible with PurchasableOnce::purchase.

class SecretProduct implements Inactive, Cancellable { }

An inactive product cannot be purchased nor renewed nor cancelled because Active::getProduct and Inactive::getProduct are not compatible.

Wow, that’s great garantees isn’t it? PHP will raise fatal errors for impossible actions or impossible states. No warnings or notices: Fatal errors. Most of them are correctly inferred by IDE, so… follow the red crosses in your IDE.

Restoring a product

One major thing is missing: The state of a product is stored in the database. When loading the product, we must be able to get an instance of a product at its previous state. To avoid repeating code, we will use traits. Rebuilding the state of a product is “just” (it really is) a composition of traits.

Note: In these examples, we are using anonymous classes and traits. It is possible to achieve the same behavior with final named classes. Also we are using a repository, which is convenient for this article, but not necessarily the best solution.

Repository

The following ProductRepository\load function is just here to give you an idea of how it works.

namespace ProductRepository;

function load(int $id, string $state): Product
{
    // Load the product from the database with `$id`.
    //
    // The states can be `Renewable`, `Cancellable`, or `Inactive` (check
    // the FSM to double-check). Products that have not been purchased
    // are not in the database.

    // Fake minimal active product.
    $product = new class implements Active {
        public function getProduct(): Active {
            return $this;
        }
    };

    switch ($state) {
        // State B.
        case Renewable::class:
            return new class ($product) implements Renewable {
                use ActiveProduct;
                use RenewableProduct;
                use CancellableProduct;
            };

        // State C.
        case Cancellable::class:
            return new class ($product) implements Cancellable {
                use ActiveProduct;
                use CancellableProduct;
            };

        // State D.
        case Inactive::class:
            return new class ($product) implements Inactive {
                use InactiveProduct;
            };

        // Invalid state.
        default:
            throw new RuntimeException('Invalid product state.');
    }
}

Traits

The code must look familiar because this is just a split from the SecretProduct implementation.

trait ActiveProduct
{
    protected $product;

    public function __construct(Product $product)
    {
        $this->product = $product;
    }

    public function getProduct(): Active
    {
        return $this->product;
    }
}

trait RenewableProduct
{
    public function renew(): Renewable
    {
        // Do the renew.
        return $this;
    }
}

trait CancellableProduct
{
    public function cancel(): Inactive
    {
        return new class ($this->getProduct()) implements Inactive {
            protected $product;

            public function __construct(Product $product)
            {
                $this->product = $product;
                // Do the cancel.
            }

            public function getProduct(): Inactive
            {
                return $this->product;
            }
        };
    }
}

trait InactiveProduct
{
    protected $product;

    public function __construct(Product $product)
    {
        $this->product = $product;
    }

    public function getProduct(): Inactive
    {
        return $this->product;
    }
}

Assert possible and impossible actions

The possible actions are:

$product = ProductRepository\load(42, Renewable::class);

assert($product           instanceof Product);
assert($product->renew()  instanceof Product);
assert($product->cancel() instanceof Product);

Product 42 is assumed to be in the state B (Renewable::class), so we can renew and cancel it.

Those followings are impossible actions:

$product = ProductRepository\load(42, Renewable::class);

$product->purchase();
$product->cancel()->cancel();

It is impossible to purchase the product 42 because it is in state B, so it has already been purchased. It is impossible to cancel a product twice.

Same garantees apply here!

Conclusion

It is possible to re-implement SecretProduct with the traits we have defined for the ProductRepository, or to use named classes. I let this as an easy wrap up exercise for the reader.

The real conclusion is that we have successfully implemented the Finite-State Machine of a product with a Type System. It is impossible to have an invalid implementation that violates the constraints, such as an inactive renewable product. PHP detects it immediately at runtime. Invalid actions are also impossible, such as purchasing a product twice, or renewing a once-off purchased product. It is also detected by PHP.

All violations take the form of PHP fatal errors.

The product repository is an example of how to restore a product at a particular state, with the help of the defined interfaces, and new small and simple traits.

One more thing

It is possible to integrate product categories in this type system (like bundles). It is more complex, but possible.

I would highly recommend these following readings:

I would like to particularly emphasize a paragraph from the first article:

So what is a type? The only true definition is this: a type is a label used by a type system to prove some property of the program’s behavior. If the type checker can assign types to the whole program, then it succeeds in its proof; otherwise it fails and points out why it failed.

Seeing types as labels is a very smart way of approaching them.

I would like to thanks Marco Pivetta for the reviews!

sabre/katana

sabre/katana's logo
Project’s logo.

What is it?

sabre/katana is a contact, calendar, task list and file server. What does it mean? Assuming nowadays you have multiple devices (PC, phones, tablets, TVs…). If you would like to get your address books, calendars, task lists and files synced between all these devices from everywhere, you need a server. All your devices are then considered as clients.

But there is an issue with the server. Most of the time, you might choose Google or maybe Apple, but one may wonder: Can we trust these servers? Can we give them our private data, like all our contacts, our calendars, all our photos…? What if you are a company or an association and you have sensitive data that are really private or strategic? So, can you still trust them? Where the data are stored? Who can look at these data? More and more, there is a huge need for “personal” server.

Moreover, servers like Google or Apple are often closed: You reach your data with specific clients and they are not available in all platforms. This is for strategic reasons of course. But with sabre/katana, you are not limited. See the above schema: Firefox OS can talk to iOS or Android at the same time.

sabre/katana is this kind of server. You can install it on your machine and manage users in a minute. Each user will have a collection of address books, calendars, task lists and files. This server can talk to a loong list of devices, mainly thanks to a scrupulous respect of industrial standards:

  • Mac OS X:
    • OS X 10.10 (Yosemite),
    • OS X 10.9 (Mavericks),
    • OS X 10.8 (Mountain Lion),
    • OS X 10.7 (Lion),
    • OS X 10.6 (Snow Leopard),
    • OS X 10.5 (Leopard),
    • BusyCal,
    • BusyContacts,
    • Fantastical,
    • Rainlendar,
    • ReminderFox,
    • SoHo Organizer,
    • Spotlife,
    • Thunderbird ,
  • Windows:
    • eM Client,
    • Microsoft Outlook 2013,
    • Microsoft Outlook 2010,
    • Microsoft Outlook 2007,
    • Microsoft Outlook with Bynari WebDAV Collaborator,
    • Microsoft Outlook with iCal4OL,
    • Rainlendar,
    • ReminderFox,
    • Thunderbird,
  • Linux:
    • Evolution,
    • Rainlendar,
    • ReminderFox,
    • Thunderbird,
  • Mobile:
    • Android,
    • BlackBerry 10,
    • BlackBerry PlayBook,
    • Firefox OS,
    • iOS 8,
    • iOS 7,
    • iOS 6,
    • iOS 5,
    • iOS 4,
    • iOS 3,
    • Nokia N9,
    • Sailfish.

Did you find your device in this list? Probably yes 😉.

sabre/katana sits in the middle of all your devices and synced all your data. Of course, it is free and open source. Go check the source!

List of features

Here is a non-exhaustive list of features supported by sabre/katana. Depending whether you are a user or a developer, the features that might interest you are radically not the same. I decided to show you a list from the user point of view. If you would like to get a list from the developer point of view, please see this exhaustive list of supported RFC for more details.

Contacts

All usual fields are supported, like phone numbers, email addresses, URLs, birthday, ringtone, texttone, related names, postal addresses, notes, HD photos etc. Of course, groups of cards are also supported.

My card on Mac OS X
My card inside the native Contact application of Mac OS X.
My card on Firefox OS
My card inside the native Contact application of Firefox OS.

My photo is not in HD, I really have to update it!

Cards can be encoded into several formats. The most usual format is VCF. sabre/katana allows you to download the whole address book of a user as a single VCF file. You can also create, update and delete address books.

Calendars

A calendar is just a set of events. Each event has several properties, such as a title, a location, a date start, a date end, some notes, URLs, alarms etc. sabre/katana also support recurring events (“each last Monday of the month, at 11am…”), in addition to scheduling (see bellow).

My calendars on Mac OS X
My calendars inside the native Calendar application of Mac OS X.
My calendars on Firefox OS
My calendars inside the native Calendar application of Firefox OS.

Few words about calendar scheduling. Let’s say you are organizing an event, like New release (we always enjoy release day!). You would like to invite several people but you don’t know if they could be present or not. In your event, all you have to do is to add attendees. How are they going to be notified about this event? Two situations:

  1. Either attendees are registered on your sabre/katana server and they will receive an invite inside their calendar application (we call this iTIP),
  2. Or they are not registered on your server and they will receive an email with the event as an attached file (we call this iMIP). All they have to do is to open this event in their calendar application.
Typical mail to invite an attendee to an event
Invite an attendee by email because she is not registered on your sabre/katana server.

Notice the gorgeous map embedded inside the email!

Once they received the event, they can accept, decline or “don’t know” (they will try to be present at) the event.

Receive an invite to an event
Receive an invite to an event. Here: Gordon is inviting Hywan. Three choices for Hywan:

, or

.
Status of all attendees
Hywan has accepted the event. Here is what the event looks like. Hywan can see the response of each attendees.
Notification from attendees
Gordon is even notified that Hywan has accepted the event.

Of course, attendees will be notified too if the event has been moved, canceled, refreshed etc.

Calendars can be encoded into several formats. The most usal format is ICS. sabre/katana allows you to download the whole calendar of a user as a single ICS file. You can also create, update and delete calendars.

Task lists

A task list is exactly like a calendar (from a programmatically point of view). Instead of containg event objects, it contains todo objects.

sabre/katana supports group of tasks, reminder, progression etc.

My task lists on Mac OS X
My task lists inside the native Reminder application of Mac OS X.

Just like calendars, task lists can be encoded into several formats, whose ICS. sabre/katana allows you to download the whole task list of a user as a single ICS file. You can also create, update and delete task lists.

Files

Finally, sabre/katana creates a home collection per user: A personal directory that can contain files and directories and… synced between all your devices (as usual 😄).

sabre/katana also creates a special directory called public/ which is a public directory. Every files and directories stored inside this directory are accessible to anyone that has the correct link. No listing is prompted to protect your public data.

Just like contact, calendar and task list applications, you need a client application to connect to your home collection on sabre/katana.

Connect to a server in Mac OS X
Connect to a server with the Finder application of Mac OS X.

Then, your public directory on sabre/katana will be a regular directory as every other.

List of my files
List of my files, right here in the Finder application of Mac OS X.

sabre/katana is able to store any kind of files. Yes, any kinds. It’s just files. However, it white-lists the kind of files that can be showed in the browser. Only images, audios, videos, texts, PDF and some vendor formats (like Microsoft Office) are considered as safe (for the server). This way, associations can share musics, videos or images, companies can share PDF or Microsoft Word documents etc. Maybe in the future sabre/katana might white-list more formats. If a format is not white-listed, the file will be forced to download.

How is sabre/katana built?

sabre/katana is based on two big and solid projects:

  1. sabre/dav,
  2. Hoa.

sabre/dav is one of the most powerful CardDAV, CalDAV and WebDAV framework in the planet. Trusted by the likes of Atmail, Box, fruux and ownCloud, it powers millions of users world-wide! It is written in PHP and is open source.

Hoa is a modular, extensible and structured set of PHP libraries. Fun fact: Also open source, this project is also trusted by ownCloud, in addition to Mozilla, joliCode etc. Recently, this project has recorded more than 600,000 downloads and the community is about to reach 1000 people.

sabre/katana is then a program based on sabre/dav for the DAV part and Hoa for everything else, like the logic code inside the sabre/dav‘s plugins. The result is a ready-to-use server with a nice interface for the administration.

To ensure code quality, we use atoum, a popular and modern test framework for PHP. So far, sabre/dav has more than 1000 assertions.

Conclusion

sabre/katana is a server for contacts, calendars, task lists and files. Everything is synced, everytime and everywhere. It perfectly connects to a lot of devices on the market. Several features we need and use daily have been presented. This is the easiest and a secure way to host your own private data.

Go download it!