On PHP Routing Libraries

I have been interested in different URL routing techniques in web frameworks for a few years now. I have looked at the code for Symfony 2, Aura, and Slim's routing. They, at a simplified level, work the same way. I suspect most routing libraries written in the PHP 5.3+ era work similarly and there has not been any new developments for a few years... until now. A few interesting libraries have burst onto the scene in the past month or two that do things differently. One in particular causing some discussion and debate in the community. I want to talk about what makes them different, advantages and disadvantages, and what I want out of a PHP routing library that has yet to be fulfilled.

The Status Quo

At their root routing libraries map a URL pattern (usually the rewritten path portion) to a piece of code to be executed. A secondary purpose is to extract key => value parameters from the URL path. A typical routing library lets you define routes like so.

<?php
$router->addRoute('GET', '/post/{id}/comments', ['controller' => 'PostController', 'action' => 'show']);

That last argument might be an anonymous function or some other PHP callable. And maybe instead of calling addRoute() with GET you call a get() method. Those are small implementation details. Under the hood most routing libraries take those arguments and make a Route object out of them and store a collection of these Route objects. When routing is run the collection is looped over and for each Route a regex is used to extract the parameter names from the pattern, in our example the id, and turn that pattern into a real regex pattern. The generated regex is then used to test if the Route matches the requested URL/path/whatever. And if it does match this generated regex pulls the parameter values out of the requested URL.

Bullet

Bullet PHP bills itself as a resource-oriented micro PHP framework. The novel thing it does for routing is to break the requested path into segments, one for each directory in the path. The routing engine will try matching the first directory segment. If there is a callable registered for that segment the callable will be run. The engine will continue searching for matched routes and running any registered callables until all the segments are exhausted.

OK I think an example is in order. This is shamelessly stolen from the Bullet PHP README. Suppose the requested path is /events/45/edit. The routing engine will first look for matches for just /events. Any registered callbacks will be executed. Then the routing engine will look for matches for /events/45, again any registered callbacks will be executed. Finally /events/45/edit will be used to search for matches.

The examples in the documentation show that in the /events callable you could register the routes for /events/{id} and /events/{id}/edit.

<?php
$app = new Bullet\App(array(/* some config here */));

$app->path('events', function($request) use ($app) {

    $app->get(function($request) use ($app) {
        // list events
    });
    $app->post(function($request) use ($app) {
        // create an event
    });
    $app->path('new', function($request) use ($app) {
        // new event form
    });

    $app->param('int', function($request, $id) use ($app) {
        $app->get(function($request) use ($id) {
            // View an event
        });
        $app->put(function($request) use ($id) {
            // Update event
        });
        $app->delete(function($request) use ($id) {
            // Delete event
        });
        $app->path('edit', function($request) use ($id) {
            // edit event form
        });
    });
});

This is a form of creating a tree structure with your routes. A true tree structure is something I feel is missing from most PHP web frameworks. Tree structures can considerably speed up route matching since you can skip over dozens of routes with one check. For example you could skip over all of the /admin routes for your application in one route match check. More routing libraries should try to implement a tree structure rather than just a flat list of routes to loop over.

The downsides of Bullet PHP are that it expects the callable to be a Closure. No other types are excepted. In the authors defense he does state that the Closure is not the actual controller, but you call the actual controller from the Closure. That leads to my other complaint though. Although I like the tree structure and it promotes creating RESTful, resource oriented routes, doing to requires a lot of boiler plate code. I don't want to have to retype having /events/{id} GET, /events/{id} POST, etc for every resource. Nor do I want to have to re-implement the code for instantiating a controller object and calling the action method on each project. I'd like to see more PHP routing libraries with a resources() method that does something similar to the name method in Ruby on Rails. Aura Routing 2.0 added an attachResource() method but that is the only one I'm aware of. I have added my own implementation on top of Slim.

Pux

Pux is the more controversial of the two libraries presented in this post. Pux makes big claims in performance: 48X as fast as Symfony in static routes and 31X with dynamic routes. It achieves this in three ways. 1. It stores the routes as arrays vs Route objects. I don't know how much this actually saves these days since there has been a lot of optimizations in the PHP engine surrounding objects and memory performance. 2. It pre-compiles the regex patterns. In my example of the common implementation, this step of changing the URL pattern to a regex pattern happens on every request. Good libraries will lazily do this step, converting each one until a match is found and then stopping. Bad libraries do this step on all routes regardless. The pre-compiled route arrays are stored on disk, or in memory. This is the part that really interests me. 3. It uses a C extension. This part doesn't interest me at. Of course something written in C is faster than in PHP where it doesn't have the overhead of a virtual machine or hash table lookups for every variable or array offset. The C extension is not as useful to most of us becuase we can't use it on shared hosts or most Software-as-a-Service providers. The only way to use it is to host your own PHP on a VPS.

The fact that Pux is faster due to the C extension is no surprise to anyone. It's the pre-compiling of the regex patterns that is novel to me, and seems to be glazed over in a lot of the discussion about Pux. One of the reasons I like pre-compiling the regex patterns is that pre-processing of web application code is getting more common place. Sure you don't see it in PHP, but on the client side concatenating and minifying Javascript, compiling LESS or SCSS to CSS are basically a necessity now. Libraries could compile the regexes on the fly like the do now during development. Prior to release you run a CLI script as part of your release process and the compiled regexes along with other routing information are saved to file that is then read in on each request instead of the more verbose routing information.

I can imagine other uses for this type of pre-release preprocessing. A big one is for describing relational database tables, columns and relationships for an Active Record or Data Mapper library.

I should also point out an excellent blog post by Nikita Popov, a regular PHP core contributor about how to speed up regular expression matching in routing engines. The gist of it is that by combining individual regex patterns together you can approach or beat Pux's performance. You need to do some special processing to do this and there is at least one caveat to it but it is a very intersting approach to the problem.

My Perfect Routing Library

I have had several starts in the past at writing a routing library with some of these features in mind. What I have concentrated on is having a tree structure for the routing and how to best create RESTful route groups. I've started down the road of having Collection objects and Route objects using simple recursion to iterate over them all (or skip branches). I have thought about doing this so you could iterate using one of SPL's builtin recursive iterators, but have not had time to flush anything out. I've done it where the RESTful route group is just extending the Collection with Route objects for each URL-method combination. And I've done it where the RESTful groups is a specialized Route object that uses one regex to match all the URLs. My perfect library has to have at least those two, tree structures and easy way to create resource groups.

And as I said I am intrigued by the idea of pre-compiling the regex patterns, especially in production where it does not make sense to repeat this task over and over again for each request. So if I ever got the time to create my perfect routing library it would have that in it too. One last point to tink about is we have not seen anyone take advantage of generators on PHP 5.5 for a routing library. I have not thought of a way in which generators could do something novel for routing libraries yet, but there are much more creative people than me out there.

Comments