Pseudosynchronous JavaScript

One of the primary goals when writing a user-friendly API is to provide a straightforward interface that makes it easy for developers to write clear code. Unfortunately, this becomes much more difficult than it should be when writing APIs that rely on asynchronous operations, as is commonly the case with JavaScript libraries.

One way to improve asynchronous code is to use promises. While promises provide a higher level of abstraction than callbacks, they still aren’t as clear as synchronous code, and they can still require nesting complex operations.

In this essay I’ll explain an alternative approach: pseudosynchronous interfaces. This approach builds on promises and standard callbacks to let programmers use the results of asynchronous operations as if they were synchronous. With pseudosynchronous interfaces, the only time an asynchronous callback has to be used is at the end of a call chain, when the programmer wants to directly manipulate a value returned asynchronously.

We’ll be using a combination of function wrappers and proxy interfaces to provide an interface that has the illusion of a synchronous API, while still allowing all of our code to execute as soon as the required objects and data have been loaded.

Pseudosynchronous input

Let’s say we have a function, concat, which takes a list of strings, and outputs those strings concatenated with each other.

function concat() {
    var output = "";
    Array.prototype.forEach(arguments, function(arg) {
        output += arg;
    });
    return output;
} 

cat('abc', 'def', 'ghi'); // 'abcdefghi'

This works fine as long as we’re working with strings, but what if we’re doing asynchronous file reads? Our code can easily become much more complicated.

function concatAsync() {
    var pending = 0;
    var args = [];
    // Our callback is the last argument to our function.
    callback = Array.prototype.pop.call(arguments);
    Array.prototype.forEach.call(arguments, function(arg, i) {
        pending++;
        asyncFileRead(arg, function(data) {
            // Save the returned data to our arguments list.
            args[i] = data;
            // Reduce the number of things we're waiting on by one.
            pending--;
            // If we're not waiting on anything else, execute the callback.
            if (pending === 0) {
                // Take all of our arguments and join them together.
                callback(args.join(''));
            }
        });
    });
}

concatAsync('one', 'two', 'three', function(d) { console.log(d); });

The overhead of writing wrappers for this kind of operation isn’t that annoying when we’re only doing it once, but if it becomes a consistent pattern in our code, we’ll want to abstract the code into something more general purpose.

We can do this by creating a wrapper function, which takes a function as its argument and returns another function with a different behavior. Our wrapper function takes a normal function as an argument, such as our synchronous cat function above, and resolves all its asynchronous arguments before executing the real function. This means that the functions that pass through the wrapper can be written in a linear manner.

function pseudosynchronousWrapper(inner) {
    
    // Take a bunch of arguments and wait for them to resolve
    // asynchronously.  Callback with the complete list of 
    // resolved arguments.
    function getDataForArguments(inputs, callback) {
        
        var waiting = 0;
        var args  = [];
        var queue = [];
        calledBack = false;
        
        function checkStatus() {
            if (waiting > 0) {
                // If we're still waiting on some arguments, don't do anything.
                return;
            }
            if (!calledBack) {
                // If we haven't already called back, then do so now.
                callback(args);
            }
            // Remember that we've run our callbacks so we don't do it a second time.
            calledBack = true;
        }
        
        // Grab all our arguments and wait for them to callback with data.
        inputs.forEach(function(arg, i) {
            if (arg && typeof arg.then === 'function') {
                // The argument is an asynchronous Promise that hasn't resolved yet.
                // We'll add it to our queue so we can attach its callback and wait.
                waiting++;
                queue[i] = arg;
            } else {
                // The argument is a normal synchronous argument.
                // We don't have to wait, so we'll add it to our arguments list.
                args[i]  = arg;
            }
        });
        
        queue.forEach(function(item, i) {
            // Attach our callback to our promise that will place the returned
            // data into our resolved arguments list.
            item.then(function(data) {
                waiting--;
                args[i] = data;
                checkStatus();
            });
        });
        
        // Check to see if we're done.
        checkStatus();
    }

    // Instead of returning our normal function, we return an anonymous wrapper
    // function that desynchronizes its asynchronous arguments and executes a
    // callback when everything's ready.
    return function() {
        var callbacks = [];
        var args = Array.prototype.slice.call(arguments);

        getDataForArguments(args, function() {
            // Run our inner function when the arguments are ready.
            var result = inner.apply(null, arguments);
            callbacks.forEach(function(callback) {
                callback(result);
            });
        });
        // Return a then-able Promise-like object that attaches callbacks to the
        // callbacks list.
        return {
            then: function(fun) {
                callbacks.push(fun);
            }
        };
    };
}

We can now rewrite our asynchronous cat example to something like this:

catAsync = pseudosynchronousWrapper(cat);
catAsync(read('file1.txt'), read('file2.txt'), read('file3.txt'), 'efgh').then(console.log);

Pseudosynchronous API chaining

Now that we have our input squared away, what about our output?

Let’s say we have a database client with an asynchronous API. We want to query the database for an existing user, and create a user if none already exists. This involves initializing several interfaces and using those interfaces to do additional work, such as loading a users collection or querying on a set of parameters. If the user can’t be created, we want to be presented with an error. Otherwise, we want to be able to access the data object of the new user.

For simplicity’s sake, we’ll be using the traditional Node.js (e, data) structure for our function. Here’s an example usage of the function we’re trying to create:

createUser('Trogdor', function(e, user) {
    if (e) {
        console.log("Sorry, there was an error: " + e);
    } else {
        console.log("User created!  New user ID is " + user.get('id'));
    }
});

Synchronous

In an ideal world, we’d make our code as close to the business logic as possible. We could write things linearly and wait until each operation completes before continuing on to the next operation.

function createUser(username, callback) {
    var connection = DatabaseClient.connect();

    var users = connection.collection('users');
    var existing = users.find({username: username});

    if (existing) {
        callback("User already exists: " + username);
    } else {
        var newUser = users.create({username: username});
        callback(null, newUser);
    }
    connection.close();
}

Traditional asynchronous code

The world is not ideal and, as we’ve already established, our database interfaces are all asynchronous. We’ll have to create a lot of nested callbacks in our code and ensure that we handle all of our errors properly.

function createUser(username, callback) {
    DatabaseClient.connect(function (e, connection) {
        function closeAndCallback(e,data) {
            connection.close();
            callback(e,data);
        }

        if (e) {
            return callback(e);
        }

        connection.getCollection('users', function(e, users) {
            if (e) {
                return closeAndCallback(e);
            }

            users.find({username: username}, function(e, existing) {
                if (e) {
                    return closeAndCallback(e);
                }

                if (existing) {
                    return closeAndCallback("User " + username + " already exists.");
                } else {
                    users.create({username: username}, function(e, data) {
                        closeAndCallback(e, data);
                    });
                }
            });
        });
    });
}

This structure is known in the JavaScript community as the Pyramid of Doom.

Promises

Promises are one solution to the problem. By using them, we can write much cleaner code by taking advantage of the fact that errors will bubble up to a top level, which means we don’t have to check for errors after every operation.

The code becomes a lot more streamlined than the generic example, especially when it comes to error bubbling and ensuring that the connection is always closed, but it’s still not exactly synchronous. We still have to create a callback for each level of the execution chain.

function createUser(username, callback) {
    return DatabaseClient.connect().then(function(connection) {
        return connection.collection('users').then(function(users) {
            return users.query({username: username}).then(function(existing) {
                if (existing) {
                    callback("User already exists: " + username);
                } else {
                    return users.create({username: username}).then(function (user) {
                        callback(null, user);
                    });
                }
            });
        }).fin(connection.close);
    }).catch(callback);
}

We could, of course, abstract our functions and place them at the same top level so our code feels less Pyramid-of-Doomey, but it’s still not the ideal solution, and forces us to create a lot of very tiny functions that are specific to one particular operation and never used again.

Proxy interfaces for asynchronous calls

Wouldn’t it be cool if we could call the methods of returned APIs even before they’re initialized? This works as long as we don’t need to do operations that are based on the actual results of these calls.

For example, if we had such an API, we could rewrite our user query to the following:

var db = Database.connect();
var users = db.collection('users');
var user  = users.find({username: 'Trogdor'});

Under the hood, Database.connect (still asynchronous) returns a proxy interface which queues calls made to its members. Once the actual Database interface is ready, the queued operations are executed.

If we were to write out exactly what’s going on, it might look something like this:

function DatabaseClient(callback) {

    var self  = this;
    var queue = [];
    var ready = false;

    console.log("Connecting to database.");

    function run(method, args) {
        console.log("Executing operation: DatabaseClient." + method);
        self[method].apply(self, args);
    }

    setTimeout(function() {
        console.log("Connected to database.");
        if (callback) callback(self);
        // Execute every method in our queue.
        queue.forEach(function(call) {
            var method = call[0];
            var args   = call[1];
            run(method, args);
        });
        // Put our proxy into the ready state so we know to immediately
        // execute things rather than add them to our queue.
        ready = true;
        queue = [];
    }, 100);

    return {
        collection: function() {
            if (ready) {
                // If we are ready, call immediately.
                self.collection.apply(self, arguments);
            } else {
                // If we're not ready yet, queue all calls.
                console.log("Queuing operation: DatabaseClient.collection");
                queue.push(['collection', arguments]);
            }
        },

        close: function() {
            if (ready) {
                // If we are ready, call immediately.
                self.close.apply(self, arguments);
            } else {
                // If we're not ready yet, queue all calls.
                console.log("Queuing operation: DatabaseClient.close");
                queue.push(['close', arguments]);
            }
        }
    };
}

With this code in place, we can write our code for the DatabaseClient prototype as we normally would.

DatabaseClient.prototype = {
    collection: function(collectionName, callback) {
        console.log("Loading collection: " + collectionName);
    },
    close: function() {
        console.log("Closing database connection.");
    }
};

Now we can use our instance of DatabaseClient before the data is actually ready, freeing us from having to use callbacks. That means we can write our code like this:

var db = new DatabaseClient();
db.collection('users');
db.close();

The above code would output:

Connecting to database.
Queuing operation: DatabaseClient.collection
Queuing operation: DatabaseClient.close
Connected to database.
Executing operation: DatabaseClient.collection
Loading collection: users
Executing operation: DatabaseClient.close
Closing database connection.

Neat, right? The only problem with this so far (as you may have noticed) is that you can’t actually chain calls to proxy interfaces.

For example, if we tried to continue on in this fashion and write our code like this:

var db = new DatabaseClient();
var users = db.collection('users');
var user  = users.find({username: 'Trogdor'}); 
db.close();

We’d end up with an error:

TypeError: Cannot call method 'find' of undefined

D’oh!

Recursive queues

We can fix this by using subqueues, and by providing dummy interfaces.

First, let’s clean up our code a bit by abstracting our proxy queues, so we can reuse the logic in other classes.

function ProxyQueue(real) {

    var queued = [];
    var ready  = false;

    return {
        queue: function(method, args) {
            console.log("Queuing operation: " + method);
            queued.push([method, args]);
            if (ready) {
                this.trigger();
            }
        },
        trigger: function(bind) {
            real = real || bind;
            var next = queued.shift();
            while(next) {
                var method = next[0];
                var args   = next[1];
                console.log("Executing queued operation: " + method);
                real[method].apply(real, args);
                next = queued.shift();
            }
            ready = true;
        }
    };
}

Not only does this allow for code reuse, but it makes our constructor code far more straightforward.

function DatabaseClient() {

    console.log("Connecting to database.");

    var self   = this;
    self.connected = false;
    self.proxy = new ProxyQueue(this);

    setTimeout(function() {
        console.log("Connected to database.");
        self.connected = true;
        self.proxy.trigger();
    }, 100);

    return this.getInterface();
}

By pulling our interface out of the constructor method, we make it possible to return pseudosynchronous Database objects from other queued methods.

DatabaseClient.prototype = {
    // [...]
    getInterface: function() {
        var proxy = this.proxy || new ProxyQueue();
        return {
            collection: function(name) {
                var subproxy = Collection.prototype.getInterface();
                proxy.queue('collection', [name, subproxy]);
                return subproxy;
            },
            close: function() {
                proxy.queue('close', arguments);
            },
            trigger: proxy.trigger
        };
    }
};

We can see the benefits of this level of abstraction by implementing a pseudosynchronous Collection API.

function Collection(collectionName, database, subproxy) {

    console.log("Loading " + collectionName + " collection.");

    this.name = collectionName;

    var self   = this;
    self.database = database;
    self.proxy    = subproxy || new ProxyQueue(this);

    setTimeout(function() {
        console.log("Collection loaded.");
        self.proxy.trigger(self);
    }, 100);

    return this.getInterface();
}

Collection.prototype = {
    find: function(params, callback) {
        if (!this.database.connected) {
            return console.log("ERROR: Database is closed.");
        }
        console.log("Searching collection: " + this.name);
        setTimeout(function() {
            callback({username: 'Trogdor', id: 1});
        }, 100);
    },
    getInterface: function() {
        var proxy = this.proxy || new ProxyQueue();
        return {
            find: function() {
                proxy.queue('find', arguments);
            },
            trigger: proxy.trigger
        };
    }
};

All we have to do now is make sure that DatabaseClient.collect returns a Collection interface immediately, and then triggers the queue once the Collection is ready.

We can do this by updating the DatabaseClient.prototype to return a subproxy with the Collection interface when DatabaseClient.collection is called, and then attaching that subproxy to the Collection object that eventually ends up being created.

DatabaseClient.prototype = {
    collection: function(collectionName, subproxy) {
        new Collection(collectionName, this, subproxy);
    },
    // [...]
};

Now, our Pyramid of Doom has been compressed like so:

var db = new DatabaseClient();
var users = db.collection('users');
users.find({username: 'Trogdor'}, function(data) {
    console.log("User ID is " + data.id);
});
db.close();

Except that we have one problem with our output:

Connecting to database.
Queuing operation: collection
Queuing operation: find
Queuing operation: close
Connected to database.
Executing queued operation: collection
Loading users collection.
Executing queued operation: close
Closing database connection.
Collection loaded.
Executing queued operation: find
ERROR: Database is closed.

To block or not to block?

Since our queues are running as soon as their prerequisites are met, it means we have a race condition between users.find and db.close.

This leaves us with a decision to make. We can either stop the execution of some queued calls until previous queued calls have finished, or we can execute everything as quickly as possible. One could argue that the former approach would be very close to “blocking”, and not as efficient as the latter, which simply ensures that all prerequisites are met before continuing a given operation.

This would be a great thing to add as an API configuration option if you were going to build a framework around these concepts. For now, I’m going to go with executing everything as soon as its prerequisites are met.

If we want to ensure that the database is closed only after we’re done loading our collections, we can place that code within our callback.

var db = new DatabaseClient();
var users = db.collection('users');
users.find({username: 'Trogdor'}, function(data) {
    console.log("User ID is " + data.id);
    db.close();
});

Which should log:

Connecting to database.
Queuing operation: collection
Queuing operation: find
Connected to database.
Executing queued operation: collection
Loading users collection.
Collection loaded.
Executing queued operation: find
Searching collection: users
User ID is 1
Queuing operation: close
Executing queued operation: close
Closing database connection.

Taking this further

The great thing about this technique is that it can compose really well with existing promise frameworks. In fact, by using an existing promise framework, a lot of the code examples in this article can be simplified.