The language of choice

Propositional logic was discovered by Stoics around 300 B.C., only to be abandoned in later antiquity and rebuilt in the 19th century by George Boole’s successors. One of them, Charles Peirce, saw its significance for what we now call logic circuits, yet that discovery too was forgotten until the 1930s. In the ’50s John McCarthy invented conditional expressions, casting the logic into the form we’ll study here; then in 1986 Randal Bryant repeated one of McCarthy’s constructions with a crucial tweak that made his report “for many years the most cited paper in all of computer science, because it revolutionized the data structures used to represent Boolean functions” (Knuth).¹ Let’s explore and code up some of this heritage of millennia, and bring it to bear on a suitable challenge: playing tic-tac-toe.

Then we’ll tackle a task that’s a little more practical: verifying a carry-lookahead adder circuit. Supposedly logic gets used all the time for all kinds of serious work, but for such you’ll have to consult the serious authors; what I can say myself, from working out the code to follow, is that the subject offers a fun playground plus the most primitive form of the pun between meaning and mechanism.

You’re encouraged to read with this article’s code cloned and ready to run, to try varying the examples, tracing their execution, or messing with the code and seeing how it breaks. Literate code ought to afford a more active style of reading: we’re not limited to the affordances of paper.²

Decision trees and nets

We’re going to make a programming language– but a tiny one with nothing but constants, variables, and if/else expressions. The tininess will help us to analyze and transform its expressions more aggressively than we’re used to doing for code in, for instance, Python.

Code in such a language– let’s call it the choice language– might look like

B if A else 0

but instead we’ll write this same meaning as A(0, B), for reasons: it’s shorter, it generalizes to multiway decisions like A(X, Y, Z) (selecting among three options for A’s value instead of two)³, and it’ll spare us from writing a parser, because A(0, B) is a Python expression that can be made to build an object representing the choice-language expression.⁴

Informally, p(q, r) will mean “Look at p and use its value to choose between q and r: for a value of 0, return q’s value; but for 1, return r’s value instead.” Thus A(0, B) will be 0 when A is 0, or else whatever B is, when A is 1. (Note that Python’s B if A else 0 reverses the order of the arguments: it has the ‘truthy’ case first.)

Building and evaluating expressions

Let’s code this. To be even lazier about not parsing any input, we’ll make the user write const0 for 0:

>>> A, B = Variable('A'), Variable('B')
>>> const0 = Constant(0)
>>> A(const0, B)
A(0, B)

Expressions have meaning in how they respond to evaluate:

>>> A.evaluate({A: 1})
1
>>> const0.evaluate({A: 1})
0

(A, in a context where A is 1, evaluates to 1; but const0 evaluates to 0.)

>>> A(const0, B).evaluate({A: 1, B: 0})
0
>>> A(const0, B).evaluate({A: 1, B: 1})
1

(So the value of A(const0, B), in a context where A is 1, is B’s value.)

class Node(object):
    "A decision-tree node."
    def evaluate(self, env):
        "My value when variables are set according to env, a dictionary."
        abstract
    def __call__(self, if0, if1):
        """Return an expression whose value is if0's or if1's,
        according as self's is 0 or 1."""
        return Choice(self, if0, if1)

class ConstantNode(Node):
    def __init__(self, value): self.value = value
    def __repr__(self):        return repr(self.value)
    def evaluate(self, env):   return self.value

class VariableNode(Node):
    def __init__(self, name):  self.name = name
    def __repr__(self):        return self.name
    def evaluate(self, env):   return env[self]

class ChoiceNode(Node):
    def __init__(self, index, if0, if1):
        self.index, self.if0, self.if1 = index, if0, if1
    def __repr__(self):
        return '%r(%r, %r)' % (self.index, self.if0, self.if1)
    def evaluate(self, env):
        branch = (self.if0, self.if1)[self.index.evaluate(env)]
        return branch.evaluate(env)

We won’t call on the classes directly; we’ll use these names instead:

Variable = VariableNode

Choice = ChoiceNode

from utils import memoize

Constant = memoize(ConstantNode)
const0, const1 = Constant(0), Constant(1)

and later redefine them to introduce optimizations. The first optimization appears here already for Constant: memoizing it, meaning that if you call Constant(0) twice you’ll get the same object both times. This saves us from having to override equality and hashing on ConstantNodes. The memoize function is overkill for this purpose, but we’re going to use it for more later.⁵

def memoize(f):
    """Return a function like f but caching its results. Its arguments
    must be hashable."""
    def memoized(*args):
        try: return memoized._memos[args]
        except KeyError:
            result = memoized._memos[args] = f(*args)
            return result
    memoized._memos = {}
    return memoized

From table to tree

We can re-express a data table in our new language. For instance,

A B   Answer
----+-------
0 0 |   0
0 1 |   0
1 0 |   0
1 1 |   1

can be encoded as

>>> Answer = A(const0, B)

Let’s automate the translation. We might express the table like “the first row’s answer if the variables match the first row else express the rest of the table”– in this case, 0 if A==0 and B==0 else (the remaining three lines). Build the equality test like so:

def match(variables, values):
    """Return an expression that evaluates to 1 just when every variable
    has the corresponding given value; 0 otherwise."""
    do_they_match = const1
    for variable, value in zip(variables, values):
        if   value == 0: do_they_match = variable(do_they_match, const0)
        elif value == 1: do_they_match = variable(const0, do_they_match)
    return do_they_match

But we’d get a much larger expression than the one we wrote by hand. Just to match the first row, it’s:

>>> match((A, B), (0, 0))
B(A(1, 0), 0)

(In words, “if B is 1, then 0; else if A is 1, then 0; else 1 for a match”.)

The whole table would need more than four times the size of this expression. Adding more variables makes it exponentially worse– here’s a three-variable table with eight entries:

>>> majority = {(0,0,0): 0,  (1,1,1): 1,
...             (0,0,1): 0,  (1,1,0): 1,
...             (0,1,0): 0,  (1,0,1): 1,
...             (1,0,0): 0,  (0,1,1): 1}

How can we do better?

One approach is called peephole optimization. Certain simple expressions could be simpler, such as A(0, 1): it has the same value as just A. Likewise, A(B, B) is equivalent to B. We could walk through all the subexpressions of an expression, looking for these patterns; it’s easier, though, to notice them at the time we build the tree:

def Choice(index, if0, if1):
    if if0 == if1:                            # A(B, B) --> B
        return if0
    elif (if0, if1) == (const0, const1):      # A(0, 1) --> A
        return index
    else:
        return ChoiceNode(index, if0, if1)

After this change, our table would translate into the same simple expression we started with– but mostly by luck. majority still becomes monstrous.

Splitting in two: the Boole expansion

In 1847 George Boole introduced divide-and-conquer: turn the table into “if first variable then subtable where that variable is 1 else subtable where it’s 0”.⁶⁷

def express(variables, table):
    """Return an expression e such that, for each key in table,
    e.evaluate(dict(variables, key)) == table[key]."""
    if not table:
        return const0      # Or 1, doesn't matter.
    elif len(table) == 1:
        return Constant(table.values()[0])
    else:
        first_var, rest_vars = variables[0], variables[1:]
        return first_var(express(rest_vars, subst_for_first_var(table, 0)),
                         express(rest_vars, subst_for_first_var(table, 1)))

def subst_for_first_var(table, first_var_value):
    """Return a subtable of table, eliminating the first variable and
    supposing it'll take the given value."""
    return {keys[1:]: output
            for keys, output in table.items() if keys[0] == first_var_value}

Now, by aiming for a balanced tree and perhaps by giving the peephole optimization more to work with, when we ask for majority we get back respectable code:

>>> C = Variable('C')
>>> express((A,B,C), majority)
A(B(0, C), B(C, 1))

To be grand, you could say we’ve started to learn or synthesize programs from sample inputs and outputs. In fact this method is popular in machine learning using fancier alternatives for return const0 and for picking first_var. But we’ll take a different tack to improvement: eliminating common subexpressions.

The parity function– which is 1 just when the input has an odd number of ones, so it really should’ve been named the oddity function– cannot be expressed as a small tree. It’s easy to make a big one:

>>> parity = {(0,0,0,0): 0,  (0,0,0,1): 1,  (1,0,0,1): 0,  (1,0,0,0): 1,
...           (0,0,1,1): 0,  (0,0,1,0): 1,  (1,0,1,0): 0,  (1,0,1,1): 1,
...           (0,1,0,1): 0,  (0,1,0,0): 1,  (1,1,0,0): 0,  (1,1,0,1): 1,
...           (0,1,1,0): 0,  (0,1,1,1): 1,  (1,1,1,1): 0,  (1,1,1,0): 1}
>>> express((A,B,C,D), parity)
A(B(C(D, D(1, 0)), C(D(1, 0), D)), B(C(D(1, 0), D), C(D, D(1, 0))))

Here D(1, 0) appears 4 times. Also appearing twice are C(D, D(1, 0)) and the same with reversed arguments. If we wrote down the parity function for more variables, the table and the tree would both grow exponentially, and so would the opportunities for sharing.

We already saw how to coalesce duplicates when we memoized ConstantNode. Memoizing ChoiceNode turns our tree into a directed acyclic graph (DAG).

>>> ChoiceNode = memoize(ChoiceNode)
>>> len(ChoiceNode._memos)
0
>>> express((A,B,C,D), parity)
# <same giant-ass expression as before, with 11 pairs of parentheses>
>>> len(ChoiceNode._memos)
6

The same expression now is built of only 6 unique Choice nodes. In general, the parity function over n variables needs a number of nodes linear in n, as a DAG, but exponential as a tree. (Claude Shannon proved a generalization of this result in his master’s thesis, planting seeds of the techniques we’ll look at later.)

Graphs like these are called binary decision diagrams (BDDs). The term can mean any graph of choice nodes that branch only on variables, but usually people restrict it to graphs that could have been built by our express function: ones that only test variables in a consistent order (such as A, B, C, then D); have no nodes like A(0, 1) or A(B, B) as eliminated by Choice, above; and have no redundant nodes as would be coalesced by memoize.

A tic-tac-toe machine

Daniel Hillis and Brian Silverman made a couple of different tic-tac-toe players in Tinkertoys. Let’s try and design one too, down to the level of choice gates, if not actual Tinkertoys. (When thinking of the data structure reduced to hardware, we call a node a gate.)

tictactoe.py is my player as a game-tree search function. We’ll apply it to materialize the whole game tree– it’s pretty small!– into a table mapping game boards to each one’s successor (meaning the board you get after the chosen move); then we’ll crunch the table into a network of choice gates.

We need to understand a few exports from tictactoe.py: a tic-tac-toe board is called a grid, a game starts with empty_grid, successors(grid) finds the possible next moves and performs them (generating the resulting next grids), and max_play(grid) picks the ‘best’ successor for the player to move– so max_play is our game AI. A grid is represented as a pair (p,q) of 9-bit integers, p for the player to move and q for their opponent. Each bit tells whether a mark fills a particular square on the board; bit 0 denotes a mark in the lower right corner, bit 1 is for the square to the left of it, and so on:

  8 | 7 | 6       X | O |          1 0 0         0 1 0
 ---+---+---     ---+---+---
  5 | 4 | 3         | O |          0 0 0         0 1 0
 ---+---+---     ---+---+---
  2 | 1 | 0         |   | X        0 0 1         0 0 0

Bit positions      A board         p bits        q bits
                                (X to move)  (O just moved)

Play out every possible game between human as X and max_play as O: when it’s X’s turn, visit every successor, but when it’s O’s, visit only the one picked by max_play, recording it in all_responses.

import logic as E; E.ChoiceNode = E.memoize(E.ChoiceNode)
import tictactoe as T

def exhaust(human_to_move):
    for AI_to_move in T.successors(human_to_move):
        if T.successors(AI_to_move):
            all_responses[AI_to_move] = response = T.max_play(AI_to_move)
            exhaust(response)
        else:
            all_responses[AI_to_move] = tuple(reversed(AI_to_move))

all_responses = {}
exhaust(T.empty_grid)

(If you’re wondering about tuple(reversed(AI_to_move)), that serves as a ‘game over’ state that still shows the game board– it’d be impolite to wipe the display before the human even sees their last move. The reversal maintains the mapping from p and q to X and O.)

>>> len(all_responses)
427

For a machine made out of binary gates, the most natural representation is again 18 bits, 9 each for X’s and O’s. We can represent the player’s move in the 9 X bits, and compute the next 9 O bits from the current 18 X’s and O’s.

 O_tables = [{grid: (Os2>>square)&1
              for grid,(_,Os2) in all_responses.items()}
             for square in range(9)]

(for grid,(_,Os2) in all_responses.items() generates the grid before each tabulated move, along with Os2: the configuration of O’s after the O player moves in response. Then grid: (Os2>>square)&1 maps the input grid to the output 0 or 1 marking an O in a particular tic-tac-toe square.)

These aren’t quite tables of the sort our express function eats, yet: the keys are pairs of 9-bit ints, not the required tuples of bits. We must convert them:

squares = range(9)
def combine(xs, os): return tuple(xs) + tuple(os)

XO_variables = combine((E.Variable('x%d' % k) for k in squares),
                       (E.Variable('o%d' % k) for k in squares))

def XO_values((Os,Xs)):
    return combine(((Xs>>k)&1 for k in squares),
                   ((Os>>k)&1 for k in squares))

O_nodes = [E.express(XO_variables, {XO_values(grid): value
                                    for grid, value in table.items()})
           for table in O_tables]

Remember that memoizing ChoiceNode, above, got us a BDD instead of a tree. Assuming we cleared the memo table first (with ChoiceNode._memos.clear()) we end up with 1236 choice gates (2985 without memoization):

>>> len(ChoiceNode._memos)
1236

This can be improved by a heuristic: choices that ‘matter the most’ should get tested first. In tic-tac-toe that’s the center square most of all, then the corners, then the edges. (This ordering of variables in the BDD has nothing to do with the function computed by it: that is, the moves the AI picks. An AI playing to lose would likely still be more compact if encoded in a BDD that looked at the center square first.) Interleave the X bits with the O bits to treat both center bits first, then all corner bits, and so on:

squares = (4, 0,2,6,8,  7,1,3,5)
def combine(xs, os): return sum(zip(xs, os), ())

These tweaks bring us down to 956 gates. (The odd order 7,1,3,5 puts the first edge next to the last corner. I can’t say why, so far as I’ve seen, other small tweaks worsen this.)

Our express function works best on exhaustive tables, where every possible key has a value. The O_tables are not exhaustive; tweaking express to start to exploit don’t-cares gets us down to 556 gates:

def express(variables, table):
    if not table:
        return const0      # (Can you be cleverer here?)
    elif len(table) == 1:
        return Constant(table.values()[0])
    else:
        first_var, rest_vars = variables[0], variables[1:]
        # The new logic: if first_var takes on only one value
        # in the table...
        domain = set(row[0] for row in table.keys())
        if len(domain) == 1:
            # ...then just assume it'll take that value in the input.
            value = next(iter(domain))
            return express(rest_vars, subst_for_first_var(table, value))
        # Otherwise as before.
        return first_var(express(rest_vars, subst_for_first_var(table, 0)),
                         express(rest_vars, subst_for_first_var(table, 1)))

What if we could use three-way Choice gates? Encode a grid square as 0/1/2 for blank/X/O, and we’d need only half as many variables, and end up reducing the 556 gates to 379. Would 379 ternary gates cost less than 556 binary ones? It seems plausible, for instance as switches for falling-marble tracks. (What, you think a mechanical tic-tac-toe player implausible?)

In a 6th-grade science fair Steve Wozniak did much better: “about 100 transistors and about 100 diodes.” How? Beats me!

Tables, schmables

We’ve seen some ways for choice trees and choice graphs to compress data tables. But when we start from tables, they can only grow so big before exhausting our space and patience. What if we skip the table building and go straight to the choice structures?

When the (non-)table in question is meant to express some logical claim, like “if you’re outside and it’s dark, it’s night”, we can draw on familiar language: NOT, AND, OR, 0/1 for false/true, and so on.

# The Python operators ~, &, |, ^:
Node.__invert__ = lambda self:        self(const1, const0)
Node.__and__    = lambda self, other: self(const0, other)
Node.__or__     = lambda self, other: self(other, const1)
Node.__xor__    = lambda self, other: self(other, ~other)

def Equiv(p, q):   return p(~q, q)
def Implies(p, q): return p(const1, q)

Thus:

>>> inside, dark, night = map(Variable, 'idn')
>>> claim = Implies(~inside & dark, night)
>>> claim.evaluate({dark: 1, inside: 1, night: 0})
1

With these we can express functions in many variables, so many that a tabulation would overflow the visible universe.

Exploring all possible worlds

We just evaluated a claim which turned out to be true in a particular world. We might ask: is it true no matter what? (That is, however the variables are set, provided they’re each 0 or 1.) Or likewise: find me a world where it’s true.

To find it given a table, we’d loop over table.items(). For a choice tree claim, we might loop in the same way over all possible values for its variables:

((values, claim.evaluate(dict(zip(variables, values))))
 for values in itertools.product(*((0,1),) * len(variables)))

(This method is called truth-table enumeration.) But we’re assuming now that this virtual table would get impractically big; to do better we must exploit the structure of the choice tree. Let’s try a goal-directed walk, skipping those possibilities that we can rule out early:

def is_valid(claim):
    # A claim true in all worlds is false in none of them.
    return satisfy(claim, 0) is None

def satisfy(tree, goal):
    """Find a (partial) env that makes tree.evaluate(env) == goal,
    else None."""
    return next(tree.find(goal, {}),
                None)

class Node(object):
    # ...
    def find(self, goal, env):
        """Generate (partial) environments, extending `env`, in which
        I evaluate to `goal`."""
        abstract

class ConstantNode(Node):
    # ...
    def find(self, goal, env):
        if self.value == goal:
            yield env

class VariableNode(Node):
    # ...
    def find(self, goal, env):
        if self not in env:
            yield extend(env, self, goal)
        elif env[self] == goal:
            yield env

class ChoiceNode(Node):
    # ...
    def find(self, goal, env):
        # Succeed where the index evaluates to 0, and if0 to the goal...
        for env1 in self.index.find(0, env):
            for env2 in self.if0.find(goal, env1):
                yield env2
        # ... or likewise for index 1 and if1.
        for env1 in self.index.find(1, env):
            for env2 in self.if1.find(goal, env1):
                yield env2

def extend(env, var, value):
    "A copy of env but mapping var to value."
    result = dict(env)
    result[var] = value
    return result

This search may take time exponential in the depth of the choice expression– which can be worse than enumerating the truth table (exponential in the number of variables), but it can also be much better: the expression may be shallower, or branches of the search may be cut off early.

>>> x, y = map(Variable, 'xy')
>>> satisfy(~x&~x, 1)
{x: 0}
>>> is_valid(Implies(Implies(Implies(x, y), x), x))
True
>>> is_valid(Implies(Implies(Implies(x, y), x), y))
False

A pen-and-paper version of this algorithm, with special cases for AND, OR, and the rest, is taught as the method of analytic tableaux. Since that form of the method leaves you a choice of which subexpression to target next, a smart choice might streamline the search. (For instance, a & b can be treated like either a(0, b) or b(0, a); if a looks ‘less branchy’ than b– it’s a smaller expression, say, or uses variables we’ve already set– then the search is likely to go faster with a first.)⁸

An algebra of choice

We’ve gained a new way to check whether two claims are always equal: is_valid(Equiv(claim1, claim2)). Another way uses algebra, following John McCarthy’s rules:

Selection:

const0(a, b) == a
const1(a, b) == b

Case analysis and irrelevance:

p(const0, const1) == p
          p(a, a) == a

The same p twice collapses into one (and it’s associative):

p(a, p(b, c)) == p(a, c)
p(p(a, b), c) == p(a, c)

Nesting:

q(p, r)(a, b) == q(p(a, b), r(a, b))

Swapping:

q(p(a, b), p(c, d)) == p(q(a, c), q(b, d))

McCarthy’s notation had (p -> r; q) for our p(q, r). An infix version like Haskell’s, q `p` r, is rather pretty, more like familiar algebra: so the associative law becomes

a `p` (b `p` c) = a `p` c = (a `p` b) `p` c

These laws together let you standardize choice trees: take any two trees, rewrite each by the method below, and then the resulting trees will be identical to each other just when the original trees are equal in meaning.

How? Massage the tree into a mirror of the Boole expansion of the corresponding table, like this: First eliminate any nested tests– subexpressions like q(p, r) in q(p, r)(a, b)– using the nesting rule. Then enforce a standard order for the variables, like the order of variables passed to express() in examples above. Aim to move the first-ordered variable to the root of the tree (lifting it using the swapping law). Repeat for each subtree, using the other laws to eliminate repeated variables, and also any constants from the test position of any subtrees.

For example, suppose we’ve decided on alphabetical order for the variables. b(a, c) is partly in standard form, having no nested tests, but the variables are out of order. We can fix that:

b(a, c) = b(a(0, 1), a(c, c))   # By case analysis and irrelevance
        = a(b(0, c), b(1, c))   # By swapping

This produces the same expression as if we’d run express((a,b,c), table) on a tabulation of b(a, c).

For more practice in the algebra, try proving the distributive law:

a `p` (b `q` c) = (a `p` b) `q` (a `p` c)

Systematic succinctness

A BDD (the usual, restricted kind) is McCarthy’s standardized tree, made into a DAG by memoizing. We’ve built some already from tables, but we can do more: given expressions in this form, we can compose them into more complex expressions in the same form, efficiently, by applying McCarthy’s rewrite rules only so far as they’re needed.

Let’s start our code over from scratch. The changes:

Since a choice node will now always be indexed by a variable (since we eliminate nested tests), make ChoiceNode hold the variable itself. We can drop VariableNode from the node types, since one of them can always be represented by one of the new choice nodes.
We’ll need an ordering of the variables, under the user’s control, saying which variables go higher in the DAG than others. The simplest way is to represent variables by numbers: smaller numbers go higher in the DAG. Call this number the ‘rank’ of the variable, and of the ChoiceNode. (If a user wants to name their variables, they could always make up an array of names indexed by rank.)

We set the scene with some repeated code:

from utils import memoize

class Node(object):
    "A binary-decision-diagram node."
    __invert__ = lambda self:        self(const1, const0)
    __and__    = lambda self, other: self(const0, other)
    __or__     = lambda self, other: self(other, const1)
    __xor__    = lambda self, other: self(other, ~other)

def Equiv(p, q):   return p(~q, q)
def Implies(p, q): return p(const1, q)

The changes to the classes reflect the structural changes planned above:

class ConstantNode(Node):
    rank = float('Inf')   # (Greater than any variable's rank.)
    def __init__(self, value):     self.value = value
    def evaluate(self, env):       return self.value
    def __call__(self, *branches): return branches[self.value]

Constant = memoize(ConstantNode)
const0, const1 = Constant(0), Constant(1)

def Variable(rank):
    return build_node(rank, const0, const1)

class ChoiceNode(Node):
    value = None    # (Explained below.)
    def __init__(self, rank, if0, if1):
        assert rank < if0.rank and rank < if1.rank
        self.rank = rank
        self.if0 = if0
        self.if1 = if1
    def evaluate(self, env):
        branch = (self.if0, self.if1)[env[self.rank]]
        return branch.evaluate(env)
    def __call__(self, if0, if1):
        if if0 is if1: return if0
        if (if0, if1) == (const0, const1):
            return self
        # The above cases usually save work, but aren't needed for correctness.
        return build_choice(self, if0, if1)

build_node = memoize(ChoiceNode)

And there’s wholly new logic to keep the variables ordered, lower-rank first:

@memoize
def build_choice(node, if0, if1):
    """Like Choice(node, if0, if1) in logic.py, but McCarthy-standardized,
    presupposing the arguments are all McCarthy-standardized."""
    top = min(node.rank, if0.rank, if1.rank)
    cases = [subst(node, top, value)(subst(if0, top, value),
                                     subst(if1, top, value))
             for value in (0, 1)]
    return make_node(top, *cases)

def make_node(rank, if0, if1):
    if if0 is if1: return if0
    return build_node(rank, if0, if1)

def subst(node, rank, value):
    """Specialize node to the case where variable #rank takes the given value.
    Again, node must be standardized."""
    if   rank <  node.rank: return node
    elif rank == node.rank: return (node.if0, node.if1)[value]
    else:                   return make_node(node.rank,
                                             subst(node.if0, rank, value),
                                             subst(node.if1, rank, value))

In our earlier code build_choice(node, if0, if1) would be just Choice(node, if0, if1), but now we must make sure to branch first on the lowest-ranked variable. (Imagine it as the lightest, floating to the top of the downward-growing DAG.) We find that variable, call it top, and branch on it with make_node(top.rank, *cases). The cases are equivalent to Choice(node, if0, if1) with the top variable set to each of its possible values.

Every ConstantNode has a dummy rank to keep special cases out of this logic. (Similarly each ChoiceNode has a dummy value.)

As before, we memoize the node constructors (ConstantNode and ChoiceNode) to make DAGs instead of trees. If those were all we memoized, the code would be logically correct, but algorithmically awful: it would run as slowly as McCarthy’s tree standardizer, with its exponentially bigger trees. The memoize on build_choice cuts off the repeated work before it starts.

Moving on to queries on these structures: satisfy and is_valid could be adapted to DAGs with little change, but to reap the efficiency gains we’d have to memoize satisfy too (and the generators it returns). We’ll go for efficiency more directly by making this version of satisfy produce just one result.

def is_valid(claim):
    return satisfy(claim, 0) is None

def satisfy(node, goal):
    """Return the lexicographically first env such that
    node.evaluate(env) == goal, if there's any; else None."""
    env = {}
    while isinstance(node, ChoiceNode):
        if   node.if0.value in (None, goal): node, env[node.rank] = node.if0, 0
        elif node.if1.value in (None, goal): node, env[node.rank] = node.if1, 1
        else:                                return None
    return env if node.value == goal else None

As another fun problem, see if you can efficiently count how many ways a claim can be satisfied; for a corollary, generate solutions uniformly at random.

A more ambitious example

So let’s try our new tool on a problem too big for a table: adding large numbers in binary.

def ripple_carry_add(A, B, carry):
    "The simplest adder in logic gates."
    assert len(A) == len(B)
    S = []
    for a, b in zip(A, B):
        sum, carry = add3(a, b, carry)
        S.append(sum)
    return tuple(S), carry

def add3(a, b, c):
    "Compute the least- and most-significant bits of a+b+c."
    s1, c1 = add2(a, b)
    s2, c2 = add2(s1, c)
    return s2, c1 | c2

def add2(a, b):
    "Compute the least- and most-significant bits of a+b."
    return a ^ b, a & b

A and B should be two lists of logic expressions (and carry, a single expression) with the least significant bits first. ripple_carry_add then returns logic expressions for the sum bits and the carry out. They’re built following the most straightforward algorithm for addition, the same one we’re taught in school, but for binary numbers. A circuit following this algorithm may take time linear in the length of the inputs: adding 111…1 to 111…1 makes the carries ripple all the way from the right end to the left.

You can add faster in the worst case by speculating: compute the sum for the left half of the inputs under two assumptions in parallel: assuming the carry out from the right half will be either 0 or 1. By the time that carry is available, so will be the speculative sums, and we can select one or the other immediately. Repeating the speculation recursively results in a bigger but shallower circuit.

def carry_lookahead_add(A, B, carry):
    assert len(A) == len(B)

    @memoize # (to keep the circuit from blowing up in size)
    def lookahead(lo, nbits, assume_cin):
        if nbits == 0:
            return Constant(assume_cin)
        elif nbits == 1:
            s, c = add2_constant(A[lo], B[lo], assume_cin)
            return c
        else:
            m = preceding_power_of_2(nbits)
            return lookahead(lo, m, assume_cin)(lookahead(lo+m, nbits-m, 0),
                                                lookahead(lo+m, nbits-m, 1))

    # The carry at each place.
    C = [carry(lookahead(0, hi, 0),
               lookahead(0, hi, 1))
         for hi in range(len(A)+1)]

    return tuple(a^b^c for a,b,c in zip(A, B, C)), C[-1]

def preceding_power_of_2(n):
    import math
    return 2**int(math.log(n-1, 2))

def add2_constant(a, b, bit):
    if bit == 0: return add2(a, b)
    else:        return Equiv(a, b), a|b

Do these different adders compute the same function? Let’s see if they do.

def test_equivalent(n, adder1, adder2, interleaved=True):
    c_in, A, B = make_alu_inputs(n, interleaved)
    S1, c_out1 = adder1(A, B, c_in)
    S2, c_out2 = adder2(A, B, c_in)
    assert len(S1) == len(S2) == len(A)
    assert Equiv(c_out1, c_out2) == const1
    for s1, s2 in zip(S1, S2):
        assert Equiv(s1, s2) == const1
    return 'passed'

def make_alu_inputs(n, interleaved):
    c_in = Variable(-1)
    inputs = map(Variable, range(2*n))
    if interleaved:
        A = inputs[0::2]
        B = inputs[1::2]
    else:
        A = inputs[:n]
        B = inputs[n:]
    return c_in, A, B

>>> test_equivalent(64, ripple_carry_add, carry_lookahead_add)
'passed'

We just checked whether two different 64-bit adder circuits produce the same outputs for all 2¹²⁹ of the possible values of A, B, and carry. It took only a moment in interpreted Python. Magic!

The magic has limits: a run with interleaved=False would fail on any computer now existing. The BDDs get too big.

Not the end

We saw Bryant’s revolutionary algorithm coded as a pair of memoize annotations on (a natural Python rendition of) a method published a quarter century earlier by another Turing Award winner, McCarthy. Memoization dates to the 1960’s as well. Stories like this one, and the long gap between Peirce and Shannon in linking logic with digital circuits, I take as encouragement to not only study the textbooks, but to try to work out the basics of a subject for yourself: simple ideas are still out there undiscovered or forgotten, waiting to be put together.

References

George Boole (1854), An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities.

Randal Bryant (1986), “Graph-Based Algorithms for Boolean Function Manipulation”.

Marcello D’Agostino (1992), “Are tableaux an improvement on truth-tables? Cut-free proofs and bivalence”.

A.K. Dewdney (1989), “Computer recreations: A Tinkertoy computer that plays tic-tac-toe”.

Kenneth Laine Ketner (1984), “The early history of computer design: Charles Sanders Peirce and Marquand’s logical machines”, with the assistance of Arthur Franklin Stewart, Princeton University Library Chronicle, v. 45, n. 3, pp. 186–211

Donald Knuth (2011), The Art of Computer Programming, Volume 4A (Combinatorial Algorithms, Part 1). If you’d like to learn more about the topics in this article, this seems the deepest and most fun source to read next. I’ve only dipped into it so far. (He also surveys how BDDs actually were developed, as opposed to my reimagining of how they might have been.) His website (at this writing) offers a fascicle on SAT solving as well.

Benson Mates (1972), Elementary Logic. The last chapter summarizes the history I condensed into a sentence. Mates’s 1953 Stoic Logic covers and defends it.

John McCarthy (1963), “A Basis for a Mathematical Theory of Computation”.

Donald Michie (1967), “Memo functions: a language feature with ‘rote-learning’ properties.” Edinburgh: Department of Machine Intelligence & Perception. 1967. Research Memorandum MIP-R-29. (I could not find this online to read for myself.)

Charles Peirce (1886), Letter, Peirce to A. Marquand, 1886 December 30, published 1993 in Kloesel, C. et al., eds., Writings of Charles S. Peirce: A Chronological Edition, Vol. 5. Indiana Univ. Press, pp. 421–3.

Claude Shannon (1938), A Symbolic Analysis of Relay and Switching Circuits.

Thanks

Thank you to: Kragen Sitaker for comments, encouragement, and diagram-generating code; Brandon Moore for discussions about tic-tac-toe circuit generation; the Recurse Center for distractions from writing; and the editors for saintly forbearance.

See the References section above for this and other sources. I’ve chosen not to clutter the main text with these links. ↩
Only by, as it turned out, my failing to finish the intended JavaScript toys in time. ↩
The code repo includes this generalization; it’s left out of the article to keep the code more concrete. ↩
B if A else 0 would be a Python expression as well, but Python can only evaluate it to A or 0; we can’t override it to mean building a tree representing the choice-language expression we want. ↩
In Python 3, the same functionality is offered by functools.lru_cache. ↩
His bigger idea was to write if x then y else z as x*y + (1-x)*z, solve the resulting equations by ordinary algebra, and interpret the solutions as inferences. His system was not yet the one we call Boolean algebra today. ↩
The case if not table: may seem unnecessary, except we were never promised an exhaustive table. The tic-tac-toe tables we’ll see later will only include boards that can come up in a game, omitting the rows for some other combinations of variables. ↩
The paper by D’Agostino, above, examines why the tableau method seems unpopular for practical work. ↩