The language of choice
Propositional logic was discovered by Stoics around 300 B.C., only to be abandoned in later antiquity and rebuilt in the 19th century by George Boole’s successors. One of them, Charles Peirce, saw its significance for what we now call logic circuits, yet that discovery too was forgotten until the 1930s. In the ’50s John McCarthy invented conditional expressions, casting the logic into the form we’ll study here; then in 1986 Randal Bryant repeated one of McCarthy’s constructions with a crucial tweak that made his report “for many years the most cited paper in all of computer science, because it revolutionized the data structures used to represent Boolean functions” (Knuth).1 Let’s explore and code up some of this heritage of millennia, and bring it to bear on a suitable challenge: playing tic-tac-toe.
Then we’ll tackle a task that’s a little more practical: verifying a carry-lookahead adder circuit. Supposedly logic gets used all the time for all kinds of serious work, but for such you’ll have to consult the serious authors; what I can say myself, from working out the code to follow, is that the subject offers a fun playground plus the most primitive form of the pun between meaning and mechanism.
You’re encouraged to read with this article’s code cloned and ready to run, to try varying the examples, tracing their execution, or messing with the code and seeing how it breaks. Literate code ought to afford a more active style of reading: we’re not limited to the affordances of paper.2
Decision trees and nets
We’re going to make a programming language– but a tiny one with nothing but constants, variables, and if/else expressions. The tininess will help us to analyze and transform its expressions more aggressively than we’re used to doing for code in, for instance, Python.
Code in such a language– let’s call it the choice language– might look like
B if A else 0
but instead we’ll write this same meaning as A(0, B)
, for reasons:
it’s shorter, it generalizes to multiway decisions like A(X, Y, Z)
(selecting among three options for A
’s value instead of two)3,
and it’ll spare us from writing a parser, because A(0, B)
is a Python
expression that can be made to build an object representing the
choice-language expression.4
Informally, p(q, r) will mean “Look at p and use its
value to choose between q and r: for a value of 0, return
q’s value; but for 1, return r’s value instead.” Thus A(0,
B)
will be 0 when A
is 0, or else whatever B
is, when A
is 1.
(Note that Python’s B if A else 0
reverses the order of the arguments:
it has the ‘truthy’ case first.)
Building and evaluating expressions
Let’s code this. To be even lazier about not parsing any input, we’ll
make the user write const0
for 0
:
>>> A, B = Variable('A'), Variable('B')
>>> const0 = Constant(0)
>>> A(const0, B)
A(0, B)
Expressions have meaning in how they respond to evaluate
:
>>> A.evaluate({A: 1})
1
>>> const0.evaluate({A: 1})
0
(A
, in a context where A
is 1, evaluates to 1; but const0
evaluates to 0.)
>>> A(const0, B).evaluate({A: 1, B: 0})
0
>>> A(const0, B).evaluate({A: 1, B: 1})
1
(So the value of A(const0, B)
, in a context where A
is 1, is B
’s
value.)
class Node(object):
"A decision-tree node."
def evaluate(self, env):
"My value when variables are set according to env, a dictionary."
abstract
def __call__(self, if0, if1):
"""Return an expression whose value is if0's or if1's,
according as self's is 0 or 1."""
return Choice(self, if0, if1)
class ConstantNode(Node):
def __init__(self, value): self.value = value
def __repr__(self): return repr(self.value)
def evaluate(self, env): return self.value
class VariableNode(Node):
def __init__(self, name): self.name = name
def __repr__(self): return self.name
def evaluate(self, env): return env[self]
class ChoiceNode(Node):
def __init__(self, index, if0, if1):
self.index, self.if0, self.if1 = index, if0, if1
def __repr__(self):
return '%r(%r, %r)' % (self.index, self.if0, self.if1)
def evaluate(self, env):
branch = (self.if0, self.if1)[self.index.evaluate(env)]
return branch.evaluate(env)
We won’t call on the classes directly; we’ll use these names instead:
Variable = VariableNode
Choice = ChoiceNode
from utils import memoize
Constant = memoize(ConstantNode)
const0, const1 = Constant(0), Constant(1)
and later redefine them to introduce optimizations. The first
optimization appears here already for Constant
: memoizing it,
meaning that if you call Constant(0)
twice you’ll get the same
object both times. This saves us from having to override equality and
hashing on ConstantNode
s. The memoize
function is overkill for
this purpose, but we’re going to use it for more later.5
def memoize(f):
"""Return a function like f but caching its results. Its arguments
must be hashable."""
def memoized(*args):
try: return memoized._memos[args]
except KeyError:
result = memoized._memos[args] = f(*args)
return result
memoized._memos = {}
return memoized
From table to tree
We can re-express a data table in our new language. For instance,
A B Answer
----+-------
0 0 | 0
0 1 | 0
1 0 | 0
1 1 | 1
can be encoded as
>>> Answer = A(const0, B)
Let’s automate the translation. We might express the table like “the
first row’s answer if
the variables match the first row else
express the rest of the table”– in this case, 0 if A==0 and B==0
else
(the remaining three lines). Build the equality test like so:
def match(variables, values):
"""Return an expression that evaluates to 1 just when every variable
has the corresponding given value; 0 otherwise."""
do_they_match = const1
for variable, value in zip(variables, values):
if value == 0: do_they_match = variable(do_they_match, const0)
elif value == 1: do_they_match = variable(const0, do_they_match)
return do_they_match
But we’d get a much larger expression than the one we wrote by hand. Just to match the first row, it’s:
>>> match((A, B), (0, 0))
B(A(1, 0), 0)
(In words, “if B is 1, then 0; else if A is 1, then 0; else 1 for a match”.)
The whole table would need more than four times the size of this expression. Adding more variables makes it exponentially worse– here’s a three-variable table with eight entries:
>>> majority = {(0,0,0): 0, (1,1,1): 1,
... (0,0,1): 0, (1,1,0): 1,
... (0,1,0): 0, (1,0,1): 1,
... (1,0,0): 0, (0,1,1): 1}
How can we do better?
One approach is called peephole optimization. Certain simple
expressions could be simpler, such as A(0, 1)
: it has the same value
as just A
. Likewise, A(B, B)
is equivalent to B
. We could walk
through all the subexpressions of an expression, looking for these
patterns; it’s easier, though, to notice them at the time we build the
tree:
def Choice(index, if0, if1):
if if0 == if1: # A(B, B) --> B
return if0
elif (if0, if1) == (const0, const1): # A(0, 1) --> A
return index
else:
return ChoiceNode(index, if0, if1)
After this change, our table would translate into the same simple
expression we started with– but mostly by luck. majority
still
becomes monstrous.
Splitting in two: the Boole expansion
In 1847 George Boole introduced divide-and-conquer: turn the table
into “if
first variable then
subtable where that variable is 1
else
subtable where it’s 0”.67
def express(variables, table):
"""Return an expression e such that, for each key in table,
e.evaluate(dict(variables, key)) == table[key]."""
if not table:
return const0 # Or 1, doesn't matter.
elif len(table) == 1:
return Constant(table.values()[0])
else:
first_var, rest_vars = variables[0], variables[1:]
return first_var(express(rest_vars, subst_for_first_var(table, 0)),
express(rest_vars, subst_for_first_var(table, 1)))
def subst_for_first_var(table, first_var_value):
"""Return a subtable of table, eliminating the first variable and
supposing it'll take the given value."""
return {keys[1:]: output
for keys, output in table.items() if keys[0] == first_var_value}
Now, by aiming for a balanced tree and perhaps by giving the peephole optimization more to
work with, when we ask for majority
we get back respectable code:
>>> C = Variable('C')
>>> express((A,B,C), majority)
A(B(0, C), B(C, 1))
To be grand, you could say we’ve started to learn or synthesize
programs from sample inputs and outputs. In fact this method is
popular in machine
learning using
fancier alternatives for return const0
and for picking
first_var
. But we’ll take a different tack to improvement:
eliminating common subexpressions.
Sharing structure: the BDD
The parity function– which is 1
just when the input has an odd
number of ones, so it really should’ve been named the oddity
function– cannot be expressed as a small tree. It’s easy to make a
big one:
>>> parity = {(0,0,0,0): 0, (0,0,0,1): 1, (1,0,0,1): 0, (1,0,0,0): 1,
... (0,0,1,1): 0, (0,0,1,0): 1, (1,0,1,0): 0, (1,0,1,1): 1,
... (0,1,0,1): 0, (0,1,0,0): 1, (1,1,0,0): 0, (1,1,0,1): 1,
... (0,1,1,0): 0, (0,1,1,1): 1, (1,1,1,1): 0, (1,1,1,0): 1}
>>> express((A,B,C,D), parity)
A(B(C(D, D(1, 0)), C(D(1, 0), D)), B(C(D(1, 0), D), C(D, D(1, 0))))
Here D(1, 0)
appears 4 times. Also appearing twice are C(D, D(1,
0))
and the same with reversed arguments. If we wrote down the parity
function for more variables, the table and the tree would both grow
exponentially, and so would the opportunities for sharing.
We already saw how to coalesce duplicates when we memoized
ConstantNode
. Memoizing ChoiceNode
turns our tree into a directed acyclic
graph (DAG).
>>> ChoiceNode = memoize(ChoiceNode)
>>> len(ChoiceNode._memos)
0
>>> express((A,B,C,D), parity)
# <same giant-ass expression as before, with 11 pairs of parentheses>
>>> len(ChoiceNode._memos)
6
The same expression now is built of only 6 unique Choice
nodes. In
general, the parity function over n variables needs a number of
nodes linear in n, as a DAG, but exponential as a tree. (Claude
Shannon proved a generalization of this result in his master’s thesis,
planting seeds of the techniques we’ll look at later.)
Graphs like these are called binary decision diagrams (BDDs). The term
can mean any graph of choice nodes that branch only on variables, but
usually people restrict it to graphs that could have been built by our
express
function: ones that only test variables in a consistent order
(such as A
, B
, C
, then D
); have no nodes like A(0, 1)
or
A(B, B)
as eliminated by Choice
, above; and have no redundant
nodes as would be coalesced by memoize
.
A tic-tac-toe machine
Daniel Hillis and Brian Silverman made a couple of different tic-tac-toe players in Tinkertoys. Let’s try and design one too, down to the level of choice gates, if not actual Tinkertoys. (When thinking of the data structure reduced to hardware, we call a node a gate.)
tictactoe.py is my player as a game-tree search function. We’ll apply it to materialize the whole game tree– it’s pretty small!– into a table mapping game boards to each one’s successor (meaning the board you get after the chosen move); then we’ll crunch the table into a network of choice gates.
We need to understand a few exports from tictactoe.py
: a tic-tac-toe
board is called a grid, a game starts with empty_grid
, successors(grid)
finds the possible next moves and performs them (generating the
resulting next grids), and max_play(grid)
picks the ‘best’
successor for the player to move– so max_play
is our game AI. A grid is
represented as a pair (p,q)
of 9-bit integers, p
for the player to
move and q
for their opponent. Each bit tells whether a mark fills a
particular square on the board; bit 0 denotes a mark in the lower
right corner, bit 1 is for the square to the left of it, and so on:
8 | 7 | 6 X | O | 1 0 0 0 1 0
---+---+--- ---+---+---
5 | 4 | 3 | O | 0 0 0 0 1 0
---+---+--- ---+---+---
2 | 1 | 0 | | X 0 0 1 0 0 0
Bit positions A board p bits q bits
(X to move) (O just moved)
Play out every possible game between human as X and max_play
as O:
when it’s X’s turn, visit every successor, but when it’s O’s, visit
only the one picked by max_play
, recording it in all_responses
.
import logic as E; E.ChoiceNode = E.memoize(E.ChoiceNode)
import tictactoe as T
def exhaust(human_to_move):
for AI_to_move in T.successors(human_to_move):
if T.successors(AI_to_move):
all_responses[AI_to_move] = response = T.max_play(AI_to_move)
exhaust(response)
else:
all_responses[AI_to_move] = tuple(reversed(AI_to_move))
all_responses = {}
exhaust(T.empty_grid)
(If you’re wondering about tuple(reversed(AI_to_move))
, that serves as a
‘game over’ state that still shows the game board– it’d be impolite to
wipe the display before the human even sees their last move. The
reversal maintains the mapping from p
and q
to X and O.)
>>> len(all_responses)
427
For a machine made out of binary gates, the most natural representation is again 18 bits, 9 each for X’s and O’s. We can represent the player’s move in the 9 X bits, and compute the next 9 O bits from the current 18 X’s and O’s.
O_tables = [{grid: (Os2>>square)&1
for grid,(_,Os2) in all_responses.items()}
for square in range(9)]
(for grid,(_,Os2) in all_responses.items()
generates the grid
before each tabulated move, along with Os2
: the configuration of O’s
after the O player moves in response. Then grid: (Os2>>square)&1
maps the input grid to the output 0 or 1 marking an O in a particular
tic-tac-toe square.)
These aren’t quite tables of the sort our express
function eats,
yet: the keys are pairs of 9-bit ints, not the required tuples of
bits. We must convert them:
squares = range(9)
def combine(xs, os): return tuple(xs) + tuple(os)
XO_variables = combine((E.Variable('x%d' % k) for k in squares),
(E.Variable('o%d' % k) for k in squares))
def XO_values((Os,Xs)):
return combine(((Xs>>k)&1 for k in squares),
((Os>>k)&1 for k in squares))
O_nodes = [E.express(XO_variables, {XO_values(grid): value
for grid, value in table.items()})
for table in O_tables]
Remember that memoizing ChoiceNode
, above, got us a BDD instead of a
tree. Assuming we cleared the memo table first (with
ChoiceNode._memos.clear()
) we end up with 1236 choice gates (2985
without memoization):
>>> len(ChoiceNode._memos)
1236
This can be improved by a heuristic: choices that ‘matter the most’ should get tested first. In tic-tac-toe that’s the center square most of all, then the corners, then the edges. (This ordering of variables in the BDD has nothing to do with the function computed by it: that is, the moves the AI picks. An AI playing to lose would likely still be more compact if encoded in a BDD that looked at the center square first.) Interleave the X bits with the O bits to treat both center bits first, then all corner bits, and so on:
squares = (4, 0,2,6,8, 7,1,3,5)
def combine(xs, os): return sum(zip(xs, os), ())
These tweaks bring us down to 956 gates. (The odd order 7,1,3,5 puts the first edge next to the last corner. I can’t say why, so far as I’ve seen, other small tweaks worsen this.)
Our express
function works best on exhaustive tables, where every
possible key has a value. The O_tables
are not exhaustive; tweaking
express
to start to exploit don’t-cares gets us down to 556 gates:
def express(variables, table):
if not table:
return const0 # (Can you be cleverer here?)
elif len(table) == 1:
return Constant(table.values()[0])
else:
first_var, rest_vars = variables[0], variables[1:]
# The new logic: if first_var takes on only one value
# in the table...
domain = set(row[0] for row in table.keys())
if len(domain) == 1:
# ...then just assume it'll take that value in the input.
value = next(iter(domain))
return express(rest_vars, subst_for_first_var(table, value))
# Otherwise as before.
return first_var(express(rest_vars, subst_for_first_var(table, 0)),
express(rest_vars, subst_for_first_var(table, 1)))
What if we could use three-way Choice
gates? Encode a grid square as
0/1/2 for blank/X/O, and we’d need only half as many variables, and end up reducing the 556 gates to 379. Would 379 ternary gates cost less than 556 binary ones? It seems plausible, for instance as switches for falling-marble tracks. (What, you think a mechanical tic-tac-toe player implausible?)
In a 6th-grade science fair Steve Wozniak did much better: “about 100 transistors and about 100 diodes.” How? Beats me!
Tables, schmables
We’ve seen some ways for choice trees and choice graphs to compress data tables. But when we start from tables, they can only grow so big before exhausting our space and patience. What if we skip the table building and go straight to the choice structures?
When the (non-)table in question is meant to express some logical claim, like “if you’re outside and it’s dark, it’s night”, we can draw on familiar language: NOT, AND, OR, 0/1 for false/true, and so on.
# The Python operators ~, &, |, ^:
Node.__invert__ = lambda self: self(const1, const0)
Node.__and__ = lambda self, other: self(const0, other)
Node.__or__ = lambda self, other: self(other, const1)
Node.__xor__ = lambda self, other: self(other, ~other)
def Equiv(p, q): return p(~q, q)
def Implies(p, q): return p(const1, q)
Thus:
>>> inside, dark, night = map(Variable, 'idn')
>>> claim = Implies(~inside & dark, night)
>>> claim.evaluate({dark: 1, inside: 1, night: 0})
1
With these we can express functions in many variables, so many that a tabulation would overflow the visible universe.
Exploring all possible worlds
We just evaluated a claim which turned out to be true in a particular world. We might ask: is it true no matter what? (That is, however the variables are set, provided they’re each 0 or 1.) Or likewise: find me a world where it’s true.
To find it given a table, we’d loop over table.items()
. For a choice
tree claim
, we might loop in the same way over all possible values
for its variables
:
((values, claim.evaluate(dict(zip(variables, values))))
for values in itertools.product(*((0,1),) * len(variables)))
(This method is called truth-table enumeration.) But we’re assuming now that this virtual table would get impractically big; to do better we must exploit the structure of the choice tree. Let’s try a goal-directed walk, skipping those possibilities that we can rule out early:
def is_valid(claim):
# A claim true in all worlds is false in none of them.
return satisfy(claim, 0) is None
def satisfy(tree, goal):
"""Find a (partial) env that makes tree.evaluate(env) == goal,
else None."""
return next(tree.find(goal, {}),
None)
class Node(object):
# ...
def find(self, goal, env):
"""Generate (partial) environments, extending `env`, in which
I evaluate to `goal`."""
abstract
class ConstantNode(Node):
# ...
def find(self, goal, env):
if self.value == goal:
yield env
class VariableNode(Node):
# ...
def find(self, goal, env):
if self not in env:
yield extend(env, self, goal)
elif env[self] == goal:
yield env
class ChoiceNode(Node):
# ...
def find(self, goal, env):
# Succeed where the index evaluates to 0, and if0 to the goal...
for env1 in self.index.find(0, env):
for env2 in self.if0.find(goal, env1):
yield env2
# ... or likewise for index 1 and if1.
for env1 in self.index.find(1, env):
for env2 in self.if1.find(goal, env1):
yield env2
def extend(env, var, value):
"A copy of env but mapping var to value."
result = dict(env)
result[var] = value
return result
This search may take time exponential in the depth of the choice expression– which can be worse than enumerating the truth table (exponential in the number of variables), but it can also be much better: the expression may be shallower, or branches of the search may be cut off early.
>>> x, y = map(Variable, 'xy')
>>> satisfy(~x&~x, 1)
{x: 0}
>>> is_valid(Implies(Implies(Implies(x, y), x), x))
True
>>> is_valid(Implies(Implies(Implies(x, y), x), y))
False
A pen-and-paper version of this algorithm, with special cases for AND,
OR, and the rest, is taught as the method of analytic
tableaux.
Since that form of the method leaves you a choice of which
subexpression to target next, a smart choice might streamline the
search. (For instance, a & b
can be treated like either a(0, b)
or b(0, a)
; if a
looks ‘less branchy’ than b
–
it’s a smaller expression, say, or uses variables we’ve already set–
then the search is likely to go faster with a
first.)8
An algebra of choice
We’ve gained a new way to check whether two claims are always equal:
is_valid(Equiv(claim1, claim2))
.
Another way uses algebra, following John McCarthy’s rules:
- Selection:
const0(a, b) == a
const1(a, b) == b
- Case analysis and irrelevance:
p(const0, const1) == p
p(a, a) == a
- The same
p
twice collapses into one (and it’s associative):
p(a, p(b, c)) == p(a, c)
p(p(a, b), c) == p(a, c)
- Nesting:
q(p, r)(a, b) == q(p(a, b), r(a, b))
- Swapping:
q(p(a, b), p(c, d)) == p(q(a, c), q(b, d))
McCarthy’s notation had (p -> r; q)
for our p(q, r)
. An infix
version like Haskell’s, q `p` r
, is rather pretty, more
like familiar algebra: so the associative law becomes
a `p` (b `p` c) = a `p` c = (a `p` b) `p` c
These laws together let you standardize choice trees: take any two trees, rewrite each by the method below, and then the resulting trees will be identical to each other just when the original trees are equal in meaning.
How? Massage the tree into a mirror of the Boole expansion of the
corresponding table, like this: First eliminate any nested tests–
subexpressions like q(p, r)
in q(p, r)(a, b)
– using the nesting
rule. Then enforce a standard order for the variables, like the order
of variables passed to express()
in examples above. Aim to move the
first-ordered variable to the root of the tree (lifting it using the
swapping law). Repeat for each subtree, using the other laws to
eliminate repeated variables, and also any constants from the test
position of any subtrees.
For example, suppose we’ve decided on alphabetical order for the
variables. b(a, c)
is partly in standard form, having no nested
tests, but the variables are out of order. We can fix that:
b(a, c) = b(a(0, 1), a(c, c)) # By case analysis and irrelevance
= a(b(0, c), b(1, c)) # By swapping
This produces the same expression as if we’d run express((a,b,c), table)
on a tabulation of b(a, c)
.
For more practice in the algebra, try proving the distributive law:
a `p` (b `q` c) = (a `p` b) `q` (a `p` c)
Systematic succinctness
A BDD (the usual, restricted kind) is McCarthy’s standardized tree, made into a DAG by memoizing. We’ve built some already from tables, but we can do more: given expressions in this form, we can compose them into more complex expressions in the same form, efficiently, by applying McCarthy’s rewrite rules only so far as they’re needed.
Let’s start our code over from scratch. The changes:
-
Since a choice node will now always be indexed by a variable (since we eliminate nested tests), make
ChoiceNode
hold the variable itself. We can dropVariableNode
from the node types, since one of them can always be represented by one of the new choice nodes. -
We’ll need an ordering of the variables, under the user’s control, saying which variables go higher in the DAG than others. The simplest way is to represent variables by numbers: smaller numbers go higher in the DAG. Call this number the ‘rank’ of the variable, and of the
ChoiceNode
. (If a user wants to name their variables, they could always make up an array of names indexed by rank.)
We set the scene with some repeated code:
from utils import memoize
class Node(object):
"A binary-decision-diagram node."
__invert__ = lambda self: self(const1, const0)
__and__ = lambda self, other: self(const0, other)
__or__ = lambda self, other: self(other, const1)
__xor__ = lambda self, other: self(other, ~other)
def Equiv(p, q): return p(~q, q)
def Implies(p, q): return p(const1, q)
The changes to the classes reflect the structural changes planned above:
class ConstantNode(Node):
rank = float('Inf') # (Greater than any variable's rank.)
def __init__(self, value): self.value = value
def evaluate(self, env): return self.value
def __call__(self, *branches): return branches[self.value]
Constant = memoize(ConstantNode)
const0, const1 = Constant(0), Constant(1)
def Variable(rank):
return build_node(rank, const0, const1)
class ChoiceNode(Node):
value = None # (Explained below.)
def __init__(self, rank, if0, if1):
assert rank < if0.rank and rank < if1.rank
self.rank = rank
self.if0 = if0
self.if1 = if1
def evaluate(self, env):
branch = (self.if0, self.if1)[env[self.rank]]
return branch.evaluate(env)
def __call__(self, if0, if1):
if if0 is if1: return if0
if (if0, if1) == (const0, const1):
return self
# The above cases usually save work, but aren't needed for correctness.
return build_choice(self, if0, if1)
build_node = memoize(ChoiceNode)
And there’s wholly new logic to keep the variables ordered, lower-rank first:
@memoize
def build_choice(node, if0, if1):
"""Like Choice(node, if0, if1) in logic.py, but McCarthy-standardized,
presupposing the arguments are all McCarthy-standardized."""
top = min(node.rank, if0.rank, if1.rank)
cases = [subst(node, top, value)(subst(if0, top, value),
subst(if1, top, value))
for value in (0, 1)]
return make_node(top, *cases)
def make_node(rank, if0, if1):
if if0 is if1: return if0
return build_node(rank, if0, if1)
def subst(node, rank, value):
"""Specialize node to the case where variable #rank takes the given value.
Again, node must be standardized."""
if rank < node.rank: return node
elif rank == node.rank: return (node.if0, node.if1)[value]
else: return make_node(node.rank,
subst(node.if0, rank, value),
subst(node.if1, rank, value))
In our earlier code build_choice(node, if0, if1)
would be just
Choice(node, if0, if1)
, but now we must make sure to branch first
on the lowest-ranked variable. (Imagine it as the lightest, floating to
the top of the downward-growing DAG.) We find that variable, call it
top
, and branch on it with make_node(top.rank, *cases)
. The
cases
are equivalent to Choice(node, if0, if1)
with the top
variable set to each of its possible values.
Every ConstantNode
has a dummy rank
to keep special cases out of
this logic. (Similarly each ChoiceNode
has a dummy value
.)
As before, we memoize the node constructors (ConstantNode
and
ChoiceNode
) to make DAGs instead of trees. If those were all we
memoized, the code would be logically correct, but algorithmically
awful: it would run as slowly as McCarthy’s tree standardizer, with
its exponentially bigger trees. The memoize
on build_choice
cuts
off the repeated work before it starts.
Moving on to queries on these structures: satisfy
and is_valid
could be adapted to DAGs with little change, but to reap the
efficiency gains we’d have to memoize satisfy
too (and the
generators it returns). We’ll go for efficiency more directly by
making this version of satisfy
produce just one result.
def is_valid(claim):
return satisfy(claim, 0) is None
def satisfy(node, goal):
"""Return the lexicographically first env such that
node.evaluate(env) == goal, if there's any; else None."""
env = {}
while isinstance(node, ChoiceNode):
if node.if0.value in (None, goal): node, env[node.rank] = node.if0, 0
elif node.if1.value in (None, goal): node, env[node.rank] = node.if1, 1
else: return None
return env if node.value == goal else None
As another fun problem, see if you can efficiently count how many ways a claim can be satisfied; for a corollary, generate solutions uniformly at random.
A more ambitious example
So let’s try our new tool on a problem too big for a table: adding large numbers in binary.
def ripple_carry_add(A, B, carry):
"The simplest adder in logic gates."
assert len(A) == len(B)
S = []
for a, b in zip(A, B):
sum, carry = add3(a, b, carry)
S.append(sum)
return tuple(S), carry
def add3(a, b, c):
"Compute the least- and most-significant bits of a+b+c."
s1, c1 = add2(a, b)
s2, c2 = add2(s1, c)
return s2, c1 | c2
def add2(a, b):
"Compute the least- and most-significant bits of a+b."
return a ^ b, a & b
A
and B
should be two lists of logic expressions (and carry
, a
single expression) with the least significant bits
first. ripple_carry_add
then returns logic expressions for the sum
bits and the carry out. They’re built following the most
straightforward algorithm for addition, the same one we’re taught in
school, but for binary numbers. A circuit following this algorithm may
take time linear in the length of the inputs: adding 111…1 to
111…1 makes the carries ripple all the way from the right end to the
left.
You can add faster in the worst case by speculating: compute the sum for the left half of the inputs under two assumptions in parallel: assuming the carry out from the right half will be either 0 or 1. By the time that carry is available, so will be the speculative sums, and we can select one or the other immediately. Repeating the speculation recursively results in a bigger but shallower circuit.
def carry_lookahead_add(A, B, carry):
assert len(A) == len(B)
@memoize # (to keep the circuit from blowing up in size)
def lookahead(lo, nbits, assume_cin):
if nbits == 0:
return Constant(assume_cin)
elif nbits == 1:
s, c = add2_constant(A[lo], B[lo], assume_cin)
return c
else:
m = preceding_power_of_2(nbits)
return lookahead(lo, m, assume_cin)(lookahead(lo+m, nbits-m, 0),
lookahead(lo+m, nbits-m, 1))
# The carry at each place.
C = [carry(lookahead(0, hi, 0),
lookahead(0, hi, 1))
for hi in range(len(A)+1)]
return tuple(a^b^c for a,b,c in zip(A, B, C)), C[-1]
def preceding_power_of_2(n):
import math
return 2**int(math.log(n-1, 2))
def add2_constant(a, b, bit):
if bit == 0: return add2(a, b)
else: return Equiv(a, b), a|b
Do these different adders compute the same function? Let’s see if they do.
def test_equivalent(n, adder1, adder2, interleaved=True):
c_in, A, B = make_alu_inputs(n, interleaved)
S1, c_out1 = adder1(A, B, c_in)
S2, c_out2 = adder2(A, B, c_in)
assert len(S1) == len(S2) == len(A)
assert Equiv(c_out1, c_out2) == const1
for s1, s2 in zip(S1, S2):
assert Equiv(s1, s2) == const1
return 'passed'
def make_alu_inputs(n, interleaved):
c_in = Variable(-1)
inputs = map(Variable, range(2*n))
if interleaved:
A = inputs[0::2]
B = inputs[1::2]
else:
A = inputs[:n]
B = inputs[n:]
return c_in, A, B
>>> test_equivalent(64, ripple_carry_add, carry_lookahead_add)
'passed'
We just checked whether two different 64-bit adder circuits produce
the same outputs for all 2129 of the possible values of A
, B
,
and carry
. It took only a moment in interpreted Python. Magic!
The magic has limits: a run with interleaved=False
would fail on any
computer now existing. The BDDs get too big.
Not the end
We saw Bryant’s revolutionary algorithm coded as a pair of memoize
annotations on (a natural Python rendition of) a method published a
quarter century earlier by another Turing Award winner,
McCarthy. Memoization dates to the 1960’s as well. Stories like this
one, and the long gap between Peirce and Shannon in linking logic with
digital circuits, I take as encouragement to not only study the
textbooks, but to try to work out the basics of a subject for
yourself: simple ideas are still out there undiscovered or forgotten,
waiting to be put together.
References
George Boole (1854), An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities.
Randal Bryant (1986), “Graph-Based Algorithms for Boolean Function Manipulation”.
Marcello D’Agostino (1992), “Are tableaux an improvement on truth-tables? Cut-free proofs and bivalence”.
A.K. Dewdney (1989), “Computer recreations: A Tinkertoy computer that plays tic-tac-toe”.
Kenneth Laine Ketner (1984), “The early history of computer design: Charles Sanders Peirce and Marquand’s logical machines”, with the assistance of Arthur Franklin Stewart, Princeton University Library Chronicle, v. 45, n. 3, pp. 186–211
Donald Knuth (2011), The Art of Computer Programming, Volume 4A (Combinatorial Algorithms, Part 1). If you’d like to learn more about the topics in this article, this seems the deepest and most fun source to read next. I’ve only dipped into it so far. (He also surveys how BDDs actually were developed, as opposed to my reimagining of how they might have been.) His website (at this writing) offers a fascicle on SAT solving as well.
Benson Mates (1972), Elementary Logic. The last chapter summarizes the history I condensed into a sentence. Mates’s 1953 Stoic Logic covers and defends it.
John McCarthy (1963), “A Basis for a Mathematical Theory of Computation”.
Donald Michie (1967), “Memo functions: a language feature with ‘rote-learning’ properties.” Edinburgh: Department of Machine Intelligence & Perception. 1967. Research Memorandum MIP-R-29. (I could not find this online to read for myself.)
Charles Peirce (1886), Letter, Peirce to A. Marquand, 1886 December 30, published 1993 in Kloesel, C. et al., eds., Writings of Charles S. Peirce: A Chronological Edition, Vol. 5. Indiana Univ. Press, pp. 421–3.
Claude Shannon (1938), A Symbolic Analysis of Relay and Switching Circuits.
Thanks
Thank you to: Kragen Sitaker for comments, encouragement, and diagram-generating code; Brandon Moore for discussions about tic-tac-toe circuit generation; the Recurse Center for distractions from writing; and the editors for saintly forbearance.
-
See the References section above for this and other sources. I’ve chosen not to clutter the main text with these links. ↩
-
Only by, as it turned out, my failing to finish the intended JavaScript toys in time. ↩
-
The code repo includes this generalization; it’s left out of the article to keep the code more concrete. ↩
-
B if A else 0
would be a Python expression as well, but Python can only evaluate it toA
or0
; we can’t override it to mean building a tree representing the choice-language expression we want. ↩ -
In Python 3, the same functionality is offered by
functools.lru_cache
. ↩ -
His bigger idea was to write
if x then y else z
asx*y + (1-x)*z
, solve the resulting equations by ordinary algebra, and interpret the solutions as inferences. His system was not yet the one we call Boolean algebra today. ↩ -
The case
if not table:
may seem unnecessary, except we were never promised an exhaustive table. The tic-tac-toe tables we’ll see later will only include boards that can come up in a game, omitting the rows for some other combinations of variables. ↩ -
The paper by D’Agostino, above, examines why the tableau method seems unpopular for practical work. ↩