Not everything is an expression
“Everything is an expression” is an informal slogan of several programming languages, particularly those descended from the Lisp family. It is meant to contrast with languages that distinguish statements from expressions, such as Python, JavaScript, and C.
Favoring expressions over statements can simplify code. Instead of the bulky:
if increment:
do_something_with(x + 1)
else:
do_something_with(x)
you can write:
do_something_with(x + if increment then 1 else 0)
However, this is not an essay about the benefits of replacing statements with expressions.
Rather, I claim that not everything is an expression! The idea that there are only expressions ignores the importance of other syntax classes. I believe Lisp macros are underpowered because of this ignorance.
What are syntax classes?
I’ll explain what I mean by syntax class in two ways, first with an abstract definition, then with an example. Skip the first if it’s confusing; skip the second if you don’t need it.
Syntax classes, abstractly
By syntax class I mean a distinct type in our language’s abstract syntax tree, or a semantically interesting nonterminal symbol in our language’s grammar. In most grammars, many nonterminals are semantically boring, and exist to specify banal syntactic details – operator precedence, where commas go in a list, the format of numeric literals. But some nonterminals represent things like expressions, statements, declarations, modules, patterns or variables; the “top-level” categories in a broad overview of the language’s syntax. These are syntax classes.1
Syntax classes, by example
def twice_last(seq):
last, *rest = reversed(seq)
return last * 2
This (rather unidiomatic) Python code is colored by syntax class: expressions in red, statements in blue, declarations in purple, and patterns in green.
Different syntax classes represent distinct concepts:
-
Expressions are evaluated to compute their value. Literal values, variables, binary operations, and function calls are examples of expressions.
-
Statements are executed to take some action. Variable assignments, loops, conditionals, and raising exceptions are examples of statements.
-
Declarations alter our environment, often by defining something. Function and type definitions, imports, and variable declarations (in a language that distinguishes them from assignments, such as C or JavaScript) are declarations.
-
Patterns tell us how to bind variables. They get matched against values. For example, in Python:
x,y = (1,2)
This matches the pattern
x,y
against the value(1,2)
. The match succeeds, bindingx
to1
andy
to2
. If instead we wrote:x,y = (1,2,3)
Then the match would fail, since
x,y
expects a 2-element sequence, and(1,2,3)
has 3 elements. This raises aValueError
.Even an ordinary variable assignment like
x = 3
involves a pattern: the variablex
.x
is simply a pattern that always matches, bindingx
to whatever it is matched against.
This is not an exhaustive list. These classes are fuzzy. Different languages treat them differently. Many languages collapse two or more into one category. Lisps don’t distinguish between statements and expressions; Python combines statements and declarations; Haskell lacks statements entirely. Some languages (Prolog, Forth, sed, SQL, BNF) aren’t easy to describe in these terms at all!
The power of patterns
Most mainstream languages only have very simple patterns. But in some languages, particularly those descended from the ML family, patterns are quite powerful and pattern-matching is a core construct.
Some examples, in
fibonacci 0 = 1
fibonacci 1 = 1
fibonacci n = fibonacci (n-1) + fibonacci (n-2)
sum [] = 0 -- list is empty
sum (x:xs) = x + sum xs -- list is `x' followed by `xs'
-- Finds all adjacent pairs:
-- adjacent [1,2,3] == [(1,2), (2,3)]
adjacent (x:y:rest) = (x,y) : adjacent (y:rest)
adjacent _ = []
-- Searching a binary search tree
data Tree = Empty | Node Int Tree Tree
inTree x Empty = False
inTree x (Node v l r) =
case compare x v of -- matches on (compare x v)
EQ -> True -- x = v, EQ for "equals"
LT -> inTree x l -- x < v, LT for "less than"
GT -> inTree x r -- x > v, GT for "greater than"</span>
fun fibonacci 0 = 1
| fibonacci 1 = 1
| fibonacci n = fibonacci (n-1) + fibonacci (n-2)
fun sum [] = 0
| sum (x::xs) = x + sum xs
(* Finds all adjacent pairs:
* adjacent [1,2,3] == [(1,2), (2,3)] *)
fun adjacent (x::y::rest) = (x,y) :: adjacent (y::rest)
| adjacent _ = []
(* Searching a binary search tree *)
datatype tree = Empty | Node of int * tree * tree
fun inTree x Empty = false
| inTree x (Node(v,l,r)) =
case Int.compare (x,v)
of EQUAL => true
| LESS => inTree x l
| GREATER => inTree x r
let rec fibonacci = function
| 0 -> 1
| 1 -> 1
| n -> fibonacci (n-1) + fibonacci (n-2)
let rec sum = function
| [] -> 0
| (x::xs) -> x + sum xs
(* Finds all adjacent pairs:
* adjacent [1;2;3] == [(1,2); (2,3)] *)
let rec adjacent = function
| (x::y::rest) -> (x,y) :: adjacent (y::rest)
| _ -> []
(* Searching a binary search tree *)
type tree = Empty | Node of int * tree * tree
let rec inTree x t = match t with
| Empty -> false
(* OCaml has no equivalent of Haskell's Ordering or SML's order types *)
| Node(v,l,r) -> x == v || inTree x (if x < v then l else r)
fn fib(n: int) -> int {
match n {
0 => 1,
1 => 1,
n => fib(n-1) + fib(n-2),
}
}
fn sum(l: &[int]) -> int {
match l {
[] => 0,
[x, ..xs] => x + sum(xs),
}
}
// Finds all adjacent pairs:
// adjacent(&[1,2,3]) == vec![(2,3), (1,2)]
fn adjacent(l: &[int]) -> Vec<(int,int)> {
match l {
[x, .. xs @ [y, ..]] => adjacent(xs) + &[(x, y)],
_ => vec![]
}
}
// Searching a binary search tree
enum Tree { Empty, Node(int, Box<Tree>, Box<Tree>) }
fn inTree(x: int, t: &Tree) -> bool {
match t {
&Empty => false,
&Node(v, ref l, ref r) => match x.cmp(&v) {
Equal => true,
Less => inTree(x, *l),
Greater => inTree(x, *r),
}
}
}
(define/match (fibonacci n)
[((or 0 1)) 1]
[(n) (+ (fibonacci (- n 1)) (fibonacci (- n 2)))])
(define/match (sum l)
[('()) 0]
[((cons x xs)) (+ 1 (sum xs))])
;; Finds adjacent pairs:
;; (adjacent '(a b c)) == '((a b) (b c))
(define/match (adjacent l)
[((cons x (and xs (cons y _)))) (cons (list x y) (adjacent xs))]
[(_) '()])
;; Searching a binary search tree
(struct node (value left right) #:transparent)
(define/match (in-tree x tree)
[(x (node v l r)) (or (= x v) (in-tree x (if (< x v) l r)))]
[(_ _) #f])
Many language features are simply special-cases of pattern-matching. We’ve
already seen that Python’s tuple unpacking is a form of pattern-matching.
Optional arguments and rest arguments (*args
in Python & Ruby) are
patterns matched against a list of arguments. Keyword arguments are patterns
matched against a dictionary of arguments. if
is just pattern-matching on
booleans:
if CONDITION { THEN-CASE ... }
else { ELSE-CASE ... }
Is the same as:
case CONDITION of
True -> THEN-CASE ...
False -> ELSE-CASE ...
Many useful domain-specific languages are based on pattern-matching. Regular
expressions express patterns of strings; sed
and awk
employ this to great
effect. yacc
uses a more powerful cousin of this idea, extending patterns to
grammars. The OMeta language attempts to push the
power of pattern-matching as far as possible in a general-purpose setting.
Pattern-matching in Lisps
Pattern-matching is not native to Lisp as it is to the ML family. Neither Common Lisp nor Scheme nor Clojure have a general-purpose pattern-matching form built-in, much less integrated into function definition.2
But adding language features is Lisp’s strong point: we can implement
pattern-matching as a macro!3 As a toy
example, here’s an if-match
macro, where (if-match pat expr then else)
runs
then
if pat
matches expr
and otherwise runs else
.
In
(defmacro if-match [pat expr then else]
(cond
;; `pat' is a variable
(symbol? pat) `(let [~pat ~expr] ~then)
;; `pat' is a literal atom
(not (seq? pat)) `(if (= '~pat ~expr) ~then ~else)
;; `pat' is (cons A B) with subpatterns A, B
(= 'cons (first pat))
(let [[_ p-first p-rest] pat
tmp (gensym)]
`(let [~tmp ~expr]
(if (seq? ~tmp)
(if-match ~p-first (first ~tmp)
(if-match ~p-rest (rest ~tmp) ~then ~else) ~else) ~else)))
:else (throw (Exception. "Unrecognized pattern"))))
(defmacro if-match (pat expr then else)
(cond
;; `pat' is a variable
((and pat (symbolp pat)) `(let ((,pat ,expr)) ,then))
;; `pat' is a literal atom
((atom pat) `(if (equalp ',pat ,expr) ,then ,else))
;; `pat' is (cons A B) with subpatterns A, B
((eq 'cons (car pat))
(destructuring-bind (_ p-car p-cdr) pat
(let ((tmp (gensym)))
`(let ((,tmp ,expr))
(if (consp ,tmp)
(if-match ,p-car (car ,tmp)
(if-match ,p-cdr (cdr ,tmp) ,then ,else) ,else) ,else)))))
(t (error "Unrecognized pattern"))))
(define-syntax if-match
(syntax-rules (cons)
;; pattern is (cons A B) with subpatterns A, B
[(_ (cons p-car p-cdr) expr then else)
(let ((x expr))
(if (pair? x)
(if-match p-car (car x)
(if-match p-cdr (cdr x) then else) else) else))]
[(_ pat expr then else)
(symbol?? pat ;; deep magic
;; pattern is a variable
(let ((pat expr)) then)
;; otherwise, pattern is a literal
(if (equal? 'pat expr) then else))]))
(require (for-syntax syntax/parse))
(define-syntax if-match
(syntax-parser #:literals (cons)
;; pattern is a variable
[(_ x:id expr then else) #'(let ((x expr)) then)]
;; pattern is (cons A B) with subpatterns A, B
[(_ (cons p-car p-cdr) expr then else)
#'(let ((x expr))
(if (pair? x)
(if-match p-car (car x)
(if-match p-cdr (cdr x) then else) else) else))]
;; pattern is an atomic literal
[(_ (~and lit (~not (_ ...))) expr then else)
#'(if (equal? 'lit expr) then else)]))
(Of course, in Racket you could just use match, but now we're getting ahead of ourselves.)
We can use it like so:
(if-match 0 0 true false) ; --> true
(if-match 0 1 true false) ; --> false
(if-match (cons 0 x) '(0 hi there) x nil) ; --> (hi there)
(if-match (cons 0 x) '(1 hi there) x nil) ; --> nil
(if-match 0 0 t nil) ; --> T
(if-match 0 1 t nil) ; --> NIL
(if-match (cons 0 x) '(0 hi there) x nil) ; --> (HI THERE)
(if-match (cons 0 x) '(1 hi there) x nil) ; --> NIL
(if-match 0 0 #t #f) ; --> #t
(if-match 0 1 #t #f) ; --> #t
(if-match (cons 0 x) '(0 hi there) x #f) ; --> (hi there)
(if-match (cons 0 x) '(1 hi there) x #f) ; --> #f
So far, so good, right? But…
The Library Problem
Now that I’ve defined my handy-dandy pattern-matching macro if-match
, I’m
stuck with whatever patterns I thought up when I wrote it. If I come up with a
new pattern, I have to rewrite if-match
to support it.
This stinks! Consider: Hilary the Hacker writes a spiffy lazy stream library, while Charlie the Coder writes a fast B-tree library. Naturally, both want to support pattern-matching. They can either:
-
Each implement their own custom pattern-matching macros. This duplicates work, defeats standardization, and makes it impossible to pattern-match on a stream and a tree at the same time.
-
Bug me to include support for Hilary’s streams and Charlie’s trees in my
if-match
library. This introduces irrelevant dependencies, and over time, a big ugly monolithic ball of code will accrete.4
The whole meaning of modularity is avoiding this rock-and-a-hard-place choice between duplicating work and centralizing it.
Generalized macros
The problem with if-match
is that it requires you to think of everything in
advance — exactly the difficulty macros are supposed to solve! Macros
are Lisp’s standard language extension mechanism, and patterns are
just a small, domain-specific language.
So why can’t we just define new patterns with macros?
Here’s how this might work. Suppose we already know how to match the following patterns:
nil | Matches an empty list. |
(cons A B) |
Matches a list whose first element matches A
and whose remainder matches B . |
It would be handy to have a pattern (list
A1 ... An)
, which matches an
n-element list whose 1st element matches A1
, whose 2nd element matches
A2
, … and so on up to An
. For example, (list x 2 y)
would match (1 2 3)
, binding x
to 1
and
y
to 3
.
I’m proposing that this be done using a macro for patterns, with something like the following mock- syntax:
(defpattern list [& element-pats]
(if (empty? element-pats) 'nil
(let [[p & ps] element-pats]
`(cons ~p (list ~@ps)))))
(defpattern list (&rest element-pats)
(if (not element-pats) 'nil
(destructuring-bind (p &rest ps) element-pats
`(cons ,p (list ,@ps)))))
(define-pattern-syntax list
(syntax-rules ()
[(_) nil]
[(_ p ps ...) (cons p (list ps ...))]))
This solves Hilary and Charlie’s library problem: instead of rolling their own
incompatible pattern-matchers, or asking me to rewrite if-match
, they
separately extend if-match
using defpattern
define-pattern-syntax
. These extensions are pulled into scope
when I import their libraries, just as a macro would be.5
This idea is not original to me: The Racket
language’s match
macro is
extensible via define-match-expander
.
The optima library for Common Lisp exposes a
defpattern
macro as well. I don’t know of a Clojure library that uses this
approach per se, but core.match
is
extensible in other
ways.
Ultimately, this is a partial success story for Lisp. ML-style pattern-matching can be assimilated and macro-ified. With some work, it can even be made macro-extensible. This design is not yet idiomatic among Lisps, but hopefully in time it will be. And in any language but a Lisp, would this kind of extension even be thinkable?6
DSLs as syntax classes
Hold on — isn’t this an essay about syntax classes, and how not everything is an expression? What does the design of pattern-matching macros have to do with that? Let’s rephrase our progress so far in terms of syntax classes:
Pattern-matching doesn’t just add a new kind of expression, it adds a new
syntax class: patterns. The problem with if-match
is that it doesn’t allow
modularly extending the syntax of patterns. We already know how to modularly
extend expression syntax: macros! By extending macros to cover patterns as
well, we solve the library problem.
Now that we’re thinking in terms of syntax classes, the problem and solution have become much clearer!
But there’s nothing special about patterns. Many domain-specific languages are
simply new syntax classes on top of an existing language. Microsoft’s
LINQ adds a syntax
class for queries, for example. Expressions embed queries to be performed, and
queries embed expressions representing, for example, predicates to be filtered on or
keys to be looked up. Parser-generators are another example: typically, we write
parsing rules in a special syntax, like BNF-inspired yacc
rules or PEG parsing
expressions. Into these rules we embed semantic actions — expressions
that run when a parse rule matches.
The
canonical uses
of Lisp macros are: (1) changing order of evaluation, (2) creating new binding
or definition forms, and (3) making domain-specific languages. LINQ and parsing
are clear-cut examples of case (3). But the obvious way to implement a DSL as a
macro, as we saw with if-match
, hard-codes the form of the new syntax class,
giving up on extensibility. I think Lisp macros aren’t good enough at (3),
precisely because of this issue.
Consider: Each of the pattern-matching libraries I listed above had to design and implement an extension mechanism from scratch — even though Lisp already has an extension mechanism: macros! And patterns are just one of many useful domain-specific languages we might desire. Introducing a new extensible syntax class shouldn’t require us to reinvent the wheel! Lisp doesn’t have a standard way to define new syntax classes, one which makes them macro-extensible by default. It should.7
-
In Practical Foundations for Programming Languages, these are called “sorts.” I’d love to hear whether other textbooks have a name for this concept; perhaps I’m missing some piece of standard terminology. ↩
-
Common Lisp has
destructuring-bind
and Clojure has structural binding, but those don’t backtrack and can’t match literals (like0
or't
), so you can’t use them for case-analysis. Scheme hassyntax-rules
andsyntax-case
(in R6RS), but they’re only useful for defining macros. ↩ -
Indeed, there are many libraries implementing pattern-matching macros. Here’s a pageful for Common Lisp. Clojure has core.match. Scheme doesn’t really have a library ecosystem, but here’s a file you can use. Many Schemes also provide their own libraries; for example, Racket has a powerful
match
macro in the standard library. ↩ -
Moreover, suppose Hilary’s stream library itself uses pattern-matching. Then we have a circular dependency:
if-match
depends on Hilary’s streams, and Hilary’s streams depend onif-match
! Few languages support this kind of library-level mutual recursion. ↩ -
Of course, the devil is in the details: how are
defpattern
anddefine-pattern-syntax
implemented? How doesif-match
ask what extensions have been defined viadefpattern
? How do we avoid namespace collisions? This article lays out some of the theory behind my approach to answering these questions, but I don’t have all the answers yet. See also footnotes 6 and 7. ↩ -
Actually, yes. Quite a bit of work has been done on non-Lisp languages with extensible syntax, most notably the SugarJ project. Moxy, the language I built at the Recurse Center, is another (as yet undocumented) example. ↩
-
In fact, once upon a time, PLT Scheme (the language which became Racket) did! It was called McMicMac, and I only discovered it after I was nearly done writing this essay. Its notion of extensible syntax class was called a vocabulary, and extensions were micros (by analogy with “macros”). It apparently had problems, however, and was replaced by an ancestor of Racket’s current macro system, which sadly lacks the ability to define vocabularies. ↩