2016-11-01

In this article I will clarify some confusing terminology in the world of parsing, which spans online resources and computer science publications from the 70's and 80's. This is based on my experience writing a parser for shell arithmetic, e.g.

$ echo $(( 1 + 2 * 3 )) 7

The first pass of the expression parser for oil was based on a
patch I made to toybox implementing the Unix `expr`

command.

I used Eli Bendersky's 2012 article on precedence climbing to write it. I cited his article in the patch, as well as Keith Clarke's 1986 paper The Top-Down Parsing of Expressions, which I found in a relevant Wikipedia article.

(For contrast, GNU `expr`

uses a plain recursive descent parser, which works
fine, but is more verbose and in theory less efficient.)

The `expr`

command only supports binary operators, so we just need a single
recursive function `eval_expr()`

to handle all operators, as in Bendersky's
article.

But shell arithmetic is borrowed from C and also has these constructs:

- prefix operators:
`-3`

and`+x`

- the ternary operator:
`x > 2 ? y : 0`

- right associative operators:
`2 ** 3 ** 4`

is`2 ** (3 ** 4)`

- array indexing:
`a[0]`

A single function is awkward when you have this more complex set of operators.
I could see how to add them, but I decided first to revisit a technique I had
worked with about eight years ago: **Top Down Operator Precedence**, described
for a modern audience by Douglas Crockford.

So I implemented the richer arithmetic language with the TDOP algorithm, also
known as **Pratt Parsing**. It has successfully parsed real code, along the
lines of previous tests.

In doing so, I realized that **Precedence climbing and Pratt parsing are the
same algorithm**.

Pratt parsing is perhaps a little more general, but the core algorithm is the same:

You can assign each operator token a precedence, or

*binding power*in Pratt's terminology.You have a recursive function that parses expressions, consuming tokens to the right,

**until it reaches an operator**of precedence*less than or equal to*the previous operator -- or just*less than*if it's a right-associative operator.In Pratt parsing, tokens can be used in the

*null*and/or*left*position, based on whether they take an expression on the left or not (`nud`

or`led`

in Pratt's terminology). Examples of*left*operators are infix`+`

, postfix`++`

, or the pseudo-infix`a[0]`

(with operator`[`

). Examples of*null*operators are unary minus`-`

, and grouping parentheses`(`

.In contrast, precedence climbing is described with unary operators, binary operators, and parentheses handled "inline". But this is about source code organization, not an algorithmic issue.

This technique was invented at least twice, and is explained on the web as if it were two different algorithms. But it's really one algorithm -- if you were teaching a class, it wouldn't make sense to teach "both" algorithms.

Let's look at the evidence that they're different algorithms:

Wikipedia has separate pages for Pratt Parser and Operator-Precedence Parser. The latter mentions precedence climbing as an alternative to Pratt parsing.

Bendersky's site has two different articles: TDOP Parsing and Parsing Expressions by Precedence Climbing.

The origin of Pratt Parsing is Vaughan Pratt's 1973 paper Top-Down Operator Precedence. The term "precedence climbing" apparently comes from this relatively modern article by Theodore Norvell, which, which cites Keith Clarke's 1986 paper The Top-Down Parsing of Expressions. At the end it says:

*I am grateful to Keith Clarke and Martin Richards for helping me trace the origins of what I've called precedence climbing.*

If you google for it, the term "precedence climbing" is relatively common.

But after implementing slightly different expression languages with each of
these techniques, it's clear that they are the **same algorithm**. Let's go
back to the original papers to see this.

In Pratt's paper, he is talking about problems with the earlier Floyd method and the advantage of his technique:

One way to resolve the issue is simply to announce the outcome in advance for

each pair A and B, basing the choices on some reasonable heuristics. Floyd [1963] suggested this approach, called operator precedence. The outcome was storedin a table....

We present an approach to language design which simultaneously solves both these problems, without unduly restricting normal usage, yet allows us to retain the numeric approach to operator precedence.

The idea is to assign data types to classes and then to

totally order the classes.

Clarke's paper doesn't cite Vaughan's paper, but it cites an earlier publication by Richards related to the BCPL compiler, and says how it improves on that technique:

Richards[1] gives a procedure for parsing expressions which uses recursion rather than an explicit stack despite being derived using the operator precedence technique. The procedure is essentially identical to the one given here (performing the same number of tests for each operator symbol), but requires the construction of the

operator precedence matrix. We show how the procedure can be derived by grammar- and program-transformation from the recursive descent method,using only numerical precedences and an indication of left- or right- associativity.

My reading of this is as follows:

- Pratt (1973) is improving on the Floyd method (1963)
- Clarke (1986) is improving on the Richards method (1979)
- Pratt is assigning a
**total order**to tokens with the**binding power**mechanism. Later in the paper you will see that this is done with two numbers -- left binding power`lbp`

and right binding power`rbp`

. - Clarke is assigning
**numerical precedences and associativity**to operators. - They are both avoiding the construction of a partial order on operators, which can be represented as a matrix or table.

(Aside: In his recent JuliaCon keynote, Guy Steele mentioned that the Fortress language avoided a total order on operator precedence for usability reasons.)

The binding power and associativity mechanisms are equivalent. This is most easily seen by looking at Bendersky's sample code (which is excellent, and from which I heavily borrowed):

From the precedence climbing article:

```
def compute_expr(tokenizer, min_prec):
atom_lhs = compute_atom(tokenizer)
while True:
# ...
cur = tokenizer.cur_token
if (cur is None or cur.name != 'BINOP'
or OPINFO_MAP[cur.value].prec < min_prec): # TERMINATION CONDITION
break
next_min_prec = prec + 1 if assoc == 'LEFT' else prec
# Consume the current token and prepare the next one for the recursive
# call
tokenizer.get_next_token()
atom_rhs = compute_expr(tokenizer, next_min_prec) # RECURSIVE CALL
```

Note the lines labeled TERMINATION CONDITION and RECURSIVE CALL.

From the TDOP/Pratt article:

```
def expression(rbp=0):
global token
t = token
token = next()
left = t.nud()
while rbp < token.lbp: # TERMINATION CONDITION
t = token
token = next()
left = t.led(left) # results in recursive call
return left
```

*How about right-associative operators? ... We can do that by calling
expression in the handler of exponentiation with a rbp lower than the lbp of
exponentiation:*

```
class operator_pow_token(object):
lbp = 30
def led(self, left):
return left ** expression(30 - 1) # RECURSIVE CALL
```

It might take a little squinting, but the similarities are apparent. They are even more apparent after playing with and debugging both implementations.

There's a while loop making a recursive call, and it has a termination condition on precedence / binding power.

In precedence climbing, right associativity is implemented by making the recursive call with

`next_min_prec = prec + 1`

. In Pratt parsing, we set the right binding power`rbp`

to be`lbp-1`

, and make the recursive call with it. This is the same thing expressed in two slightly different ways.In precedence climbing, grouping parentheses are handled "inline" in the recursive parsing function. In TDOP, they can be handled as a

`nud`

function for the`(`

token. This is a superficial difference that relates only to the structure of code, and not the algorithm.

These same differences in presentation are in Pratt's and Clarke's original papers, not just in Bendersky's articles, but I think the Python code is clearer for a modern audience than the pseudo-code and diagrams used in those papers.

Clarke's paper presents associativity like this:

E(if RightAssoc(oper) then oprec else oprec+1)

The Norvell article which coined the term precedence climbing presents it like this:

const q := case associativity(op) of Right: prec( op ) Left: 1+prec( op )

Wikipedia presents it like this:

while lookahead is a binary operator whose precedence is greater than op's, or a right-associative operator whose precedence is equal to op's

These are all the same thing, and can be expressed with the `rbp`

mechanism of
Pratt parsers.

Pratt's paper doesn't seem to mention the right associativity trick, but I
don't otherwise see the point of having integers two integers `lbp`

and `rbp`

,
rather than a single integer. Crockford's 2007 presentation
of Pratt's parsing technique does use the associativity trick.

In any case, Pratt's algorithm is at least as general as Clarke's.

Again, if you were teaching a computer science class, it wouldn't make sense to teach "both" algorithms — they are the same.

It seems that they were independent discoveries, but Pratt's paper came first. (It's not clear that Clarke's article was meant to be a "paper" in the peer-reviewed sense; maybe he was just distributing an article for use among his colleagues.)

The term *precedence climbing* is in somewhat common use. To avoid confusion,
I suggest dropping it altogether. Let's just call the algorithm *top-down
operator precedence parsing*, or *Pratt parsing* for short.

Tomorrow I will show a snippet of my Pratt implementation for bash arithmetic.

At some point, I should write an article on **recognition-based** vs.
**generative** or grammar-based methods for describing languages. This
dichotomy is taken from Bryan Ford's paper on parsing expression
grammars.

The TDOP algorithm is clearly in the recognition-based category, and Pratt made a similar distinction in his 1973 paper:

One may wonder why such an "obviously" utopian approach has not been generally adopted already. I suspect the root cause of this kind of oversight is our universal preoccupation with BNF grammars and their various offspring ...

I am personally enamored of automata theory per se, but

I am not impressed with the extent to which it has so far been successfully applied to the writing of compilers or interpreters. Nor do I see a particularly promising future in this direction. Rather, I see automata theory as holding back the development of ideas valuable to language design that are not visibly in the domain of automata theory.

I've researched how "real" languages implement their parsers, and I have to agree with Pratt. Languages are often described by grammars, but in general we use hand-written parsers rather than parsers derived automatically from grammars.

This gap is significant, and I will provide supporting experience and empirical data in future posts.

*11/4 update: I e-mailed Eli Bendersky and Theodore Norvell, and they both
agree with the claim in this article. The later precedence climbing algorithm
is a special case of the earlier Pratt parsing algorithm.*

Related: Pratt Parsing Index and Updates