Pratt Parsing and Precedence Climbing Are the Same Algorithm

2016-11-01

In this article I will clarify some confusing terminology in the world of parsing, which spans online resources and computer science publications from the 70's and 80's. This is based on my experience writing a parser for shell arithmetic, e.g.

$ echo $(( 1 + 2 * 3 ))
7

Table of Contents

Background

Contrary Evidence

Original Papers by Pratt and Others

Python Code

Conclusion

Further Thoughts: Generation vs. Recognition

Background

The first pass of the expression parser for oil was based on a patch I made to toybox implementing the Unix expr command.

I used Eli Bendersky's 2012 article on precedence climbing to write it. I cited his article in the patch, as well as Keith Clarke's 1986 paper The Top-Down Parsing of Expressions, which I found in a relevant Wikipedia article.

(For contrast, GNU expr uses a plain recursive descent parser, which works fine, but is more verbose and in theory less efficient.)

The expr command only supports binary operators, so we just need a single recursive function eval_expr() to handle all operators, as in Bendersky's article.

But shell arithmetic is borrowed from C and also has these constructs:

prefix operators: -3 and +x
the ternary operator: x > 2 ? y : 0
right associative operators: 2 ** 3 ** 4 is 2 ** (3 ** 4)
array indexing: a[0]

A single function is awkward when you have this more complex set of operators. I could see how to add them, but I decided first to revisit a technique I had worked with about eight years ago: Top Down Operator Precedence, described for a modern audience by Douglas Crockford.

So I implemented the richer arithmetic language with the TDOP algorithm, also known as Pratt Parsing. It has successfully parsed real code, along the lines of previous tests.

In doing so, I realized that Precedence climbing and Pratt parsing are the same algorithm.

Pratt parsing is perhaps a little more general, but the core algorithm is the same:

You can assign each operator token a precedence, or binding power in Pratt's terminology.
You have a recursive function that parses expressions, consuming tokens to the right, until it reaches an operator of precedence less than or equal to the previous operator -- or just less than if it's a right-associative operator.
In Pratt parsing, tokens can be used in the null and/or left position, based on whether they take an expression on the left or not (nud or led in Pratt's terminology). Examples of left operators are infix +, postfix ++, or the pseudo-infix a[0] (with operator [). Examples of null operators are unary minus -, and grouping parentheses (.

In contrast, precedence climbing is described with unary operators, binary operators, and parentheses handled "inline". But this is about source code organization, not an algorithmic issue.

This technique was invented at least twice, and is explained on the web as if it were two different algorithms. But it's really one algorithm -- if you were teaching a class, it wouldn't make sense to teach "both" algorithms.

Contrary Evidence

Let's look at the evidence that they're different algorithms:

Wikipedia has separate pages for Pratt Parser and Operator-Precedence Parser. The latter mentions precedence climbing as an alternative to Pratt parsing.
Bendersky's site has two different articles: TDOP Parsing and Parsing Expressions by Precedence Climbing.
The origin of Pratt Parsing is Vaughan Pratt's 1973 paper Top-Down Operator Precedence. The term "precedence climbing" apparently comes from this relatively modern article by Theodore Norvell, which, which cites Keith Clarke's 1986 paper The Top-Down Parsing of Expressions. At the end it says: I am grateful to Keith Clarke and Martin Richards for helping me trace the origins of what I've called precedence climbing.

If you google for it, the term "precedence climbing" is relatively common.

Original Papers by Pratt and Others

But after implementing slightly different expression languages with each of these techniques, it's clear that they are the same algorithm. Let's go back to the original papers to see this.

In Pratt's paper, he is talking about problems with the earlier Floyd method and the advantage of his technique:

One way to resolve the issue is simply to announce the outcome in advance for each pair A and B, basing the choices on some reasonable heuristics. Floyd [1963] suggested this approach, called operator precedence. The outcome was stored in a table.

...

We present an approach to language design which simultaneously solves both these problems, without unduly restricting normal usage, yet allows us to retain the numeric approach to operator precedence.

The idea is to assign data types to classes and then to totally order the classes.

Clarke's paper doesn't cite Vaughan's paper, but it cites an earlier publication by Richards related to the BCPL compiler, and says how it improves on that technique:

Richards[1] gives a procedure for parsing expressions which uses recursion rather than an explicit stack despite being derived using the operator precedence technique. The procedure is essentially identical to the one given here (performing the same number of tests for each operator symbol), but requires the construction of the operator precedence matrix. We show how the procedure can be derived by grammar- and program-transformation from the recursive descent method, using only numerical precedences and an indication of left- or right- associativity.

My reading of this is as follows:

Pratt (1973) is improving on the Floyd method (1963)
Clarke (1986) is improving on the Richards method (1979)
Pratt is assigning a total order to tokens with the binding power mechanism. Later in the paper you will see that this is done with two numbers -- left binding power lbp and right binding power rbp.
Clarke is assigning numerical precedences and associativity to operators.
They are both avoiding the construction of a partial order on operators, which can be represented as a matrix or table.

(Aside: In his recent JuliaCon keynote, Guy Steele mentioned that the Fortress language avoided a total order on operator precedence for usability reasons.)

The binding power and associativity mechanisms are equivalent. This is most easily seen by looking at Bendersky's sample code (which is excellent, and from which I heavily borrowed):

Python Code

From the precedence climbing article:

def compute_expr(tokenizer, min_prec):
    atom_lhs = compute_atom(tokenizer)

    while True:
        # ...
        cur = tokenizer.cur_token
        if (cur is None or cur.name != 'BINOP'
            or OPINFO_MAP[cur.value].prec < min_prec):  # TERMINATION CONDITION
            break

        next_min_prec = prec + 1 if assoc == 'LEFT' else prec

        # Consume the current token and prepare the next one for the recursive
        # call
        tokenizer.get_next_token()
        atom_rhs = compute_expr(tokenizer, next_min_prec)  # RECURSIVE CALL

Note the lines labeled TERMINATION CONDITION and RECURSIVE CALL.

From the TDOP/Pratt article:

def expression(rbp=0):
    global token
    t = token
    token = next()
    left = t.nud()
    while rbp < token.lbp:  # TERMINATION CONDITION
        t = token
        token = next()
        left = t.led(left)  # results in recursive call
    return left

How about right-associative operators? ... We can do that by calling expression in the handler of exponentiation with a rbp lower than the lbp of exponentiation:

class operator_pow_token(object):
    lbp = 30
    def led(self, left):
       return left ** expression(30 - 1)  # RECURSIVE CALL

It might take a little squinting, but the similarities are apparent. They are even more apparent after playing with and debugging both implementations.

There's a while loop making a recursive call, and it has a termination condition on precedence / binding power.
In precedence climbing, right associativity is implemented by making the recursive call with next_min_prec = prec + 1. In Pratt parsing, we set the right binding power rbp to be lbp-1, and make the recursive call with it. This is the same thing expressed in two slightly different ways.
In precedence climbing, grouping parentheses are handled "inline" in the recursive parsing function. In TDOP, they can be handled as a nud function for the ( token. This is a superficial difference that relates only to the structure of code, and not the algorithm.

These same differences in presentation are in Pratt's and Clarke's original papers, not just in Bendersky's articles, but I think the Python code is clearer for a modern audience than the pseudo-code and diagrams used in those papers.

Clarke's paper presents associativity like this:

E(if RightAssoc(oper) then oprec else oprec+1)

The Norvell article which coined the term precedence climbing presents it like this:

const q := case associativity(op)
           of Right: prec( op )
              Left:  1+prec( op )

Wikipedia presents it like this:

while lookahead is a binary operator whose precedence is greater
         than op's, or a right-associative operator
         whose precedence is equal to op's

These are all the same thing, and can be expressed with the rbp mechanism of Pratt parsers.

Pratt's paper doesn't seem to mention the right associativity trick, but I don't otherwise see the point of having two integers lbp and rbp, rather than a single integer. Crockford's 2007 presentation of Pratt's parsing technique does use the associativity trick.

In any case, Pratt's algorithm is at least as general as Clarke's.

Conclusion

Again, if you were teaching a computer science class, it wouldn't make sense to teach "both" algorithms — they are the same.

It seems that they were independent discoveries, but Pratt's paper came first. (It's not clear that Clarke's article was meant to be a "paper" in the peer-reviewed sense; maybe he was just distributing an article for use among his colleagues.)

The term precedence climbing is in somewhat common use. To avoid confusion, I suggest dropping it altogether. Let's just call the algorithm top-down operator precedence parsing, or Pratt parsing for short.

Further Thoughts: Generation vs. Recognition

Tomorrow I will show a snippet of my Pratt implementation for bash arithmetic.

At some point, I should write an article on recognition-based vs. generative or grammar-based methods for describing languages. This dichotomy is taken from Bryan Ford's paper on parsing expression grammars.

The TDOP algorithm is clearly in the recognition-based category, and Pratt made a similar distinction in his 1973 paper:

One may wonder why such an "obviously" utopian approach has not been generally adopted already. I suspect the root cause of this kind of oversight is our universal preoccupation with BNF grammars and their various offspring ...

I am personally enamored of automata theory per se, but I am not impressed with the extent to which it has so far been successfully applied to the writing of compilers or interpreters. Nor do I see a particularly promising future in this direction. Rather, I see automata theory as holding back the development of ideas valuable to language design that are not visibly in the domain of automata theory.

I've researched how "real" languages implement their parsers, and I have to agree with Pratt. Languages are often described by grammars, but in general we use hand-written parsers rather than parsers derived automatically from grammars.

This gap is significant, and I will provide supporting experience and empirical data in future posts.

11/4 update: I e-mailed Eli Bendersky and Theodore Norvell, and they both agree with the claim in this article. The later precedence climbing algorithm is a special case of the earlier Pratt parsing algorithm.