Why Sponsor Oils? | blog | oilshell.org

Precedence Climbing is Widely Used

2017-03-30

While creating an index of my Pratt parsing articles, I found one more thing to say.

The first article made a case that Pratt parsing and precedence climbing are the same algorithm. I suggested retiring the "precedence climbing" name to avoid confusion.

But after doing a survey of real code, I now think it makes sense to retain it. It's a special case of Pratt parsing, but a widely used special case.

As a reminder, both methods are top-down algorithms, in that parent nodes are created before child nodes. This is in contrast to the other popular method, the Shunting Yard Algorithm.

Here's how they are different:

Precedence Climbing Pratt Parsing
Data Structure A table of token ID to integer precedence Two tables of precedence, or two dynamically dispatched methods on token objects
Code Structure Single Recursive Function Mutually recursive functions, all of which mutate the current token. Just like a recursive descent parser.
Handling Parentheses Inline special case ( is a prefix operator with low left binding power
Handling unary/ternary/non-binary operators
(-x,
b?x:y, a[i], f(x,y))
Inline special case User-defined function, i.e. mutually recursive "plugins"

By this definition, there's a lot of production code that uses precedence climbing but not Pratt parsing. I'd like to do a detailed survey later, but for now I'll just state that these projects all use precedence climbing:

I determined this by scanning source code for a single table of integer precedences, and single recursive function with a precedence test.

In contrast, these front ends use neither Pratt parsing nor precedence climbing:

The full Pratt parsing method doesn't seem to be widely used. It's in in Crockford's JSLint (code), the Wren language by Bob Nystrom, and in the old and relatively obscure Maxima computer algebra system. (I learned this from a reader comment.) But I haven't seen it anywhere else.

Pratt Parsing is Nicer

Still, I prefer the Pratt style because it's more general, and because the code is cleaner. I hope that a future code survey post will show this, but no promises since I still have a blog backlog.

Leave a comment if you have a link to a production-quality arithmetic parser. What algorithm does it use?


Related: