Why Sponsor Oils? | blog | oilshell.org
In this article I will clarify some confusing terminology in the world of parsing, which spans online resources and computer science publications from the 70's and 80's. This is based on my experience writing a parser for shell arithmetic, e.g.
$ echo $(( 1 + 2 * 3 )) 7
The first pass of the expression parser for oil was based on a
patch I made to toybox implementing the Unix expr
command.
I used Eli Bendersky's 2012 article on precedence climbing to write it. I cited his article in the patch, as well as Keith Clarke's 1986 paper The Top-Down Parsing of Expressions, which I found in a relevant Wikipedia article.
(For contrast, GNU expr
uses a plain recursive descent parser, which works
fine, but is more verbose and in theory less efficient.)
The expr
command only supports binary operators, so we just need a single
recursive function eval_expr()
to handle all operators, as in Bendersky's
article.
But shell arithmetic is borrowed from C and also has these constructs:
-3
and +x
x > 2 ? y : 0
2 ** 3 ** 4
is 2 ** (3 ** 4)
a[0]
A single function is awkward when you have this more complex set of operators. I could see how to add them, but I decided first to revisit a technique I had worked with about eight years ago: Top Down Operator Precedence, described for a modern audience by Douglas Crockford.
So I implemented the richer arithmetic language with the TDOP algorithm, also known as Pratt Parsing. It has successfully parsed real code, along the lines of previous tests.
In doing so, I realized that Precedence climbing and Pratt parsing are the same algorithm.
Pratt parsing is perhaps a little more general, but the core algorithm is the same:
You can assign each operator token a precedence, or binding power in Pratt's terminology.
You have a recursive function that parses expressions, consuming tokens to the right, until it reaches an operator of precedence less than or equal to the previous operator -- or just less than if it's a right-associative operator.
In Pratt parsing, tokens can be used in the null and/or left position,
based on whether they take an expression on the left or not (nud
or led
in Pratt's terminology). Examples of left operators are infix +
, postfix
++
, or the pseudo-infix a[0]
(with operator [
). Examples of null
operators are unary minus -
, and grouping parentheses (
.
In contrast, precedence climbing is described with unary operators, binary operators, and parentheses handled "inline". But this is about source code organization, not an algorithmic issue.
This technique was invented at least twice, and is explained on the web as if it were two different algorithms. But it's really one algorithm -- if you were teaching a class, it wouldn't make sense to teach "both" algorithms.
Let's look at the evidence that they're different algorithms:
Wikipedia has separate pages for Pratt Parser and Operator-Precedence Parser. The latter mentions precedence climbing as an alternative to Pratt parsing.
Bendersky's site has two different articles: TDOP Parsing and Parsing Expressions by Precedence Climbing.
The origin of Pratt Parsing is Vaughan Pratt's 1973 paper Top-Down Operator Precedence. The term "precedence climbing" apparently comes from this relatively modern article by Theodore Norvell, which, which cites Keith Clarke's 1986 paper The Top-Down Parsing of Expressions. At the end it says: I am grateful to Keith Clarke and Martin Richards for helping me trace the origins of what I've called precedence climbing.
If you google for it, the term "precedence climbing" is relatively common.
But after implementing slightly different expression languages with each of these techniques, it's clear that they are the same algorithm. Let's go back to the original papers to see this.
In Pratt's paper, he is talking about problems with the earlier Floyd method and the advantage of his technique:
One way to resolve the issue is simply to announce the outcome in advance for each pair A and B, basing the choices on some reasonable heuristics. Floyd [1963] suggested this approach, called operator precedence. The outcome was stored in a table.
...
We present an approach to language design which simultaneously solves both these problems, without unduly restricting normal usage, yet allows us to retain the numeric approach to operator precedence.
The idea is to assign data types to classes and then to totally order the classes.
Clarke's paper doesn't cite Vaughan's paper, but it cites an earlier publication by Richards related to the BCPL compiler, and says how it improves on that technique:
Richards[1] gives a procedure for parsing expressions which uses recursion rather than an explicit stack despite being derived using the operator precedence technique. The procedure is essentially identical to the one given here (performing the same number of tests for each operator symbol), but requires the construction of the operator precedence matrix. We show how the procedure can be derived by grammar- and program-transformation from the recursive descent method, using only numerical precedences and an indication of left- or right- associativity.
My reading of this is as follows:
lbp
and right binding power rbp
.(Aside: In his recent JuliaCon keynote, Guy Steele mentioned that the Fortress language avoided a total order on operator precedence for usability reasons.)
The binding power and associativity mechanisms are equivalent. This is most easily seen by looking at Bendersky's sample code (which is excellent, and from which I heavily borrowed):
From the precedence climbing article:
def compute_expr(tokenizer, min_prec):
atom_lhs = compute_atom(tokenizer)
while True:
# ...
cur = tokenizer.cur_token
if (cur is None or cur.name != 'BINOP'
or OPINFO_MAP[cur.value].prec < min_prec): # TERMINATION CONDITION
break
next_min_prec = prec + 1 if assoc == 'LEFT' else prec
# Consume the current token and prepare the next one for the recursive
# call
tokenizer.get_next_token()
atom_rhs = compute_expr(tokenizer, next_min_prec) # RECURSIVE CALL
Note the lines labeled TERMINATION CONDITION and RECURSIVE CALL.
From the TDOP/Pratt article:
def expression(rbp=0):
global token
t = token
token = next()
left = t.nud()
while rbp < token.lbp: # TERMINATION CONDITION
t = token
token = next()
left = t.led(left) # results in recursive call
return left
How about right-associative operators? ... We can do that by calling expression in the handler of exponentiation with a rbp lower than the lbp of exponentiation:
class operator_pow_token(object):
lbp = 30
def led(self, left):
return left ** expression(30 - 1) # RECURSIVE CALL
It might take a little squinting, but the similarities are apparent. They are even more apparent after playing with and debugging both implementations.
There's a while loop making a recursive call, and it has a termination condition on precedence / binding power.
In precedence climbing, right associativity is implemented by making the
recursive call with next_min_prec = prec + 1
. In Pratt parsing, we set the
right binding power rbp
to be lbp-1
, and make the recursive call with it.
This is the same thing expressed in two slightly different ways.
In precedence climbing, grouping parentheses are handled "inline" in the
recursive parsing function. In TDOP, they can be handled as a nud
function
for the (
token. This is a superficial difference that relates only to the
structure of code, and not the algorithm.
These same differences in presentation are in Pratt's and Clarke's original papers, not just in Bendersky's articles, but I think the Python code is clearer for a modern audience than the pseudo-code and diagrams used in those papers.
Clarke's paper presents associativity like this:
E(if RightAssoc(oper) then oprec else oprec+1)
The Norvell article which coined the term precedence climbing presents it like this:
const q := case associativity(op) of Right: prec( op ) Left: 1+prec( op )
Wikipedia presents it like this:
while lookahead is a binary operator whose precedence is greater than op's, or a right-associative operator whose precedence is equal to op's
These are all the same thing, and can be expressed with the rbp
mechanism of
Pratt parsers.
Pratt's paper doesn't seem to mention the right associativity trick, but I
don't otherwise see the point of having two integers lbp
and rbp
, rather
than a single integer. Crockford's 2007 presentation of
Pratt's parsing technique does use the associativity trick.
In any case, Pratt's algorithm is at least as general as Clarke's.
Again, if you were teaching a computer science class, it wouldn't make sense to teach "both" algorithms — they are the same.
It seems that they were independent discoveries, but Pratt's paper came first. (It's not clear that Clarke's article was meant to be a "paper" in the peer-reviewed sense; maybe he was just distributing an article for use among his colleagues.)
The term precedence climbing is in somewhat common use. To avoid confusion, I suggest dropping it altogether. Let's just call the algorithm top-down operator precedence parsing, or Pratt parsing for short.
Tomorrow I will show a snippet of my Pratt implementation for bash arithmetic.
At some point, I should write an article on recognition-based vs. generative or grammar-based methods for describing languages. This dichotomy is taken from Bryan Ford's paper on parsing expression grammars.
The TDOP algorithm is clearly in the recognition-based category, and Pratt made a similar distinction in his 1973 paper:
One may wonder why such an "obviously" utopian approach has not been generally adopted already. I suspect the root cause of this kind of oversight is our universal preoccupation with BNF grammars and their various offspring ...
I am personally enamored of automata theory per se, but I am not impressed with the extent to which it has so far been successfully applied to the writing of compilers or interpreters. Nor do I see a particularly promising future in this direction. Rather, I see automata theory as holding back the development of ideas valuable to language design that are not visibly in the domain of automata theory.
I've researched how "real" languages implement their parsers, and I have to agree with Pratt. Languages are often described by grammars, but in general we use hand-written parsers rather than parsers derived automatically from grammars.
This gap is significant, and I will provide supporting experience and empirical data in future posts.
11/4 update: I e-mailed Eli Bendersky and Theodore Norvell, and they both agree with the claim in this article. The later precedence climbing algorithm is a special case of the earlier Pratt parsing algorithm.
Related: Pratt Parsing Index and Updates