A discussion of an operator precedence parsing technique that seems to only have a folklore existence. Some compiler writers seem to know about it, others not. I suspect it was once well-known but fell out of favour.
Operator precedence parsing is a technique commonly used to parse expressions in a computer programming language. It’s fairly easy to understand, easy to hand-write parsers for typical expression grammars, and is fairly quick. It’s very well known (for example, see [Brown] chapter 4.4). Almost certainly pioneered by Floyd: “Syntactic analysis and operator precedence,” Journal of the Association for Computing Machinery 10 (1963), 316-333 (continuing the theme of moaning about the ACM I should point out that I can’t get a copy).
Operator precedence comes down to a way of resolving the ambiguity in an expression like “A e B f C” . Should e bind more tightly, like this “(A e B) f C”, or should f bind more tightly, like this “A e (B f C)”?
Operator precedence works by assigning a precedence (usually a small integer) to each operator, like + or *. Consider the situation when we have read “A e B” and f is the next token. We have two choices: we can either reduce “A e B” (equivalent to making e bind more tightly than f); or, we can shift f (equivalent to making f bind more tightly). To decide, we compare the precedence of e and f; we shift if f has higher precedence than e.
(This idea of assigning a numerical precedence is a less general form of consulting a table to determine if f has higher precedence than e; I don’t discuss that here.)
Consider what we should do when e and f are the same, a + b + c or a :: b :: c for example. Should “A @ B @ C” be “(A @ B) @ C” or “A @ (B @ C)”? We assign an associativity with each operator which is either left or right. Left-associative operators group like “(A @ B) @ C”, right-associative operators group like “A @ (B @ C)”. A common way of implementing this in a operator precedence scheme is to give each operator two precedences, a left-precedence, lp, and a right-precedence, rp. When parsing “A e B f C” we compare rp(e) with the lp(f). The usual rule is to reduce unless rp(e) < lp(f). An operator is left-associative when its left- and right- precedences are equal (or when rp > lp, but that’s not so common), and is right-associative when the rp < lp. By the way, it’s easy to get all these left and rights confused and most of them are only conventional.
Most operators are left-associative, and it doesn’t really matter for commutative ones like + and *. Common examples of a right-associative operator: Python’s (and Fortran’s) power operator, **, so that x ** y ** z is x ** (y ** z). The alternative, (x ** y) ** z, is boring because that’s mathematically equal to x ** (y * z); ML’s cons operator, ::, is right-associative so that x :: y :: xs yields a normal list (a list of whatever type x is, rather than a list of lists).
Lua implements a pretty typical example of operator precedence parsing; here’s the precedence table (called priority in Lua):
static const struct {
lu_byte left; /* left priority for each binary operator */
lu_byte right; /* right priority */
} priority[] = { /* ORDER OPR */
{6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7}, /* `+' `-' `/' `%' */
{10, 9}, {5, 4}, /* power and concat (right associative) */
{3, 3}, {3, 3}, /* equality and inequality */
{3, 3}, {3, 3}, {3, 3}, {3, 3}, /* order */
{2, 2}, {1, 1} /* logical (and/or) */
};
Notice how all the left-associative operators (which is most of them) have equal left- and right-precedence, and the two right-associative operators have a raised left-precedence.
A Standard Folklore Hack
Okay, so now the hack:
Define the left-precedence, lp, of your operators as you would normally; then define the right-precedence, rp, as rp = lp&~1 (that’s lp twiddles 1, clearing the bottom bit). Essentially this encodes the associativity in the bottom bit of a single precedence value. The way I’ve presented it here odd (bit-0 set) means right-associative; even (bit-0 clear) means left-associative.
In the implementation of Lua (above), we could double all the left priorities (subtracting one for the right-associative power and concat) and then not store the right-priorities.
I learned of this technique from a friend who mentioned it in passing in Cockcroft 4 as I was implementing a BASIC interpreter. I assumed at the time it was part of compiler writing wisdom, but since then I’ve failed to find a written reference for it.
Crockford doesn’t mention it in his article on parsing and it would allow him to unify his infix and infixr procedures.
As we see above, the Lua team don’t use it. It would lead to a small space saving (possibly of some use for a language that is often embedded).
I’m pretty sure it’s not in my Dragon, I had a quick look previously, but it’s not to hand right now (still in a packed box I think).
Does anyone know if this technique has a proper name and/or a good reference? It’s probably in one of those boxes full of Floyd stuff in a Stanford basement that Knuth mentions.