A discussion of an operator precedence parsing technique that seems to only have a folklore existence. Some compiler writers seem to know about it, others not. I suspect it was once well-known but fell out of favour.

Operator precedence parsing is a technique commonly used to parse expressions in a computer programming language. It’s fairly easy to understand, easy to hand-write parsers for typical expression grammars, and is fairly quick. It’s very well known (for example, see [Brown] chapter 4.4). Almost certainly pioneered by Floyd: “Syntactic analysis and operator precedence,” Journal of the Association for Computing Machinery 10 (1963), 316-333 (continuing the theme of moaning about the ACM I should point out that I can’t get a copy).

Operator precedence comes down to a way of resolving the ambiguity in an expression like “A e B f C” . Should `e` bind more tightly, like this “(A e B) f C”, or should `f` bind more tightly, like this “A e (B f C)”?

Operator precedence works by assigning a *precedence* (usually a small integer) to each operator, like `+` or `*`. Consider the situation when we have read “A e B” and `f` is the next token. We have two choices: we can either *reduce* “A e B” (equivalent to making `e` bind more tightly than `f`); or, we can *shift* `f` (equivalent to making `f` bind more tightly). To decide, we compare the precedence of `e` and `f`; we *shift* if `f` has higher precedence than `e`.

(This idea of assigning a numerical precedence is a less general form of consulting a table to determine if `f` has higher precedence than `e`; I don’t discuss that here.)

Consider what we should do when `e` and `f` are the same, `a + b + c`

or `a :: b :: c`

for example. Should “A @ B @ C” be “(A @ B) @ C” or “A @ (B @ C)”? We assign an *associativity* with each operator which is either *left* or *right*. Left-associative operators group like “(A @ B) @ C”, right-associative operators group like “A @ (B @ C)”. A common way of implementing this in a operator precedence scheme is to give each operator *two* precedences, a left-precedence, `lp`, and a right-precedence, `rp`. When parsing “A e B f C” we compare `rp(e)`

with the `lp(f)`

. The usual rule is to *reduce* unless `rp(e) < lp(f)`

. An operator is *left-associative* when its left- and right- precedences are equal (or when `rp > lp`

, but that’s not so common), and is *right-associative* when the `rp < lp`

. By the way, it’s easy to get all these left and rights confused and most of them are only conventional.

Most operators are left-associative, and it doesn’t really matter for commutative ones like + and *. Common examples of a right-associative operator: Python’s (and Fortran’s) power operator, `**`, so that `x ** y ** z`

is `x ** (y ** z)`

. The alternative, `(x ** y) ** z`

, is boring because that’s mathematically equal to `x ** (y * z)`

; ML’s cons operator, `::`, is right-associative so that `x :: y :: xs`

yields a normal list (a list of whatever type `x` is, rather than a list of lists).

Lua implements a pretty typical example of operator precedence parsing; here’s the precedence table (called *priority* in Lua):

static const struct { lu_byte left; /* left priority for each binary operator */ lu_byte right; /* right priority */ } priority[] = { /* ORDER OPR */ {6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7}, /* `+' `-' `/' `%' */ {10, 9}, {5, 4}, /* power and concat (right associative) */ {3, 3}, {3, 3}, /* equality and inequality */ {3, 3}, {3, 3}, {3, 3}, {3, 3}, /* order */ {2, 2}, {1, 1} /* logical (and/or) */ };

Notice how all the left-associative operators (which is most of them) have equal left- and right-precedence, and the two right-associative operators have a raised left-precedence.

### A Standard Folklore Hack

Okay, so now the hack:

Define the left-precedence, `lp`, of your operators as you would normally; then define the right-precedence, `rp`, as `rp = lp&~1`

(that’s `lp` twiddles 1, clearing the bottom bit). Essentially this encodes the *associativity* in the bottom bit of a single precedence value. The way I’ve presented it here *odd* (bit-0 set) means *right-associative*; *even* (bit-0 clear) means *left-associative*.

In the implementation of Lua (above), we could double all the left priorities (subtracting one for the right-associative power and concat) and then not store the right-priorities.

I learned of this technique from a friend who mentioned it in passing in Cockcroft 4 as I was implementing a BASIC interpreter. I assumed at the time it was part of compiler writing wisdom, but since then I’ve failed to find a written reference for it.

Crockford doesn’t mention it in his article on parsing and it would allow him to unify his *infix* and *infixr* procedures.

As we see above, the Lua team don’t use it. It would lead to a small space saving (possibly of some use for a language that is often embedded).

I’m pretty sure it’s not in my Dragon, I had a quick look previously, but it’s not to hand right now (still in a packed box I think).

Does anyone know if this technique has a proper name and/or a good reference? It’s probably in one of those boxes full of Floyd stuff in a Stanford basement that Knuth mentions.

2007-12-19 at 16:40:36

ACM paper pubs are all online, for those (such as myself) who pay the sub.

I actually don’t think this is unreasonable. The sub isn’t much, and the various things which the ACM does are mostly worthwhile and can’t be done for free.

2007-12-19 at 16:49:10

And they’re mostly already payed for. I prefer a pay-to-publish, free-to-view model. But really that’s just an irrelevant whine.

2007-12-19 at 16:53:31

isn’t much: $99, I think, plus of course the ACM membership itself.And they’re all searchable.

A cursory search doesn’t turn up this hack, though.

2007-12-19 at 23:31:21

Mmmm, it is cute, easy to implememt, and saves space. I’ve never heard of it, but I am young – and when I get paid to write compilers I always use a parser generator :-)

I was going to say that it sounded completely unnecesary, would only make the code harder to understand, and saved only a tiny amount of space – but it has grown on me: I can imagine how you’d implement it in a readable way, and used sensibly would be easier to read than the example of the Lua parser.

It could be used by parser generators – IIRC, yacc operators are declared as either left or right associative, and in order of precedence. Presumably this trick could be used to generate a fairly compact precedence table.

I’m just wondering – can we think of any language with operators where the right-precedence is higher than the left-precedence, or where the two precedences differ by more than one?

I suspect that if there were such cases, the parser would probably be written using a full on shift-reduce table anyway. They don’t need to be too hard to figure out, and have the added benefit that you can incorporate syntax error reporting into the expression parser in a natural way.

2007-12-24 at 11:08:47

Another way to think about this. Start with a single precedence table. If f is left associative operators shift if prec(e)<prec(f) and if f is right associative shift if prec(e)<=prec(f).

So the hack becomes an optimization of (rassoc(f)?prec(e)<=prec(f):prec(e)<prec(f))

2008-07-13 at 09:46:05

oh, come on – precedence tables aren’t THAT big… I can imagine this may mattered in the seventhies, but probably not even then.