A discussion of an operator precedence parsing technique that seems to only have a folklore existence. Some compiler writers seem to know about it, others not. I suspect it was once well-known but fell out of favour.

Operator precedence parsing is a technique commonly used to parse expressions in a computer programming language. It’s fairly easy to understand, easy to hand-write parsers for typical expression grammars, and is fairly quick. It’s very well known (for example, see [Brown] chapter 4.4). Almost certainly pioneered by Floyd: “Syntactic analysis and operator precedence,” Journal of the Association for Computing Machinery 10 (1963), 316-333 (continuing the theme of moaning about the ACM I should point out that I can’t get a copy).

Operator precedence comes down to a way of resolving the ambiguity in an expression like “A e B f C” . Should `e` bind more tightly, like this “(A e B) f C”, or should `f` bind more tightly, like this “A e (B f C)”?

Operator precedence works by assigning a *precedence* (usually a small integer) to each operator, like `+` or `*`. Consider the situation when we have read “A e B” and `f` is the next token. We have two choices: we can either *reduce* “A e B” (equivalent to making `e` bind more tightly than `f`); or, we can *shift* `f` (equivalent to making `f` bind more tightly). To decide, we compare the precedence of `e` and `f`; we *shift* if `f` has higher precedence than `e`.

(This idea of assigning a numerical precedence is a less general form of consulting a table to determine if `f` has higher precedence than `e`; I don’t discuss that here.)

Consider what we should do when `e` and `f` are the same, `a + b + c`

or `a :: b :: c`

for example. Should “A @ B @ C” be “(A @ B) @ C” or “A @ (B @ C)”? We assign an *associativity* with each operator which is either *left* or *right*. Left-associative operators group like “(A @ B) @ C”, right-associative operators group like “A @ (B @ C)”. A common way of implementing this in a operator precedence scheme is to give each operator *two* precedences, a left-precedence, `lp`, and a right-precedence, `rp`. When parsing “A e B f C” we compare `rp(e)`

with the `lp(f)`

. The usual rule is to *reduce* unless `rp(e) < lp(f)`

. An operator is *left-associative* when its left- and right- precedences are equal (or when `rp > lp`

, but that’s not so common), and is *right-associative* when the `rp < lp`

. By the way, it’s easy to get all these left and rights confused and most of them are only conventional.

Most operators are left-associative, and it doesn’t really matter for commutative ones like + and *. Common examples of a right-associative operator: Python’s (and Fortran’s) power operator, `**`, so that `x ** y ** z`

is `x ** (y ** z)`

. The alternative, `(x ** y) ** z`

, is boring because that’s mathematically equal to `x ** (y * z)`

; ML’s cons operator, `::`, is right-associative so that `x :: y :: xs`

yields a normal list (a list of whatever type `x` is, rather than a list of lists).

Lua implements a pretty typical example of operator precedence parsing; here’s the precedence table (called *priority* in Lua):

static const struct {
lu_byte left; /* left priority for each binary operator */
lu_byte right; /* right priority */
} priority[] = { /* ORDER OPR */
{6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7}, /* `+' `-' `/' `%' */
{10, 9}, {5, 4}, /* power and concat (right associative) */
{3, 3}, {3, 3}, /* equality and inequality */
{3, 3}, {3, 3}, {3, 3}, {3, 3}, /* order */
{2, 2}, {1, 1} /* logical (and/or) */
};

Notice how all the left-associative operators (which is most of them) have equal left- and right-precedence, and the two right-associative operators have a raised left-precedence.

### A Standard Folklore Hack

Okay, so now the hack:

Define the left-precedence, `lp`, of your operators as you would normally; then define the right-precedence, `rp`, as `rp = lp&~1`

(that’s `lp` twiddles 1, clearing the bottom bit). Essentially this encodes the *associativity* in the bottom bit of a single precedence value. The way I’ve presented it here *odd* (bit-0 set) means *right-associative*; *even* (bit-0 clear) means *left-associative*.

In the implementation of Lua (above), we could double all the left priorities (subtracting one for the right-associative power and concat) and then not store the right-priorities.

I learned of this technique from a friend who mentioned it in passing in Cockcroft 4 as I was implementing a BASIC interpreter. I assumed at the time it was part of compiler writing wisdom, but since then I’ve failed to find a written reference for it.

Crockford doesn’t mention it in his article on parsing and it would allow him to unify his *infix* and *infixr* procedures.

As we see above, the Lua team don’t use it. It would lead to a small space saving (possibly of some use for a language that is often embedded).

I’m pretty sure it’s not in my Dragon, I had a quick look previously, but it’s not to hand right now (still in a packed box I think).

Does anyone know if this technique has a proper name and/or a good reference? It’s probably in one of those boxes full of Floyd stuff in a Stanford basement that Knuth mentions.