The odd/even operator precedence hack

2007-12-19

A discussion of an operator precedence parsing technique that seems to only have a folklore existence. Some compiler writers seem to know about it, others not. I suspect it was once well-known but fell out of favour.

Operator precedence parsing is a technique commonly used to parse expressions in a computer programming language. It’s fairly easy to understand, easy to hand-write parsers for typical expression grammars, and is fairly quick. It’s very well known (for example, see [Brown] chapter 4.4). Almost certainly pioneered by Floyd: “Syntactic analysis and operator precedence,” Journal of the Association for Computing Machinery 10 (1963), 316-333 (continuing the theme of moaning about the ACM I should point out that I can’t get a copy).

Operator precedence comes down to a way of resolving the ambiguity in an expression like “A e B f C” . Should e bind more tightly, like this “(A e B) f C”, or should f bind more tightly, like this “A e (B f C)”?

Operator precedence works by assigning a precedence (usually a small integer) to each operator, like + or *. Consider the situation when we have read “A e B” and f is the next token. We have two choices: we can either reduce “A e B” (equivalent to making e bind more tightly than f); or, we can shift f (equivalent to making f bind more tightly). To decide, we compare the precedence of e and f; we shift if f has higher precedence than e.

(This idea of assigning a numerical precedence is a less general form of consulting a table to determine if f has higher precedence than e; I don’t discuss that here.)

Consider what we should do when e and f are the same, a + b + c or a :: b :: c for example. Should “A @ B @ C” be “(A @ B) @ C” or “A @ (B @ C)”? We assign an associativity with each operator which is either left or right. Left-associative operators group like “(A @ B) @ C”, right-associative operators group like “A @ (B @ C)”. A common way of implementing this in a operator precedence scheme is to give each operator two precedences, a left-precedence, lp, and a right-precedence, rp. When parsing “A e B f C” we compare rp(e) with the lp(f). The usual rule is to reduce unless rp(e) < lp(f). An operator is left-associative when its left- and right- precedences are equal (or when rp > lp, but that’s not so common), and is right-associative when the rp < lp. By the way, it’s easy to get all these left and rights confused and most of them are only conventional.

Most operators are left-associative, and it doesn’t really matter for commutative ones like + and *. Common examples of a right-associative operator: Python’s (and Fortran’s) power operator, **, so that x ** y ** z is x ** (y ** z). The alternative, (x ** y) ** z, is boring because that’s mathematically equal to x ** (y * z); ML’s cons operator, ::, is right-associative so that x :: y :: xs yields a normal list (a list of whatever type x is, rather than a list of lists).

Lua implements a pretty typical example of operator precedence parsing; here’s the precedence table (called priority in Lua):

static const struct {
  lu_byte left;  /* left priority for each binary operator */
  lu_byte right; /* right priority */
} priority[] = {  /* ORDER OPR */
   {6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7},  /* `+' `-' `/' `%' */
   {10, 9}, {5, 4},                 /* power and concat (right associative) */
   {3, 3}, {3, 3},                  /* equality and inequality */
   {3, 3}, {3, 3}, {3, 3}, {3, 3},  /* order */
   {2, 2}, {1, 1}                   /* logical (and/or) */
};

Notice how all the left-associative operators (which is most of them) have equal left- and right-precedence, and the two right-associative operators have a raised left-precedence.

A Standard Folklore Hack

Okay, so now the hack:

Define the left-precedence, lp, of your operators as you would normally; then define the right-precedence, rp, as rp = lp&~1 (that’s lp twiddles 1, clearing the bottom bit). Essentially this encodes the associativity in the bottom bit of a single precedence value. The way I’ve presented it here odd (bit-0 set) means right-associative; even (bit-0 clear) means left-associative.

In the implementation of Lua (above), we could double all the left priorities (subtracting one for the right-associative power and concat) and then not store the right-priorities.

I learned of this technique from a friend who mentioned it in passing in Cockcroft 4 as I was implementing a BASIC interpreter. I assumed at the time it was part of compiler writing wisdom, but since then I’ve failed to find a written reference for it.

Crockford doesn’t mention it in his article on parsing and it would allow him to unify his infix and infixr procedures.

As we see above, the Lua team don’t use it. It would lead to a small space saving (possibly of some use for a language that is often embedded).

I’m pretty sure it’s not in my Dragon, I had a quick look previously, but it’s not to hand right now (still in a packed box I think).

Does anyone know if this technique has a proper name and/or a good reference? It’s probably in one of those boxes full of Floyd stuff in a Stanford basement that Knuth mentions.

About these ads

6 Responses to “The odd/even operator precedence hack”

  1. Nick Barnes Says:

    ACM paper pubs are all online, for those (such as myself) who pay the sub.
    I actually don’t think this is unreasonable. The sub isn’t much, and the various things which the ACM does are mostly worthwhile and can’t be done for free.

  2. drj11 Says:

    And they’re mostly already payed for. I prefer a pay-to-publish, free-to-view model. But really that’s just an irrelevant whine.

  3. Nick Barnes Says:

    isn’t much: $99, I think, plus of course the ACM membership itself.
    And they’re all searchable.
    A cursory search doesn’t turn up this hack, though.

  4. Matthew Bentham Says:

    Mmmm, it is cute, easy to implememt, and saves space. I’ve never heard of it, but I am young – and when I get paid to write compilers I always use a parser generator :-)

    I was going to say that it sounded completely unnecesary, would only make the code harder to understand, and saved only a tiny amount of space – but it has grown on me: I can imagine how you’d implement it in a readable way, and used sensibly would be easier to read than the example of the Lua parser.

    It could be used by parser generators – IIRC, yacc operators are declared as either left or right associative, and in order of precedence. Presumably this trick could be used to generate a fairly compact precedence table.

    I’m just wondering – can we think of any language with operators where the right-precedence is higher than the left-precedence, or where the two precedences differ by more than one?

    I suspect that if there were such cases, the parser would probably be written using a full on shift-reduce table anyway. They don’t need to be too hard to figure out, and have the added benefit that you can incorporate syntax error reporting into the expression parser in a natural way.

  5. Friend Says:

    Another way to think about this. Start with a single precedence table. If f is left associative operators shift if prec(e)<prec(f) and if f is right associative shift if prec(e)<=prec(f).

    So the hack becomes an optimization of (rassoc(f)?prec(e)<=prec(f):prec(e)<prec(f))

  6. Mehman Says:

    oh, come on – precedence tables aren’t THAT big… I can imagine this may mattered in the seventhies, but probably not even then.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: