## Archive for December, 2007

### Filling in the CRB form

2007-12-28

The Criminal Records Bureau’s Disclosure Application Form seems to be a fairly typical form. It’s one of those forms where the onus is on the you to fill it in with every detail correct and if you get anything wrong then they Just Say No. There’s no advantage to them (the CRB) to say “yes”, so they may as well just say no at the slightest mistake or deviation. Better be careful then.

The wording in these forms often arouses the pedantic mathematician in me. And not always in a good way.

Field 20, Section C: “Surname at birth (if different)”. Clearly if I had a different surname at birth I am supposed to write my surname at birth in this field. What am I supposed to do if it was the same? Am I at liberty to write whatever I like in the field?

Fields 28 and 29 are the “Town/City” and “County/District” for my “Place of Birth”; “Please enter town/city names and county/district names in full as recorded on your Birth Certificate”, I am advised. Amusingly, it turns out that my Birth Certificate doesn’t record my place of birth. Nor does it record anything called a Town, City, or County.

The form is booby trapped. Section E, Addition information, asks for things like marital status, bank account details, and has one of those classic false dichotomies: “Cross ONE box only: Employed, Self Employed, Part-time Employed, Unemployed, Student, Other”. One could easily be Employed, Self Employed, Part-time Employed and a Student; that being the case, which box is one supposed to cross? This is all irrelevant however, Section E should not be filled in, as advised by “An Applicant’s Guide to Completing the CRB Application Form”. Nor should section F. Obviously. (The front of the form says “Please complete section A-H in BLOCK CAPITALS”).

Good job I’m a human and not a robot.

### The odd/even operator precedence hack

2007-12-19

A discussion of an operator precedence parsing technique that seems to only have a folklore existence. Some compiler writers seem to know about it, others not. I suspect it was once well-known but fell out of favour.

Operator precedence parsing is a technique commonly used to parse expressions in a computer programming language. It’s fairly easy to understand, easy to hand-write parsers for typical expression grammars, and is fairly quick. It’s very well known (for example, see [Brown] chapter 4.4). Almost certainly pioneered by Floyd: “Syntactic analysis and operator precedence,” Journal of the Association for Computing Machinery 10 (1963), 316-333 (continuing the theme of moaning about the ACM I should point out that I can’t get a copy).

Operator precedence comes down to a way of resolving the ambiguity in an expression like “A e B f C” . Should e bind more tightly, like this “(A e B) f C”, or should f bind more tightly, like this “A e (B f C)”?

Operator precedence works by assigning a precedence (usually a small integer) to each operator, like + or *. Consider the situation when we have read “A e B” and f is the next token. We have two choices: we can either reduce “A e B” (equivalent to making e bind more tightly than f); or, we can shift f (equivalent to making f bind more tightly). To decide, we compare the precedence of e and f; we shift if f has higher precedence than e.

(This idea of assigning a numerical precedence is a less general form of consulting a table to determine if f has higher precedence than e; I don’t discuss that here.)

Consider what we should do when e and f are the same, a + b + c or a :: b :: c for example. Should “A @ B @ C” be “(A @ B) @ C” or “A @ (B @ C)”? We assign an associativity with each operator which is either left or right. Left-associative operators group like “(A @ B) @ C”, right-associative operators group like “A @ (B @ C)”. A common way of implementing this in a operator precedence scheme is to give each operator two precedences, a left-precedence, lp, and a right-precedence, rp. When parsing “A e B f C” we compare rp(e) with the lp(f). The usual rule is to reduce unless rp(e) < lp(f). An operator is left-associative when its left- and right- precedences are equal (or when rp > lp, but that’s not so common), and is right-associative when the rp < lp. By the way, it’s easy to get all these left and rights confused and most of them are only conventional.

Most operators are left-associative, and it doesn’t really matter for commutative ones like + and *. Common examples of a right-associative operator: Python’s (and Fortran’s) power operator, **, so that x ** y ** z is x ** (y ** z). The alternative, (x ** y) ** z, is boring because that’s mathematically equal to x ** (y * z); ML’s cons operator, ::, is right-associative so that x :: y :: xs yields a normal list (a list of whatever type x is, rather than a list of lists).

Lua implements a pretty typical example of operator precedence parsing; here’s the precedence table (called priority in Lua):

static const struct {
lu_byte left;  /* left priority for each binary operator */
lu_byte right; /* right priority */
} priority[] = {  /* ORDER OPR */
{6, 6}, {6, 6}, {7, 7}, {7, 7}, {7, 7},  /* +' -' /' %' */
{10, 9}, {5, 4},                 /* power and concat (right associative) */
{3, 3}, {3, 3},                  /* equality and inequality */
{3, 3}, {3, 3}, {3, 3}, {3, 3},  /* order */
{2, 2}, {1, 1}                   /* logical (and/or) */
};


Notice how all the left-associative operators (which is most of them) have equal left- and right-precedence, and the two right-associative operators have a raised left-precedence.

### A Standard Folklore Hack

Okay, so now the hack:

Define the left-precedence, lp, of your operators as you would normally; then define the right-precedence, rp, as rp = lp&~1 (that’s lp twiddles 1, clearing the bottom bit). Essentially this encodes the associativity in the bottom bit of a single precedence value. The way I’ve presented it here odd (bit-0 set) means right-associative; even (bit-0 clear) means left-associative.

In the implementation of Lua (above), we could double all the left priorities (subtracting one for the right-associative power and concat) and then not store the right-priorities.

I learned of this technique from a friend who mentioned it in passing in Cockcroft 4 as I was implementing a BASIC interpreter. I assumed at the time it was part of compiler writing wisdom, but since then I’ve failed to find a written reference for it.

Crockford doesn’t mention it in his article on parsing and it would allow him to unify his infix and infixr procedures.

As we see above, the Lua team don’t use it. It would lead to a small space saving (possibly of some use for a language that is often embedded).

I’m pretty sure it’s not in my Dragon, I had a quick look previously, but it’s not to hand right now (still in a packed box I think).

Does anyone know if this technique has a proper name and/or a good reference? It’s probably in one of those boxes full of Floyd stuff in a Stanford basement that Knuth mentions.

### A little programming puzzle: /=

2007-12-18

/= is Common Lisp’s not-equals function (similar to the != operator from C). In Common Lisp /= is restricted to numbers and it is in fact an error to call it with any other type, but not a lot of people know that. It looks like an innocuous enough function, until one considers that like a lot of Lisp functions it can take any number of arguments:

(/= x1 x2 x3 ... xn)


The semantics of /= are to return True if and only if no two numbers are the same value. Note that (/= 1 2 1) is False; it is insufficient to merely consider adjacent pairs, if any pair have the same value then the function should return False.

Contrast this with the semantics of the function for equality, =, which returns True if and only if all numbers have the same value. This can be implemented by checking adjacent pairs, or comparing each argument against the first, a straightforward O(n) algorithm.

So the puzzle is, how do you implement /=? Given a list of numbers we want to return False if any pair have the same value, returning True otherwise. It’s most interesting if we consider the complexity and therefore the case when we have large numbers of arguments.

Common Lisp implements a wide range of number types: unbounded integers (bignums), unbounded rationals, finite precision floating-point types, and complex versions of all of those. /= compares numbers by value so the integer 0, the floating-point number 0d0, and the complex floating-point number #C(0d0 0d0) are all equal. Note that any real number can be considered as being equal in value to a complex number with a 0 imaginary part; also note that any floating-point number can be converted to a rational having exactly the same value (see my blog article on the subject). That means that for simplification we might want to convert all input numbers to the type (complex rational) before proceeding. Since that’s obviously a bounded time operation it doesn’t affect the complexity of whatever algorithm we come up with. That cuts down on the number of different types of pair-wise comparison operators that we might need.

The first, obviously naïve, approach would be to compare all possible pairs. That obviously works and is obviously O(n2).

If you like a puzzle, try improving upon this naïve algorithm before reading on.

The next approach I came up with is to sort the arguments first, and then do comparisons of all the adjacent pairs. Recall that the input numbers are complex and there’s no natural ordering. Doesn’t matter, we can just pick any old ordering, for example lexicographic ordering, where #C(a b) is less then #C(c d) if a < c, or a = c and b < d. This algorithm is O(n log(n)) for the sort (consult any algorithms book) and O(n) for the final comparison of adjacent pairs, so O(n log(n)) overall.

That seems a lot better. But there is another approach that’s even better, suggested by a good friend:

Create a hash-table and hash each argument into it, checking to see if an entry with that value already exists. If an entry already exists then that’s an equal pair, so return False; if we get to the end of the arguments then there’s no equal pair, so return True. This algorithm is O(n) unless you’re unlucky to run into some pathological hashing collision. So O(n) in all likelihood.

Of course the asymptotic behaviour of /= for large numbers of arguments is unlikely to form part of your decision to use one Lisp system or another, but it makes a nice puzzle. Unsurprisingly, the practice of using more than 2 arguments is very rare. Lisp test suites, that sort of thing. The best example I found is:

(not (/= samples 3 4))

which is true when samples is 3 or 4. This seems obscure to me, but it is more compact than alternatives; maybe it is idiomatic in some circles.

### Wii wait

2007-12-10

Several times I’ve recently heard people talk about the high street-price of the Nintendo Wii, and that this might be induced by an artificial scarcity.

Surely this is crazy? Nintendo have no reason to artificially restrict the supply of Wii. Remember, the factory gate price doesn’t change; Nintendo get exactly the same amount of money for each Wii sold, whether the retailer sells it at USD 300 or USD 600. So it’s clearly in Nintendo’s interest to sell as many Wii as possible. This is not a luxury Rolls-Royce, they do not make only 1000 and then sell each one for USD 10,000.

I can see that over the course of a year Nintendo might want to restrict supply at some times so they can supply demand at others; I’m sure that over the summer Nintendo have been restricting supplies of Wii so that they create a stockpile for the holiday season. But that season is upon us, surely Nintendo are pumping these little puppies out as fast as they can manage?

There are reasons why it’s bad for Nintendo to be unable to supply demand for Wii (whether deliberately or unintentionally): pushing up the street price means that you’ll have less money to spend on software, so less money overall going to Nintendo; less chance of getting that coveted “most popular toy this Christmas” spot.

So… why is it so hard to get hold of a Wii? There’s nothing inside the box that seems particular difficult to manufacturer (no high-density optical drives, nor high-volume solid state memory, nor touch-sensitive colour display); the Wii’s been out for a while now so yields ought to be good and the manufacturing process overall should be well understood and easily replicated. Why aren’t they just booting up factories all over Asia making Wii? It could be that they simply don’t have the purchasing power to buy the factory time. Maybe that funny fruit music toy company is buying it all up.

### Microsoft’s Wireless Keyboard Encryption

2007-12-07

I have read the recent whitepaper regarding the cracking of some of Microsoft’s keyboards. Here is my summary:

• 8-bit key;
• XOR
• Hohoho.