Archive for May, 2007

Slicing Soreen


I think Soreen is easier to slice and butter after it’s been frozen and defrosted. Slightly.

I tend to only buy Soreen when there’s a 2-for-1 offer on, so one unit gets eaten, the other gets frozen.

Iverson’s Convention, or what is the value of x < y?


Iverson’s Convention is, roughly speaking, the idea that a boolean result can be encoded using the first two natural numbers: 0 for false, 1 for true. More precisely that for a relationship R, aRb is 1 when a stands in the relation R to b, 0 otherwise. It’s 1 when {a,b} ∈ R, if you like to think of R a subset of the cartesian product.

Knuth promotes Iverson’s convention in this paper. Really Knuth is promoting a specific notation [aRb] meaning a function that is 1 when aRb is true and 0 otherwise. In the field of computer science I’m not really so concerned about the notation, more about the name. The name Iverson’s Convention doesn’t seem to be particularly well known, but I think it gives a useful name to distinguish different computer languages.

Computer programming languages roughly divide into those languages that use Iverson’s Convention (IC languages) and those that do not. C famously does, whereas Java does not. In C the result of «x == y» is either 0 or 1 and is of type int; in Java the result of «x == y» is either true or false and is of type boolean.

Surveying a random smattering of languages (by consulting my bookcase) with regards to the question of what a typical relational expression, like «x < y», can evaluate to I find 3 rough categories: IC languages; those that have some sort of more or less well defined boolean type; and languages that do something else.

Iverson’s Convention languages: C, awk, J, Python, Sinclair BASIC, Inform 6, perl

Boolean type languages: Java, JavaScript, PostScript, Logo, Dylan

Something else: Lisp, FORTH, Icon, BCPL, /bin/sh.

[In the first version of this article, I placed Python in the non-IC camp. That was wrong and I have learned.]
[On 2007-08-02 I added Javascript to the discussion.]

BCPL surprised me by not being an IC language. BCPL has only one type, all values are an integer of some fixed bit-length. Iverson’s Convention would be a natural match for the language (and was adopted for one of BCPL’s descendents, C), but in BCPL true and false are implementation defined values (see [BCPL1980] page 13). FORTH is semi-IC; according to my [FORTH1983] false is 0 and true is non-zero. I expect there was a fair amount of variation across different FORTH implementations. Sinclair BASIC (on which I cut my teeth) is IC, but I remember there being considerable variation amongst different BASIC dialects.

Icon takes a completely different tack. In Icon all the usual relationship operators are control flow constructs. So «x < y» evaluates to y when the relationship holds, but when the relationship does not hold the expression is said to fail and produces no result. Constructs like if are sensitive to whether the control expression succeeded with any value or failed.

I have to say I’m surprised that their aren’t more languages in the IC camp along with C, J, and Python. Inform is esoteric but I included it because I happened to have [DM4] to hand. There are probably lots of similarly esoteric languages that I don’t know about, lazily copying the IC feature of C. Perhaps C++ is like that.

Python has an interesting take on the issue (which I got wrong in the first version of the article, but discussion below, particular from Paul and Gareth, has set me straight). In Python 2.2 expressions like «x < y» evaluate to 0 or 1 (integers). In Python 2.3 such expressions evaluate to a bool, True or False, but bool is a numeric type containing the integers 0 (spelt False) and 1 (spelt True). False and True are distinct objects from 0 and 1 but have values 0 and 1 when used as numbers. When converted to strings they convert as ‘False’ and ‘True’. It’s as if Python is trying to gradually and carefully rid itself of Iverson’s Convention.

PostScript can be thought of as a modernised FORTH, so perhaps it is an example of a language evolving from IC to non-IC (well, from nearly IC in the case of FORTH). Some languages designed since the rise of C—Java, most notably—have adopted a boolean type rather than IC. Perhaps there is a modern trend away from IC.

/bin/sh is amusing because it implements a sort of inverted Iverson’s Convention, IIC. true is 0 and false is 1, in the sense of the return codes used by programs likely to be used in if and while and friends (programs such as grep and test).

Common Lisp does have a boolean type (with values T and NIL), but generally the functions that return a boolean result return a generalized boolean where NIL is returned for false, and any other value for true. That’s why it’s in the “other” category. Another thing that sets Common Lisp slightly apart from the “boolean type” languages is that Lisp’s false value, NIL, also has another rôle for being the empty list.

Naturally, J is an IC language.

What does if accept?

That’s a snapshot of Iverson’s Convention from a viewpoint of value production. So what about the behaviour of if when its condition is some value other than the canonical true/false values (whether that be 0 or 1 in the case of IC languages, or true, false for most of the others)? Mostly the languages are quite tolerant, following the principle of being conservative in what you produce and liberal in what you accept.

All the IC languages accept any integer in a condition and treat 0 as false and non-zero as true (generalised IC). C accepts any scalar and treats the condition «x» as being the same as «x != 0». Thus in C, the float 0.0f and the null pointer also act like false, and other values act like true. awk extended conditions to strings. In awk the empty string is treated like false and any other string is treated like true (see Appendix A for an awk obscurity). perl extends false privileges to the string ‘0’, the empty list, and the undefined value.

The boolean type languages are variously liberal. They have particular canonical values for true and false. Sometimes that’s encoded in a particular type (boolean in Java, Lua), sometimes not. Java follows a strict model, only expressions that are statically of type boolean are permitted as conditions, and its boolean type contains only true and false; this is true to Java’s static heart but a spurning of its C heritage that still surprises some C to Java converts. The other languages are more dynamic and more liberal. Generally there’s a finite set of value that are treated like false (False, NIL, nil, 0) and the rest are treated like true.

Lua used to not have a boolean type; the boolean type was introduced in version 5.0; prior to that, Lua used nil (note, not the empty list) and any value different from nil as its false and true. So for historical reasons nil still acts like false when used in a condition. The caretakers of Lua think that they “should have introduced booleans from the start”, see [LUA2007] section 7.

Python follows the scripting language tradition (exemplified by awk and perl) of accepting the empty string for false and treating other strings as true; it also follows the Lisp traditional of treating the empty list (and the 0-tuple) as false.

JavaScript also follows the scripting language tradition of accepting the empty string as false, as well as null and undefined. 0 length arrays are not special (unlike Python and perl), they are true just like any other language. JavaScript is one of the few languages to incorporate IEEE 754 floating point numbers as standard; the NaNs in the floating point type are also accepted as false (as well as both zeroes: -0, +0).

So in terms of what is acceptable as a boolean there are more conventions than for production. There’s generalised IC (C, awk, Python, perl, FORTH), boolean type (Lisp, Python, Lua, Java, JavaScript), empty string (awk, Python, perl, JavaScript), empty list (Lisp, Python, perl). Python and Lua both have the convention of using a value not otherwise related to other values as being equivalent to false: None in Python, and nil in Lua (this is also similar to undefined and null in JavaScript, I’m just not sure it’s similar enough). perl has a convention not shared by any other language (that I looked at): it accepts the string ‘0’ for false.

shaurz on reddit points out that in Python objects can pass themselves off as False by implementing __nonzero__. Cool. It seems to be the only common language that makes true/false an extensible protocol.

Java is the only language that accepts exactly the same values as it produces.

There are many major languages left out, particularly given the ridiculous obscurity of some of the ones I have included, perhaps I will add more at some point (I did: on 2007-08-02 I added JavaScript). Scheme and ML deserve a showing because they Do The Right Thing; Prolog deserves a mention because, like Icon, true/false is matter of control flow, not computing values.

Appendix A – when a non-empty string in awk is true

awk is a little bit tricky because a value might have both a string part and a numeric part; when used in a condition the numeric part has priority. Where this gets hairy is for strings that would evaluate to 0 if used in a numeric context.

Consider the following two programs and their output:

$ awk 'BEGIN{a[1]="0";print a[1];if(a[1])print"hello"}'
$ awk 'BEGIN{split("0",a);print a[1];if(a[1])print"hello"}'

The first program prints a 0 followed by “hello”, the string a[1] is not empty so “hello” is printed. The second program prints just a 0. a[1] still has the same string value but this time it also has a numeric value; the numeric value of a[1] is tested by the if, found to false, so “hello” is not printed.


[BCPL1980] “BCPL the language and its compiler”; Martin Richards, Colin Whitby-Strevens; CUP; 1980.
[DM4] “The Inform Designer’s Manual”; Graham Nelson;; 2001.
[FORTH1983] “FORTH on the BBC Microcomputer”; Richard de Grandis–Harrison; Acornsoft; 1983.
[LUA2007] “The Evolution of Lua”; Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar Celes; HOPL III; 2007

Image processing in J


Some of my rough notes as I discover how I might go about doing some image processing in J.

This article mostly assumes you’ve been following my Learning J series: Part I, Part II, Part III, and Part IV.

Change working directory (cd) with «1!:44 ‘/Users/drj/project/fingerprint’». «!:» is the foreign conjunction.

Read file (as string): «t =. 1!:1 < ‘finger.pgm’» or a subset with «1!:11 ‘finger.pgm’;0 20».

The PGM file format is grayscale image format. It’s elegant simplicitly makes it ideal for prototyping image algorithms. It consists of a text header describing the width, height, and maxval (the value that represents maximum brightness, very commonly 255) followed by a big string of bytes (no compression, just the data in row major order).

Our first goal is to use J’s file mapping facilities which allow us to treat an array of bytes stored in a file as an array. For that we’ll need to find out the width, height, and maxval of the PGM file.

Regular expressions

require 'regex'
'(\S*\s+){3}(\S*\s)' rxmatch t

substrings can be extracted using the match results:

   ('P5\s+(\d*)\s+(\d*)\s+(\d*)\s' rxmatch t) rxfrom t
|P5 3301 3397 255 |3301|3397|255|

Observe that this duplicates t. That makes it candidate for a fork. Observe that the above expression has the form: «(x f y) g y». Recall that a fork «x (f g h) y» is the same as «(x f y) g (x h y)», so we can use a fork «’P5\s+(\d*)\s+(\d*)\s+(\d*)\s’ (rxmatch rxfrom h) t» if we can find a suitable h. We need a dyadic h such that «x h y» is y. Happily, «]» is such an h:

'P5\s+(\d*)\s+(\d*)\s+(\d*)\s' (rxmatch rxfrom ]) t

We can use «}.» to drop the first match (which describes the entire match, which I’m not really interested in):

'P5\s+(\d*)\s+(\d*)\s+(\d*)\s' (([: }. rxmatch) rxfrom ]) t

Notice the use of the cut «[:» and the extra set of parens.

Really we want to make a new monadic verb to extract the width, height, and maxval from a pgm file:

pgmxym =: ???

So we should bind the header regular expression onto rxmatch using the currying conjunction «&»:

pgmheader =: 'P5\s+(\d*)\s+(\d*)\s+(\d*)\s'
pgmxym =: ([: }. pgmheader & rxmatch) rxfrom ]

By placing the inner trident on the right we can eliminate a set of parens. We can do that by using the «~» (twiddles – has a terrible rendering on my browser/CSS) adverb (called passive in J) to swap the operands to rxfrom:

pgmxym =: ] rxfrom~ [: }. pgmheader&rxmatch

and now it’s this way round we can drop the initial «]» to give us a bident on the left. Observe that the monadic trident «[ g h» is the same as the monadic bident «g h». Note that if we wanted a bident anywhere else we’d have to use parentheses. So we now have:

pgmxym =: rxfrom~ [: }. pgmheader&rxmatch

I’m starting to feel like a proper J coder now.

Now we need to convert a boxed list of strings into 3 numbers.

«x “. y» converts y to a number, defaulting to x when that’s not possible. But before I discovered that I went on a merry excursion to implement equivalent functionality myself. Since the excursion does include some useful learning things about J it turned out not to be completely pointless, but is now relegated to an appendix.

In my excursion I did learn that «|.» is reverse, and dyadic «x i. y» is the index of y within x. And I rely on one of those facts shortly.

atoi =: 0&".    NB. much better than my version.

Mapped Files

require 'jmf' NB. J Mapped Files presumably
(JCHAR;width;headersize) map_jmf_ 'arrayname';datafn

I need to supply width, which is the width in bytes of each row, and headersize which is the initial amount of file to skip before the array proper begins. Now I need all the results from rxmatch; I need the total length of the header (which is essentially the first item of the rxmatch results) and I need the strings for the remaining 3 matches to be converted into numbers. I think that’s currently beyond my powers to do in a tacit definition. Explicit it is then:

pgmlxym =: verb define
m =. pgmheader&rxmatch y  NB. get matches
l =. 1{0{m                NB. Length of header
l , atoi @ > }. m rxfrom t

   'l w h m' =: pgmlxym t    NB. sets 4 variables

   (JCHAR;w;l) map_jmf_ 'fingerc';<'finger.pgm'

Cool. Now I have my image file as the array fingerc in J. There are two problems. The first is that it is an array of characters, not numbers; the second is that it’s huge and easy to accidentally print out as a J value (and being essentially binary data not edifying to print anyway). The first problem I can fix with: «finger =: a. i. fingerc» («a.» is alphabet, the list of characters in numerical order).

Simple operations can be performed. White–Black inversion: «f =. 255 – finger».

Now I have to figure out how to write a file out:

   fd =. 1!:21 'foo' NB. open file called foo.
   nf =. 0&":  NB. convert number to string in Natural Format.
   nf 127
   nf h
   $nf h
   ('P5 ', (nf w), ' ', (nf h), ' 255 ') 1!:2 fd  NB. , forms list.
   g=. f { a.  NB. convert from character to number array.
   g 1!:3"0 1 0 fd  NB. append each row to file.
   1!:22 fd  NB. close file.

We can use some J’s provided facilities to produce simple plots of brightness.

nubcount ,/ finger

nubcount returns an N by 2 array where each item is entry, count. Where entry is one of the items of the operand to nubcount and count is the number of times it appeared. The list of unique items in a list is called its nub. The result array is an array of boxes because the counts are always numbers but the entries could be any type.

Do a nubcount and gather the nub into x and the corresponding counts into y:

x =. > 0{"1 count [ v =. > 1{"1 count [ count =. nubcount ,/ finger

Note the use of dyadic «[» which is really just being used to sequence 3 things on one line. The sequencing, of course, runs from right-to-left and this is a source of slight annoyance for me. I can sequence two things on seperate lines but when I want to put them on one line I can’t simply join the lines together with a «[» in between them. I have to swap the two lines round first.

It turns out that x does not have all the numbers from 0 to 255 (in other words its nub is not the same as «i. 256». This is because some values don’t appear in the input picture.

   x i. i. 256

gives us a list in order from 0 to 255 of where that entry appears in x or «#x» when it doesn’t appear. We can now get counts for all values from 0 to 255, by using this list to index v. Extending v with a single 0 means we correctly get a 0 count for values that don’t appear:

   (x i. i. 256) { v,0

and we can display this using plot:

plot (x i. i. 256) { v,0  NB. requires «require'plot'»
pd 'save bmp'  NB. saves to file ~/j601/temp/plot.bmp

Plot of brightness values and their counts.  Suspiciously wiggly.

Now in my case the plot is suspiciously wiggly. For a range of values the plot swings up and down wildly, as if the scanner (the original source) has some sort of quantisation artefact. For example, as it might appear if the bottom two bits were mysteriously returned as 0 1 most of the time.

I can recover some accuracy in the bottom 2 bits by rescaling the image. Since this requires me to actually remember some things from my degree I’ll do that later.

Appendix A – String to Number

Convert string to number:

   atoi =: [:(+10&*)/ [:|.'0123456789'&i.
   atoi '3155'
   1+ atoi '3155'  NB. add one to prove that it's a number

New things: «|.» is reverse, and dyadic «x i. y» is the index of y within x.

atoi uses cap, «[:», twice for the same reason: the trident «([: g h) y» is equivalent to «g h y» (we can’t use a bident because «(g h) y» is not «g h y», it’s «g y h y»). This is simple composition and can also be achieved with the «@» conjunction; «u@v y» is «u v y». Unfortunately conjunctions associate left-to-right (in so far as this is a reasonable way to think about it), so that «u @ v @ w» is «(u@v)@w». That means many more brackets in the form that uses «@»:


The ability of cap to reduce parens is mentioned in its documentation.

So now we can glue everything together.

   pgmxym t

That’s a vector of boxes, each box containing a character array (string). Boxes are one of the atomic types that can appear as the 0-cells of an array. They’re just like pointers. They’re useful in this case because the answers, which are strings, are different lengths. We can’t have an array with rows of different length, so we can’t form an array directly from the strings ‘3301’, ‘3397’, ‘255’, because they’re not all the same length. So they’re boxed.

We can unbox with «>»:

   > pgmxym t

and then convert to a number:

   atoi > pgmxym t
332 335 95 180

Oh dear. We can see by comparing with the previous output that something has gone wrong with the rank of atoi. We could fix the rank with: «atoi”1 > pgmxyd t» but this still gives the wrong answers: 3301 3397 2560. The last result is wrong because atoi wasn’t applied to the string ‘256’ it was applied to the string ‘256 ‘ because when «>» produced its results they were an array so all the rows had to be the same length. Evidently atoi doesn’t work when there are non-digits in the string. The solution I chose was to compose atoi with «>»:

   atoi @ > pgmxyd t
3301 3397 255


Norvig’s literal instance hack for Java


Because not enough Java programmers know about Norvig’s cute trick.

Consider java.awt.Polygon. It’s a nice enough class, but its constructors make it awkward to use directly in a call to java.awt.Graphics.fillPolygon.

Norvig points out that you can create an anonymous class and write an initialiser for it. Like this: «new Polygon() {{addPoint(0,0); addPoint(100,50); addPoint(100,-50);}}». That’s an expression. That means that if we’re drawing on a java.awt.Graphics object g we can go «g.fillPolygon(new Polygon() {{addPoint(0,0); addPoint(100,50); addPoint(100,-50);}});». It saves us from having to declare a very temporary Polygon variable.

Norvig’s example is even more compelling. You can create anonymous literal Vectors just as easily as literal arrays: «new Vector(3) {{add(“one”); add(“two”); add(“three”)}}».

Of course the double curly braces makes it look a bit scary, probably even to experienced Java programmers. But I think I like that.

Don’t even think about doing this in JME. The extra inner class costs at least 400 bytes.

Python: how is sys.stdout.encoding chosen?


Or what to do if your Python programs complain they can’t output a string because of encoding problems.

Like this:

>>> print u'\\N{left-pointing double angle quotation mark}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\\xab' in position 0: ordinal not in range(128)

It can’t perform the output because sys.stdout.encoding is ‘ascii’ and the ascii encoding can’t encoding that weird unicode character. I think I approve of this strict approach. It encourages explicitness and The Right Thing.

Okay. But in this instance I know what I’m doing and I want stdout to be treated as if it uses the UTF-8 encoding.

It turns out that Python picks the value for sys.stdout.encoding based on the value of the environment variable LC_CTYPE, so the following works:

$ LC_CTYPE=en_GB.utf-8 python
>>> print u'\\N{left-pointing double angle quotation mark}'

It even prints out the right character.


Awesome Keyboard Power: «»


I just discovered that on my MacBook I can get left- and right-pointing double angle quotation mark by pressing Alt-\ and Shift-Alt-\. Yay! Them’s the Unicode names for what my copy of ECMA-94 calls «LEFT ANGLE QUOTATION MARK» and «RIGHT ANGLE QUOTATION MARK».

The only pair of balanced quotation marks in ISO 8859-1. Accept no substitute.

Up till now I’ve been using Character Palette to input them (they’re on my favourite characters) which is tedious. I can’t help feeling that I would’ve discovered the keyboard shortcut earlier if Character Pallete had said “Alt-\” right next to where it pointlessly says «UTF8: C2 AB».

Learning J – Part IV


Please see Learning J Part I, Part II, and Part III.

Strings of verbs that are not parseable with the usual dyadic–monadic machinery are called trains. (%+/) is a train of two verbs; the first verb is % (division), the second is +/ (the verb + modified by the adverb / to produce the sum-list verb).

A train of two verbs f g is called a hook (or bident, but that isn’t used as much). When used monadically (f g) x is the same as x f g x. Notice the repetition of the x operand.

   (+-)7  NB. 7 - 7 = 0
   (*-)2  NB. 2 * -2
   (**:)3 NB. * is times and *: is square

So (%+/) n is the same as n % +/ n which if n is a list then this rescales the list so that the sum of the result is 1:

   i. 5
0 1 2 3 4
0 0.1 0.2 0.3 0.4

Dyadic hooks are boring, x (f g) y is the same as x f g y.

A 3-element train, f g h, is called a fork (or, again less often, trident). The monadic case (f g h) y is the same as (f y) g (h y), and the dyadic case x (f g h) y is the same as (x f y) g (x h y).

This is all tolerably well explained in Appendix F of the J dictionary.

The mnemonic that I use for remembering the difference between monadic and dyadic tridents is that the monadic case is just the same as the dyadic case but with x removed. Dyadic: (x f y) g (x h y); Monadic: (f y) g (h y).

Recall the parsing rules of Appendix E, and observe that larger sequences of verbs get decomposed into hooks and forks. b c d e is b (c d e); that is, the hook of b and the fork c d e. a b c d e is a fork of a fork: a b (c d e).

The classic pedagogical fork computes the average of a list:

   avg =: +/ % #
   avg 6 7 8
   avg 1 2 4 8 16
   avg 2 ^ i. 5

Observe that the repetition of operands is useful and can avoid temporaries. i.11 */ i.11 produces a multiplication table. We can use the fact that a bident duplicates its operand: (*/ g) i. 11 will do if we can find a monadic g that does nothing. Both [ and ] are monadic identities. So we can get the same multiplication table with (*/]) i. 11. As it happens this simple duplication is so useful that there’s an adverb, ~ (twiddles), to do it. u~ y is y u y. So */~ i. 11 also produces the same table.

Suppose we wanted to compute triangular numbers. tri(n) = (n×(n+1))/2. Perhaps we would like to use J to solve Gauss’s little school problem. Note that n×(n+1) can be computed with a bident: (*>:) (>: is the successor function). Now all we have to do is halve the answer. The currying conjunction & comes in handy. The monadic %&2 halves its operand; %&2 y is the same as y%2. We can use @ to compose these two:

   tri =: %&2 @ (*>:)
   tri i.10
0 1 3 6 10 15 21 28 36 45

In fact the brackets aren’t necessary. We can get rid of the brackets around the bident by using the monadic identity, [, in a trident:

tri =: %&2 @ [*>:

It’s a little shorter, but I’m not convinced that it’s any neater.

I tried this form, and it works, but not for the reasons that I thought it did. The spacing I’ve used above suggests I have a trident on the right, a curried divide operator on the left, and I’ve joined them with @. We can use J to see how it’s actually been parsed:

   %&2 @ [*>:
||+-+-+-+|@|[|| |  |
|||%|&|2|| | || |  |
||+-+-+-+| | || |  |
|+-------+-+-+| |  |

The kinda scary looking boxes are just another way to write brackets (in this case). So %&2 @ [*>: turns out to be parsed as ((%&2)@[)*>:. Lesson: be careful. This is actually computing (n/2)×(n+1); of course this is mathematically the same as (n×(n+1))/2 but the latter can be done in integer arithmetic whereas the former requires fractions. Good job halving is an exact operation in IEEE arithmetic.

The reason %&2@[*>: gets parsed as it does is that the trident reduction cannot take place if there is a conjunction to the left of the potential trident. When the stack has the four terms @ [ * >: on it, @ is a conjunction so [ * >: will not be reduced to a trident. The conjunctions get reduced first.

Conjunctions get reduced left to right. %&2@[ is (%&2)@[. Referring to the parsing table again we see that’s because a conjunction won’t be reduced if there is a conjunction immediately to its left.

Since [ is monadic identity, the [ in %&2@[ is not necessary. We can just use %&2 instead: %&2*>:. It also turns out that dividing by 2 is sufficiently useful that there’s a primitive to do it: -:. So we can use tri =: -:*>:, which is just a single trident.

Cap, spelt [:, is a magic verb used in tridents. [: g h is the same as f g h but without the entire f branch. So x ([: g h) y is g (x h y), and ([: g h) y is g h y.

The last form of cap is found in our original triangle formula that used @: tri =: %&2 @ (*>:). We can remove the @ conjunction and replace it with a cap instead:

   tri =. [: %&2 (*>:)

And this time we can replace the bracketed bident on the right with a trident that uses [:

   tri =. [: %&2 [*>:
|  ||%|&|2|||[|*|>:||
|  |+-+-+-+|+-+-+--+|

This does get parsed how we (or at least I) expect, with [ * >: on the right being reduced to a trident.

Using -: and, for amusement, swapping ] for [ we get:

   tri =: [:-:]*>:

To be honest this seems like an exercise in fruitless manipulation, but I’m sure I’ll be finding all sorts of witty things we can do with forks and hooks.

Why electronic counting failed


All I’ve read so far is the BBC article about how the pilot at 5 councils of electronic counting has failed. That and some of the more tedious leglislation on the matter.

So having not found out much yet I predict the following will turn out to be key factors in the failure of the electronic counting pilot:

Secret software.
MS Access.
Administrator Privileges.
Power cycling.
Final hardware not specified (specification of hardware changed throughout project life-cycle).
No full-scale tests. That is, no test using a similar number of votes as would be expected on election night.
No test performed using the final hardware and software at the site where counting was to be done using real counting staff (until the real thing on election night, naturally).
Nobody opened an image-processing textbook (see secret software).
Results of pre-election night tests suppressed.
Requirements for error rates not specified numerically (in quantifiable terms).
At least one class of errors (in categorising votes) not considered.
Requirements on speed of processing not specified numerically.
Requirements on number of votes presented for adjudication (compared to votes adjudicated in a hand count) not specified.
Changing the design of the ballot paper so that electronic voting would be “easier” meant that electronic counting systems could not be tested with the actual voting slips from previous elections.
Inadequate staff training (see power cycling).
Concurrency problems (for example, presenting the same image to two different adjudicators, then screwing up the count database).

Awesome Shell Power: yes ” | head | nl -ba


Often when programming in shell you want a sequence of numbers. bash has quirky for syntax for doing it, but I’m not really interested in that. I want something portable.

«yes '' | head | nl -ba» will generate a sequence of numbers from 1 to 10. 1 to N can be achieved by specifying an argument to head: «yes '' | head -7 | nl -ba», or, as I never tire of telling people, using sed instead of head: «yes '' | sed 7q | nl -ba». The sed version is one character shorter.

As long as we’ve invoked sed we may as well read the manual to get: «yes | sed -n '=;7q'». Note that as sed is not actually processing the input lines, just counting them, it doesn’t matter what their contents are, so we no longer need an argument to yes.

It turns out that yes is not a standard part of the Unix utilities. So much for portability. Whilst we could make an adequate simulation of yes with «while : ; do echo y ; done», it’s much more amusing to use dd:

$ dd < /dev/zero ibs=1 cbs=1 count=7 conv=unblock | sed -n =
7+0 records in
0+1 records out
14 bytes transferred in 0.000060 secs (233945 bytes/sec)

I love dd for its obscurity, its parody of the DD statement in JCL, and the way no-one seems to know what it is for any more. There are some stderr turds left by dd which we can eliminate with a quick 2>&-: «dd < /dev/zero 2>&- ibs=1 cbs=1 count=7 conv=unblock | sed -n =».

This version of the command is improved in many ways: “count=7” is practically self-documenting; we no longer need quotes around the sed program (though the way a bare = appears at the end does seem like a bit of a non sequitur, especially given all the other = that appear in the dd command that have a completely different meaning). However, it’s unsatisfying in many other ways. It uses /dev/zero which isn’t guaranteed to exist (even on conformant Unix OSes). It squirts binary data (a sequence of alternating NUL and LF characters) into sed which isn’t obliged to behave on such input. It’s long and silly.

Much better to just generate the sequence directly using something approaching a real programming language:

awk 'BEGIN{ for(i=1;i<=7;++i) print i}'

Even those who don’t know the mighty awk can probably work out what it does. It’s portable (some now truly ancient, that is SunOS 4.x, systems might need an extra < /dev/null); it’s clear; it works.

Of course on systems with jot installed you can just go jot 7. That includes FreeBSD and OS X. It turns out that jot is named after APL’s ɩ, iota, which is spelt i. in J.

Vote Different


It turns out that there are 12 “electoral modernisation pilots”. Sorry for the crappy image linked to a PDF. Getting this table from NeoOffice to my blog was unreasonably annoying.

Advance is a scheme where voters can vote up to 2 weeks (generally) before the election. In all these schemes a signature will be required before advance voting.

Signature is a scheme where all voters (not just advance voters) will be required to provide a signature before voting.

Scanning means that votes will be counted by electronic scanning machines. Y denotes a scheme using “commerically available” scanning hardware. X presumably denotes a scheme using scanning hardware custom-built by the Mayor’s nephew in his shed.

Internet denotes a scheme of advance voting using the internet.

Telephone denotes a scheme of advance voting using the telephone.

Central denotes the provision of a centrally provided facility (or facilities) at which voters can vote regardless of their ward or parish.

That single table is about the same amount of information in this 5 page PDF from the DCA found on this page. Is the DCA’s performance measured in Kilogrammes or something?