Archive for the 'ecmascript' Category

JavaScript: using numbers as table keys considered harmful

2008-07-11

You may or may not know, but in JavaScript all tables (associative arrays) are indexed by strings, and nothing else. So what happens when you go a[0] = thing, isn’t that indexing the table a with the number 0? Well, yes and no. The number gets converted to a string, so a[0] is equivalent to a['0']. In fact all values, of any type, get converted to a string when used as a table key, but I’m only concerned about numbers here (see ECMA 262 section 11.2.1).

Note that this is only how the language is specified, not implemented. Given the importance of arrays in JavaScript, routinely indexed by integers, I would expect that special representations in the implementation would allow efficient indexing by integers.

So with (small) integer keys there are few possible problems. 113 gets converted to ’113′ without ambiguity, so a[113] and a['113'] are equivalent. What about other number values, like 1.5 or 98.4 (Aside: when I say “98.4″ what I mean is the double precision floating point number that is closest to 98.4, this will be a little bit different from exactly 98.4 (see my article on printing floating point numbers))? It would be embarrassing if a[98.4] were equivalent to a['98.400000000000006']. It would be even more embarrassing if the equivalence was allowed to vary between implementations. Well, it’s not. The JavaScript programmer has every right to expect the body of the if in this code to be executed:

a={'98.4':true}
if(a[98.4]) { ... }

When converting numbers to strings JavaScript requires that the parsimonious printing rule is applied. Roughly speaking, a number gets converted to a decimal string that is closer to the source number than any other (floating point) number, and of all such possible decimal strings the shortest one is chosen. The “shortest one” requirement is what gives us ’98.4′ instead of ’98.400000000000006′.

However, there is a problem. Sometimes there is more than one possible shortest string. An obvious example is the smallest positive double precision float: 5e-324 (see the smallest number). This number can be represented in decimal as 3e-324, 4e-324, …, 7e-324. These are representations of the floating point number in the sense that the nearest floating point to each of these decimals is in each case the same floating point number, the smallest positive double precision float. There are other numbers. nextafter(1, 2) is one such number: 1.0000000000000002. This number can equally well be represented by the decimal 1.0000000000000003.

So the problem is that when there is more than one possible shortest decimal string the implementation is free to choose amongst them (so my “no variance between implementations”, above, was a small lie). In principle a JavaScript implementation could make a different choice each time it performed the conversion. So a[5e-324] could index up to 5 different keys in the table a. It would of course be insanity for an implementation to do that, but it’s still a loophole in the specification, and it should be closed.

In the 3rd edition of ECMA 262 there’s a note about this at the end of section 9.8.1; the note is “not part of the normative requirements”. The note specifies additional requirements on the choice of the decimal string (namely “nearest”, and, if still ambiguous, “even”) that ensure a unique conversion is performed. So the text that specifies the requirements we need is already written, it should be a simple matter to make this note normative in the next revision, and this should be done.

I thought Java got this right, but on close inspection I see that it also permits the same variation when there is more than one shortest decimal string. In fact, for the smallest positive double precision float 2 decimal digits are required by Java (because a decimal point is required), so permitting many more options for 5e-324. In this case I wouldn’t even like to say whether “4.9E-324″ or “5.0E-324″ should (in the moral sense) be returned. The former has the virtue of being closest to the floating point number, the latter has the virtue of not misleading about available precision.

A use for uncompressed PNGs

2007-11-20

A new feature of Curly Logo is the SETTBG command to Set Text BackGround colour. settbg "#850c sets the background to be a nearly fully opaque brown. Note that you can set the alpha (opacity) channel. As I mention in an earlier article the background for this (XHTML) div element is a 1×1 PNG. In order to implement SETTBG I have to create a 1×1 PNG containing an arbitrary 4-channel colour value, in JavaScript.

Gareth Rees wrote a very useful post on making very small PNGs that does most of the work of showing which bits should go where in a PNG file. Unfortunately the IDAT chunk in a PNG file is required to be compressed with zlib/deflate. That means I have to take my 32-bits of data (RGBA), prepend a 0 octet (for PNG’s scanline filter) then compress those 5 octets into zlib format.

I haven’t written a fully general zlib encoder in JavaScript, so what I do is use the feature of the deflate format that allows uncompressed blocks.

Let’s say my pixel has colour value #c5005c6d (that’s in RGBA order). My PNG scanline requires a 00 to be prepended, so I need to zlib encode the 5 octets 00 c5 00 5c 6d. Here’s what the encoded zlib data looks like:

78 9c 01 05 00 fa ff 00 c5 00 5c 6d 04 3e 01 8f

Which breaks down like this:

78 9c
zlib header.
01
the little endian bit string 1 00 00000. 1 indicating the final block, 00 indicating a non-compressed block, and 00000 are 5 bits of padding to align the start of a block to on octet (which is required for non-compressed blocks, and very convenient for me).
05 00 fa ff
The number of octets of data in the uncompressed block (5). Stored as a little-endian 16-bit integer followed by its 1′s complement (!).
00 c5 00 5c 6d
The scanline data stored as plain octets.
04 3e 01 8f
The zlib Adler checksum. Observe that 1 + 0×00+0xc5+0×00+0x5c+0x6d = 0x18f and 5 + 5*0×00 + 4*0xc5 + 3*0×00 + 2*0x5c + 1*0x6d = 0x43e.

In the same style as Gareth’s Python here’s some JavaScript:

  // Create a PNG chunk.  Supply type and data, length and CRC are
  // computed.
  chunk: function(type, data) {
    return be32(data.length) + type + data +
        be32(this.crc(type + data))
  },

  // Create binary data for a 1x1 4-channel PNG.  The colour is specified
  // by the number n: 0xRRGGBBAA
  unit: function(n) {
    var d = '\x00' + be32(n) // PNG scanline data.
    return '\x89PNG\r\n\x1A\n' +
      this.chunk('IHDR', be32(1) + be32(1) + '\x08\x06\x00\x00\x00') +
      this.chunk('IDAT',
          '\x78\x9c\x01\x05\x00\xfa\xff' + d + be32(this.adler32(d))) +
      this.chunk('IEND', '')
  }

These fragments are embedded inside an object, hence the object literal syntax to declare the functions and the use of this to refer to other functions defined alongside. be32 is a function to encode an integer in 4 octet (32-bit) big-endian format.

JavaScript bit twiddling

Getting all this data correctly inside a PNG requires two checksum algorithms: Zlib format uses a simple Adler checksum (at the end, see above); PNG uses a conventional 32-bit polynomial residue checksum for each of its chunks. I had to implement both of those in JavaScript.

JavaScript is actually a pretty nice language for bit-twiddling. Sure the performance is bound to suck but that’s a feature of the implementations not the language itself. Not that I actually measured the performance, but I Just Know. In any case performance is not an issue for encoding this trivial amount of data. JavaScript has all of C’s bit-twiddling operators, and it has a bonus feature that C doesn’t have: 32-bit operation guaranteed. Generally the way that an operator like «x ^ y» works in JavaScript is that both x and y are converted to 32-bit signed integers then the operation is performed as if on 32-bit signed integers and the result is converted to a JavaScript number (which is an IEEE double). In C you might have to worry about those 36-bit or 64-bit architectures that have “funny” widths for their types. Not in JavaScript.

Here’s the CRC code in C from the PNG Spec Annex D:

unsigned long update_crc(unsigned long crc, unsigned char *buf,
                             int len)
{
  unsigned long c = crc;
  int n;

  for (n = 0; n < len; n++) {
    c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
  }
  return c;
}

Here’s my JavaScript version:

  crc32: function(buf, crc) {
    if(crc === undefined) {
      crc = 0xffffffff
    }
    var i
    for(i=0; i<buf.length; ++i) {
      crc = this.crctable[(crc^buf.charCodeAt(i)) & 0xff] ^ (crc >>> 8)
    }
    return crc
  },

The core code inside the “for” loop has hardly changed. Instead of «buf[i]» I have to write «buf.charCodeAt(i)»; JavaScript doesn’t have a character type so conversions from characters to their internal representation are more explicit. Note that in JavaScript the shift is «crc >>> 8». JavaScript doesn’t distinguish signed and unsigned integers (or indeed integers and floats) so it can’t tell which sort of shift is required by inspecting crc. Like Java JavaScript has two right shift operators, «>>» for preserving the sign bit, and «>>>» for clearing the sign bit. Using the wrong one was a bug in an earlier implementation of mine.

The trouble with matrix multiplication

2007-11-19

is that it’s not commutative. That means that when the SVG spec says in the documentation for the multiply method in the SVGMatrix DOM “this matrix is post-multiplied by another matrix” it really does matter which way round you do the multiplication of the two matrices.

In the call «A.multiply(B)», A is this and should be post-multiplied by B. In the usual mathematical convention the result should be AB.

Safari computes BA, Firefox computes AB. Oopsy.

This is a confusing area; when I was first getting the turtle motion in Curly Logo going (on Firefox initially) I remember a bit where the turtle’s motion was wrong, and I realised that I had some matrix multiplication the wrong way round. Happens all the time regardless of how much thinking I do beforehand about which way round matrices ought to be multiplied, so I thought nothing of it and simply swopped my arguments. As I was writing the first draft of this article I thought Firefox had it wrong and Safari had it right, turns out that’s not right, it’s the other way round.

Regardless of who is right, anyone wanting to use the routine has to do something to work around it.

So here’s what I do. I perform a trial matrix multiplication using SVGMatrix.multiply and determine whether the multiplication was performed the correct way round or the wrong way round.

Consider the matrices L, a flip, and R, a projection:

L = \begin{pmatrix}0&1&0\cr 1&0&0\cr 0&0&1\end{pmatrix}, R = \begin{pmatrix}0&0&1\cr 0&0&0\cr 0&0&1\end{pmatrix}

LR, that is «L.multiply(R)» as it should be:

LR = \begin{pmatrix}0&0&0\cr 0&0&1\cr 0&0&1\end{pmatrix}

What happens if the implementation swops its arguments:

RL = \begin{pmatrix}0&0&1\cr 0&0&0\cr 0&0&1\end{pmatrix}

In JavaScript (well, SVG DOM really) the entries of a matrix are assigned to properties like this:

\begin{pmatrix}a&c&e\cr b&d&f\cr 0&0&1\end{pmatrix}

Note that the two result matrices (above) only differ in their e and f entries. Specifically when «L.multiply(R)» is computed correctly (following the specification) the result’s e property is 0; when computed with the matrices swopped (incorrectly) the result’s e property is 1. How convenient.

Here is some JavaScript to detect if SVGMatrix.multiply is wrong:

matmulwrong = function() {
  var g = document.createElementNS('http://www.w3.org/2000/svg', 'g')
  g.setAttribute('transform',
    'matrix(0 1 1 0 0 0) translate(1 0)')
  var l = g.transform.baseVal.getItem(0).matrix
  var r = g.transform.baseVal.getItem(1).matrix
  return l.multiply(r).e
}

There is a lot of faffing around creating a pointless g element just so I can attach two matrices to it. That really was the most convenient way I found to create two matrices.

We can use matmulwrong to select one of two definitions for our own matmul function to multiply two matrices:

// matmul(A, B) computes AB
matmul = matmulwrong() ?
    function(a, b) { return b.multiply(a) } :
    function(a, b) { return a.multiply(b) }

Now we should only use matmul and avoid using SVGMatrix.multiply. Defining it like this means we decide once, when the JavaScript is loaded, and not every time that we call matmul. Notice how this simply tests whether SVGMatrix.multiply is swopped or not and acts accordingly; it doesn’t care whether you’re using Safari or Firefox, it’s more in the spirit of object detection, not browser sniffing. As soon as bug 16062 gets fixed in FirefoxSafari the code will detect it as being fixed. Life is good again and Curly Logo works on both browsers.

PS: Opera agrees with Firefox.
PPS: Guess which WordPress feature I used today that I thought I’d never find a genuine use for.

JavaScript: Bitmap Editing

2007-11-16

Following on from my discovery of RFC 2397 I realised that we would be able to manipulate the “src” property of the “img” element in JavaScript and thereby change the image being displayed. The Base64 encoding is conveniently handled by JavaScript’s (non-standard but ubiquitous in the browser) btoa function (2008-05-10: ubiquitous apart from Opera, that is). I chose OS/2′s BMP image file format as being suitably simple to manipulate.

The result is BME (BitMap Editor). Note that all the work is done by client-side JavaScript, there’s no server side programming or communication at all (that would be madness!).

Bi-level any-colour-you-like-as-long-as-it’s-grey-and-blue at the moment. I suggest you right-click to save.

This use of RFC 2397 shows up some bugs and misfeatures of the OS X user interface. For example, dragging the image onto Mail.app opens a new mail message containing the contents of the URL as text rather than an image.

Javascript: coercing to signed integer

2007-11-13

If you have some JavaScript value in x and you want it as a 32-bit signed integer then you can use the expression:

x|0

or change x using the assignment expression:

x|=0

This leaves 32-bit signed integers untouched, and does plausible things for fractional values, very large numbers, strings that look like numbers, true and false, NaN, infinities, and any other objects. It can’t raise an exception. It’s very short.

I can’t remember what I was trying to do when I discovered this, so I haven’t got a good use case…

JavaScript: Functions are Constructors are Functions

2007-09-13

Some people seem to be a bit confused about the return value of a JavaScript function when used as a constructor. So what does «new Foo(some args)» evaluate to?

If the constructor returns an object then that is used; otherwise the return value is irrelevant and the newly constructed object is used.

The immediate and practical upshot this is that for a constructor it doesn’t matter whether you use «return» or «return this». I would say that a constructor’s normal job is to make objects so if you must return explicitly from a constructor then just use the simple «return» form.

The amusing and alarming upshot is that the behaviour of a function can be radically different according to whether it is invoked as a function or a constructor. Imagine you have a subtraction function:

function Sub(x, y) {
  return x-y
}

Invoke it like «Sub(9,5)» and it returns 4. What does it do when used in «new Sub(9,5)»? Well, technically the function still returns 4, but because 4 is not an object that value is ignored and the newly constructed object is returned instead. A new expression is guaranteed to evaluate to an object (or raise an exception).

Consider (at an interactive JavaScript prompt):

js> function Sub(x, y) {
  result = x-y
  return result
}
js> Sub.prototype.toString = function() { return 'I am a walrus' }
js> Sub(9,5)
4
js> result
4
js> new Sub(100,1)
I am a walrus
js> result
99

When invoked as a constructor we get a new object that prints as «I am a walrus» because of the toString method in its prototype. I have deliberately changed Sub so that it writes to a global variable so that you can convince yourself that it is being called in the constructor case even though its return value is not being used.

Do not do this at home!

I am not recommending the above, I just find it an amusing way to illustrate an obscure feature of JavaScript.

I recommend that you make it crystal clear which of your functions are constructors and never encourage anyone to call them as a normal function. The fact that all functions can be used as constructors (and vice versa, any constructor can be invoked as a function) is an attractive nuisance. Sure, you can avoid it by being careful, but it’s too easy to do accidentally. Douglas Crockford has an article about the attractive nuisance of fallthrough in switch, which is where I got the term.

The usual convention, which seems very sensible to me, is to start constructors with a capital letter. That’s makes them look a bit more like classes too.

To some extent you can protect yourself against accidentially calling a constructor as a function. You can test whether this is the global object (which it should be in a function) or not the global object (which it should be in a constructor). But only if you first remember what the global object is:

js> g=this
[object global]
js> function Foo() {
  if(this === g) {
    throw 'I am a constructor, not a function'
  }
}
js> new Foo()
[object Object]
js> Foo()
uncaught exception: I am a constructor, not a function

Because the first OO language I learnt was Dylan I find that the artificial separation of constructors and ordinary functions (which may or may not return fresh objects) is also attractive nuisance. It’s an attractive nuisance that has worked its way (like fallthrough in switch) into many programming languages: C++, Java, JavaScript. Even Common Lisp has it, though I suspect that unwrapped uses of make-instance are actively discouraged. In the other languages I mention, the idea that you might use something other than a constructor to get a fresh object is so unusual that there’s a name for this idea: factory method pattern!

Appendix: Standard References

All in ECMA 262 3rd edition.

Section 11.2.2 defines the semantics of the new operator. Which when used like this:

new Foo()

The semantics are to grab the Foo object and call its [[Construct]] method. When Foo is a constructor that you defined yourself, that is a JavaScript function, the [[Construct]] method is described in section 13.2.2 and has roughly the behavious I describe above: construct a «new Object()» (and let it be this); call the function; if it yields an object then use it, otherwise use this.

Follow

Get every new post delivered to your Inbox.