Archive for the 'javascript' Category

JavaScript: using numbers as table keys considered harmful

2008-07-11

You may or may not know, but in JavaScript all tables (associative arrays) are indexed by strings, and nothing else. So what happens when you go a[0] = thing, isn’t that indexing the table a with the number 0? Well, yes and no. The number gets converted to a string, so a[0] is equivalent to a['0']. In fact all values, of any type, get converted to a string when used as a table key, but I’m only concerned about numbers here (see ECMA 262 section 11.2.1).

Note that this is only how the language is specified, not implemented. Given the importance of arrays in JavaScript, routinely indexed by integers, I would expect that special representations in the implementation would allow efficient indexing by integers.

So with (small) integer keys there are few possible problems. 113 gets converted to ’113′ without ambiguity, so a[113] and a['113'] are equivalent. What about other number values, like 1.5 or 98.4 (Aside: when I say “98.4″ what I mean is the double precision floating point number that is closest to 98.4, this will be a little bit different from exactly 98.4 (see my article on printing floating point numbers))? It would be embarrassing if a[98.4] were equivalent to a['98.400000000000006']. It would be even more embarrassing if the equivalence was allowed to vary between implementations. Well, it’s not. The JavaScript programmer has every right to expect the body of the if in this code to be executed:

a={'98.4':true}
if(a[98.4]) { ... }

When converting numbers to strings JavaScript requires that the parsimonious printing rule is applied. Roughly speaking, a number gets converted to a decimal string that is closer to the source number than any other (floating point) number, and of all such possible decimal strings the shortest one is chosen. The “shortest one” requirement is what gives us ’98.4′ instead of ’98.400000000000006′.

However, there is a problem. Sometimes there is more than one possible shortest string. An obvious example is the smallest positive double precision float: 5e-324 (see the smallest number). This number can be represented in decimal as 3e-324, 4e-324, …, 7e-324. These are representations of the floating point number in the sense that the nearest floating point to each of these decimals is in each case the same floating point number, the smallest positive double precision float. There are other numbers. nextafter(1, 2) is one such number: 1.0000000000000002. This number can equally well be represented by the decimal 1.0000000000000003.

So the problem is that when there is more than one possible shortest decimal string the implementation is free to choose amongst them (so my “no variance between implementations”, above, was a small lie). In principle a JavaScript implementation could make a different choice each time it performed the conversion. So a[5e-324] could index up to 5 different keys in the table a. It would of course be insanity for an implementation to do that, but it’s still a loophole in the specification, and it should be closed.

In the 3rd edition of ECMA 262 there’s a note about this at the end of section 9.8.1; the note is “not part of the normative requirements”. The note specifies additional requirements on the choice of the decimal string (namely “nearest”, and, if still ambiguous, “even”) that ensure a unique conversion is performed. So the text that specifies the requirements we need is already written, it should be a simple matter to make this note normative in the next revision, and this should be done.

I thought Java got this right, but on close inspection I see that it also permits the same variation when there is more than one shortest decimal string. In fact, for the smallest positive double precision float 2 decimal digits are required by Java (because a decimal point is required), so permitting many more options for 5e-324. In this case I wouldn’t even like to say whether “4.9E-324″ or “5.0E-324″ should (in the moral sense) be returned. The former has the virtue of being closest to the floating point number, the latter has the virtue of not misleading about available precision.

A use for uncompressed PNGs

2007-11-20

A new feature of Curly Logo is the SETTBG command to Set Text BackGround colour. settbg "#850c sets the background to be a nearly fully opaque brown. Note that you can set the alpha (opacity) channel. As I mention in an earlier article the background for this (XHTML) div element is a 1×1 PNG. In order to implement SETTBG I have to create a 1×1 PNG containing an arbitrary 4-channel colour value, in JavaScript.

Gareth Rees wrote a very useful post on making very small PNGs that does most of the work of showing which bits should go where in a PNG file. Unfortunately the IDAT chunk in a PNG file is required to be compressed with zlib/deflate. That means I have to take my 32-bits of data (RGBA), prepend a 0 octet (for PNG’s scanline filter) then compress those 5 octets into zlib format.

I haven’t written a fully general zlib encoder in JavaScript, so what I do is use the feature of the deflate format that allows uncompressed blocks.

Let’s say my pixel has colour value #c5005c6d (that’s in RGBA order). My PNG scanline requires a 00 to be prepended, so I need to zlib encode the 5 octets 00 c5 00 5c 6d. Here’s what the encoded zlib data looks like:

78 9c 01 05 00 fa ff 00 c5 00 5c 6d 04 3e 01 8f

Which breaks down like this:

78 9c
zlib header.
01
the little endian bit string 1 00 00000. 1 indicating the final block, 00 indicating a non-compressed block, and 00000 are 5 bits of padding to align the start of a block to on octet (which is required for non-compressed blocks, and very convenient for me).
05 00 fa ff
The number of octets of data in the uncompressed block (5). Stored as a little-endian 16-bit integer followed by its 1′s complement (!).
00 c5 00 5c 6d
The scanline data stored as plain octets.
04 3e 01 8f
The zlib Adler checksum. Observe that 1 + 0×00+0xc5+0×00+0x5c+0x6d = 0x18f and 5 + 5*0×00 + 4*0xc5 + 3*0×00 + 2*0x5c + 1*0x6d = 0x43e.

In the same style as Gareth’s Python here’s some JavaScript:

  // Create a PNG chunk.  Supply type and data, length and CRC are
  // computed.
  chunk: function(type, data) {
    return be32(data.length) + type + data +
        be32(this.crc(type + data))
  },

  // Create binary data for a 1x1 4-channel PNG.  The colour is specified
  // by the number n: 0xRRGGBBAA
  unit: function(n) {
    var d = '\x00' + be32(n) // PNG scanline data.
    return '\x89PNG\r\n\x1A\n' +
      this.chunk('IHDR', be32(1) + be32(1) + '\x08\x06\x00\x00\x00') +
      this.chunk('IDAT',
          '\x78\x9c\x01\x05\x00\xfa\xff' + d + be32(this.adler32(d))) +
      this.chunk('IEND', '')
  }

These fragments are embedded inside an object, hence the object literal syntax to declare the functions and the use of this to refer to other functions defined alongside. be32 is a function to encode an integer in 4 octet (32-bit) big-endian format.

JavaScript bit twiddling

Getting all this data correctly inside a PNG requires two checksum algorithms: Zlib format uses a simple Adler checksum (at the end, see above); PNG uses a conventional 32-bit polynomial residue checksum for each of its chunks. I had to implement both of those in JavaScript.

JavaScript is actually a pretty nice language for bit-twiddling. Sure the performance is bound to suck but that’s a feature of the implementations not the language itself. Not that I actually measured the performance, but I Just Know. In any case performance is not an issue for encoding this trivial amount of data. JavaScript has all of C’s bit-twiddling operators, and it has a bonus feature that C doesn’t have: 32-bit operation guaranteed. Generally the way that an operator like «x ^ y» works in JavaScript is that both x and y are converted to 32-bit signed integers then the operation is performed as if on 32-bit signed integers and the result is converted to a JavaScript number (which is an IEEE double). In C you might have to worry about those 36-bit or 64-bit architectures that have “funny” widths for their types. Not in JavaScript.

Here’s the CRC code in C from the PNG Spec Annex D:

unsigned long update_crc(unsigned long crc, unsigned char *buf,
                             int len)
{
  unsigned long c = crc;
  int n;

  for (n = 0; n < len; n++) {
    c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
  }
  return c;
}

Here’s my JavaScript version:

  crc32: function(buf, crc) {
    if(crc === undefined) {
      crc = 0xffffffff
    }
    var i
    for(i=0; i<buf.length; ++i) {
      crc = this.crctable[(crc^buf.charCodeAt(i)) & 0xff] ^ (crc >>> 8)
    }
    return crc
  },

The core code inside the “for” loop has hardly changed. Instead of «buf[i]» I have to write «buf.charCodeAt(i)»; JavaScript doesn’t have a character type so conversions from characters to their internal representation are more explicit. Note that in JavaScript the shift is «crc >>> 8». JavaScript doesn’t distinguish signed and unsigned integers (or indeed integers and floats) so it can’t tell which sort of shift is required by inspecting crc. Like Java JavaScript has two right shift operators, «>>» for preserving the sign bit, and «>>>» for clearing the sign bit. Using the wrong one was a bug in an earlier implementation of mine.

The trouble with matrix multiplication

2007-11-19

is that it’s not commutative. That means that when the SVG spec says in the documentation for the multiply method in the SVGMatrix DOM “this matrix is post-multiplied by another matrix” it really does matter which way round you do the multiplication of the two matrices.

In the call «A.multiply(B)», A is this and should be post-multiplied by B. In the usual mathematical convention the result should be AB.

Safari computes BA, Firefox computes AB. Oopsy.

This is a confusing area; when I was first getting the turtle motion in Curly Logo going (on Firefox initially) I remember a bit where the turtle’s motion was wrong, and I realised that I had some matrix multiplication the wrong way round. Happens all the time regardless of how much thinking I do beforehand about which way round matrices ought to be multiplied, so I thought nothing of it and simply swopped my arguments. As I was writing the first draft of this article I thought Firefox had it wrong and Safari had it right, turns out that’s not right, it’s the other way round.

Regardless of who is right, anyone wanting to use the routine has to do something to work around it.

So here’s what I do. I perform a trial matrix multiplication using SVGMatrix.multiply and determine whether the multiplication was performed the correct way round or the wrong way round.

Consider the matrices L, a flip, and R, a projection:

L = \begin{pmatrix}0&1&0\cr 1&0&0\cr 0&0&1\end{pmatrix}, R = \begin{pmatrix}0&0&1\cr 0&0&0\cr 0&0&1\end{pmatrix}

LR, that is «L.multiply(R)» as it should be:

LR = \begin{pmatrix}0&0&0\cr 0&0&1\cr 0&0&1\end{pmatrix}

What happens if the implementation swops its arguments:

RL = \begin{pmatrix}0&0&1\cr 0&0&0\cr 0&0&1\end{pmatrix}

In JavaScript (well, SVG DOM really) the entries of a matrix are assigned to properties like this:

\begin{pmatrix}a&c&e\cr b&d&f\cr 0&0&1\end{pmatrix}

Note that the two result matrices (above) only differ in their e and f entries. Specifically when «L.multiply(R)» is computed correctly (following the specification) the result’s e property is 0; when computed with the matrices swopped (incorrectly) the result’s e property is 1. How convenient.

Here is some JavaScript to detect if SVGMatrix.multiply is wrong:

matmulwrong = function() {
  var g = document.createElementNS('http://www.w3.org/2000/svg', 'g')
  g.setAttribute('transform',
    'matrix(0 1 1 0 0 0) translate(1 0)')
  var l = g.transform.baseVal.getItem(0).matrix
  var r = g.transform.baseVal.getItem(1).matrix
  return l.multiply(r).e
}

There is a lot of faffing around creating a pointless g element just so I can attach two matrices to it. That really was the most convenient way I found to create two matrices.

We can use matmulwrong to select one of two definitions for our own matmul function to multiply two matrices:

// matmul(A, B) computes AB
matmul = matmulwrong() ?
    function(a, b) { return b.multiply(a) } :
    function(a, b) { return a.multiply(b) }

Now we should only use matmul and avoid using SVGMatrix.multiply. Defining it like this means we decide once, when the JavaScript is loaded, and not every time that we call matmul. Notice how this simply tests whether SVGMatrix.multiply is swopped or not and acts accordingly; it doesn’t care whether you’re using Safari or Firefox, it’s more in the spirit of object detection, not browser sniffing. As soon as bug 16062 gets fixed in FirefoxSafari the code will detect it as being fixed. Life is good again and Curly Logo works on both browsers.

PS: Opera agrees with Firefox.
PPS: Guess which WordPress feature I used today that I thought I’d never find a genuine use for.

JavaScript: Bitmap Editing

2007-11-16

Following on from my discovery of RFC 2397 I realised that we would be able to manipulate the “src” property of the “img” element in JavaScript and thereby change the image being displayed. The Base64 encoding is conveniently handled by JavaScript’s (non-standard but ubiquitous in the browser) btoa function (2008-05-10: ubiquitous apart from Opera, that is). I chose OS/2′s BMP image file format as being suitably simple to manipulate.

The result is BME (BitMap Editor). Note that all the work is done by client-side JavaScript, there’s no server side programming or communication at all (that would be madness!).

Bi-level any-colour-you-like-as-long-as-it’s-grey-and-blue at the moment. I suggest you right-click to save.

This use of RFC 2397 shows up some bugs and misfeatures of the OS X user interface. For example, dragging the image onto Mail.app opens a new mail message containing the contents of the URL as text rather than an image.

Javascript: coercing to signed integer

2007-11-13

If you have some JavaScript value in x and you want it as a 32-bit signed integer then you can use the expression:

x|0

or change x using the assignment expression:

x|=0

This leaves 32-bit signed integers untouched, and does plausible things for fractional values, very large numbers, strings that look like numbers, true and false, NaN, infinities, and any other objects. It can’t raise an exception. It’s very short.

I can’t remember what I was trying to do when I discovered this, so I haven’t got a good use case…

Writing JavaScript for JsMin

2007-11-05

http://s.wordpress.com/wp-content/plugins/highlight/shCore.js

apparently contains some JavaScript used on my blog (to do syntax highlighting in one or two articles that discuss code). The JavaScript has been run through Crockford’s JSMin. Examining the output is quite interesting; where we see opportunities for further minimisation these suggest either JavaScript coding practices that could be adopted or improvements to similar minimisation tools.

There are certain JavaScript coding practices that make a minimisation tool’s life harder:

var x=e
var y=f

has the same meaning as:

var x=e,y=f

The second form gets rid of the var token and is four characters shorter. But I think a minimisation tool would be quite bold to perform transformations like this, so programmers are better off doing this themselves. This is probably why Crockford’s style is to use one var and let all the variables share it, cuddled with commas. It seems to be a commonly emulated style.

var doc=null

The «=null» is probably redundant because just a plain «var doc» on its own is going to initialise the doc variable with the undefined value (ironically, there’s nothing undefined about undefined apart from its name). Sure, subsequent code could tell the difference between null and undefined but you’re unlikely to be actually interested in the difference. Because you could in principle write code that changed according to whether doc was null or undefined («if(doc===undefined)» being the most obvious example) then a minimisation tool is very unlikely to be able to correctly make the transformation, again the burden is on the programmer.Observation: «undefined == null» is true so code like «if(doc == null)» will continue to work anyway. Which leads me to…

if(flashcopier==null)

When this sort of code is used for feature detection it probably has the same intent as «if(!flashcopier)» which is 5 characters shorter. Again because these two fragments don’t have quite the same meaning (for example, when flashcopier is 0) it’s up to the programmer to decide if one can be replaced by the other.

if(str==null||str.length==0)

Whilst I fully appreciate the use of the «str==null» guard so that the «str.length» expression can’t throw an exception, it probably isn’t necessary. If str is intended to be a string then the code above is equivalent to «if(!str)» because the empty string is a falsy value in JavaScript (see my article on Iverson’s Convention for a cross-language comparison of true/false interpretation). This equivalence relies on str being a string; sure, the name is a big clue, but in the absence of a type annotation (either code or documentation) I can’t tell. Maybe str could be the empty Array, [], which would then have a different behaviour in the two tests. The programmer will have to decide if the replacement code is acceptable.

if(typeof(a[i])=='string'

Due to JavaScript’s operator precedence the test is equivalent to «typeof a[i]==’string’» which is a whole character smaller. This transformation is one that a minimisation tool should be able to make more easily than the programmer. After all a minimisation tool is in a position to know the operator precedence table in detail and correctly decide when parentheses are redundant, the programmer is not. Of course this would require that the minimisation tool do a full parse and JSMin doesn’t do anything like that.

if((match.index>c.index)&&(match.index<c.index+c.length))

Like the earlier typeof example the minimiser could use its knowledge of operator precedence to remove the parentheses in the condition expression: «if(match.index>c.index&&match.index<c.index+c.length)». I recommend that the programmer does not perform transformations like this merely in order to shrink their code a bit, it’s way too error prone.This example suggests another space saving that might be made. The variable match could be renamed m. That would save 8 characters just in this line, another 4 in its declaration, and 4 each time the variable was used. This kind of transformation is amongst the most thorny however. If the programmer does it then readability may be sacrificed on the altar of bandwidth minimisation (to a certain breed of programmer that grew up using a language where the length of a variable’s name affected its speed of execution, such parsimony may come naturally).

An automated minimisation tool could rename variables to make them shorter, but two features of the JavaScript language get in the way: with and eval. with is bad because it introduces a new set of scoped variables whose names are selected at runtime (in «with(x) {match}» the match variable may reference a property of x, if it exists, or be an ordinary reference to a variable in the function). Similarly eval is bad because the code to be evaluated can reference the local variables of the function making the call. It’s impossible to rename any variables because you can’t tell if code inside a with or an eval intends to reference (or avoid) those variables. Maybe it’s not so bad, because anyone following sensible JavaScript guidelines will already be deprecating those two features.

return false;}

Two things here. The semicolon is redundant and can be removed. [ECMA 262-3] has bizarre and arcanely specified rules for when semicolons are required, but in this case they boil down to “it can be removed”. In fact a semicolon before a closing squiggly bracket, «}», can almost always be removed (bonus credit for knowing when it can’t).The other minimisation possible is that the entire «return false» statement is a candidate for removal, but the programmer has to do this. If instead of «return false» a function “drops off the bottom” then this is equivalent to «return undefined»; in a boolean context, if the function is being used in an if for example, then undefined is just as good as false. Again, like the «var doc=null» and the «flashcopier==null» example, the programmer will have to decide whether the transformation is possible without breaking the program.

(i%2==0)?'alt':''

Could almost be replaced by «(i%2)?”:’alt’» (negating the condition and swapping the two alternatives) except that when «i%2» evaluates to NaN these two expressions have different meanings. An automatic transforming tool could make the transformation if it could infer the type of i to be a finite number or if the programmer could declare it to the tool. There are quite a few transformations like this, where the transformation would be possible given only a little hint about the type. Incidentally if the programmer had written «i&1==0» then that could be automatically translated with no type inference needed.I was just about to wrap up when I saw…

var regex=new RegExp('^\\s*','g')

Which is just sheer Javaism creeping into JavaScript. The correct way to write this is «var regex=/^\s*/g».

Conclusion

There are a few things a programmer can do to make their JavaScript codes smaller for the wire. Most of these also make the JavaScript less readable so are not recommended. Perhaps the only transformation that I can recommend the programmer do is to only use one var statement per function, using commas to separate the variables (this also makes the code align with the semantics).

There is scope for minimisation tools that have a deeper understanding of JavaScript. JsMin does basically nothing more than remove comments and spaces and it gets a surprisingly long way, getting any further will require parsing. (There are other minimisation tools, I just haven’t investigated them, and they don’t seem to be as popular)

A declare syntax, so that the programmer could annotate the program with information that the compiler can either exploit or check (as appropriate), would be really useful. The kinds of thing I am thinking of are type declarations and assertions. In Lisp you can go «(declare (type unsigned-byte a))» to declare that the variable a is constrained to be a non-negative integer. Most Lisp compilers will allow you to run this code in a mode where this declaration is checked (extra safety) or, alternatively, assumed (extra speed). It would be good to have a similar feature in JavaScript.

JavaScript: Functions are Constructors are Functions

2007-09-13

Some people seem to be a bit confused about the return value of a JavaScript function when used as a constructor. So what does «new Foo(some args)» evaluate to?

If the constructor returns an object then that is used; otherwise the return value is irrelevant and the newly constructed object is used.

The immediate and practical upshot this is that for a constructor it doesn’t matter whether you use «return» or «return this». I would say that a constructor’s normal job is to make objects so if you must return explicitly from a constructor then just use the simple «return» form.

The amusing and alarming upshot is that the behaviour of a function can be radically different according to whether it is invoked as a function or a constructor. Imagine you have a subtraction function:

function Sub(x, y) {
  return x-y
}

Invoke it like «Sub(9,5)» and it returns 4. What does it do when used in «new Sub(9,5)»? Well, technically the function still returns 4, but because 4 is not an object that value is ignored and the newly constructed object is returned instead. A new expression is guaranteed to evaluate to an object (or raise an exception).

Consider (at an interactive JavaScript prompt):

js> function Sub(x, y) {
  result = x-y
  return result
}
js> Sub.prototype.toString = function() { return 'I am a walrus' }
js> Sub(9,5)
4
js> result
4
js> new Sub(100,1)
I am a walrus
js> result
99

When invoked as a constructor we get a new object that prints as «I am a walrus» because of the toString method in its prototype. I have deliberately changed Sub so that it writes to a global variable so that you can convince yourself that it is being called in the constructor case even though its return value is not being used.

Do not do this at home!

I am not recommending the above, I just find it an amusing way to illustrate an obscure feature of JavaScript.

I recommend that you make it crystal clear which of your functions are constructors and never encourage anyone to call them as a normal function. The fact that all functions can be used as constructors (and vice versa, any constructor can be invoked as a function) is an attractive nuisance. Sure, you can avoid it by being careful, but it’s too easy to do accidentally. Douglas Crockford has an article about the attractive nuisance of fallthrough in switch, which is where I got the term.

The usual convention, which seems very sensible to me, is to start constructors with a capital letter. That’s makes them look a bit more like classes too.

To some extent you can protect yourself against accidentially calling a constructor as a function. You can test whether this is the global object (which it should be in a function) or not the global object (which it should be in a constructor). But only if you first remember what the global object is:

js> g=this
[object global]
js> function Foo() {
  if(this === g) {
    throw 'I am a constructor, not a function'
  }
}
js> new Foo()
[object Object]
js> Foo()
uncaught exception: I am a constructor, not a function

Because the first OO language I learnt was Dylan I find that the artificial separation of constructors and ordinary functions (which may or may not return fresh objects) is also attractive nuisance. It’s an attractive nuisance that has worked its way (like fallthrough in switch) into many programming languages: C++, Java, JavaScript. Even Common Lisp has it, though I suspect that unwrapped uses of make-instance are actively discouraged. In the other languages I mention, the idea that you might use something other than a constructor to get a fresh object is so unusual that there’s a name for this idea: factory method pattern!

Appendix: Standard References

All in ECMA 262 3rd edition.

Section 11.2.2 defines the semantics of the new operator. Which when used like this:

new Foo()

The semantics are to grab the Foo object and call its [[Construct]] method. When Foo is a constructor that you defined yourself, that is a JavaScript function, the [[Construct]] method is described in section 13.2.2 and has roughly the behavious I describe above: construct a «new Object()» (and let it be this); call the function; if it yields an object then use it, otherwise use this.

Follow

Get every new post delivered to your Inbox.