Writing JavaScript for JsMin

2007-11-05

https://s-ssl.wordpress.com/wp-content/plugins/highlight/shCore.js

apparently contains some JavaScript used on my blog (to do syntax highlighting in one or two articles that discuss code). The JavaScript has been run through Crockford’s JSMin. Examining the output is quite interesting; where we see opportunities for further minimisation these suggest either JavaScript coding practices that could be adopted or improvements to similar minimisation tools.

There are certain JavaScript coding practices that make a minimisation tool’s life harder:

var x=e
var y=f

has the same meaning as:

var x=e,y=f

The second form gets rid of the var token and is four characters shorter. But I think a minimisation tool would be quite bold to perform transformations like this, so programmers are better off doing this themselves. This is probably why Crockford’s style is to use one var and let all the variables share it, cuddled with commas. It seems to be a commonly emulated style.

var doc=null

The «=null» is probably redundant because just a plain «var doc» on its own is going to initialise the doc variable with the undefined value (ironically, there’s nothing undefined about undefined apart from its name). Sure, subsequent code could tell the difference between null and undefined but you’re unlikely to be actually interested in the difference. Because you could in principle write code that changed according to whether doc was null or undefined («if(doc===undefined)» being the most obvious example) then a minimisation tool is very unlikely to be able to correctly make the transformation, again the burden is on the programmer.Observation: «undefined == null» is true so code like «if(doc == null)» will continue to work anyway. Which leads me to…

if(flashcopier==null)

When this sort of code is used for feature detection it probably has the same intent as «if(!flashcopier)» which is 5 characters shorter. Again because these two fragments don’t have quite the same meaning (for example, when flashcopier is 0) it’s up to the programmer to decide if one can be replaced by the other.

if(str==null||str.length==0)

Whilst I fully appreciate the use of the «str==null» guard so that the «str.length» expression can’t throw an exception, it probably isn’t necessary. If str is intended to be a string then the code above is equivalent to «if(!str)» because the empty string is a falsy value in JavaScript (see my article on Iverson’s Convention for a cross-language comparison of true/false interpretation). This equivalence relies on str being a string; sure, the name is a big clue, but in the absence of a type annotation (either code or documentation) I can’t tell. Maybe str could be the empty Array, [], which would then have a different behaviour in the two tests. The programmer will have to decide if the replacement code is acceptable.

if(typeof(a[i])=='string'

Due to JavaScript’s operator precedence the test is equivalent to «typeof a[i]==’string’» which is a whole character smaller. This transformation is one that a minimisation tool should be able to make more easily than the programmer. After all a minimisation tool is in a position to know the operator precedence table in detail and correctly decide when parentheses are redundant, the programmer is not. Of course this would require that the minimisation tool do a full parse and JSMin doesn’t do anything like that.

if((match.index>c.index)&&(match.index<c.index+c.length))&#91;/sourcecode&#93;
Like the earlier <code>typeof</code> example the minimiser could use its knowledge of operator precedence to remove the parentheses in the condition expression: «if(match.index&gt;c.index&amp;&amp;match.index&lt;c.index+c.length)».  I recommend that the programmer does <em>not</em> perform transformations like this merely in order to shrink their code a bit, it's way too error prone.This example suggests another space saving that might be made.  The variable <var>match</var> could be renamed <var>m</var>.  That would save 8 characters just in this line, another 4 in its declaration, and 4 each time the variable was used.  This kind of transformation is amongst the most thorny however.  If the programmer does it then readability may be sacrificed on the altar of bandwidth minimisation (to a certain breed of programmer that grew up using a language where the length of a variable's name affected its speed of execution, such parsimony may come naturally).

An automated minimisation tool could rename variables to make them shorter, but two features of the JavaScript language get in the way: <code>with</code> and <code>eval</code>.  <code>with</code> is bad because it introduces a new set of scoped variables whose names are selected at runtime (in «with(x) {match}» the <var>match</var> variable may reference a property of <var>x</var>, if it exists, or be an ordinary reference to a variable in the function).  Similarly <code>eval</code> is bad because the code to be evaluated can reference the local variables of the function making the call.  It's impossible to rename any variables because you can't tell if code inside a <code>with</code> or an <code>eval</code> intends to reference (or avoid) those variables.  Maybe it's not so bad, because anyone following sensible JavaScript guidelines will already be deprecating those two features.

return false;}

Two things here. The semicolon is redundant and can be removed. [ECMA 262-3] has bizarre and arcanely specified rules for when semicolons are required, but in this case they boil down to “it can be removed”. In fact a semicolon before a closing squiggly bracket, «}», can almost always be removed (bonus credit for knowing when it can’t).The other minimisation possible is that the entire «return false» statement is a candidate for removal, but the programmer has to do this. If instead of «return false» a function “drops off the bottom” then this is equivalent to «return undefined»; in a boolean context, if the function is being used in an if for example, then undefined is just as good as false. Again, like the «var doc=null» and the «flashcopier==null» example, the programmer will have to decide whether the transformation is possible without breaking the program.

(i%2==0)?'alt':''

Could almost be replaced by «(i%2)?”:’alt’» (negating the condition and swapping the two alternatives) except that when «i%2» evaluates to NaN these two expressions have different meanings. An automatic transforming tool could make the transformation if it could infer the type of i to be a finite number or if the programmer could declare it to the tool. There are quite a few transformations like this, where the transformation would be possible given only a little hint about the type. Incidentally if the programmer had written «i&1==0» then that could be automatically translated with no type inference needed.I was just about to wrap up when I saw…

var regex=new RegExp('^\\s*','g')

Which is just sheer Javaism creeping into JavaScript. The correct way to write this is «var regex=/^\s*/g».

Conclusion

There are a few things a programmer can do to make their JavaScript codes smaller for the wire. Most of these also make the JavaScript less readable so are not recommended. Perhaps the only transformation that I can recommend the programmer do is to only use one var statement per function, using commas to separate the variables (this also makes the code align with the semantics).

There is scope for minimisation tools that have a deeper understanding of JavaScript. JsMin does basically nothing more than remove comments and spaces and it gets a surprisingly long way, getting any further will require parsing. (There are other minimisation tools, I just haven’t investigated them, and they don’t seem to be as popular)

A declare syntax, so that the programmer could annotate the program with information that the compiler can either exploit or check (as appropriate), would be really useful. The kinds of thing I am thinking of are type declarations and assertions. In Lisp you can go «(declare (type unsigned-byte a))» to declare that the variable a is constrained to be a non-negative integer. Most Lisp compilers will allow you to run this code in a mode where this declaration is checked (extra safety) or, alternatively, assumed (extra speed). It would be good to have a similar feature in JavaScript.

4 Responses to “Writing JavaScript for JsMin”

  1. Pauan Says:

    I am rather curious about the situation where you can’t leave off the semicolon. I tried a variety of returned values but couldn’t figure it out.

  2. drj11 Says:

    You’re talking about this teaser in the article:

    «In fact a semicolon before a closing squiggly bracket, «}», can almost always be removed (bonus credit for knowing when it can’t).»

    ?

    Hmm. I wish I’d left myself a hint, because it seems a bit tricky now…

    Aha!

    {for(a in []);}

    I never said the corner case was in sensible code. ;)

    • Pauan Says:

      Curious. In fact it occurs anytime a block expects a statement. For instance:

      {if(true)}
      {while(true)}

      etc.

      The reason for this is that both blocks are expecting a statement, but there is none, causing a syntax error. Thus:

      {if(true);}

      Is correct, because the semicolon counts as an empty statement. Another nice vulgarity that I believe was taken from C.


Leave a reply to Pauan Cancel reply