Archive for the 'programming' Category

Dictionary ordering and floating point


This is an expanded version of an earlier tweet of mine.

Consider the following Python program:

    d = {k:k for k in 'bfg'}

The output of this program changes from run to run:

    drj$ ./ 
    {'g': 'g', 'f': 'f', 'b': 'b'}
    drj$ ./ 
    {'b': 'b', 'g': 'g', 'f': 'f'}

Dictionaries in Python are implemented using hash tables, so when iterating over a dictionary, the order of the keys is not generally predictable.

You might have known that it wasn’t predictable, but you might not have known that it can change from one Python process to the next. This is a good thing. It avoids a denial-of-service attack that is cross language and cross framework. But don’t worry about that, it’s been fixed since Python 3.3, released over 2 years ago now.

Recently we (at ScraperWiki) came across this in the context of floating point addition. What if the order of a dictionary determines the order of a summation? This program adds up the values of a dictionary:

    d = dict(b=0.2, f=0.6, g=0.7)

It doesn’t always produce the same answer:

    drj$ ./ 
    drj$ ./ 

This is obviously a very smaller difference, but as it happened we were dynamically generating a web page that showed rounded results. 1.4999999999999998 rounds to 1, 1.5 rounds to 2. You could reload the web page and the value shown would sometimes flip from 1 to 2, and vice versa (well, these aren’t the real numbers, but you get the idea).

There are a few things worth pointing out here. As a convenience instead of writing “3602879701896397 / 18014398509481984″ I’ve written “0.2”, and similarly “5404319552844595 / 9007199254740992″ is written “0.6”, and “3152519739159347 / 4503599627370496″ is written “0.7”.

The second program is just calling sum(), so in this case I don’t get to pick the order of adding up for two reasons: 1) The order of the list that is input to sum() is not chosen by me; 2) The order in which sum() sums things might be different anyway (this turns out to be a surprisingly subtle subject, but I don’t want to get into math.fsum).

Tweets like this: “I don’t care if float addition is associative or not” miss the point I was making, which is that floating point addition is not associative (meaning that (0.2 ⟡ 0.6) ⟡ 0.7) != 0.2 ⟡ (0.6 ⟡ 0.7) where ⟡ means floating point addition). Floating point arithmetic is an abstraction and sometimes you need to understand the abstraction and what lies beneath in order to debug your webpage.

In this case we fixed the problem by avoiding floating point until the very last step. The numbers we were adding were all of the form p/20 (where p is an integer). So doing the adding up in integers and then doing a single division at the end means that we get the exact answer (or the nearest representable floating point number when the exact answer is not representable).

shell booleans are commands!


How should we represent boolean flags in shell? A common approach, possibly inspired by C, is to set the variable to either 0 or 1.

Then you see code like this:

if [ $debug = 1 ]; then

or this example from zgrep:

if test $have_pat -eq 0; then

there is nothing special about 0 and 1, they are just two strings for repreresenting “the flag is set” and “the flag is unset”.

Test for strings is surprisingly awkward in shell. In Python you can go if debug: .... It would be nice if we could do something similar in shell:

if $debug ; then

Well we can. In a shell if statement, if thing, the thing is just a command. If we arrange that debug is either true or false, then if $debug will run either the command true or the command false.

debug=true # sets flag
debug=false # unsets flag

I wish I could remember who I learnt this trick off because I think it’s super cool, and not enough shell programmers know about it. true and false are pretty much self explanatory as boolean values, and no extra code is needed because they already exist as shell commands.

You can also use this with &&:

$debug && stuff

Sometimes shell scripts have a convention where a variable is either unset (to mean false) or set to anything (to mean true). You can convert from this convention to the true/false convention with 2 lines of code:

# if foo is set to anything, set it to "true"
# if foo is the empty string, set it to "false"

bash functions: it is mad!


bash can export functions to the environment. A consequence of this is that bash can import functions from the environment. This leaves us #shellshocked. #shellshock aside, why would anyone want to do this? As I Jackson says: “it is mad”.

Exporting bash functions allows a bash script, or me typing away at a terminal in bash, to affect the behaviour of basically any other bash program I run. Or a bash program that is run by any other program.

For example, let’s say I write a program in C that prints out the help for the zgrep program:

#include <stdlib.h>

int main(void)
    return system("zgrep --help");

This is obviously just a toy example, but it’s not unusual for Unix programs to call other programs to do something. Here it is in action:

drj$ ./zgrephelp
Usage: /bin/zgrep [OPTION]... [-e] PATTERN [FILE]...
Look for instances of PATTERN in the input FILEs, using their
uncompressed contents if they are compressed.

OPTIONs are the same as for 'grep'.

Report bugs to <>.

Now, let’s say I define a function called test in my interactive bash session:

test () { bob ; }

This is unwise (test is the name of a well know Unix utility), but so far only harmful to myself. If I try and use test in my interactive session, things go a bit weird:

drj$ test -e /etc/passwd
The program 'bob' is currently not installed. You can install it by typing:
sudo apt-get install python-sponge

but at least I can use bash in other processes and it works fine:

drj$ bash -c 'test -e /etc/passwd' ; echo $?

What happens if I export the function test to the environment?

drj$ export -f test
drj$ ./zgrephelp
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
gzip: /bin/zgrep: bob: command not found
--help.gz: No such file or directory
/bin/zgrep: bob: command not found
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found

zgrephelp stops working. Remember, zgrephelp is written in C! Of course, zgrephelp runs the program zgrep which is written in… bash! (on my Ubuntu system).

Exporting a function can affect the behaviour of any bash script that you run, including bash scripts that are run on your behalf by other programs, even if you never knew about them, and never knew they were bash scripts. Did you know /bin/zcat is a bash script? (on Ubuntu)

How is this ever useful? Can you ever safely export a function? No, not really. Let’s say you export a function called X. Package Y might install a binary called X and a bash script Z that calls X. Now you you’ve broken Z. So you can’t export a function if it has the same name as a binary installed by any package that you ever might install (including packages that are you never use directly but are installed merely to compile some package that you do want to use).

Let’s flip this around, and consider the import side.

When a bash script starts, before it’s read a single line of your script it will import functions from the environment. These are just environment variables of the form BASH_FUNC_something()=() { function defintion here ; }. You don’t have to create those by exporting a function, you can just create an environment variable of the right form:

drj$ env 'BASH_FUNC_foo()=() { baabaa ; }' bash -c foo
bash: baabaa: command not found

Imagine you are writing a bash script and you are a conscientious programmer (this requires a large amount of my imagination). bash will import potentially arbitrary functions, with arbitrary names, from the environment. Can you prevent these functions being defined?

It would appear not.

Any bash script should carefully unset -f prog for each prog that it might call (including builtins like cd; yes you can define a function called cd).

Except of course, that you can’t do this if unset has been defined as a function.

Why is exporting functions ever useful?

Why big is bad


If you know me, you know that I don’t like using /bin/bash for scripting. It’s not that hard to write scripts that are portable, and my earlier “10 tips” article might help.

Why don’t I like /bin/bash? There are many reasons, but it’s mostly about size.

drj$ ls -lL $(which sh bash)
-rwxr-xr-x 1 root root 959120 Sep 22 21:39 /bin/bash
-rwxr-xr-x 1 root root 109768 Mar 29 2012 /bin/sh

/bin/bash is nearly 10 times the size of /bin/sh (which in this case, is dash). It’s bigger because it’s loaded with features that you probably don’t need. An interactive editor (two in fact). That’s great for interactive use, but it’s just a burden for non-interactive scripts. Arrays. Arrays are really super useful and fundamental to many algorithms. In a real programming language. If you need arrays, it’s time for your script to grow up and become a program, in Python, Lua, Go, or somesuch.

Ditto job control.
Ditto Extended Regular Expression matching.
Ditto mapfile.
Ditto a random number generator.
Ditto a TCP/IP stack.

You might think that these things can’t harm you if you don’t use them. That’s not true. We have a little bit of harm just by being bigger. When one thing is 10 times bigger than it needs to be, no one will notice. When everything is 10 times bigger than it needs to be then it’s wasteful, and extremely difficult to fix. These features take up namespace. Got a shell script called source or complete? Can’t use it, those are builtins in bash. They slow things down. Normally I wouldn’t mention speed, but 8 years ago Ubuntu switched from bash to dash for the standard /bin/sh and the speed increase was enough to affect boot time. Probably part of the reason that bash is slower is simply that it’s bigger. There are more things it has to do or check even though you’re not making use of those features.

If you’re unlucky a feature you’ve never heard of and don’t use will interact with another feature or a part of your system and surprise you. If you’re really unlucky it will be a remote code exploit so easy to use you can tweet exploits, which is what ShellShock is. Did you know you can have functions in bash? Did you know you can export them to the environment? Did you know that the export feature works by executing the definition of the function? Did you know that it’s buggy and can execute more than bash expected? Did you know that with CGI you can set environment variables to arbitrary strings?

There are lots of little pieces to reason about when considering the ShellShock bug because bash is big. And that’s after we know about the bug. What about all those features of you don’t use and don’t even know about? Have you read and understood the bash man page? Well, those features you’ve never heard of are probably about as secure as the feature that exports functions to the environment, a feature that few people know about, and fewer people use (and in my opinion, no one should use).

The most important thing about security is attitude. It’s okay to have the attitude that a shell should have lots of useful interactive features; it’s arguable that a shell should have a rich programming environment that includes arrays and hash tables.

It’s not okay to argue that this piece of bloatware should be installed as the standard system shell.

Coding is like Cooking


Coding is like cooking.

Well, not really. But a bit. This is not an article about how recipes are like programs, it’s about the role that cooking has in our personal lives and in society.

I can cook. A bit. Well enough that I can cook for my household, and friends that might drop by. I don’t always eat frozen pizza. Day to day cooking I can mostly do without a written recipe (spag bol, salmon and broccoli, that kind of thing), but when we entertain I’ll generally use a recipe; we own a few too many cookbooks and I can find recipes online. Perhaps one or two dishes I can make well enough that they’re actually good. So I can cook, but not well enough that anyone would pay me to do it. And as for being a chef there are probably other skills that professional cooks have that are part of the job that are simply not on my radar. Planning a menu, choosing suppliers, managing a kitchen.

I’m not suggesting that because I can cook a bit that I’ll be a cook. But conversely just because I can’t be a chef, that doesn’t mean that cooking is pointless. I don’t cook because it’s useful to the economy or because it helps me get a better job. I don’t do it as a hobby. I cook because it’s useful to me personally. It’s a sort of basic life skill.

I imagine most people are like me in this regard. They can cook, something. The amount of cooking that people do might vary. Some people will do it as a hobby, cooking things for their friends every week. Some people will do it professionally and cook for hundreds or thousands of people.

I would like programming to be like this.

I think most people can program. A bit. Not to a professional level, not to a standard where they would be comfortable getting paid to do it. Some people might like programming a bit more and do it as a hobby. Again, doesn’t mean they would be paid to do it.

You don’t have to be a programmer to program. Just like you don’t have to be a cook to cook.


I have friends who I think of as cooks. Some of them do it as a hobby, some professionally. One thing I’ve noticed is that they enjoy cooking, and they like to tinker. They’ll try cooking something just for fun on a Saturday morning. The professionals will try some new technique or ingredient or idea because it will expand their power. I don’t do that. Not with cooking anyway. But I do with programming. I’ll write a browser-based Logo implementation to learn a bit about SVG and JavaScript. Or I’ll learn a new language because it seems fun and might stretch me intellectually.

Are there problems with this analogy? Yes there are:

  • Cooking is useful in itself. The result is usually a meal; you can eat it. This is not so clearly true of programming. There are useful programs to be written, but by and large the kinds of programs that non-professional coders write are not all that useful (yet; see below).
  • Terminology. Lots of people (professionals and hobbyists alike) seem to think that when you say “teach people to program” you mean “train people to be professional programmers”. We don’t. Or at least, I don’t. No more than “teaching people to cook” means “teaching people to be cooks”.
  • Access

    The access route to cooking is pretty straightforward. Most homes and (all?) schools have a hob or a cooker or a microwave. You can cook something with that. I don’t have to buy a stainless steel counter and professional range to cook. You shouldn’t need “pro-level” tools to program. You shouldn’t have to buy a specialist computer and learn how to use an editor like vi, emacs, or TextMate.

    So I think we have to make the access route to programming simpler.

    Tools like ScraperWiki’s Code in your Browser tool help (full disclosure: I work for the company that made this), as may things like CoffeePlay (note I said things like CoffeePlay, which is a whimsical tool created over a weekend, not a serious product). You can start programming using just your browser. There is a browser based version of MIT’s Scratch programming environment. The Raspberry Pi helps.

    At the moment, I think these things show possible futures. They are hints at how we can make coding more accessible. But I think it is possible to make the route to programming simpler. A child can take the first steps in cooking by mixing flour and butter, and putting scones in the oven. We need to make programming just as easy and accessible.

    We need to make it easier to do more things with code. What I mean here is more hackable things in the spirit of the maker community, hack spaces, and so on. It’s kind of neat that I can log into my Kobo ereader and modify the software, it’s a shame I can’t do that with my Kindle so easily. I really love my label printer, but I wish I could hack the firmware on it.

    I don’t think everyone should be a cook, but I think everyone should cook.

    Traditional approach to ATtiny programming


    I recently bought an Adafruit Gemma. It’s a little programming board that is slightly bigger than a 10p coin and it costs about GBP 7.

    It uses an ATtiny85 micro and is Arduino compatible, so the way you’re encouraged to program is to use the Arduino GUI tools and all that good stuff.

    By “traditional approach” I mean grumpy old man approach. I don’t like GUIs much. I can’t use vi and the rendering of the font in the editor is terrible. And the syntax highlighting burns my eyes.

    So you can just use the command line tools, right? Right. On Ubuntu you can apt-get install gcc-avr and then use the avr-gcc compiler to compile your C code. You’ll need avr-objcopy (from the binutils-avr package) to convert your .elf file into an Intel .hex file, and you’ll need avrdude (from the avrdude package) to flash the device. The gory details are captured in this Makefile, and I got those gory details by switching the Arduino GUI into its verbose mode and watching it compile my project.

    My first demo project is also an exercise in avoiding the Arduino libraries. Mostly because when I was working out how to use the command line tools, I didn’t want the hassle of dealing with multiple files and libraries and things.

    So this is also an example of how to program the ATtiny85 (and more or less any AVR type micro) without using heavyweight libraries. The Gemma has a built-in red LED on PB1. This was definitely one of the things that attracted me to the Gemma. I can program it do something without needing to plug any extra hardware in. Specifically, I can flash the LED.

    Flashing an LED is a matter of using a GPIO pin and driving it high (on) and low (off). The assembler programmer would do that with the SBI (Set BIt) and CBI (Clear BIt) instructions. So I’m thinking “Can we have reasonable looking C code generate The Right Instruction?”.

    The C code to set a bit is generally of the form *p |= b where p is a pointer to some memory location and b is a number with a single bit set (a power of 2 in other words). Similarly the C code to clear a bit is *p &= ~b. As long as we give the compiler enough information, it should be able to compile the code *p |= b into an SBI instruction.

    And so it can. Through some fairly tedious but also fairly ordinary C macros, I can write BIT_SET(PORTB, 1) to set pin 1 or PORTB (PB1, the pin with the LED attached), and it gets converted into roughly: *(volatile uint8_t *)0x38 |= 2; which is basically saying modify memory location 0x38 by setting its bit 1. In a little oddity of the AVR architecture the SBI and CBI instructions operate on IO addresses which are at memory locations 0x20 onwards. The upshot of this is that memory location 0x38 is modified with a SBI 0x18 instruction (this mystified me for about 2 hours last night, and I realised what was wrong just as I was drifting off to sleep).

    Because in the code *(volatile uint8_t *)0x38 |= 2; both the location, 0x38, and the value, 2, are constant, the compiler has everything it needs to generate the right SBI instruction. And it does!

    drj$ avr-objdump -d main.elf 2>&1 | sed -n '/<main>:/,$p' | sed 9q
    00000040 <main>:
      40:	1f 93       	push	r17
      42:	b9 9a       	sbi	0x17, 1	; 23
      44:	c1 9a       	sbi	0x18, 1	; 24
      46:	88 ec       	ldi	r24, 0xC8	; 200
      48:	90 e0       	ldi	r25, 0x00	; 0
      4a:	f2 df       	rcall	.-28     	; 0x30 <delay>
      4c:	c1 98       	cbi	0x18, 1	; 24
      4e:	88 ec       	ldi	r24, 0xC8	; 200

    You can see at the beginning of the disassembly of main the SBI 0x17, 1 instruction which is as a result of the macro BIT_SET(DDRB, 1) (setting the pin to be a digital output). And you can see SBI 0x18, 1 to drive the pin high and light the LED and CBI 0x18, 1 to drive the pin low. The compiler has even subtracted 0x20 from the addresses.

    avr-gcc -Os FTW!

    classy enumerations


    An enumeration is a term that usually refers to a type consisting of a finite number of distinct members. The members themselves can be tested for equality, but usually their particular value is not important.

    Maybe you’re modelling a Sphex wasp and you have a state variable with values NOTNEARHOME, JUSTLEFTHOME. You could represent that with an enumeration.

    In C the enum keyword assigns (usually small) integers to constant identifiers. It is problematic, chiefly because the members of the enumeration are really just integers. After enum { Foo; }, then code like Foo / 5 is valid (note: valid, not sensible).

    In Python you could do essentially the same thing:

    if self.state == NOTNEARHOME:
        if 'spider' in self.inventory:
            # head towards home
            # look for juicy spiders

    You do see this style (ast.PyCF_ONLY_AST Note 1), but it has the same problems as enum in C. The values are just integers (so, for example, print self.state will print 0, or 1).

    You could use strings (like decimal.ROUND_HALF_EVEN):

    NOTNEARHOME = 'notnearhome'
    # and so on...

    That’s better because now I might have a clue if a print out a value and it’s 'notnearhome', but it’s only a little bit better, because you still might accidentally use the value innappropriately (opt = decimal.ROUND_HALF_EVEN; opt += 'foo').

    I have a proposal:

    Use a class!

    class NOTNEARHOME: pass         # Note 2
    class JUSTLEFTHOME: pass

    Let’s call this classy enumerations.

    Classy enumerations have the advantage that we don’t need to manually assign numbers or strings. Values like Mood.PUZZLED and Mood.CONFUSED (which are actually classes) will be unique, so can be tested using == or is correctly.

    With classy enumerations we get an error if we accidentally use them in an expression:

    >>> PUZZLED+1
    Traceback (most recent call last):
      File "", line 1, in 
    TypeError: unsupported operand type(s) for +: 'classobj' and 'int'

    And to wrap up:

    class True: pass
    class False: pass

    This article was inspired by looking at some of Peter Waller‘s code who seems to have invented the idea of using classes for enumerations.

    Note 1

    Yes this value matches a value in the C header file. Maybe that has some merit, but it doesn’t make for a very Pythonic interface.

    Note 2

    The body of a class definition is just a block of code. When that body is just a simple statement, it can share the line with the class declaration. Hence, class NOTNEARHOME: pass is a compact complete class definition. If you’re in a mood for documentation, replace “pass” with a docstring.

    Explaining p += q in Python


    If you’re a Python programmer you should know about the augmented assignment statements:

    i += 1

    This adds one to i. There is a whole host of these augmented operators (-=, *=, /=, %= etc).

    Aside: I used to call these assignment operators which is the C terminology, but in Python assignment is a statement, not an expression (yay!): you can’t go while i -= 1 (and this is a Good Thing).

    An augmented assignment like i += 1 is often described as being the same as i = i + 1, except that i can be a complicated expression and it will not be evaluated twice.

    As Julian Todd pointed out to me (a couple of years ago now), this is not quite right when it comes to lists.

    Recall that when p and q are lists, p + q is a fresh list that is neither p nor q:

    >>> p = [1]
    >>> q = [2]
    >>> r = p + q
    >>> r
    [1, 2]
    >>> r is p
    >>> r is q

    So p += q should be the same as p = p + q, which creates a fresh list and assigns a reference to it to p, right?

    No. It’s a little bit tricky to convince yourself of this fact; you have to keep a second reference to the original p (called op below):

    >>> p = [1]
    >>> op = p
    >>> p += [2]
    >>> p
    [1, 2]
    >>> op
    [1, 2]
    >>> p is op

    Here it is in pictures:

    Because of this, it’s slightly harder to explain how the += assignment statement behaves. For numbers we can explain it by breaking it down into a + operator and an assignment, but for lists this explanation fails because it doesn’t explain how p (in our p += q example) retains its identity (the curious will have already found out that+= is implemented by calling the __iadd__ method of p).

    What about tuples?

    When p and q are tuples the behaviour of += is more like numbers than lists. A fresh tuple is created. It has to be, since you can’t mutate a tuple.

    This kind of issue, the difference between creating a fresh object and mutating an existing one, lies at the heart of understanding the P languages (perl, Python, PHP, Ruby, JavaScript).

    The keen may wish to fill in this table:

    p q p + q p += q
    list list fresh p mutated
    tuple tuple fresh fresh
    list tuple ? ?
    tuple list ? ?

    On compiling 34 year old C code


    The 34 year old C code is the C code for ed and sed from Unix Version 7. I’ve been getting it to compile on a modern POSIXish system.

    Some changes had to be made. But not very many.

    The union hack

    sed uses a union to save space in a struct. Specifically, the reptr union has two sub structs that differ only in that one of them has a char *re1 field where the other has a union reptr *lb1. In the old days it was possible to access members of structs inside unions without having to name the intermediate struct. For example the code in the sed implementation uses rep->ad1 instead of rep->reptr1.ad1. That’s no longer possible (I’m pretty sure this shortcut was already out of fashion by the time K&R was published in 1978, but I don’t have a copy to hand).

    I first changed the union to a struct that had a union inside it only for the two members that differed:

    struct	reptr {
    		char	*ad1;
    		char	*ad2;
    	union {
    		char	*re1;
    		struct reptr	*lb1;
            } u;
    		char	*rhs;
    		FILE	*fcode;
    		char	command;
    		char	gfl;
    		char	pfl;
    		char	inar;
    		char	negfl;
    } ptrspace[PTRSIZE], *rep;

    The meant changing a few “union reptr” to “struct reptr”, but most of the member accesses would be unchanged. ->re1 had to be changed to ->u.re1, but that’s a simple search and replace.

    It wasn’t until I was explaining this ghastly use of union to Paul a day later that I realised the union is a completely unnecessary space-saving micro-optimisation. We can just have a plain struct where only one of the two fields re1 and lb1 were ever used. That’s much nicer, and so is the code.

    The rise of headers

    In K&R C if the function you were calling returned int then you didn’t need to declare it before calling it. Many functions that in modern times return void, used to return int (which is to say, they didn’t declare what they returned, so it defaulted to int, and if the function used plain return; then that was okay as long as the caller didn’t use the return value). exit() is such a function. sed calls it without declaring it first, and that generates a warning:

    sed0.c:48:3: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default]

    I could declare exit() explicitly, but it seemed simpler to just add #include <stdlib.h>. And it is.

    ed declares some of the library functions it uses explicitly. Like malloc():

    char	*malloc();

    That’s a problem because the declaration of malloc() has changed. Now it’s void *malloc(size_t). This is a fatal violation of the C standard that the compiler is not obliged to warn me about, but thankfully it does.

    The modern fix is again to add header files. Amusingly, version 7 Unix didn’t even have a header file that declared malloc().

    When it comes to mktemp() (which is also declared explicitly rather than via a header file), ed has a problem:

    tfname = mktemp("/tmp/eXXXXX");

    2 problems in fact. One is that modern mktemp() expects its argument to have 6 “X”s, not 5. But the more serious problem is that the storage pointed to by the argument will be modified, and the string literal that is passed is not guaranteed to be placed in a modifiable data region. I’m surprised this worked in Version 7 Unix. These days not only is it not guaranteed to work, it doesn’t actually work. SEGFAULT because mktemp() tries to write into a read-only page.

    And the 3rd problem is of course that mktemp() is a problematic interface so the better mkstemp() interface made it into the POSIX standard and mktemp() did not.

    Which brings me to…

    Unfashionable interfaces

    ed uses gtty() and stty() to get and set the terminal flags (to switch off ECHO while the encryption key is read from the terminal). Not only is gtty() unfashionable in modern Unixes (I replaced it with tcgettattr()), it was unfashionable in Version 7 Unix. It’s not documented in the man pages; instead, tty(4) documents the use of the TIOCGETP command to ioctl().

    ed is already using a legacy interface in 1979.

    CoffeeScript is 4 times shorter than C


    Compare FreeBSD’s sed.c (implemented in C) with drj’s sed.js (implemented in CoffeeScript).

    A size comparison

                  lines bytes
    C             2471  62k
    CoffeeScript  533   14k

    That right there is CoffeeScript’s chief advantage over C. The CoffeeScript is 4 times smaller, meaning CoffeeScript programs are likely to be faster to write, easier to maintain, and contain fewer bugs.

    Is it an apples to apples comparison? The FreeBSD sed implements a couple more options than my mostly POSIX compliant sed.js but they don’t pad it out much. Neither implementation includes code to handle Regular Expressions. sed.c uses the Unix library and uses JavaScript’s built-in RegExp class. So I think it’s a pretty reasonable comparison.

    What makes the C code bigger? There is a lot of memory management, and an awful lot of string copying and substring extraction. There is an implementation of a hash table (for labels).

    Obviously we all knew that high level languages resulted in smaller programs, but it’s rare to get the opportunity to do such a direct comparison.


    Get every new post delivered to your Inbox.

    Join 28 other followers