sed, POSIX, and Node.js

2013-07-11

I’ve been implementing sed. A POSIX compatible sed in Node.js.

It just seemed to me that one day soon the world will need a suite of Unix utilities written in Node.js. And I shall be The One.

The experience has made me a bit sad about the POSIX spec. There are problems. For example, it’s not very good at documenting the actual or desired behaviour of classic Unix utilities:

sed has a D command.

This command deletes the initial portion of the pattern space up to the first newline (which may be the entire pattern space if a newline has not been introduced with an editing command or with N); then D begins a new cycle. At the start of this new cycle, the next line of input is loaded into the pattern space, but ONLY IF THE PATTERN SPACE IS EMPTY.

This last bit is missing from the 2004 edition of the POSIX spec. It’s fixed and documented correctly in the 2013 edition of the POSIX spec.

The behaviour of sed hasn’t changed since Version 7 in 1979. The D command has always skipped appending input (if there was anything left in the pattern space). Probably no sed ever had its D command behave in the way documented in the 2004 POSIX spec. Maybe if someone was to try building a version of sed from scratch using the 2004 POSIX spec and without reference to any other sed implementations. But who would be mad enough to do that?

At some point someone drafting the POSIX spec didn’t notice the actual behaviour of sed, made a mistake in documenting the behaviour of its D command, and noone noticed until 2013 (well, a few years before, presumably). Which brings me to…

The pace of change is glacial.

Another thing about the POSIX spec which saddened me a little was the way all sorts of bizarre, obscure, and not very useful behaviours get documented and locked-in. You knew that sed has a ! modifier that negates an address. So «sed -n /barf/!p» prints all the lines that do NOT match /barf/. Did you know you can have as many ! as you like? «sed -n /barf/!!!!p» has the same behaviour as the previous program. At least according to the spec, and the Version 7 sed that I tried. There’s no point to this. No real program relies on this behaviour, yet there it is in the spec, so you have to implement it (if you want to comply to the spec). GNU sed (popular on Linux) gives an error instead. Which brings me to…

You can’t really rely on what you read in the spec being implemented.

or…

GNU feel free to depart from the spec whenever they see fit to do so.

sed is a bit weak. For example, its regular expressions (POSIX Basic Regular Expressions) don’t even support «|» for alternation. POSIX has Extended Regular Expressions. Wouldn’t it be sensible to move towards adding Extended Regular Expressions to all the tools that only had Basic Regular Expressions? Well, maybe yes, but there seems to be no taste for doing that in a POSIX committee. And remember…

The pace of change is glacial.

About these ads

6 Responses to “sed, POSIX, and Node.js”

  1. mjb67 Says:

    I wrote a long rant about how useless POSIX is but I really don’t think the Internet would be improved by sharing it. Suffice to say that I agree with you.

  2. Paolo Bonzini Says:

    Actually I never knew about the ‘!’ tidbit… It’s not because GNU feels free to depart from the standard, it’s probably because no one ever noticed. In fact I think it was me who reported the bug in the description of the D command (http://austingroupbugs.net/view.php?id=282) and it was noticed that no spec of sed ever had a correct description of the D command.

  3. neal Says:

    things aren’t always what they seem when you look deeply into them. POSIX is committee driven, and in my experience nothing good comes from committees.

    the sed /blah/!p command seems like a “grep -v” equivalent. I can see a use of the “sed !p” command, when you’re stuck with a broken system in single user mode where a editor like vi is unavailable. example:

    sed -n /^myword/\!p some_file > some_file_without_line
    cp some_file some_file.save
    mv some_file_without_line some_file

  4. Jilles Tjoelker Says:

    sed -E has been accepted for the next POSIX issue: http://austingroupbugs.net/view.php?id=528

    POSIX is certainly slow in accepting new features, but the process often finds problems in new features, since a quite formal specification is required.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: