Archive for September, 2013

On compiling 34 year old C code

2013-09-01

The 34 year old C code is the C code for ed and sed from Unix Version 7. I’ve been getting it to compile on a modern POSIXish system.

Some changes had to be made. But not very many.

The union hack

sed uses a union to save space in a struct. Specifically, the reptr union has two sub structs that differ only in that one of them has a char *re1 field where the other has a union reptr *lb1. In the old days it was possible to access members of structs inside unions without having to name the intermediate struct. For example the code in the sed implementation uses rep->ad1 instead of rep->reptr1.ad1. That’s no longer possible (I’m pretty sure this shortcut was already out of fashion by the time K&R was published in 1978, but I don’t have a copy to hand).

I first changed the union to a struct that had a union inside it only for the two members that differed:

struct	reptr {
		char	*ad1;
		char	*ad2;
	union {
		char	*re1;
		struct reptr	*lb1;
        } u;
		char	*rhs;
		FILE	*fcode;
		char	command;
		char	gfl;
		char	pfl;
		char	inar;
		char	negfl;
} ptrspace[PTRSIZE], *rep;

The meant changing a few “union reptr” to “struct reptr”, but most of the member accesses would be unchanged. ->re1 had to be changed to ->u.re1, but that’s a simple search and replace.

It wasn’t until I was explaining this ghastly use of union to Paul a day later that I realised the union is a completely unnecessary space-saving micro-optimisation. We can just have a plain struct where only one of the two fields re1 and lb1 were ever used. That’s much nicer, and so is the code.

The rise of headers

In K&R C if the function you were calling returned int then you didn’t need to declare it before calling it. Many functions that in modern times return void, used to return int (which is to say, they didn’t declare what they returned, so it defaulted to int, and if the function used plain return; then that was okay as long as the caller didn’t use the return value). exit() is such a function. sed calls it without declaring it first, and that generates a warning:

sed0.c:48:3: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default]

I could declare exit() explicitly, but it seemed simpler to just add #include <stdlib.h>. And it is.

ed declares some of the library functions it uses explicitly. Like malloc():

char	*malloc();

That’s a problem because the declaration of malloc() has changed. Now it’s void *malloc(size_t). This is a fatal violation of the C standard that the compiler is not obliged to warn me about, but thankfully it does.

The modern fix is again to add header files. Amusingly, version 7 Unix didn’t even have a header file that declared malloc().

When it comes to mktemp() (which is also declared explicitly rather than via a header file), ed has a problem:

tfname = mktemp("/tmp/eXXXXX");

2 problems in fact. One is that modern mktemp() expects its argument to have 6 “X”s, not 5. But the more serious problem is that the storage pointed to by the argument will be modified, and the string literal that is passed is not guaranteed to be placed in a modifiable data region. I’m surprised this worked in Version 7 Unix. These days not only is it not guaranteed to work, it doesn’t actually work. SEGFAULT because mktemp() tries to write into a read-only page.

And the 3rd problem is of course that mktemp() is a problematic interface so the better mkstemp() interface made it into the POSIX standard and mktemp() did not.

Which brings me to…

Unfashionable interfaces

ed uses gtty() and stty() to get and set the terminal flags (to switch off ECHO while the encryption key is read from the terminal). Not only is gtty() unfashionable in modern Unixes (I replaced it with tcgettattr()), it was unfashionable in Version 7 Unix. It’s not documented in the man pages; instead, tty(4) documents the use of the TIOCGETP command to ioctl().

ed is already using a legacy interface in 1979.