Archive for June, 2008

Wii love Endless Ocean


So far (1 evening of play) it seems to be a really good game, it’s instantly relaxing and manages to be fairly compulsive and interesting. The water is done really well in several different ways. Underwater in the lagoon the water dims just the right amount to make it feel like a private pool. The droplets running off your visor and evaporating when you surface are just spot-on.

It makes me want to play Wave Race 64, which still has most excellent water. Available on the Wii virtual console apparently.

I’m just left wondering:

1) Why even a pacific diving simulation game has an annoying bint of a side-kick; and,
2) When it’s all going to be revealed that the fish are actually a new military genetically modified breeding program for aqua-metroids, and there’s a lost pirate research vessel buried on the ocean floor. In other words: When do I get my plasma cannon?

Unredeemed by Cate’s boots


Cate Blanchett in jack boots, with a sword. Think on that for a moment or two. It has the potential to fulfil fantasies you never knew you had. With such promise how did it manage to go so wrong?

And that crystal skull was clearly a cheap plastic knock-off.

calloc when multiply overflows


Recall that calloc allocates contiguous space for n objects of size s:

void *calloc(size_t n, size_t s);

Observe that this is the same as allocating a block of size n×s. So why not just go malloc(n*s) instead? Well, for one thing calloc initialises the memory to all-bits-zero, but I’m not concerned with that right now. The problem with malloc(n*s) is that the product n×s can overflow:


On most 32-bit platforms this will overflow 32-bit arithmetic and be silently equivalent to malloc(131073). That’s a problem.

Because calloc is passed the factors of the product it can, in principle, test the factors to see if the product would overflow, before using the product. In the case where n×s would overflow it can simply fail the allocation (returning NULL) instead of allocating a block of completely the wrong size.

I don’t think the standard gives implementations much wiggle room in this area:

The calloc function allocates space for an array of nmemb objects, each of whose size
is size. The space is initialized to all bits zero.


The calloc function returns either a null pointer or a pointer to the allocated space.

“Returns either a null pointer or a pointer to the allocated space”, nothing about the behaviour being undefined if the implied required size exceeds size_t or anything like that. However, I don’t think any serious C programmer would think that calloc is required to catch the overflow case, certainly I didn’t expect it, and I think it would be a good thing.

So, what if you wanted to implement calloc so that it catches the overflow case and returns NULL. This boils down to:

Given n and s, each a size_t, how do you detect when the product n*s overflows?

Obviously I’m more interested in portable solutions than not.

After a brief bit of thinking about multi-word arithmetic and other such horribleness the following occurred to me:

if((size_t)-1 / n < s) { return NULL; } [/sourcecode] (note that (size_t)-1 is guaranteed to be the largest value that will fit in a size_t)

I like this code, it’s portable, it’s clear, it does the right thing. The only eensy-teensy leetle worry I have is that the division might be a bit slow. Since the allocation of memory can take as few as 20 instruction cycles (see [GRUNWALD 1993]), an additional 40 or so cycles for a division might be too much overhead sometimes. We could have a fast conservative test for the okay case. For example if both n and s are smaller than 65536 then we’re guaranteed to not overflow:

if((n > 65535 || s > 65535) &&
(size_t)-1 / n < s) { return NULL; } [/sourcecode] This avoids doing the division when both n and s are suitably small. Of course there are still a range of cases where the product n*s does not overflow, but we end up doing the division test anyway. That’s not so bad because in those cases we’re allocating at least 65536 bytes anyway.

The 65536 test is a bit weak. It’s the largest value we can portably test against because size_t‘s maximum value might be as low as 655362-1. Can we portably do any better? If we could compute square roots at compile time we could do something like this:

if((n > sqrt((size_t)-1) || s > sqrt((size_t)-1) &&
(size_t)-1 / n < s) { return NULL; } [/sourcecode] but I can't think of way to do that. What if we merely accommodate the next most common size for size_t, namely 64 bits? Observe that 264-1 is 641×6700417×4294967295 (the latter factor is 232-1, the first two factors are the prime factors of 232+1).

Here is a portable test for “size_t has at least 64 bits” that is a compile time constant:

((size_t)-1)/641u/6700417u >= 4294967295u

This is portable because all the literals are guaranteed to fit in unsigned long.

So we can use this test to select a lower bound for maximum size_t value’s square root:

#define LBMAXSZSQRT ((((size_t)-1)/641u/6700417u >= 4294967295u) ? \
65535 : 4294967295)

(size_t)-1 / n < s) { return NULL; } [/sourcecode] We could extend this approach to support this sort of early-pass test for any finite number of size_t widths, but 32 and 64 seem to be the most important. In any case we still get some sort of early-pass test that avoids the divide whatever width size_t is.

So that’s how to make a nice robust calloc. We’ve all heard the blurb about how we should rely on the C library wherever possible because its routines will be highly tested, robust, and carefully optimised. So naturally every existing calloc out there performs this overflow check and doesn’t just multiply its arguments together? Oh no. Even I was surprised.

On OS X (10.4.11) calloc(65537, 65537) returns a non-NULL pointer. This is a bug.

PHK’s code in NetBSD is typical, it just multiples the two arguments and uses the result, reckless to overflow.

Hurray for FreeBSD: they do almost exactly what I suggest, but with a fast test that is just the tiniest bit unportable.

[2008-10-31: Thanks to APS for spotting I had the comparison the wrong way round in «(size_t)-1 / n < s», throughout. Now corrected.]


[GRUNWALD 1993] “CustoMalloc: Efficient Synthesized Memory Allocators”; SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(8), 851–869 (AUGUST 1993); 1993-08

double puzzle: the next floor


As I was looking at ms I came across some C code of the form:

floor(1 + x) - 1

where x is a double precision floating point value. (I’m too polite to show you the actual code, see streec.c line 443 if you must).

Question: when is this different from floor(x)?

It’s quite a fun little puzzle. I give my answers below.

For the purposes of this discussion I’ll assume that double is an IEEE double precision floating point type. So that means double exactly represents numbers of the form


where m and e are both integers, respecting the following ranges:
-253 < m < 253
-1074 <= e <= 971

253 is the first value that occurred to me. This is the smallest integer representable as a double where its successor is not exactly representable. So if x is 253 then x+1 is also 253 (IEEE round-to-even), floor(253) is 253 (it’s an integer), and 253 – 1 is exactly representable (the successor of 253 is not exactly representable, but its predecessor is).

A similar thing happens for some of the numbers bigger than 253 and less than 254. In this range the step between adjacent doubles is 2 (in the representation scheme above, e is 1). These doubles are all integers, so floor isn’t going to change anything, so floor(1+x)-1) is (x+1)-1, and floor(x) is x. When x is a double represented by an m that is odd then x+1 and x will be different because of the round-to-even rule. For exactly the same reason x+1-1 will be the same as x+1. An example is x = 253+2. This is represented as 4503599627370497×21, but x+1-1 is 4503599627370498×21. In general a similar thing happens for a number of the form 2i where i is an odd positive integer and 252 < i < 253.

The previous problem was that we had a number x and adding 1 to it made the result round up to the next higher integer. A similar thing happens for very small negative values. If x is negative and very close to 0 then x+1 might be sufficiently close to 1 to be rounded to 1. This happens when -2-54 <= x < 0 (numbers just a bit less than -2-54 get round to 1-2-53 when 1 is added to them).

-0. This is sort of degenerate case of the above. -0 can be thought of as being just the tiniest bit less than 0, and this difference cancels completely when 1 is added.

In the above case when x is small and negative what happens is that it is just smaller than an integer, and this difference gets cancelled entirely when 1 is added. So that whilst x is not an integer x+1 is. A similar thing happens at every number that’s just a bit less than an integer power of 2 (a reasonably small positive power of 2). For example 7.999999999999999 (which is 8-2-50) rounds to 9 when 1 is added. So floor(7.999999999999999+1)-1 is 8, but floor(7.999999999999999) is 7. This happens for every number of the form 2i-2i-53 (this is nextafter(2i, 0)) where i is a non-negative integer and i < 52.

Possibly I’ve missed some cases. Especially negative ones.

In the case where I originally found the code I’m pretty sure the difference is irrelevant.

Tabs: Stupid, Wrong, Intolerable


Here’s why tabs are wrong:

if( (segsites > 0 ) || ( > 0.0 ) ) {
if( ( > 0 ) && ( > 0.0 ))
fprintf(pf,”prob: %g\n”, probss ) ;
fprintf(pf,”segsites: %d\n”,segsites);
if( segsites > 0 ) fprintf(pf,”positions: “);
for( i=0; i 0 )
for(i=0;ivi. With tabstops set to the default of 8. It is of course much improved once I guess that the correct tabstop should be 4:

if( (segsites > 0 ) || ( > 0.0 ) ) {
if( ( > 0 ) && ( > 0.0 ))
fprintf(pf,”prob: %g\n”, probss ) ;
fprintf(pf,”segsites: %d\n”,segsites);
if( segsites > 0 ) fprintf(pf,”positions: “);
for( i=0; i 0 )
for(i=0;i 0 ) {

pk = (double *)malloc((unsigned)(nsegs*sizeof(double)));
ss = (int *)malloc((unsigned)(nsegs*sizeof(int)));
if( (pk==NULL) || (ss==NULL) ) perror(“malloc error. gensam.2”);

tt = 0.0 ;
for( seg=0, k=0; k 1 ) ndes_setup( seglst[seg].ptree, nsam );
end = ( kelse is to the left of the else and on an earlier line; that confuses the JavaScript syntax coloriser code (which isn’t mine) and causes it to crop the else token incorrectly. Select the “view plain” link to see all the code.

The code is from ms, a genetic sample generation program used by scientists.

It seems to me that tabs are an attractive nuisance. Something that seems like a good idea, but which can lead to trouble. Sure, you and I can use them safely, but in the hands of the peons they just lead to madness.