Recall that calloc allocates contiguous space for `n` objects of size `s`:

`void *calloc(size_t n, size_t s);`

Observe that this is the same as allocating a block of size `n`×`s`. So why not just go `malloc(n*s)`

instead? Well, for one thing `calloc`

initialises the memory to all-bits-zero, but I’m not concerned with that right now. The problem with `malloc(n*s)`

is that the product `n`×`s` can overflow:

`malloc(65537*65537)`

On most 32-bit platforms this will overflow 32-bit arithmetic and be silently equivalent to `malloc(131073)`

. That’s a problem.

Because `calloc`

is passed the factors of the product it can, in principle, test the factors to see if the product would overflow, before using the product. In the case where `n`×`s` would overflow it can simply fail the allocation (returning `NULL`

) instead of allocating a block of completely the wrong size.

I don’t think the standard gives implementations much wiggle room in this area:

The calloc function allocates space for an array of nmemb objects, each of whose size

is size. The space is initialized to all bits zero.

Returns

The calloc function returns either a null pointer or a pointer to the allocated space.

“Returns either a null pointer or a pointer to the allocated space”, nothing about the behaviour being undefined if the implied required size exceeds `size_t`

or anything like that. However, I don’t think any serious C programmer would think that `calloc`

is *required* to catch the overflow case, certainly I didn’t expect it, and I think it would be a good thing.

So, what if you wanted to implement `calloc`

so that it catches the overflow case and returns `NULL`

. This boils down to:

Given `n` and `s`, each a `size_t`

, how do you detect when the product `n*s`

overflows?

Obviously I’m more interested in portable solutions than not.

After a brief bit of thinking about multi-word arithmetic and other such horribleness the following occurred to me:

if((size_t)-1 / n < s) {
return NULL;
}

(note that `(size_t)-1`

is guaranteed to be the largest value that will fit in a `size_t`

)

I like this code, it’s portable, it’s clear, it does the right thing. The only eensy-teensy leetle worry I have is that the division might be a bit slow. Since the allocation of memory can take as few as 20 instruction cycles (see [GRUNWALD 1993]), an additional 40 or so cycles for a division might be too much overhead sometimes. We could have a fast conservative test for the okay case. For example if both `n` and `s` are smaller than 65536 then we’re guaranteed to not overflow:

if((n > 65535 || s > 65535) &&
(size_t)-1 / n < s)
{
return NULL;
}

This avoids doing the division when both `n` and `s` are suitably small. Of course there are still a range of cases where the product `n*s`

does not overflow, but we end up doing the division test anyway. That’s not so bad because in those cases we’re allocating at least 65536 bytes anyway.

The 65536 test is a bit weak. It’s the largest value we can portably test against because `size_t`

‘s maximum value might be as low as 65536^{2}-1. Can we portably do any better? If we could compute square roots at compile time we could do something like this:

if((n > sqrt((size_t)-1) || s > sqrt((size_t)-1) &&
(size_t)-1 / n < s)
{
return NULL;
}

but I can’t think of way to do that.

What if we merely accommodate the next most common size for `size_t`

, namely 64 bits? Observe that 2^{64}-1 is 641×6700417×4294967295 (the latter factor is 2^{32}-1, the first two factors are the prime factors of 2^{32}+1).

Here is a portable test for “`size_t`

has at least 64 bits” that is a compile time constant:

`((size_t)-1)/641u/6700417u >= 4294967295u`

This is portable because all the literals are guaranteed to fit in `unsigned long`

.

So we can use this test to select a lower bound for maximum `size_t`

value’s square root:

#define LBMAXSZSQRT ((((size_t)-1)/641u/6700417u >= 4294967295u) ? \
65535 : 4294967295)
if((n > LBMAXSZQRT || s > LBMAXSZSQRT) &&
(size_t)-1 / n < s)
{
return NULL;
}

We could extend this approach to support this sort of early-pass test for any finite number of `size_t`

widths, but 32 and 64 seem to be the most important. In any case we still get *some* sort of early-pass test that avoids the divide whatever width `size_t`

is.

So that’s how to make a nice robust `calloc`

. We’ve all heard the blurb about how we should rely on the C library wherever possible because its routines will be highly tested, robust, and carefully optimised. So naturally every existing `calloc`

out there performs this overflow check and doesn’t just multiply its arguments together? Oh no. Even I was surprised.

On OS X (10.4.11) `calloc(65537, 65537)`

returns a non-NULL pointer. This is a bug.

PHK’s code in NetBSD is typical, it just multiples the two arguments and uses the result, reckless to overflow.

Hurray for FreeBSD: they do almost exactly what I suggest, but with a fast test that is just the tiniest bit unportable.

[2008-10-31: Thanks to APS for spotting I had the comparison the wrong way round in «(size_t)-1 / n < s», throughout. Now corrected.]

REFERENCES

[GRUNWALD 1993] “CustoMalloc: Efficient Synthesized Memory Allocators”; SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(8), 851–869 (AUGUST 1993); 1993-08