calloc when multiply overflows


Recall that calloc allocates contiguous space for n objects of size s:

void *calloc(size_t n, size_t s);

Observe that this is the same as allocating a block of size n×s. So why not just go malloc(n*s) instead? Well, for one thing calloc initialises the memory to all-bits-zero, but I’m not concerned with that right now. The problem with malloc(n*s) is that the product n×s can overflow:


On most 32-bit platforms this will overflow 32-bit arithmetic and be silently equivalent to malloc(131073). That’s a problem.

Because calloc is passed the factors of the product it can, in principle, test the factors to see if the product would overflow, before using the product. In the case where n×s would overflow it can simply fail the allocation (returning NULL) instead of allocating a block of completely the wrong size.

I don’t think the standard gives implementations much wiggle room in this area:

The calloc function allocates space for an array of nmemb objects, each of whose size
is size. The space is initialized to all bits zero.


The calloc function returns either a null pointer or a pointer to the allocated space.

“Returns either a null pointer or a pointer to the allocated space”, nothing about the behaviour being undefined if the implied required size exceeds size_t or anything like that. However, I don’t think any serious C programmer would think that calloc is required to catch the overflow case, certainly I didn’t expect it, and I think it would be a good thing.

So, what if you wanted to implement calloc so that it catches the overflow case and returns NULL. This boils down to:

Given n and s, each a size_t, how do you detect when the product n*s overflows?

Obviously I’m more interested in portable solutions than not.

After a brief bit of thinking about multi-word arithmetic and other such horribleness the following occurred to me:

if((size_t)-1 / n < s) { return NULL; } [/sourcecode] (note that (size_t)-1 is guaranteed to be the largest value that will fit in a size_t)

I like this code, it’s portable, it’s clear, it does the right thing. The only eensy-teensy leetle worry I have is that the division might be a bit slow. Since the allocation of memory can take as few as 20 instruction cycles (see [GRUNWALD 1993]), an additional 40 or so cycles for a division might be too much overhead sometimes. We could have a fast conservative test for the okay case. For example if both n and s are smaller than 65536 then we’re guaranteed to not overflow:

if((n > 65535 || s > 65535) &&
(size_t)-1 / n < s) { return NULL; } [/sourcecode] This avoids doing the division when both n and s are suitably small. Of course there are still a range of cases where the product n*s does not overflow, but we end up doing the division test anyway. That’s not so bad because in those cases we’re allocating at least 65536 bytes anyway.

The 65536 test is a bit weak. It’s the largest value we can portably test against because size_t‘s maximum value might be as low as 655362-1. Can we portably do any better? If we could compute square roots at compile time we could do something like this:

if((n > sqrt((size_t)-1) || s > sqrt((size_t)-1) &&
(size_t)-1 / n < s) { return NULL; } [/sourcecode] but I can't think of way to do that. What if we merely accommodate the next most common size for size_t, namely 64 bits? Observe that 264-1 is 641×6700417×4294967295 (the latter factor is 232-1, the first two factors are the prime factors of 232+1).

Here is a portable test for “size_t has at least 64 bits” that is a compile time constant:

((size_t)-1)/641u/6700417u >= 4294967295u

This is portable because all the literals are guaranteed to fit in unsigned long.

So we can use this test to select a lower bound for maximum size_t value’s square root:

#define LBMAXSZSQRT ((((size_t)-1)/641u/6700417u >= 4294967295u) ? \
65535 : 4294967295)

(size_t)-1 / n < s) { return NULL; } [/sourcecode] We could extend this approach to support this sort of early-pass test for any finite number of size_t widths, but 32 and 64 seem to be the most important. In any case we still get some sort of early-pass test that avoids the divide whatever width size_t is.

So that’s how to make a nice robust calloc. We’ve all heard the blurb about how we should rely on the C library wherever possible because its routines will be highly tested, robust, and carefully optimised. So naturally every existing calloc out there performs this overflow check and doesn’t just multiply its arguments together? Oh no. Even I was surprised.

On OS X (10.4.11) calloc(65537, 65537) returns a non-NULL pointer. This is a bug.

PHK’s code in NetBSD is typical, it just multiples the two arguments and uses the result, reckless to overflow.

Hurray for FreeBSD: they do almost exactly what I suggest, but with a fast test that is just the tiniest bit unportable.

[2008-10-31: Thanks to APS for spotting I had the comparison the wrong way round in «(size_t)-1 / n < s», throughout. Now corrected.]


[GRUNWALD 1993] “CustoMalloc: Efficient Synthesized Memory Allocators”; SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(8), 851–869 (AUGUST 1993); 1993-08

16 Responses to “calloc when multiply overflows”

  1. Richard Says:

    Very often the size argument will be a constant, making the division free.

    OS X 10.5.2 appears to have a properly working calloc(), at least for the test I did; it looks from the source like you don’t necessarily always get the same allocator and the multiply is done on the far side of the indirection (why???)

  2. Clive Says:

    Are you certain it’s an error for OS X to give you 0x100020001 bytes when you ask for them? On a 64-bit processor with more than 4GB of available memory it’s a satisfiable request.

  3. Richard Says:

    OS X builds 32-bit programs by default and only does 64-bit if you ask for it specifically.

  4. drj11 Says:

    Indeedy, I should’ve emphasised that I meant OS X running on a 32-bit processor / in a 32-bit mode.

    Dividing by the size (when it’s a constant) is free, assuming a suitable amount of inlining (not necessarily recommended)… wish I’d spotted that. :)

  5. Richard Says:

    Mm, yes, it might be excessive inlining in the “implement calloc” case, though you do appeal to cycle-counting even there. I was thinking more of things like, for example, a macro-generated dynamic array types, which grows its buffer by realloc() and therefore must do the multiply and the check itself rather than relying on calloc().

  6. drj11 Says:

    Ah yes, and that reminds me of another thing I meant to mention: how annoying it is that there’s no recalloc, a realloc with a size and elements style interface.

  7. lorg Says:

    Really cool post.
    Just a little bit of nitpicking:
    It is actually possible to compute sqrt in compile time, by using a C++ compiler and nasty template tricks. However, once you use C++ you probably don’t use calloc, but new.
    Nevertheless, I wanted to challenge myself, and after tweaking the code from here is an example computation:

    struct Sqrt
    enum { value = (Sqrt::value + P/Sqrt::value)>>1 };

    struct Sqrt
    enum { value = 1 };

    int main(void)
    return 0;

  8. lorg Says:

    It seems that the all my template code (and indentation) was eaten by html :(
    Since I can’t preview my post, I’ll try that again just once, to avoid littering:

    template <int N, int P>
    struct Sqrt
    	enum { value = (Sqrt<N - 1,P>::value + P/Sqrt<N-1,P>::value)>>1 };
    template <int P>
    struct Sqrt<0, P> 
        enum { value = 1 };
    int main(void) 
    	return 0;

  9. Ah the joys of signed integer overflow.
    I was caught out with this lately when writing a new `truncate` utility. My findings and the utility source are here:

  10. josh Says:

    Use the logarithm: (1 << (CHAR_BIT * sizeof(size_t) / 2)) – 1

  11. drj11 Says:

    @josh: The shifty logarithm thing is what FreeBSD uses and it’s ever so slightly unportable because of padding bits (see ISO 9899:1999 section An unsigned int type with 32 bits of value (that is it can represent 2**32 different numerical values) might be stored in 40 bits, that is with 8 bits of padding. Those padding bits still get counted in the sizeof: (CHAR_BIT*sizeof(size_t)/2) will be 20, and you’ll be shifting too much.

    I hope you can see why I said “just the tiniest bit unportable”.

  12. drj11 Says:

    @Pádraig: Those notes are quite interesting, particularly on integer overflow. Note that mostly in this article the overflow is unsigned integer overflow which is well defined in C. As you point out signed integer overflow is undefined behaviour.

  13. mjb67 Says:

    Can’t you do:

    size_t amountToAllocate = n*s;
    if (amountToAllocate < n || amountToAllocate < s) {
    return NULL;

    <> — C99 H.2.2.1

  14. drj11 Says:

    No. On 32-bit platforms, it fails for the n = s = 65537 example given in the article.

    Hmm. I suspect wordpress borked some of your comment. Sorry about that.

  15. mjb67 Says:

    Ah, no I see why, of course an overflow due to multiplication isn’t guaranteed to end up lower than either of its operands.

  16. drj11 Says:

    Ooh. Pixer’s post about improving Quake’s reciprocal square root approximation makes me wonder if I could (portably) _approximate_ the square root as a compile time constant. The obvious method uses a non-portable pun from float to int.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: