Dictionary ordering and floating point

2015-03-30

This is an expanded version of an earlier tweet of mine.

Consider the following Python program:

    d = {k:k for k in 'bfg'}
    print(d)

The output of this program changes from run to run:

    drj$ ./d.py 
    {'g': 'g', 'f': 'f', 'b': 'b'}
    drj$ ./d.py 
    {'b': 'b', 'g': 'g', 'f': 'f'}

Dictionaries in Python are implemented using hash tables, so when iterating over a dictionary, the order of the keys is not generally predictable.

You might have known that it wasn’t predictable, but you might not have known that it can change from one Python process to the next. This is a good thing. It avoids a denial-of-service attack that is cross language and cross framework. But don’t worry about that, it’s been fixed since Python 3.3, released over 2 years ago now.

Recently we (at ScraperWiki) came across this in the context of floating point addition. What if the order of a dictionary determines the order of a summation? This program adds up the values of a dictionary:

    d = dict(b=0.2, f=0.6, g=0.7)
    print(sum(d.values()))

It doesn’t always produce the same answer:

    drj$ ./e.py 
    1.4999999999999998
    drj$ ./e.py 
    1.5

This is obviously a very smaller difference, but as it happened we were dynamically generating a web page that showed rounded results. 1.4999999999999998 rounds to 1, 1.5 rounds to 2. You could reload the web page and the value shown would sometimes flip from 1 to 2, and vice versa (well, these aren’t the real numbers, but you get the idea).

There are a few things worth pointing out here. As a convenience instead of writing “3602879701896397 / 18014398509481984” I’ve written “0.2”, and similarly “5404319552844595 / 9007199254740992” is written “0.6”, and “3152519739159347 / 4503599627370496” is written “0.7”.

The second program is just calling sum(), so in this case I don’t get to pick the order of adding up for two reasons: 1) The order of the list that is input to sum() is not chosen by me; 2) The order in which sum() sums things might be different anyway (this turns out to be a surprisingly subtle subject, but I don’t want to get into math.fsum).

Tweets like this: “I don’t care if float addition is associative or not” miss the point I was making, which is that floating point addition is not associative (meaning that (0.2 ⟡ 0.6) ⟡ 0.7) != 0.2 ⟡ (0.6 ⟡ 0.7) where ⟡ means floating point addition). Floating point arithmetic is an abstraction and sometimes you need to understand the abstraction and what lies beneath in order to debug your webpage.

In this case we fixed the problem by avoiding floating point until the very last step. The numbers we were adding were all of the form p/20 (where p is an integer). So doing the adding up in integers and then doing a single division at the end means that we get the exact answer (or the nearest representable floating point number when the exact answer is not representable).

2 Responses to “Dictionary ordering and floating point”

  1. Gareth Rees Says:

    Is that “Curtis Ovid Poe” account some kind of parody of pompous brogrammer blowhards?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: