Posts Tagged ‘associativity’

Dictionary ordering and floating point


This is an expanded version of an earlier tweet of mine.

Consider the following Python program:

    d = {k:k for k in 'bfg'}

The output of this program changes from run to run:

    drj$ ./ 
    {'g': 'g', 'f': 'f', 'b': 'b'}
    drj$ ./ 
    {'b': 'b', 'g': 'g', 'f': 'f'}

Dictionaries in Python are implemented using hash tables, so when iterating over a dictionary, the order of the keys is not generally predictable.

You might have known that it wasn’t predictable, but you might not have known that it can change from one Python process to the next. This is a good thing. It avoids a denial-of-service attack that is cross language and cross framework. But don’t worry about that, it’s been fixed since Python 3.3, released over 2 years ago now.

Recently we (at ScraperWiki) came across this in the context of floating point addition. What if the order of a dictionary determines the order of a summation? This program adds up the values of a dictionary:

    d = dict(b=0.2, f=0.6, g=0.7)

It doesn’t always produce the same answer:

    drj$ ./ 
    drj$ ./ 

This is obviously a very smaller difference, but as it happened we were dynamically generating a web page that showed rounded results. 1.4999999999999998 rounds to 1, 1.5 rounds to 2. You could reload the web page and the value shown would sometimes flip from 1 to 2, and vice versa (well, these aren’t the real numbers, but you get the idea).

There are a few things worth pointing out here. As a convenience instead of writing “3602879701896397 / 18014398509481984” I’ve written “0.2”, and similarly “5404319552844595 / 9007199254740992” is written “0.6”, and “3152519739159347 / 4503599627370496” is written “0.7”.

The second program is just calling sum(), so in this case I don’t get to pick the order of adding up for two reasons: 1) The order of the list that is input to sum() is not chosen by me; 2) The order in which sum() sums things might be different anyway (this turns out to be a surprisingly subtle subject, but I don’t want to get into math.fsum).

Tweets like this: “I don’t care if float addition is associative or not” miss the point I was making, which is that floating point addition is not associative (meaning that (0.2 ⟡ 0.6) ⟡ 0.7) != 0.2 ⟡ (0.6 ⟡ 0.7) where ⟡ means floating point addition). Floating point arithmetic is an abstraction and sometimes you need to understand the abstraction and what lies beneath in order to debug your webpage.

In this case we fixed the problem by avoiding floating point until the very last step. The numbers we were adding were all of the form p/20 (where p is an integer). So doing the adding up in integers and then doing a single division at the end means that we get the exact answer (or the nearest representable floating point number when the exact answer is not representable).