Archive for March, 2015

Dictionary ordering and floating point

2015-03-30

This is an expanded version of an earlier tweet of mine.

Consider the following Python program:

    d = {k:k for k in 'bfg'}
    print(d)

The output of this program changes from run to run:

    drj$ ./d.py 
    {'g': 'g', 'f': 'f', 'b': 'b'}
    drj$ ./d.py 
    {'b': 'b', 'g': 'g', 'f': 'f'}

Dictionaries in Python are implemented using hash tables, so when iterating over a dictionary, the order of the keys is not generally predictable.

You might have known that it wasn’t predictable, but you might not have known that it can change from one Python process to the next. This is a good thing. It avoids a denial-of-service attack that is cross language and cross framework. But don’t worry about that, it’s been fixed since Python 3.3, released over 2 years ago now.

Recently we (at ScraperWiki) came across this in the context of floating point addition. What if the order of a dictionary determines the order of a summation? This program adds up the values of a dictionary:

    d = dict(b=0.2, f=0.6, g=0.7)
    print(sum(d.values()))

It doesn’t always produce the same answer:

    drj$ ./e.py 
    1.4999999999999998
    drj$ ./e.py 
    1.5

This is obviously a very smaller difference, but as it happened we were dynamically generating a web page that showed rounded results. 1.4999999999999998 rounds to 1, 1.5 rounds to 2. You could reload the web page and the value shown would sometimes flip from 1 to 2, and vice versa (well, these aren’t the real numbers, but you get the idea).

There are a few things worth pointing out here. As a convenience instead of writing “3602879701896397 / 18014398509481984” I’ve written “0.2”, and similarly “5404319552844595 / 9007199254740992” is written “0.6”, and “3152519739159347 / 4503599627370496” is written “0.7”.

The second program is just calling sum(), so in this case I don’t get to pick the order of adding up for two reasons: 1) The order of the list that is input to sum() is not chosen by me; 2) The order in which sum() sums things might be different anyway (this turns out to be a surprisingly subtle subject, but I don’t want to get into math.fsum).

Tweets like this: “I don’t care if float addition is associative or not” miss the point I was making, which is that floating point addition is not associative (meaning that (0.2 ⟡ 0.6) ⟡ 0.7) != 0.2 ⟡ (0.6 ⟡ 0.7) where ⟡ means floating point addition). Floating point arithmetic is an abstraction and sometimes you need to understand the abstraction and what lies beneath in order to debug your webpage.

In this case we fixed the problem by avoiding floating point until the very last step. The numbers we were adding were all of the form p/20 (where p is an integer). So doing the adding up in integers and then doing a single division at the end means that we get the exact answer (or the nearest representable floating point number when the exact answer is not representable).

Advertisements

What you need to be a programmer

2015-03-09

You need, above all others:

1) to have dogged persistence in the face of failure.

2) to celebrate the utterly ordinary.

On 1. There is a reason why FAIL Blog is a thing. Failure represents 99% of what programmers do. Programmers do not sit down and write flawless code that works first time. We make mistakes. The compiler complains. The tests fail. It stops when something unexpected happens. Each failure requires the programmer to work out why it failed, fix it, and carry on. I’m talking about failures at every level from the “missing comma” and “expected string not int” type errors, through intermediate errors like using feet instead of metres, or using 100GB of RAM instead of 1GB, to higher level errors like designing a mobile phone app instead of a website.

Some people just cannot do this. They will type in their program, and it will fail to compile or run because of a missing comma or something like that, and they will just stop, and go and make a bacon butty instead. This is perfectly understandable.

Personally, as a programmer, it sometimes feels like it’s not that I want the program to run correctly, it’s that I have a curse where I cannot prevent myself from investigating every last failure and fixing it.

When programming, failure is entirely normal. It takes a certain personality type to be able to cope with this, several times a day.

On 2. The flip side is that when programmers succeed, most of the time we succeed in something entirely ordinary. We have calculated the amount of VAT correctly. We have displayed the user’s email address on the page correctly. Things that are entirely trivial and obviously should Just Work take sufficient effort, and involve overcoming several failures, that just achieving the entirely ordinary seems like it deserves a celebration.

Clearly it is not normal for someone to celebrate the fact that as a result of typing stuff into the CSS file, the dotted border between the header of a table and the next row now displays correctly. So remote are programmers from the rich celebrations that life has to offer that we have to make our own.

There is a horrific irony to all this. Computers only do what they are programmed to do. But it is incredibly difficult to program them (correctly). What they are programmed to do is only an impression of what we intend them to do. In this case it is like trying to take an impression of a fossil with only wet tissue. So, on the one hand everything a computer does it has been programmed to do, on the other it is a monumentally difficult task to get the computer to do anything at all that is useful and correct.

Lovelace grasps this subtle point: “It can do whatever we know how to order it to perform”. It’s the “know how” that’s the tricky bit.