## Python: poor printing of floating point

2007-07-03

See how Python prints a simple decimal number:

>>> 94.8
94.799999999999997
>>>

[edit on 2010-01-26: allegedly this is all fixed in Python 3.1. Please skip article if that's okay with you]

In an attempt to dispell comments on the lines of “floating point doesn’t produce exact answers”, I should point out that I know:

• Floating point is tricky; and,
• 94.8 is not exactly representable as a floating point number.

Without delving into the arcane details of floating point let’s lay out some basics. Each floating point value is some exact real number, in fact a rational number. I’m ignoring NaNs and infinities because they’re not important for this discussion. There are only finitely many floating point values.

With regards to input and output of floating point numbers what is reasonable?

### Reasonable input requirement

Every decimal number has some floating point number that is nearest (possibly two if it lies midway between two adjacent floating point numbers). It seems reasonable to me that when converting decimal numbers to floating point numbers (as Python does when it reads 94.8 in the program) it should pick the nearest floating point number to represent it internally (and if there are two such nearest, then pick one). (For very large magnitude numbers you might want to return infinity or throw an exception; doesn’t matter for this article.)

### Reasonable output requirement

When converting a number from internal floating point format to decimal (as Python does when printing) it seems reasonable to print one of the decimal numbers whose nearest floating point number is the one we are printing. In other words we would like to print a number that when read according to the reasonable input requirement (above) we get the same internal floating point number. Note that there are infinitely many decimal numbers with a given nearest floating point number (because there are only finitely many floating point numbers); once you’ve printed enough decimal digits to zoom in on the desired floating point number you can keep on printing whatever digits you like, generating lots of different decimal numbers all of which share the same nearest floating point number.

Python certainly meets our reasonable output requirement. 94.799999999999997 and 94.8 share the same nearest floating point number.

### Desirable output requirement

Consider the two number 94.799999999999997 and 94.8. By our reasonable output requirement above, they’re both good numbers to print out, but the shorter one is easier to read and think about for humans. It also satisfies an elegant principle: Don’t print out any more digits than you need to.

So a desirable requirement for printing out numbers is that we print out a number that meets the reasonable requirement, and is the shortest possible of all such numbers.

If you need a bit of persuading that shorter is better, then this argument might help you: Consider π (pi). An approximation to 18 decimal places is 3.141592653589793238. Now imagine you have written a Python program to compute π as accurately as it can. It prints 3.1415926535897931. That last digit is wrong. It’s not so much wrong as unnecessary, 3.141592653589793 represents the same floating point number:

>>> 3.141592653589793 == 3.1415926535897931
True

By printing that extra garbage digit Python is misleading you as to how many digits of precision it has.

### Can we be desirable?

So how easy is it to generate reasonable output and desirable output? Well it turns out to be a bit trickier than you might think on first blush, and it requires bignums. On the other hand, this stuff was all sorted out about 30 years ago, and every reasonable machine is large enough to do bignum arithmetic without busting a gut. Unfortunately the gurus Steele and White didn’t publish the algorithm until 1990 (see [SW1990]). To quote them: “Why wasn’t it published earlier? It just didn’t seem like that big a deal; but we were wrong.” Boy were they wrong. Even 17 years after publication, modern implementations of modern languages running on modern hardware still get it wrong.

[Political rant: I'm inclined to think that one the contributory factors is that the article was published by the ("we love dead trees") ACM. It just doesn't seem helpful to lock all this knowledge away and charge for it. I happen to have a photocopied version of this article on paper (I guess that makes me lucky, and able to write this blog article), but in 2007 I can hardly believe that I have to resort to paper copies to do research.]

Bottom line: 94.8 should print out as 94.8.

Python’s excuse will no doubt be that it relies on the platform’s C library. C has an excuse. C targets tiny platforms where it would be ridiculous to suggest that the printing routines use 320 bytes of RAM to store intermediate computations (on a platform I recently had the pleasure to program, this would’ve been about 10% of the entire RAM). On the other hand any platform where I’m actually likely to use double I almost certainly have enough space for both for code and the bignum arithmetic it requires to get the right answer. Certainly, on my MacBook the C implementation has no excuse.

It’s no coincidence that Steele had a hand in writing the 1990 paper and also a hand in designing two of the languages that actually bother to say something sensible about converting internal numbers to external format. Scheme says that reading what you print will yield the same number, and this shall be done using the minimum number of digits. Java says “as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double”. Curiously, Common Lisp doesn’t seem to actually state anything like that, but possibly I just can’t find it. Common Lisp implementations, of course, get it right, because they’re invested with the spirit of Do The Right Thing. I had a look to see what Haskell said on the matter, but apart from noting with amusement that a literal number did not denote an internal number object I couldn’t find anything. ECMAScript also specifies the Right Thing, but then, Wikipedia reckons that Steele edited the first edition.

Python has automatic memory management, bignums, strings. It feels like a big language, abstracted and distant from the silicon. It’s time for Python to grow up and specify that repr for float returns a decimal string the mathematical value of which is closer to the argument than any other float, and it returns a string with the fewest number of digits.

The trickiest thing about Steele and White’s algorithm is the bignums. Python has bignums. It turns out to be very easy indeed to implement Steele and White’s algorithm in Python. We can then use this algorithm to create conversion routines in the style of ECMAScript or Java. And I have done so. The implementation isn’t very large, but it is too large to contain in the margins of this article. If you want the code then go to, yet another dead before it began sourceforge project, flopri.

## Appendix A – Other Stuff I Found On The Way

### Comparison with other languages

Compare Python with similar codes in shell, awk, C, and Lisp:

\$ printf ‘%.20f\n’ 94.8
94.80000000000000000278
\$ awk ‘BEGIN{printf”%.20f\n”,94.8}’
94.79999999999999715783
\$ echo ‘#include <stdio.h>
int main(void){printf(“%.20f\n”,94.8);}’ > t.c
\$ cc t.c; ./a.out
94.79999999999999715783
\$ sbcl –noinform –eval ‘(progn(format t “~,20f~%” 94.8d0)(sb-ext:quit))’
94.80000000000000000000

Bit of a mixed bag isn’t it? At least Lisp gets it right.

### Blind Alleys

Python has a module called fpformat which sounds like it might be the sort of place to find optimal floating point formatting routines, but its useless. “unneeded” as its own documentation avers. pprint is no good either; pretty, but no smarts.

### Python’s Tutorial

The Python Tutorial, good in many ways, has what I think is a very weak appendix on this matter: Appendix B – Floating Point Arithmetic: Issues and Limitations (warning: fragile looking URL). It reads more like an apology than justification and varies between patronising, annoying, wrong, and helpful. It contains mistakes like:

Stop at any finite number of bits, and you get an approximation. This is why you see things like:
>>> 0.1
0.10000000000000001

As discussed above, 0.1 is a perfectly good approximation to the floating point value that 0.1 becomes on input, so approximation itself is not the reason that Python prints so many digits (don’t believe me, then try evaluating «0.1 == 0.10000000000000001»). The appendix goes on to recommend `str` if you want prettier output. I think this is the wrong fix for the problem; if Python used a different printing routine for `repr` then people wouldn’t be driven to `str`.

“You’ll see the same kind of thing in all languages that support your hardware’s floating-point arithmetic”. No I won’t. Java, ECMAScript, Lisp all get it right.

## REFERENCES

[SW1990] “How to Print Floating Point Numbers Accurately”; Guy L. Steele Jr., Jon L White; ACM SIGPLAN’90 Conference on Programming Language Design and Implementation; 1990-06

### 48 Responses to “Python: poor printing of floating point”

1. Zach Beane Says:

http://funcall.blogspot.com/2007/06/how-floating-point-works.html

2. augustss Says:

If you look at the code that is the specification of the Prelude in Haskell you can see that Haskell does the right thing too. It took a year or two after Haskell was first defined to get it right.

3. Grant Says:

Hate to get all pedantic, but python prints just fine, it just doesn’t ‘repr’ the way you want. Your pi example is also a little wierd, because python (or really the processor’s floating-point representation) can’t create the value 3.141592653589793

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
32
>>> print 94.8
94.8
>>>
>>> 3.141592653589793
3.1415926535897931
>>> 3.1415926535897931
3.1415926535897931

4. Paul Says:

Good work! You should submit your code as a patch for repr.

5. Paul Says:

Looking at the python sources str(f) is constructed using “%.12g” and repr(f) using “%.17g”. I think both should use something like your code.

6. glorkspangle Says:

Grant: Python can’t create the value 3.1415926535897931 either. The actual IEEE754 doubles in that neighbourhood are as follows, with pi and two other numbers shown for reference.

f1:3.141592653589792671908753618481568992137908935546875
f2:3.141592653589793115997963468544185161590576171875
f3:3.141592653589793560087173318606801331043243408203125
pi:3.14159265358979323846264338327950288419716939937510
d1:3.141592653589793
d2:3.1415926535897931

Clearly f2 is the best representation of pi that we can get with IEEE 754 doubles. Equally clearly, both d1 and d2 should converted to f2 on reading (as should an infinity of other decimal numbers; everything between (f1+f2)/2 and (f2+f3)/2).

David is arguing that f2 should be written as d1, since d1′s representation is shorter than d2′s (or that of any of the other values which reads as f2). It doesn’t give any digits which are irrelevant to the underlying value. Such additional digits are misleading and unhelpful. repr(f2) could output the 50+ digit exact expansion which I give, but that would not be helpful. In just the same way, repr(94.8) should produce ’94.8′.

7. drj11 Says:

@Grant: Thanks. I appreciate seeing the output on Windows (naturally I mess around on OS X).

Would you care to explain the extra ’1′ digit at the end of math.pi? I know that 3.141592653589793 can’t be exactly represented as a floating point double, but then neither can 3.1415926535897931. In fact, these two numbers share the same floating point approximation:

>>> 3.141592653589793 == 3.1415926535897931
True

So why prefer the longer one over the shorter one?

[Oh: Whilst I was writing glorkspangle snuck in there with a response in the same vein but much more detail.]

8. glorkspangle Says:

Three points, David:
1. you don’t mention the other (adjacent) Steele&White paper. I infer from what you write that it has been successfully absorbed into the folk knowledge and the C library.
2. Both Steele and White papers, on paper, are lurking somewhere in a folder at the Cambridge office, adjacent to a paper copy of IEEE 754.
3. Harshing on the ACM: they’ll catch up, and for those of us with the right access they already have (I have an ACM digital library membership, all of everything forever as PDFs; RHSK and I have used it in the past for memory management papers). Also I think that one or both of the Steele&White papers is reproduced on a SIGPLAN commemorative CD-ROM (25 years of SIGPLAN? something like that) which I have floating about somewhere here in Manchester.

9. drj11 Says:

@Paul: Thanks for checking the Python source.

I was going to say, but never got round to, that I suspected that Python did something like “%.17g”. 17 decimal digits is enough to specify any double precision float (math.ceil(1+53/math.log(10,2)) == 17), so what Python does is give the C implementation chance to do the right thing. And evidently the C implementation on both Windows and OS X doesn’t quite do the right thing.

What it looks like the C implementation on OS X is doing is a rounding of the infinitely long answer. This produces a number that is reasonable, but contains misleading digits. glorkspangle’s first comment above illustrates this idea best.

10. bosullvn Says:

For the algorithm used in Haskell and Scheme implementations, see http://citeseer.ist.psu.edu/28233.html

11. drj11 Says:

@Zach: What a splendid companion article, thanks for pointing it out.

There’s obviously some “spirit of floating point” in the cosmos at the moment that causes me and jrm to write these articles so close together.

12. drj11 Says:

@bosullvn: Thanks. Of course I was aware of this paper, but it seemed that everything you might want to know was in the Steele and White paper, so I’ve always been curious as to what’s in the Burger and Dybvig paper. I guess I should read it some day.

@glorkspangle: The companion paper on the input paper is by Clinger. I guess Steele has to have a day off sometimes. And no, I didn’t mention it. And yes I do assume it’s solved, I don’t know whether it is, but I was blithely assuming that Python solved the input problem, and I never noticed it be wrong. [edit: Turns out Python hasn't solved the input problem, see Paul's discovery below]

13. glorkspangle Says:

Clinger, yes of course.

14. glorkspangle Says:

By the way, what the heck is the shell implementation doing? Awk is obviously using both the same algorithms (same libraries, probably). Why isn’t sh?

15. MikeC Says:

Good article, but bad conclusion. 94.80 would be better, as it would indicate that the conversion was inexact, which many programmers forget …

Or better still: 94.8−

Either of these would have saved many programmers many hours of frustration.

mfc

16. MikeC Says:

(in the last post, the minus sign was supposed to be raised to be a superscript)

[drj: I tried to edit the comment to make it a superscript but wordpress ate my HTML. WordPress sucks sometimes.]

17. drj11 Says:

@MikeC: Thanks for that suggestion. Intriguing. I have to say I like 94.8⁻ (that’s u”94.8\N{SUPERSCRIPT MINUS}” in Python). Shame it has no chance of being accepted as input read.

By the way, must get round to reading about all this new fangled decimal stuff that you’re putting into IEEE 754.

18. drj11 Says:

Can any Haskellers that are still reading point me at chapter and verse for the bit of the prelude that “specifies” how floating point numbers get output? augustss or bosullvn perhaps?

My Haskell is not up to scratch (er, it’s non-existent) so I couldn’t make head nor tail of it. Haskell is so terse that I could easily have missed something.

19. augustss Says:

The conversion code can be found at http://haskell.org/onlinereport/numeric.html
It “specifies” the conversion in the same way as other Prelude functions do, namely that an implementation of a Prelude function must extensionally equal to the reference implementation in the report (i.e., must behave the same, but can be implemented differently).

20. glorkspangle Says:

@MikeC: I like your superscript minus, but of course it presents nightmarish problems with ASCII I/O.
BTW, loved DPD http://www2.hursley.ibm.com/decimal/DPDecimal.html

21. drj11 Says:

@augustss: Thanks for the Haskell pointer.

22. glorkspangle Says:

23. drj11 Says:

Glorkspangle: How did you produce those lovely long numbers?

24. glorkspangle Says:

Yes, aren’t they great? I made them with Python, of course.
Uh, let me see if I can reconstruct it. Something like this, but with more trial and error:
>>> import math
>>> import struct
>>> q = struct.unpack(‘Q’,struct.pack(‘d’, math.pi))[0]
>>> mantissa = (1 > 52) >> mantissa * (5 ** 51)
3141592653589793115997963468544185161590576171875000L
>>> (mantissa+1) * (5 ** 51)
3141592653589793560087173318606801331043243408203125L
>>> (mantissa-1) * (5 ** 51)
3141592653589792671908753618481568992137908935546875L
>>>

25. glorkspangle Says:

Argh, lots of less than symbols got eaten by wordpress.
I’ll try again.
>>> import math
>>> import struct
>>> q = struct.unpack(‘Q’,struct.pack(‘d’, math.pi))[0]
>>> mantissa = (((1 << 52) -1) & q) + (1 << 52)
>>> mantissa * (5 ** 51)
3141592653589793115997963468544185161590576171875000L
>>> (mantissa+1) * (5 ** 51)
3141592653589793560087173318606801331043243408203125L
>>> (mantissa-1) * (5 ** 51)
3141592653589792671908753618481568992137908935546875L
>>>

26. glorkspangle Says:

Trying to write a little function to do this. I got as far as this:
>>> float_digits(math.pi)
’3.141592653589793115997963468544185161590576171875000′
>>>
But it doesn’t work too well for exponents far from zero, and I’ve exhausted the budget.

27. Paul Says:

Here’s my implementation of float to string. Is there anything wrong with it?

def toString(f):
for i in range(17):
f_repr = (‘%%.%ig’ % i) % f
if float(f_repr) == f:
return f_repr
assert 0

28. Paul Says:

(You’ll have to imagine the indendation)

29. drj11 Says:

@Paul: Amusingly cunning. The only thing I can really see that’s wrong about it is that it relies on the C implementation. Having decided to Do The Right Thing I figure you may as well get rid of that dependence. However, I think that’s a minor criticism. Your code is short, pragmatic, and given a decent enough C implementation it does the job. I like it. I’m pretty sure the C implementation on OS X is good enough.

Nitpick: small integers come out looking like int not float:

>>> toString(7.0)
’7′

(so when you read them back in the type will be different). Depending on your Python philosophy, this is either tweakable or fine.

Trvial bug: some numbers require 17 digits:

>>> toString(1.6666666666666666e-100)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “<stdin>”, line 6, in toString
AssertionError

so simply change the range(17) to range(18).

30. drj11 Says:

As many of these comments show, wordpress sucks for talking about code. *sigh*

31. drj11 Says:

Shame you can’t just write to «float.__repr__».

32. glorkspangle Says:

import struct

# extract a bit field from a number.
def field(q,shift,width):
return (q >> shift) & ((1 << width) -1)

# Returns the sign (1 or -1), the exponent, and the mantissa of a
# floating point number.
# f = sign * mantissa * 2 ^ exponent

def float_parts(f):
q = struct.unpack(‘Q’,struct.pack(‘d’, f))[0]
return ((-1) ** field(q,63,1), field(q,52,11)-1023-52, field(q,0,52)+(1<<52))

def float_digits(f):
(s,e,m) = float_parts(f)
if e >= 0:
return (s, 0, m * (1 << e))
m = m * (5 ** -e)
while ((e < 0) and (m % 10 == 0)):
m = m / 10
e = e+1
return (s, -e, m)

def float_string(f):
if (f == 0.0):
return ’0.0′
if (f == -0.0):
return ‘-0.0′
(s, e, m) = float_digits(f)
if s < 0:
sign = ‘-’
else:
sign = ”
digits = str(m)
if e == 0:
return ‘%s%s’ % (sign, digits)
if e < len(digits):
return ‘%s%s.%s’ % (sign, digits[:-e], digits[-e:])
return ‘%s0.%s%s’ % (sign, ’0′*(e-len(digits)), digits)

33. glorkspangle Says:

import struct

# extract a bit field from a number.
def field(q,shift,width):
return (q >> shift) & ((1 << width) -1)

# Returns the sign (1 or -1), the exponent, and the mantissa of a
# floating point number.
# f = sign * mantissa * 2 ^ exponent

def float_parts(f):
q = struct.unpack(‘Q’,struct.pack(‘d’, f))[0]
return ((-1) ** field(q,63,1), field(q,52,11)-1023-52, field(q,0,52)+(1<<52))

def float_digits(f):
(s,e,m) = float_parts(f)
if e >= 0:
return (s, 0, m * (1 << e))
m = m * (5 ** -e)
while ((e < 0) and (m % 10 == 0)):
m = m / 10
e = e+1
return (s, -e, m)

def float_string(f):
if (f == 0.0):
return ’0.0′
if (f == -0.0):
return ‘-0.0′
(s, e, m) = float_digits(f)
if s < 0:
sign = ‘-’
else:
sign = ”
digits = str(m)
if e == 0:
return ‘%s%s’ % (sign, digits)
if e < len(digits):
return ‘%s%s.%s’ % (sign, digits[:-e], digits[-e:])
return ‘%s0.%s%s’ % (sign, ’0′*(e-len(digits)), digits)

34. glorkspangle Says:

OMG what has it done to the quote characters. Piece of garbage. Anyway, you get the idea. It’s not perfect, but it does produce some quite pretty strings.

35. Paul Says:

Are you sure repr is ‘reasonable’?

>>> import struct
>>> f = struct.unpack(‘d’, struct.pack(‘Q’, 0xfdc835c90e3a7))[0]
>>> repr(f)
’2.2057962940299013e-308′

>>> float(repr(f)) == f
False

(My toString function fails on this float too, even with the range fixed).

36. drj11 Says:

Interesting number Paul, how did you come across that?

Actually, no, I’m not sure that repr is reasonable, only that it’s reasonable for 94.8.

If you trust my flopri code then you can see that Python’s mistake is an input error not an output error:

>>> f=struct.unpack(‘d’,struct.pack(‘Q’, 0xfdc835c90e3a7))[0]
>>> f
2.2057962940299013e-308
>>> flopri.toString(f)
’2.2057962940299013e-308′
>>> hex(struct.unpack(‘Q’,struct.pack(‘d’, 2.2057962940299013e-308))[0])
’0xfdc835c90e3a6L’

Or, if we don’t trust flopri (and I’m not entirely sure that I do yet), we can use glorkspangle’s method above. We can see from f’s hex representation that f is 0xfdc835c90e3a7 × 2**-1074 (f is subnormal). Multiplying top and bottom by 5**1074 we get: f = (0xfdc835c90e3a7 × 5**1074) × 10**-1074. Doing the bignum arithmetic in Python:

>>> 0xfdc835c90e3a7*5**1074
22057962940299013219088444871698297180810646915076162814380027737133313462012753565268800133009754346982455773378649228542386089511905099064089161667128699185146652167002789769193707565995142837144858718070314868138161737017822025336229706850366930244559868446727407823544798473535637682124637114102680115962752266977932411044559967505389494856446591374915713609940677960104144642521710489393289235139773278304100095030513646631732523261274722896156425977370824164775413364001379949880608936893181313270831101194668936192963487591684776852893050766076702727207290032754272259506545649435783230681856372183115782562936168735707134276000850516970710821711477314016806760479824302551133042038608917031056563907704895485242116231171249918219245955697260797023773193359375L

So assuming the exponent is right, «2.2057962940299013e-308» is certainly a reasonable thing to print out. (The exponent is right, a little more checking can convince you of that).

By the way, that’s probably the largest integer I’ve legitimately computed. Testing a language’s bignum arithmetic by computing factorial(1000)/factorial(999) doesn’t count as legitimate.

37. drj11 Says:

766 digit answer to 0xfdc835c90e3a7*5**1074 is truncated by wordpress. *sigh*. Happily the important bit is the initial part of the answer.

38. Paul Says:

I found it by constructing random doubles to test my toString.

39. glorkspangle Says:

In this example, the shortest decimals which read as the right number end 99011 99012 99013 99014 or 99015

40. MikeC Says:

glorkspangle … glad you liked DPD :-)

Others: yes, toString for decimal numbers is interesting.

mfc

41. glorkspangle Says:

@Paul: what Python is that? In mine, f == float(repr(f)) for that f.
\$ python
Python 2.4.2 (#2, Oct 16 2005, 04:26:26)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
>>> import flt
>>> import struct
>>> f = struct.unpack(“d”, struct.pack(“Q”, 0xfdc835c90e3a7))[0]
>>> flt.float_parts(f)
(1L, -1075L, 8968181029856167L)
>>> f2 = float(repr(f))
>>> flt.float_parts(f)
(1L, -1075L, 8968181029856167L)
>>> f == f2
True
>>>

42. glorkspangle Says:

@David: my Python also disagrees with yours:

\$ python
Python 2.4.2 (#2, Oct 16 2005, 04:26:26)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
>>> import struct
>>> hex(struct.unpack(“Q”,struct.pack(“d”,2.2057962940299013e-308))[0]\
)
’0xFDC835C90E3A7L’

Note that …99010 should read as …e3a6, so I wonder whether somebody’s C library is truncating at some number of places.

43. glorkspangle Says:

(in short, we can’t say “Python does …”, in this arena. we have to be much more precise).

44. Paul Says:

@glorkspangle:
Python 2.5 on intel + Mac OS X.

Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin

45. drj11 Says:

@glorkspangle: minor bug in your float_string code. You use «f == 0.0» and «f == -0.0», but both of these tests are True for both zeroes, 0.0 and -0.0.

I couldn’t think of a way to portably distinguish between -0.0 and 0.0 in Python so in flopri I ended up using the struct.unpack hack to extract the sign bit.

The struct.unpack thing still gives me simple hackish pleasure whenever I use it.

46. glorkspangle Says:

Ta; fixed by postponing the test until after I have ‘sign’ in my hand.

47. Merwok Says:

Know the decimal module and its Decimal class? It does the right thing regarding printing, but is not compatible with float. Eeek.

48. drj11 Says:

Well, I do know of the decimal module, but I don’t really use it. I did spend some time following the decimal discussions on the ES4 list.