Python: integer, int, float

2009-02-27

Python is a duck typed language, right? That means you don’t care what value something is, you only care what it quacks like. On the whole Python does pretty well at this. Functions that take lists can usually also take any sort of sequence or iterable; functions that take files can usually also take anything that looks sufficiently file-like (has a read method).

Where it disappoints me is the number of times a builtin requires an int instead of an integer. First a little bit of terminology. When I say “int” I mean the language type int; when I say “integer” I mean any value that has a mathematical value that is an integer. So 7 is an int (in Python); it’s also an integer. 7.0 is an integer, but it’s not an int.

So, I have a suggestion: anything in Python that currently requires an int should accept an integer.

So what sort of things am I talking about?

range. I should be able to go «range(1e6)» to get very long lists:

>>> len(range(1e6))
__main__:1: DeprecationWarning: integer argument expected, got float
1000000

I find this just annoying. 1e6 is an integer. Yes it happens to be stored in a float, but you, Python, ought not to care about that. Hmm, or perhaps you would rather I wrote «1e6» as «1*10**6»?

Another way of constructing very long lists is thwarted:

>>> [0]*1e6
Traceback (most recent call last):
  File "", line 1, in 
TypeError: can't multiply sequence by non-int of type 'float'

At least here the error message correctly refers to int instead of integer. But I would say that since 1e6 is an integer then the code has a perfectly clear and unambiguous meaning. So it should work.

Linguistically, what do I mean by an integer? Well, here’s a neat definition:

def isinteger(x): return int(x) == x

In Python we can define an integer to be any value that converts to int and is equal. Observe that this works for int, long, float, decimal, and hopefully any other vaguely numerical type that might be defined in the future:

>>> map(isinteger, [7, 7L, 7.0, 7.1, decimal.Decimal(7)])
[True, True, True, False, True]

If you were actually going to use this in real code, you need to catch exceptions in isinteger:

def isinteger(x):
  try:
    return int(x) == x
  except:
    return False

otherwise isinteger would barf on things like «list()».

It’s not just large integers, like 1e6, which happen to be more conveniently written in floating point format, that cause a problem. It can happen, quite reasonably, with smaller numbers. Especially when they are the results of computations.

Let’s say I’m creating a PNG file with bit depth of 2, and I have a sequence of pixel values for a single row. Each pixel is a value «in range(4)»; at some point I’m going to have to pack the pixels into bytes, 4 pixels per byte. It can be more convenient to do this if I round my row length up to a multiple of 4; that way my packer only has to deal with rows that fit into an exact number of bytes. So let’s say the row-length is l and I want it rounded up:

u = math.ceil(l/4.0)*4.0

(in real code, the “4.0″ would probably be a parameter, and I’d probably have to use «float(pixels_per_byte)» to get the right sort of division (or use Python 3.0))

The amount of padding I need to add is therefore «u – l»:

padding = [0]*(u-l)

Alas, this barfs, because u is a float. So I end up having to use what feels to me like a gratuitious int:

padding = [0]*int(u-l)

Note that u came out of «math.ceil» so it’s already an integer; so «u-l» is an integer too.

If you were to take this “accept integers, not just ints” philosophy on board, you might find the following function useful:

def preferint(x):
  if isinteger(x):
    return int(x)
  return x

preferint will change any integer to int, leaving other values unchanged. You can stick «foo = preferint(foo)» at the front of all your functions. Or use those outré decorators.

We can write a prefer that works for any type:

def prefer(value, type):
  try:
    if type(value) == value:
      return type(value)
    except:
      return value

I suppose there are probably other parts of Python that this applies to, and I could go looking for them, but the two I’ve mentioned are the ones that bug me on a regular basis. List and string indexing would be one, but I can’t remember it annoying me:

>>> 'foo'[2.0]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: string indices must be integers
>>> range(3)[2.0]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: list indices must be integers

But yes, if I was campaigning for change then that should be changed too. Scanning through the builtins: chr already works (but «chr(7.1)» also works which is a bug), hex needs changing.

PS: (rant for another article) isinteger does not work for «complex(7)». Should it?

About these ads

31 Responses to “Python: integer, int, float”


  1. A couple of puzzles.

    First, surely finally is always executed? Your second definition of isinteger(x) would appear to always return False.

    Second, isinteger(1e6) does not do what you want becuase int(1e6) is not equal to 1e6 which is because they are actually different numbers. 1e6 represents a floating point number which is not equal to 1000000.

    I am therefore not entirely sure I know what you mean by an integer 8-).

  2. Nick Barnes Says:

    Yes, I think that everything that wants an int should take an integer. However, there is a reasonable argument against it, to do with code maintainability. Putting in a call to int() shows human readers that the programmer knows what they are doing, and also makes the code more resilient in the face of future changes to the code which computes the value in question.

    “Can’t convert complex to int” is pathetic. “Can’t be bothered to convert complex to int” would be more like it.

    Francis, the Python float 1e6 is in fact equal to 1000000. Python floats are IEEE 754 doubles; they have 52+1 bits of mantissa. David, please correct me if I’m wrong.

    And of course any float of sufficiently large magnitude is an integer. Are there any integer values which a proper converter will convert to a non-integral float? I don’t think so.

  3. drj11 Says:

    @Francis: NickB is quite correct for float. 1000000 == 1e6. And _you_ are quite correct that my finallys should be except. (Thomas Guest spotted that too).

    ta.

  4. James Block Says:

    First, a blatant appeal to authority:

    > >>>import this
    >
    > The Zen of Python, by Tim Peters
    >
    > Beautiful is better than ugly.
    > Explicit is better than implicit.

    And so on.

    As someone with mathematical background (including numerical analysis), your proposals strike me as violating the aforementioned Zen at best, and actively dangerous at worst. There’s a *lot* of difference between an integer and a floating-point (real) number: the integers have properties that real numbers simply do not have, and implicitly coercing a float into an integer runs the risk of introducing bugs that simply cannot exist when conversion to integer is explicit.

    Your padding example is especially bad on this front (maybe you made a mistake writing it up?): “u = math.ceil(l/4.0)*4.0 … Not that u came out of «math.ceil» so it’s already an integer; so «u-l» is an integer too.” Nope, u there is a float, whether you like it or not. That definition of u should be explicitly wrapped in an int() call to get the behavior you expect… anything less feels like magic behavior to me.

    The more fundamental issue, of course, is that floating point numbers have an uncertainty fundamentally built in to them. Your isinteger() function should really be written using an explicit tolerance epsilon or something similar; that leaves the question of deciding what the tolerance should be. Sure, you can say that only integers exactly representable by the floating-point format currently in use are allowed, but if you do that you’re just going to cause grief for someone else later who makes a similar argument that “7.000000000000001″ ought to be equivalent to the integer “7″… and so on. It’s an ugly situation, and explicit int() avoids it completely.

    This is made fully explicit to me with your use of “1e6″. As a scientist, when I see “1e6″ I think “a number between 500,000 and 1,500,000″, since you’ve only expressed it with one significant figure. The compiler sees “1e6″, parses and stores it as a float, and only then moves on to do the multiply… the final floating-point number may in fact be the exact representation for some integer, but your initial representation of the number simply does *not* contain enough precision for me to be comfortable with declaring the two equivalent. The expression “10**6″, on the other hand, is pure integer arithmetic and thus unambiguously an integer (not to mention its computation should be easily optimized away).

    Explicit use of int() just avoids the whole issue, and makes it easier to reason about the whole process.

  5. drj11 Says:

    @James: I think you should get more comfy with the fact that every int is exactly representable as a float (note, not every integer, but every int).

    Consider «u = math.ceil(l/4.0)*4.0». I say u is an integer. You say “nope, u is a float”. Well, u is a float, that happens to be an integer. Of course it’s an integer, I just got it from math.ceil, it would be a gross violation of contract for math.ceil to return a non-integer float.

    As for what you “see” when I write “1e6″, that’s irrelevant. What’s important is what’s denoted. The compiler converts 1e6 into the float that exactly represents the number 1000000.

    You say “floating point numbers have an uncertainty fundamentally built in to them”. This is just wrong. Floating point numbers exactly represent a finite selection of reals (in fact, rationals). I understand your desire for representing “precision” and “uncertainty” but the currently prevalent floating point model is exact. Go get your numeric tower from Scheme if you want inexact arithmetic (Scheme properly models whether the result of a computation is exact or inexact).

    • Mohammad Alhashash Says:

      “… every int is exactly representable as a float …”

      Not on my 64bit CPython or any platform with 64bit int and IEEE 754 doubles:

      >>> sys.maxint
      9223372036854775807
      >>> int(float(sys.maxint))
      9223372036854775808L

      But, I still agree with your idea.

      • David Jones Says:

        Sure. And of course I knew that on 64-bit systems ‘int’ has 64-bits of precision and therefore doesn’t fit into a float. It would have been too annoying to point all that out. :)


  6. Nick:

    >>> 100000==1e6
    False

    Is what I meant. They aren’t equal in the python sense.

    What David (I think) means is there exist a cast function “int” so that:

    >>> int(1e6)==1000000
    True

    Fair enough. But its not *quite* so logically straightforward as that. Floating point numbers do have a different semantics from integers (as James Block points out) so throwing around implicit casting might not be what you always want to do.

    In particular you may miss yet another check when apples are being used when oranges are meant.

    For the record I trip over this often and don’t like it. I don’t work with floats overmuch so I don’t want them obtruding into what I do.

    But type checking can be a useful static error check. I am not sure python programming necessarily gains by its completely cavalier attitude to typing.

    An illustration of a useful and *opposite* trend is Andrew Kennedy’s work on dimension typing – two integers might be different things and you don’t want to treat them as the same even though they are both integers. One is length and one is mass. Adding them could be fatal.

    Really the problem is most often met by iterations and so on (as David points out) in which numbers are almost being used as ordinals and then some kind of implicit from float casting might be OK.

    As for complex – the problem here is just an inherited problem from float isn’t it? If there were gaussian integers you ought to be able to use the relevant subring in the obvious way.

  7. drj11 Says:

    A further note on using int:

    Consider «int(math.ceil(x))»

    For this code to be correct the author is already assuming that math.ceil returns exactly an integer. How so? Consider if math.ceil returned “roughly an integer”. int(3.999999) is grossly different from int(4.000001).

    So anyone that passes the result of math.ceil to int is already implicitly comfortable with the fact that math.ceil returns exact integers. And it would be completely unusable if it did anything else.

    Now, consider «int(x)».

    Am I converting a floating point value that I already know to be an integer to an int (no change in value)? Or, am I discarding the fractional part of a floating point value by truncating it to an integer? You can’t tell. I have to tell you. With comments, if necessary.

    Basically the current int does two jobs, type conversion and truncation, and they should be separated.

    Now if int raised an exception for non-integer inputs (or an alternative builtin), then i think that might be a good thing. Similarly if there were versions of math.ceil, math.floor, and round that returned int (or long, as appropriate), then that would be a good thing too. They could even get the roadmap from Common Lisp and have floor and ceiling take two arguments and do the division for you.

  8. drj11 Says:

    @Francis: You’re being slightly careless:

    >>> 1000000==1e6
    True

    This is one of the reasons I prefer floating point format for large round integers. I don’t have to count the zeros and get it wrong. I was being humourous in my 1*10**6 suggestion, but only half so.

    Of course, any real language, *cough* CPL, let you use proper exponential notation. What do you mean you can’t type that in? Meh.

  9. James Block Says:

    “I think you should get more comfy with the fact that every int is exactly representable as a float (note, not every integer, but every int).”

    The distinction between “int” and “integer” is 1. arbitrary and 2. fluid. “Arbitrary” as in “true for 32-bit ints and floats with 52-bit mantissas but false for 64-bit ints and floats with 52-bit mantissas”, and “fluid” as in “gone in Python 3″ (fixing a major source of annoyance for me personally). So let’s set the int/integer bit aside, recognizing that most integers of interest will fit well within a 52-bit float mantissa.

    Back to “u = math.ceil(l/4.0)*4.0″, then. It’s absolutely true that ceil gives an integer, but then you multiply by… a float! How exactly is the compiler/interpreter supposed to recognize that, in this case, you wanted u to end up as an integer? All it sees is an int-float multiply. Yes, the float happens to be equivalent to an int, but the compiler doesn’t know that you care in this case. An explicit int() call would inform it that you do, and solve the dilemma nicely. u is not an integer value and should *never* have integer type, even if it is equivalent to one. The argument that u should be allowed as an index, due to being equivalent to an integer, has some merit, but the final result of that computation of u *must* be a float in any sane language.

    “As for what you “see” when I write “1e6″, that’s irrelevant. What’s important is what’s denoted.” I think we’ll have to agree to disagree on this one. I’m of the opinion that reading the source code and coming across a constant in floating-point notation should indicate to the reader that floating-point arithmetic is in use; to silently coerce to integer arithmetic strikes me as a serious breach of the reader’s expectation.

    As to your final point, it is true that every floating-point number is in fact equivalent to a true real number. But this is not how floating-point is *used* or how people *think about floating point*. Almost always, a particular floating point number is chosen not because it represents the real number we are interested in, but because it happens to be the closest approximation to the real number of interest in our floating-point format. There are very few algorithms indeed that rely on the precise correspondence of floating-point numbers with certain real numbers, and to introduce a “feature” that implicitly relies on this fact is just asking for trouble. Anyone who sees a float and thinks of the specific real number it is associated with is in a state of denial about the reality of floating-point number systems and their uses. For nearly every use, they *do* have an uncertainty (even if we usually like to forget about that); if that’s not what you want, use a Decimal.

  10. drj11 Says:

    @James: Now you’re being careless too. Considering “math.ceil(l/4.0)*4.0″ you say “All it sees is an int-float multiply”. Not true, the compiler “sees” a float–float multiply (in the, the runtime, since the standard compiler is not JIT enough to know that math.ceil only returns float).

    I’m not proposing that the arithmetic rules are changed.

    You say “the final result of that computation of u *must* be a float in any sane language”. Agreed, and I never proposed otherwise.

    Perhaps you’d like the languages Lua and JavaScript. There are no integer types. All the world’s a float. People do “integer only” computations all the time. You can even bit-twiddle in JavaScript.


  11. @David: Ah yes, silly me. I do have difficulty with that kind of thing.

    I like your point – which I think makes things clear – that int() is doing two jobs its doing a type conversion and a truncation job.

    I guess I worry less about this because I use python for scree-scraping, so I worry about the re library a lot but much less about floats.

  12. drj11 Says:

    @Francis: On complex. Common Lisp has the (complex integer) type. More and more I feel that Python should just “get with the program” and implement the bits that are missing from Common Lisp. Multi-dimensional reshapable arrays being by other gripe at the comment (not that Common Lisp’s support here is great, but it exists).

    I’m not sure what you mean by “As for complex – the problem here is just an inherited problem from float isn’t it?”. «float(complex(7))» doesn’t work either and that’s also an annoyance.


  13. What I meant was in some languages Complex is a superclass of Float and so if you aren’t allowed to use a Float then you aren’t allowed to use a Complex.

    But as you say python is very ornery about these atomic types while being truly cavalier about everything else 8-).

  14. Nick Barnes Says:

    @James: As to your final point, it is true that every floating-point number is in fact equivalent to a true real number. But this is not how floating-point is *used* or how people *think about floating point*. Almost always, a particular floating point number is chosen not because it represents the real number we are interested in, but because it happens to be the closest approximation to the real number of interest in our floating-point format.

    Your values of “people”, “used”, and “almost always” are different from mine.

    I suspect that almost all the floating point constants in my code over the last 25 years have been 10.0 (with a few instances of, for instance, 1.0, 2.0, -1.0, etc).

    Entertaining side-anecdote: A memorable bug was encountered when bootstrapping the MLWorks compiler; at some point it acquired an incorrect representation of 10.0. This representation persisted through several subsequent phases of the bootstrap because reading floats – such as the explicit 10.0 in the float-reading code in the lexer – involved multiplying or dividing by the compiler’s current value of 10.

    In any case, the real numbers of interest to me, in my code, have generally been integers. I have been known to get very cross indeed with sloppy writers of compilers and runtimes who think “ah, they’re floats, they obviously don’t care about precision”. Such as in previous discussions on David’s blog, in fact.

  15. ewx Says:

    “Floats aren’t exact” is a pretty widespread idea and is presumably something people get told to dissuade them from putting too much trust in the results of FP computations.

    I think it’s a rather wrong-headed way of going about it. You might equally say that “ints aren’t exact” on the basis that 7/2 (e.g. in C) doesn’t give you exactly 3.5 for instance!

    • SomeGuy Says:

      Division is the only integer operation that isn’t exact. Addition, subtraction, multiplication, and bit operations are always precise. “Floats aren’t exact” because rounding errors can occur on ANY operation, not just division.

      (a+b)-b != a is in fact rather common with floats. This can’t happen with integers.

      >>> (10**23) == int(1e23)
      False

      There was a very nasty security exploit some years ago because of an implicit conversion to float and back for an int containing permission bits. The code in question contained no explicit floating point at all.

      • wx Says:

        If you do
        print(int(1e23))
        print(10**23)
        you’ll see why it’s false.
        Why would a person use ’1e23′ instead of ’10**23′ or even ’1*10**23′ ? It’s computer programming and one wouldn’t exactly die over 2 extra keystrokes, i mean, it doesn’t say anywhere that 1e23 will use an integer of value 10 as base, so why risk it? ‘Explicit is better than implicit.’ – Zen of Python

        [ed: happily i can't reply as you've nested too deep. :) ]

  16. mathew Says:

    % irb
    >> (0..1e6).max
    => 1000000
    >> [0]*1e6
    => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …and so on

    Ruby wins again.

    >> l = 233
    => 233
    >> u = (l/4.0).ceil * 4
    => 236
    >> padding = [0]*(u-l)
    => [0, 0, 0]
    >>

    Come over to the light side.


  17. It seems to me that floor and ceil should just return ints. The fact that you’re using them at all is a strong hint that you want an integer, no? And if you really wanted a float it’ll be converted back to one by whatever you do next.

    This doesn’t solve all your problems, of course. I’m not sure about your solution ‘cos I can see that an automatic type conversion that doesn’t always work (unlike of int->float) feels more prone to future bugs (at least as a flavour in the brain), but on the other hand I agree it would be nice if your examples worked.

    So maybe 1e6 should be an int; if I’d wanted a float I’d have asked for 1.0e6, wouldn’t I?

  18. Nick Barnes Says:

    For the record, what do builtins such as array-indexing do, in Lua and JavaScript, if I give them a non-integral number?

  19. drj11 Says:

    @Nick: On Lua and JavaScript… Well, they work. :) In slightly different ways.

    In Lua: an array is just a table that is usually accessed with integer keys. Semantically they are indistinguishable. Tables can be indexed with any value (except nil, obviously; not nil, no). So a[1] is the same as a[1.0] is the same as a[math.pow(10,0)] because these are all the same floating point value. Just because I’m about to mention it for JavaScript, the string ’1′ indexes a different slot in the table. a['1'] is not the same as a[1].

    In JavaScript: all tables (any Object, really) take only strings as keys. An Array is just a table with a bit of magic for its length property. Numbers used as indexes get converted to strings. So a[1] is the same as a['1']. Numbers go through a very carefully defined conversion to string which essentially means that typically whole numbers appear as integers without a decimal point; other floats map to their shortest representation, so a[0.1] is the same as a['0.1'], guaranteed.

    There’s a small mistake in the JavaScript specification that allows implementations some pointless wiggle room for numbers like 3e-324.

  20. drj11 Says:

    @Writinghawk: Perhaps ceil, floor, and so on should return ints (int or long, really). I suspect that some of the numbers would require a slightly embarrassing amount of memory when stored as longs. For example, I suspect that int(1e300) uses more than 31 words of storage.

    Probably the reason they don’t is that they’re intended to expose the functions from C, not paper over them.

  21. Jess Says:

    “Perhaps ceil, floor, and so on should return ints”

    In python 3, they do.

  22. drj11 Says:

    @Jess: That’s nice to know. I still wonder whether I’ll be able to stomach a language without reduce.

  23. Jess Says:

    Don’t worry about that one. reduce() was just moved from __builtins__ to functools. (actually in 2.6 it’s in both places)

  24. drj11 Says:

    I know, I know.

  25. wx Says:

    Stopped reading in the part that you said ’7.0 is an integer, but it’s not an int.’. 7.0 is not an int nor an integer (which is the same thing). I had a 1.2gb list that had been populated with int/integer values, then i made some maths with every value (and i knew the results of the operations would always result in an integer) and populated the list again with the results, gained 200mb of memory overhead, because in order to do operations all integers were converted to floats and i stored the float results. I’ve explicitly converted the results to int before populating the list, problem solved.

    • drj11 Says:

      When I say “integer” I mean the mathematical concept of integer. A number with no fractional part (or with fractional part 0, if you like). When I say “int” I mean the Python type in. 7.0 is blatantly an integer.

      Some real numbers are also integers, yes?


      • Quite. 7.0 is clearly an integer. Its a decimal representation of the integer “7″. Mathematically all integers are real numbers. As David says, some expressions which represent integers aren’t ints in python.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: