Python: perils of «x is 1»

2007-06-11

This arises from very useful discussions with Gareth and Paul in my article on Iverson’s convention.

Python code, like this: «if x is 1 :», that uses is to test equality with numbers, is subtely wrong and should not be tolerated. Python’s is operator tests for object identity (whether two expressions evaluate to the same object). Numbers in Python are objects just like everything else (and very sensible this is too).

The problem is that there may be more than one object with the same value. And since you as a programmer are almost certainly only interested in the value of a number it’s probably a mistake to use is for numbers. Two numbers with the same value aren’t necessarily the same object. Example using float:

>>> x = 1.0
>>> x is 1.0
False
>>> x == 1.0
True

Each floating point number is actually a new object, so the «x is 1.0» test fails; the literal 1.0 creates a new float object that is not the same as the one that was created for the assignment «x = 1.0».

The same is true of long:

>>> x = 1L
>>> x is 1L
False
>>> x == 1L
True

Compiler memoisation

As Gareth points out there is an additional subtlety. The Python compiler feels at liberty to merge literal number objects having the same value into one number object; at least within a compilation unit:

>>> x = 1L
>>> y = 1L; print x is 1L, y is 1L
False True

«y is 1L» is True in this case because Python uses the same 1L object for the all the 1L literals that appear on the same line passed to the interpreter.

On int

For int (Plain integer) the situtation is a bit more misleading. Certainly an int is an object and you can have two int objects with the same value, creating the same issues as for float and long:

>>> x = 2007
>>> x is 2007
False

The issue is that using is seems to work for smaller numbers:

>>> x = 160
>>> x is 160
True

What is happening is that Python maintains a global store of pre-made int objects for some of the more commonly found values. Whenever it needs an int object it checks to see if it is one of its favourites that it made earlier and if so then reuses that object instead of creating a new one. Compared to allocating a new object every time an int is required, this is a big win.

There’s no mention of this in the Python language reference, and it’s just the sort of thing you might stumble upon by accident. You might observe by experiment that is is safe to use on numbers, because you only played with small ones, and then get bitten by some sort of horrible «x is 2007» bug.

The Python implementation (that is, the C Python implementation that everyone uses) doesn’t define what range of int objects get memoised in this way. This is deliberate. Apparently it used to be 100 numbers, might now be 262 numbers. Maybe other implementations don’t use this technique at all. If you think «x is 1» is safe then what range are you going to rely on?

The message is clear: It is not sensible to rely on using is for numbers. Not even 1.

Bonus section: bool

Though bool is an integer type (False == 0, True == 1), code like «x is False» is fine. That’s because the language specification says that there are only two bool objects, True and False. You can’t create any more, so expressions that evaluate to True or False, like «x > 7», always evaluate to one of the canonical bool objects. You can’t even use bool.__new__ to reach behind the curtain and generate new bool objects:

>>> x = bool.__new__(bool, 0)
>>> x
False
>>> x is False
True

(despite what the documentation for bool.__new__ says)

Obligatory Lisp Comparison

Python’s is is Common Lisp’s EQ. Like Python, Common Lisp has all the same issues about whether «(EQ 3 3)» is true or not (it’s not guaranteed, either way). But Common Lisp has EQL which implements conceptual sameness. Python doesn’t have anything similar, and I think that’s a mistake.

18 Responses to “Python: perils of «x is 1»”

  1. Gareth Rees Says:

    Can you say more about why it’s a mistake not to have an EQL-like comparison? It seems to me that EQL in Lisp is a bit of a hack. For robust numeric comparisons, you want =, since EQL doesn’t compare numbers of different types, and may compare as different two equal numbers with different representations (such as 0.0 and −0.0 in some floating-point systems).

    I can’t find a rationale for the existence of EQL, but I guess that it’s intended to be a compromise between speed and correctness for searching sequences: in typical implementations EQL can just compare pointers, except in the case of characters and numbers, and in the latter case it only has to unbox at most once and then compare representations. This makes EQL a suitable default for the :test argument to searching functions like member, find-if, position and so on.

    But Python takes a very different view about searching containers: it’s up to the container to decide how it’s going to be searched (by providing suitable methods for count, index, __getitem__ and __contains__). There’s no functions or methods in the standard library that let you pass in your own equality test. So I don’t see any role for an EQL-like comparison.

  2. Paul Says:

    Of course there are counterexamples, but ‘x is 1’ would usually be wrong even if every int 1 were the same object: because the code wouldn’t work for other types than int. Even without considering a user-defined type that behaves like int, ‘1L is 1’ doesn’t hold. Your code will fail when your x has been produced by sums that have overflowed and produced a long.

    Usually ‘x is True’ is better written ‘x’ and ‘x is False’ as ‘not x’. If you need to distinguish between bool values and other values of other types then you can’t do this, but it’s rare to (correctly) want that.

  3. drj11 Says:

    @Paul:

    Python has int and long as distinct types and I think that’s stupid and wrong and should not be tolerated. Lisp just has integer and that’s the right way to go about it. Python is clearly heading in that direction, arithmetic on int used to give different numerical values to the same arithmetic carried out using long but that’s no longer the case.

    User-defined types that behave like int… hmm.

    Would you say «(EQL X 1)» was usually wrong in Lisp? (I’m just curious as to whether user-defined types and the int/long mess are the only issues)

    You’re quite right about «x is True». When I said “fine”, I meant you won’t suffer object/value confusion, not that I would generally recommend the practice.

    We’re moving towards an explanation of what value means in Python. briefly, I think there are situations when it’s useful to think of 1 and 1.0 as having different values, but I’ll say more in either another comment, or, perhaps more likely at this rate, another article.

  4. Gareth Rees Says:

    (EQL X 1) just feels dodgy to me. Why did you not use (= X 1)? “Because it’s faster” sounds like premature optimization. “Because I want 1.0 to test different from 1” seems rather obscure; a TYPECASE or similar test would be a clearer way to accomplish this. “Because using EQL is a convention that indicates to the reader that I know that X is an integer” also seems bogus; surely DECLARE would clearer.

    Nonetheless Google code search suggests that EQL is preferred to = for comparing with small integers.

  5. Gareth Rees Says:

    I have a more plausible theory. Historically, (some implementations of) Lisp had only EQ, EQL and EQUAL. It became conventional to avoid EQUAL unless you really wanted recursive comparison (for performance reasons). And of course you couldn’t use EQ for numbers. So people got into the habit of using EQL by default. By the time Common Lisp standardized =, the habit had stuck.

  6. Joost M. Says:

    What’s the difference between conceptual sameness (as in LISP’s ‘EQL’) and equality (Python’s ‘==’)?

  7. drj11 Says:

    @Gareth: Well, (EQL X 1) is defined for all X, whereas (= X 1) is defined only when X is a number (in Common Lisp). This surprises even some Lisp programmers.

  8. drj11 Says:

    @Joost: It mostly hinges on whether you think the int 1 and the float 1.0 are the same concept or not. (EQL 1 1.0) is False in Lisp; different types, so they can’t be EQL. 1 == 1.0 is True in Python because == compares all numbers by their mathematical value.

    For lists Python’s == is more like Lisp’s EQUAL.

    At the core of my gripe, in so far as I have one, is that Python doesn’t have enough equality relationships.

  9. drj11 Says:

    @Gareth: Also, the fact that EQL is guaranteed to terminate (whereas EQUAL is not) probably influences how likely Lispers are to use it.

    There’s a good argument for saying that Lisp doesn’t have enough equality relationships either. EQUAL and = are unsafe in general. EQ provides a pointless implementation dependent optimisation that is better expressed using DECLARE.

    But at least in Lisp’s case it seems easier to build equality functions out of EQL than building them from Python’s ==.

  10. drj11 Says:

    @Gareth: I’m not sure I get the point you’re trying to make with your Google codesearches. Maybe the comment form mangled your exact searches somehow?

    I see “about 700” uses of EQL which seem to be mostly testing for equality of symbols (which is curious because lots of people would idiomatically use the equally correct EQ for that case). I see “about 500” uses of = most of which seem to be when both quantities are known to be numbers (hmm, or maybe just intended to be). I certainly don’t see many uses of EQL for small numbers.

  11. Gareth Rees Says:

    This search is more informative, sorry. I’m just indicating that there are lots of people writing (EQL HOUR 12) or (EQL CHAR #). I suspect this is habit rather than any well-thought-out choice of equality predicate. (As suggested by many occurrences of (EQL VAR ‘SYMBOL) which is pointless.)

    (= VAR 0) is undefined when VAR is not a number, but in practice it signals TYPE-ERROR or whatever, which I think is usually more useful than evaluating to NIL. (If VAR is not a number, that’s a bug in my code and I’d prefer to find out sooner rather than later.)

    I do see a role for EQL in Lisp as the default value for the :TEST argument to search functions. But there’s no corresponding role in Python.

  12. Gareth Rees Says:

    That should have read (EQL CHAR #\). Bah to WordPress.

  13. Gareth Rees Says:

    Double bah. How about (EQL CHAR #\0)?

  14. Paul Prescod Says:

    It isn’t actually (or at least it isn’t ONLY) the compiler that merges number objects. “0+1 is 2-1” evaluates to true. Python caches and reuses a list of small integer objects for performance.

  15. Tim Head Says:

    for added fun consider

    >>> “test” is “test”
    True
    >>> t = “test”
    >>> “test” is t
    True
    >>> x = “test”
    >>> x is t
    True

  16. drj11 Says:

    (EQL VAR ‘SYMBOL) isn’t pointless. It means exactly the same thing as (EQ VAR ‘SYMBOL) and would be preferred by those that think EQ should be avoided. Note a compiler can easily generate identical code for (EQL VAR ‘SYMBOL) and (EQ VAR ‘SYMBOL).

    As for =, I think _relying_ on it to signal in production code would be bad. I would rather use an ASSERT or a carefully controlled DECLARE (probably generated by a macro). But I’m not sure.

  17. drj11 Says:

    @Paul Prescod: Yes, the article addresses that, the section “Compiler memoisation” uses long which are not pre-interned. The section “On int” discusses the global pre-interning of small ints. The point is, you shouldn’t be relying on it.

  18. drj11 Says:

    @Tim: Yes, literal strings are interned globally, just like Java (well, kind of), but computed strings are not. So it’s wrong to use is for strings as well:

    >>> x=’baa’
    >>> y=’baabaa’
    >>> x*=2
    >>> x is y
    False
    >>> x == y
    True

    Similar story for tuples.


Leave a comment