A little programming puzzle: /=

2007-12-18

/= is Common Lisp’s not-equals function (similar to the != operator from C). In Common Lisp /= is restricted to numbers and it is in fact an error to call it with any other type, but not a lot of people know that. It looks like an innocuous enough function, until one considers that like a lot of Lisp functions it can take any number of arguments:

(/= x1 x2 x3 ... xn)

The semantics of /= are to return True if and only if no two numbers are the same value. Note that (/= 1 2 1) is False; it is insufficient to merely consider adjacent pairs, if any pair have the same value then the function should return False.

Contrast this with the semantics of the function for equality, =, which returns True if and only if all numbers have the same value. This can be implemented by checking adjacent pairs, or comparing each argument against the first, a straightforward O(n) algorithm.

So the puzzle is, how do you implement /=? Given a list of numbers we want to return False if any pair have the same value, returning True otherwise. It’s most interesting if we consider the complexity and therefore the case when we have large numbers of arguments.

Common Lisp implements a wide range of number types: unbounded integers (bignums), unbounded rationals, finite precision floating-point types, and complex versions of all of those. /= compares numbers by value so the integer 0, the floating-point number 0d0, and the complex floating-point number #C(0d0 0d0) are all equal. Note that any real number can be considered as being equal in value to a complex number with a 0 imaginary part; also note that any floating-point number can be converted to a rational having exactly the same value (see my blog article on the subject). That means that for simplification we might want to convert all input numbers to the type (complex rational) before proceeding. Since that’s obviously a bounded time operation it doesn’t affect the complexity of whatever algorithm we come up with. That cuts down on the number of different types of pair-wise comparison operators that we might need.

The first, obviously naïve, approach would be to compare all possible pairs. That obviously works and is obviously O(n2).

If you like a puzzle, try improving upon this naïve algorithm before reading on.

The next approach I came up with is to sort the arguments first, and then do comparisons of all the adjacent pairs. Recall that the input numbers are complex and there’s no natural ordering. Doesn’t matter, we can just pick any old ordering, for example lexicographic ordering, where #C(a b) is less then #C(c d) if a < c, or a = c and b < d. This algorithm is O(n log(n)) for the sort (consult any algorithms book) and O(n) for the final comparison of adjacent pairs, so O(n log(n)) overall.

That seems a lot better. But there is another approach that’s even better, suggested by a good friend:

Create a hash-table and hash each argument into it, checking to see if an entry with that value already exists. If an entry already exists then that’s an equal pair, so return False; if we get to the end of the arguments then there’s no equal pair, so return True. This algorithm is O(n) unless you’re unlucky to run into some pathological hashing collision. So O(n) in all likelihood.

Of course the asymptotic behaviour of /= for large numbers of arguments is unlikely to form part of your decision to use one Lisp system or another, but it makes a nice puzzle. Unsurprisingly, the practice of using more than 2 arguments is very rare. Lisp test suites, that sort of thing. The best example I found is:

(not (/= samples 3 4))

which is true when samples is 3 or 4. This seems obscure to me, but it is more compact than alternatives; maybe it is idiomatic in some circles.

13 Responses to “A little programming puzzle: /=”

  1. Clive Says:

    The killer is that you’re using bugnums. If you were using numbers with a fixed-size representation you could just binsort and get O(n) worst case. (-8

  2. Nick Barnes Says:

    The best thing about the hashing approach is that it extends to non-numeric types.

  3. Gareth Rees Says:

    Surely

    (member samples '(3 4))

    would be the clearest way to write that kind of test? If you insist on comparison using = rather than equal (though superfluous in this particular example), you can use

    (member samples '(3 4) &test #'=)

    Or, given that you’re going to take some kind of action based on the result, maybe

    (case samples ((3 4) found-clause) (otherwise else-clause))

    I note that the following witty example appears on the first page of results from Google code search:

    ;; Assert the constraints concerning uniqueness of values(assert (/= s e n d m o r y))

  4. Gareth Rees Says:

    Obviously, when I wrote that it said

    ;; Assert the constraints concerning uniqueness of values
    (assert (/= s e n d m o r y))

    but wordpress screwed it up.

  5. drj11 Says:

    @Gareth: indeed, to all of that. WordPress sucks for talking about code. Doubly so in comments.

  6. Pavel Says:

    The hashtable approach is kinda cheating: For reasonably large argument lists making a sufficiently large hash to be reasonably sure of not getting a hashing collision is prohibetively expensive memory-wise. Also, you can’t be sure that your hash function is good enough without analyzing the data which is too expensive as well. With all those assumptions you can assume the data is sorted and simply compare neighbours. Or even just return true(or false).

  7. Gareth Rees Says:

    Hash collisions don’t make the algorithm go wrong, they just make it less efficient. For example, if you use an open hash table, then you can sort each bucket and compare neighbours; as you get more collisions, the performance decays smoothly from O(n) to O(n log n).

  8. g Says:

    Gosh, comment spam [ed: now deleted, way, way, way too late]. A sign of popularity, I guess. (It even looks like comment spam written by someone who has some inkling of what the words mean.)

    For what it’s worth, SBCL appears to do the obvious n2 thing up to 27 args at least. (At that point it fell over with a memory shortage. I was running with a reduced dynamic space to avoid the VirtualAlloc problem you sometimes get running SBCL on Windows.) This was tested by the oh-so-scientific approach of counting instances of “GENERIC-=” in the output of DISASSEMBLE when compiling a trivial function. I was too lazy to actually check the source.

    I’d guess #args would need to get much bigger than real programmers ever use before there’d be the least point in doing anything cleverer. (So much for Common Lisp’s famous reputation for doing the Right Thing at all costs, as per “Good news, bad news, how to win big”.) But if I were implementing a Lisp and had no more urgent calls on my time than gold-plating /=, I’d use a hash table when the number of arguments is large. Definitely better than sorting (bin-sorting probably included, even if no bigna), for the reasons already given.

  9. drj11 Says:

    bigna!?

    harh harh.


  10. Surely people write (apply /= ) all the time as the normal idiom for checking whether the list elements are distinct. (Sorry for the syntax, I’m a Schemer, not a Lisper.) So /= really _must_ be “gold plated” from the start. Surely anything that can take arbitrary numbers of inputs should be implemented with a half-decent scalable algorithm.

    I wish.


  11. WordPress screwed me. I said

    (apply /= <whacking great list>)

  12. lorg Says:

    My usual Python idiom for doing this is
    len(set(seq)) == len(seq), (or use !=.) This is equivalent to the hash-map solution.

  13. drj11 Says:

    Python wins again!


Leave a comment