As I drifted off to sleep my mind felt as if floating free amongst mist; the mists cleared and the following code was revealed:
def g(l, n):
return zip(*[iter(l)]*n)
(in my dream of course, l and n were free also. I bound them with a def for this article)
Python being such a naturally clear language, it’s obvious what this code does, right?
>>> g(range(10), 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> g('hello world!', 2)
[('h', 'e'), ('l', 'l'), ('o', ' '), ('w', 'o'), ('r', 'l'), ('d', '!')]
Yeah, course. g groups a list l into a list of n-tuples, by taking each group of n elements from the list and making them into a tuple. How useful.
What good is it? Well, this sort of thing crops up all the time. Decoding hex dumps for example:
# g decomposes hex string into hex-pairs:
>>> g('1c47ff47', 2)
[('1', 'c'), ('4', '7'), ('f', 'f'), ('4', '7')]
# then we can use int to decode to decimal
>>> map(lambda x:int(''.join(x), 16), g('1c47ff47', 2))
[28, 71, 255, 71]
Sorry the previous example is a bit clumsy, a Python sequence of characters isn’t quite the same as a string. Sadly.
Now the implementation of g is a bit subtle, and relies on some features of Python that might appear a bit unsound. But I’m sure I heard Raymond Hettinger say that if i is some iterable then zip(i, i, i) turns out to make 3-tuples of successive elements of i. This used to be so by accident of implementation (basically zip just happened to construct its tuples by taking one element from each argument in turn); but now they had decided that the idiom was so useful that zip is now defined to behave like that. In other words zip isn’t allowed to make internal copies of its arguments, or peek ahead in any of its iterable arguments.
zip(*[iter(l)]*n) is the same thing, but for variable n. Of note is that [i]*n makes a list of length n all of whose element entries refer to the same i (this was the cause of a bug of mine last year when I wrote [[]]*16). That iterable i is then passed n times (as n arguments) to the zip function. Neat.
It still scares me a bit.
This code is obviously ridiculous. I can’t help feeling I’ve missed a more Pythonic way of doing it.