2009-01-28

As I drifted off to sleep my mind felt as if floating free amongst mist; the mists cleared and the following code was revealed:

```def g(l, n):
return zip(*[iter(l)]*n)
```

(in my dream of course, l and n were free also. I bound them with a `def` for this article)

Python being such a naturally clear language, it’s obvious what this code does, right?

```>>> g(range(10), 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> g('hello world!', 2)
[('h', 'e'), ('l', 'l'), ('o', ' '), ('w', 'o'), ('r', 'l'), ('d', '!')]
```

Yeah, course. g groups a list l into a list of n-tuples, by taking each group of n elements from the list and making them into a tuple. How useful.

What good is it? Well, this sort of thing crops up all the time. Decoding hex dumps for example:

```# g decomposes hex string into hex-pairs:
>>> g('1c47ff47', 2)
[('1', 'c'), ('4', '7'), ('f', 'f'), ('4', '7')]
# then we can use int to decode to decimal
>>> map(lambda x:int(''.join(x), 16), g('1c47ff47', 2))
[28, 71, 255, 71]
```

Sorry the previous example is a bit clumsy, a Python sequence of characters isn’t quite the same as a string. Sadly.

Now the implementation of g is a bit subtle, and relies on some features of Python that might appear a bit unsound. But I’m sure I heard Raymond Hettinger say that if i is some iterable then `zip(i, i, i)` turns out to make 3-tuples of successive elements of i. This used to be so by accident of implementation (basically zip just happened to construct its tuples by taking one element from each argument in turn); but now they had decided that the idiom was so useful that zip is now defined to behave like that. In other words zip isn’t allowed to make internal copies of its arguments, or peek ahead in any of its iterable arguments.

`zip(*[iter(l)]*n)` is the same thing, but for variable n. Of note is that `[i]*n` makes a list of length n all of whose element entries refer to the same i (this was the cause of a bug of mine last year when I wrote `[[]]*16`). That iterable i is then passed n times (as n arguments) to the zip function. Neat.

It still scares me a bit.

This code is obviously ridiculous. I can’t help feeling I’ve missed a more Pythonic way of doing it.

10 Responses to “My Python Dream About Groups”

1. Thomas Guest Says:

I’ve never been brave enough to use this grouping construct without introducing a local function call, called something like `n_at_a_time()` rather than `g()`.
It is, though, Pythonic. The zip documentation says.

> The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

2. drj11 Says:

Yes, well, I wouldn’t call it g in serious code either.

Ah yes, the 2.6 documentation almost blesses this use. I’m kind of disappointed that it meets with such official approval actually. I hadn’t spotted it in the 2.6 doc because I generally only read the documentation for the earliest version I intend to deploy for, to try and avoid accidental dependencies on newer features.

Now that it’s in the 2.6 docs, one could just justify the odd one-off use of this idiom with:
# See http://www.python.org/doc/2.6/library/functions.html#zip
Referencing URLs that expand on useful but obscure and only occasionally used bits of code is something I do all the time.

3. Gareth Rees Says:

Decoding hex dumps

I know it’s not really the point of this post, but I use the hex codec for this:

>>> map(ord, codecs.decode(’1c47ff47′, ‘hex’))
[28, 71, 255, 71]

4. drj11 Says:

@Gareth: since I was actually decoding hex at the time I remembered this dream, that’s actually pretty useful. Thanks.

5. Francis Davey Says:

You are right that it is clear what it does – throws an error if l is an integer:

>>> def g(l, n):
… return zip(*[iter(l)]*n)

>>> g(10,3)
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in g
TypeError: ‘int’ object is not iterable

Maybe:

bash-3.2\$ python –version
Python 2.5.1

has something to do with it. I’ve not used iter() in that fashion.

6. Francis Davey Says:

Ah, sorry, I am still in morning daze – I missed out the range() function in your example.

Moral: never ask me to execute code 8-).

7. Francis Davey Says:

I’ve had a think and your construct is about the most natural that I can think of. I had a go with list comprehensions, but you really need some kind of power operator which is what you are doing with your *[iter(l)] construction. I’d be interested if anyone else comes up with anything similar.

8. TNO Says:

Interesting construct. I attempted something similar in JavaScript 1.8 and came up with this:

function g(l,n)
(function(k) [[l[k++] for(j in n)] for(i in l.length/n)])(0);

oh well…

9. rhettinger Says:

The best way to deal with an odd-feeling code idiom is to factor it out into a single, self-documenting function.

In this case, add the grouper() recipe from http://www.python.org/doc/2.6/library/itertools.html#recipes to your utils module:

```def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue,
*args)
```

FWIW, I originally resisted guaranteeing the left-to-right evaluation order on the principle that zipping is a functional construct and that a principle of functional programming is referential transparency.

However, others convinced my that generator inputs are already stateful by their very nature, that no other order makes sense, and that there is a practical use case for guaranteed order.

Also, I realized that if I didn’t guarantee the order, people who needed a grouper() function would roll their own and end-up with code substantially identical to what we already had with zip/izip. That code duplication would be an unnecessary waste.

• drj11 Says:

I do. More or less. When I don’t use the function (and it depends on how distracting it feels), I reference that recipe using that URL in a comment.

and thanks!