Python: slicing with zip

2009-11-26

Wherein I feel compelled to write some more on Python code that I find more amusing than clear.

The more I use zip the more I love it. I’m thinking about writing a tutorial on how to (ab-) use zip, but for now just this recent discovery.

Say you have two iterators that each yield a stream of objects, iland and iocean (they could be gridded temperature values, say), and you want to get the first 100 values from each iterator to do some processing, whilst not consuming any more than 100 values. You can’t go list(iland)[:100] because that will consume the entire iland iterator and you’ll never be able to get those values past the 100th again.

You can use itertools (probably my second favourite Python module):

land100 = list(itertools.islice(iland, 100))
ocean100 = list(itertools.islice(iocean, 100))

It seems a shame to mention islice and 100 twice. One could use map with a quick pack and unpack, but this is not clear:

land100,ocean100 = map(lambda i: list(itertools.islice(i, 100)), (iland,iocean))

(a simple form of this, which I do sometimes use, is x,y = map(int, (x,y)))

What about giving some love to zip? It turns out that zip will stop consuming as soon as any argument is exhausted. So

zip(range(100), iland, iocean)

returns a list of 100 triples, each triple having an index (the integer from 0 to 99 from the range() list), a value from the iland iterator, and a value from the iocean iterator. And as soon as the list produced by range(100) is exhausted it stops consuming from iland and iocean, so their subsequent values can be consumed by other parts of the program.

And yes, this seems to work by relying on a rather implementation specific feature of zip that I’m not sure should be set in stone.

That zip form above is all very good if one wants to go for n,land,ocean in ..., but what if we want the 100 land values and 100 ocean values each in their own list (like the code at the beginning of the article)? We can use zip again!

_,land100,ocean100 = zip(*zip(range(100), iland, iocean))

zip(*thing) turns a list of triples into a triple of lists, which is then destructured into the 3 variables _ (a classic dummy), land100, and ocean100.

Don’t worry, the actual code use the islice form from the first box because I think it’s the clearest.

About these ads

6 Responses to “Python: slicing with zip”

  1. Thomas Guest Says:

    Yet another variant:

    from itertools import islice, izip

    land100, ocean100 = zip(*islice(izip(iland, iocean), 100))

    • drj11 Says:

      I did actually think about, but never wrote it down because I didn’t expect it to be that tidy.

      I still not sure about “from blah import thing”, but a module like itertools give the best argument in favour.

      • Thomas Guest Says:

        Re: “from blah import thing”

        I know what you mean, and I don’t use this form much in production code. But when fitting a comment into a narrow column, it’s a way to avoid line wrapping.

        land100, ocean100 = zip(*itertools.islice(itertools.izip(iland, iocean), 100))

  2. Stephan Says:

    I don’t expect that this behaviour of zip will change: itertools explicitly provides zip_longest (http://docs.python.org/3.1/library/itertools.html#itertools.zip_longest). If they wanted to reverse this (let zip by default consume everything and have a zip_shortest in itertools) then Py3k would have been the right moment.

    • drj11 Says:

      I wasn’t clear enough in my article, sorry. I agree, zip will always stop when any argument is exhausted and itertools.zip_longest will do the other thing.

      The particular implementation feature I was referring to is that zip, in producing the next tuple, consumes each argument in order, and it never “reads ahead” in a iterator stream. Imagine an alternate implementation of zip(a,b,c) that read the next item from all 3 of A, B, and C, before deciding when to stop. That implementation might sometimes read an extra item from B and C before deciding it should stop because A is exhausted. But it would be a disaster for doing what I suggest in the article.

      We can see the effect of this by putting the range argument last:

      >>> A=iter('ABCDEFGHIJ')
      >>> B=iter('abcdefghij')
      >>> zip(A, B, range(2))
      [('A', 'a', 0), ('B', 'b', 1)]
      >>> zip(A, B, range(2))
      [('D', 'd', 0), ('E', 'e', 1)]
      

      There is some discussion of this in the Python 2.4 documentation for izip. This is removed from the 2.6 documentation, but I can’t tell whether that’s because they wish to bless this particular behaviour or not.

  3. Thomas Guest Says:

    I think the documentation is clear enough, certainly in the case of itertools.izip, where it gives a pure Python equivalent. Builtin zip notes: “the left-to-right evaluation order of the iterables is guaranteed”, which I think is a pretty strong hint.

    Is it good form to rely on this behaviour in a program? In this case, maybe, yes.

    I asked myself a similar question about itertools.takewhile last year. Is it OK to use takewhile(xs, pred) as a way of cueing the iterable, xs, so that xs is ready to yield the first element of xs following an element for which the predicate holds?

    Probably no, I decided.

    I often relate these iterator operations back to Unix commands. The closest I can find to “zip” is “paste”, which is actually itertools.izip_longest, and the question no longer arises.

    Oh, I just looked at the Python 3 zip documentation:

    zip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables

    Hmmm. But you do care about the trailing iland, iocean streams.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: