Wherein I feel compelled to write some more on Python code that I find more amusing than clear.
The more I use zip the more I love it. I’m thinking about writing a tutorial on how to (ab-) use zip, but for now just this recent discovery.
Say you have two iterators that each yield a stream of objects, iland and iocean (they could be gridded temperature values, say), and you want to get the first 100 values from each iterator to do some processing, whilst not consuming any more than 100 values. You can’t go list(iland)[:100]
because that will consume the entire iland iterator and you’ll never be able to get those values past the 100th again.
You can use itertools (probably my second favourite Python module):
land100 = list(itertools.islice(iland, 100)) ocean100 = list(itertools.islice(iocean, 100))
It seems a shame to mention islice and 100 twice. One could use map with a quick pack and unpack, but this is not clear:
land100,ocean100 = map(lambda i: list(itertools.islice(i, 100)), (iland,iocean))
(a simple form of this, which I do sometimes use, is x,y = map(int, (x,y))
)
What about giving some love to zip? It turns out that zip will stop consuming as soon as any argument is exhausted. So
zip(range(100), iland, iocean)
returns a list of 100 triples, each triple having an index (the integer from 0 to 99 from the range()
list), a value from the iland iterator, and a value from the iocean iterator. And as soon as the list produced by range(100)
is exhausted it stops consuming from iland and iocean, so their subsequent values can be consumed by other parts of the program.
And yes, this seems to work by relying on a rather implementation specific feature of zip that I’m not sure should be set in stone.
That zip form above is all very good if one wants to go for n,land,ocean in ...
, but what if we want the 100 land values and 100 ocean values each in their own list (like the code at the beginning of the article)? We can use zip again!
_,land100,ocean100 = zip(*zip(range(100), iland, iocean))
zip(*thing)
turns a list of triples into a triple of lists, which is then destructured into the 3 variables _ (a classic dummy), land100, and ocean100.
Don’t worry, the actual code use the islice form from the first box because I think it’s the clearest.