Things in Python that I mostly like, but make me feel a bit naughty when I use them. I air my smelly code as a sort of cleansing ritual and perhaps to make you feel a little bit better about your own dirty habits.
zip(*[iter(s)]*n)
I now tend to call this zip
/iter
thing group. It came to me in a dream.
itertools.chain(*s)
The “inverse” of the above. I only “discovered” this recently. A good couple of months after the above discovery.
Bound methods of simple values. Example: In order to convert a sequence of image samples from perceptual space to linear space we need to raise each one to the power 2.2:
map((2.2).__rpow__, mylist)
«(2.2).__rpow__
» is equivalent to the following function:
def f(x): return x**2.2
(as a lambda: «lambda x: x**2.2
»)
__rpow__
is the swopped version of __pow__
. Using «(2.2).__pow__
» would be equivalent to «2.2**x
», which isn’t what I want at all.
Literal string concatenation. To be honest, I’ve no idea why this feels a bit naughty to me, but it does. C has it, so why not in Python? Perhaps it’s because I often end up using it in cases where there are other problems with the code (this example from PyPNG):
raise FormatError("Illegal combination of bit depth (%d)"
" and colour type (%d)."
" See http://www.w3.org/TR/2003/REC-PNG-20031110/#table111 ."
% (self.bitdepth, self.color_type))
Sometimes I get a bit OCD about this and start removing extraneous string concatenation «+» operators from other people’s code.
The split thing that I mentioned in an earlier article. Here’s an example from Clear Climate Code:
noaa = """
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/station.inventory.Z
""".split()
It just seems so much nicer to be able to add a URL to the list without having to add any more syntax (and in this case, I think I originally created the list by pasting it in from a text file, so using split
meant less reformatting).
Doing non-trivial things in a module’s top-level. Another example from PyPNG:
_signature = struct.pack('8B', 137, 80, 78, 71, 13, 10, 26, 10)
This is just a complicated way of writing an 8-byte literal string. It’s written this way so that the crucial 8 bytes are visible as decimal values, and can therefore be checked more easily against the relevant part of the PNG standard which also uses a decimal notation.
Passing a function with side-effects into map. pipcolour (an undocumented utility in PyPNG) counts the number of colours used in an image. In the code below, pixels
is an iterator that yields each row of a PNG image; the pixels from each row are added to a pre-existing set of colours. Like this:
col = set()
for row in pixels:
# Ewgh, side effects on col
map(col.add, png.group(row, planes))
(in this code png.group
is the grouping function which is the first item in this article; for an RGB image, planes is 3)
Now, it strikes me that I could’ve gone:
set(group(itertools.chain(*pixels), planes))
but that has radically different memory usage. The first form only loads one row at a time into its working set; the second form loads the entire decompressed PNG image into the working set (it’s all the fault of «*pixels
»). If I was daring enough to require Python 2.6 then I could’ve used itertools.chain.from_iterable
, but that feels clumsy. It one of those cases where I feel that Python is unnecessarily forcing me to choose between clarity and efficiency.
I think the above also illustrates a weakness with the sort of delayed evaluation programming that you get when using iterators: Apparently simple transformations to the code can result in huge changes to the space/time complexity. This is not a weakness that is unique to Python, any style of delayed evaluation has this risk (Haskell’s lazy evaluation, for example). When I was learning Prolog my chief criticism of it (and I think it’s still valid) was that it seemed to be fairly easy to be sure that your program would compute the correct answer, but very difficult to be sure that the answer would appear before the heat death of the universe.
Violating PEP 8 so I can write one-line functions:
def test_gradient_horizontal_lr(x, y): return x
def test_gradient_horizontal_rl(x, y): return 1-x
def test_gradient_vertical_tb(x, y): return y
def test_gradient_vertical_bt(x, y): return 1-y
Did you spot I did one of those earlier? How naughty. Of course PEP 8 kind of approves of this, because “a foolish consistency is the hobgoblin of little minds” which you probably know from PEP 8 but is of course from Ralph Waldo Emerson. In a curious coincidence I just happen to have a copy of his “Self-Reliance” on my desk; and I checked, he really did say that. In fact, the fuller quote is much more poetic than I was expecting: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.”.
Indexing with a boolean expression. Yet another example comes from PyPNG and illustrates how well written it is. Say you want to create an array
of pixel values or you want to pack them into a string (using struct
). If the number of bits per value (the bitdepth) is <= 8 then you want to use «array('B')», otherwise you want «array('H')». Select the format like this:
fmt = 'BH'[self.bitdepth > 8]
Iverson’s Convention for the win! (and it’s better than PEP 308 because it works on Python 2.4)
Now, this is naughty: Creating a naughty bit of Python code merely for purposes of this blog article. Well, that’s not quite true, I got distracted by some Python code and that code and a stupid idea I had a few days ago clicked together. The code is witty, but should never be used. PyPNG allows the caller to specify the zlib compression level as an optional argument when creating a png.Writer
object. This is the argument to the zlib.compressobj
constructor. The default (for PyPNG) is to use zlib’s default. Now it’s clearly not the business of PyPNG to know what zlib’s default is, so I don’t want that appearing in the code. So the code looks like this:
if self.compression is not None:
compressor = zlib.compressobj(self.compression)
else:
compressor = zlib.compressobj()
Now, what I really want is a way to say «pass in x as an argument, but only if it’s non None
». If I had some expression, E, that evaluated to a list that was either the singleton, «[x]
», or the empty list «[]
» then I could get rid of the if
and go:
compressor = zlib.compressobj(*E)
«*E
» would expand to either the empty argument list, or an argument list with just one argument.
I discovered an E: [x for x in [self.compression] if x is not None]
. So now the code is:
compressor = zlib.compressobj(*[x for x in [self.compression] if x is not None])
Well, the code isn’t really that, I’m not that much of a monster to check that into source control.