Python: Some Naughty Features

2009-06-03

Things in Python that I mostly like, but make me feel a bit naughty when I use them. I air my smelly code as a sort of cleansing ritual and perhaps to make you feel a little bit better about your own dirty habits.

zip(*[iter(s)]*n)

I now tend to call this zip/iter thing group. It came to me in a dream.

itertools.chain(*s)

The “inverse” of the above. I only “discovered” this recently. A good couple of months after the above discovery.

Bound methods of simple values. Example: In order to convert a sequence of image samples from perceptual space to linear space we need to raise each one to the power 2.2:

map((2.2).__rpow__, mylist)

«(2.2).__rpow__» is equivalent to the following function:

def f(x): return x**2.2

(as a lambda: «lambda x: x**2.2»)

__rpow__ is the swopped version of __pow__. Using «(2.2).__pow__» would be equivalent to «2.2**x», which isn’t what I want at all.

Literal string concatenation. To be honest, I’ve no idea why this feels a bit naughty to me, but it does. C has it, so why not in Python? Perhaps it’s because I often end up using it in cases where there are other problems with the code (this example from PyPNG):

    raise FormatError("Illegal combination of bit depth (%d)"
      " and colour type (%d)."
      " See http://www.w3.org/TR/2003/REC-PNG-20031110/#table111 ."
      % (self.bitdepth, self.color_type))

Sometimes I get a bit OCD about this and start removing extraneous string concatenation «+» operators from other people’s code.

The split thing that I mentioned in an earlier article. Here’s an example from Clear Climate Code:

noaa = """
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/station.inventory.Z
""".split()

It just seems so much nicer to be able to add a URL to the list without having to add any more syntax (and in this case, I think I originally created the list by pasting it in from a text file, so using split meant less reformatting).

Doing non-trivial things in a module’s top-level. Another example from PyPNG:

_signature = struct.pack('8B', 137, 80, 78, 71, 13, 10, 26, 10)

This is just a complicated way of writing an 8-byte literal string. It’s written this way so that the crucial 8 bytes are visible as decimal values, and can therefore be checked more easily against the relevant part of the PNG standard which also uses a decimal notation.

Passing a function with side-effects into map. pipcolour (an undocumented utility in PyPNG) counts the number of colours used in an image. In the code below, pixels is an iterator that yields each row of a PNG image; the pixels from each row are added to a pre-existing set of colours. Like this:

    col = set()
    for row in pixels:
        # Ewgh, side effects on col
        map(col.add, png.group(row, planes))

(in this code png.group is the grouping function which is the first item in this article; for an RGB image, planes is 3)

Now, it strikes me that I could’ve gone:

set(group(itertools.chain(*pixels), planes))

but that has radically different memory usage. The first form only loads one row at a time into its working set; the second form loads the entire decompressed PNG image into the working set (it’s all the fault of «*pixels»). If I was daring enough to require Python 2.6 then I could’ve used itertools.chain.from_iterable, but that feels clumsy. It one of those cases where I feel that Python is unnecessarily forcing me to choose between clarity and efficiency.

I think the above also illustrates a weakness with the sort of delayed evaluation programming that you get when using iterators: Apparently simple transformations to the code can result in huge changes to the space/time complexity. This is not a weakness that is unique to Python, any style of delayed evaluation has this risk (Haskell’s lazy evaluation, for example). When I was learning Prolog my chief criticism of it (and I think it’s still valid) was that it seemed to be fairly easy to be sure that your program would compute the correct answer, but very difficult to be sure that the answer would appear before the heat death of the universe.

Violating PEP 8 so I can write one-line functions:

    def test_gradient_horizontal_lr(x, y): return x
    def test_gradient_horizontal_rl(x, y): return 1-x
    def test_gradient_vertical_tb(x, y): return y
    def test_gradient_vertical_bt(x, y): return 1-y

Did you spot I did one of those earlier? How naughty. Of course PEP 8 kind of approves of this, because “a foolish consistency is the hobgoblin of little minds” which you probably know from PEP 8 but is of course from Ralph Waldo Emerson. In a curious coincidence I just happen to have a copy of his “Self-Reliance” on my desk; and I checked, he really did say that. In fact, the fuller quote is much more poetic than I was expecting: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.”.

Indexing with a boolean expression. Yet another example comes from PyPNG and illustrates how well written it is. Say you want to create an array of pixel values or you want to pack them into a string (using struct). If the number of bits per value (the bitdepth) is <= 8 then you want to use «array('B')», otherwise you want «array('H')». Select the format like this:

fmt = 'BH'[self.bitdepth > 8]

Iverson’s Convention for the win! (and it’s better than PEP 308 because it works on Python 2.4)

Now, this is naughty: Creating a naughty bit of Python code merely for purposes of this blog article. Well, that’s not quite true, I got distracted by some Python code and that code and a stupid idea I had a few days ago clicked together. The code is witty, but should never be used. PyPNG allows the caller to specify the zlib compression level as an optional argument when creating a png.Writer object. This is the argument to the zlib.compressobj constructor. The default (for PyPNG) is to use zlib’s default. Now it’s clearly not the business of PyPNG to know what zlib’s default is, so I don’t want that appearing in the code. So the code looks like this:

if self.compression is not None:
    compressor = zlib.compressobj(self.compression)
else:
    compressor = zlib.compressobj()

Now, what I really want is a way to say «pass in x as an argument, but only if it’s non None». If I had some expression, E, that evaluated to a list that was either the singleton, «[x]», or the empty list «[]» then I could get rid of the if and go:

compressor = zlib.compressobj(*E)

«*E» would expand to either the empty argument list, or an argument list with just one argument.

I discovered an E: [x for x in [self.compression] if x is not None]. So now the code is:

compressor = zlib.compressobj(*[x for x in [self.compression] if x is not None])

Well, the code isn’t really that, I’m not that much of a monster to check that into source control.

27 Responses to “Python: Some Naughty Features”

  1. Brian Says:

    An even dirtier solution to your “x or nothing” is to use both boolean integer conversion and list multiplication together: “zlib.compressobj(*[x]*(x is not None))”.

  2. Thomas Guest Says:

    In Python 3.0 I have been known to write:

    print(*words)
    

    when I should of course write:

    print(' '.join(words))
    

    Full confession

  3. jmmcd Says:

    Nice article :)

    The “group” example looks a lot like “grouper” in http://www.python.org/doc/current/library/itertools.html

    “Example: In order to convert a sequence of image samples from perceptual space to linear space we need to raise each one to the power 2.2”

    What does this mean exactly? Do you mean raise the greyscale value of each pixel to that power, or something?

    • drj11 Says:

      Yeah it is grouper, basically. Those itertools examples are silly, why not just put them in the module so we can use them without having to type them in?

      And yes, we raise each greyscale value (as a number between 0.0 and 1.0) to a small-ish power. Sorry, tend to get my head into something and then assume everyone else already knows what I’m talking about.

      • jmmcd Says:

        “we raise each greyscale value (as a number between 0.0 and 1.0) to a small-ish power.”

        Could you give a reference in connection with the perceptual mapping if you have one please? It sounds right up my street.

        I’m doing some human-computer interaction experiments and using greyscale interpolations (with different mappings, eg exponential and noisy-linear) to test the usability of interpolating controllers, like sliders.

    • drj11 Says:

      For this perceptual stuff, Wikipedia’s Gamma Correction article isn’t bad, and Charles Poynton’s Gamma FAQ is always good for a laugh.

      See also “CIE 1931 color space”, and so on.

  4. Nick Barnes Says:

    The __rpow__ thing is hideous and obscure; the lambda equivalent is much better.

    chain.from_iterable is only clumsy because of its horrible name.

    Surprised you didn’t know the Emerson; it’s been a favourite quote of mine for many years, alongside this from Whitman:
    “Do I contradict myself?
    Very well then I contradict myself,
    (I am large, I contain multitudes.)”

  5. drj11 Says:

    I did know the Emerson (and I knew it was from Emerson), but not the fuller quote. And after “all that is needed for evil to triumph…” you can’t blame me for checking a good source, right?

    • Nick Barnes Says:

      I mean, “surprised you thought we might know the Emerson from some Python document”.

      • drj11 Says:

        @NickB: Oh right. Well, I suppose it’s my little joke. But I bet there are people who know the quote from PEP 8 but didn’t know it was Emerson.

  6. Anon Says:

    I am not a fan of side-effects in map. That’s what for-loops are for. It is helpful to us readers if you avoid side-effects in lines of code that follow a functional style. If you need a side-effect, your penalty is extra typing.

    You’re right about the dangers of lazy evaluation. That’s why I prefer languages like Python, D, and Ocaml, where lazy is available but not the default.

  7. 300baud Says:

    If mapping through bound methods and chain(*(…for…)) are naughty, gosh, I’m some sort of super freaky Python pervert.

  8. brad dunbar Says:

    yes…these things are ‘naughty’, but they’re also some of my *favorite* things. I especially enjoy using bound methods of single values. very fun…..

  9. mathew Says:

    I don’t think I’ve written anything comparable since I stopped using Perl…

  10. Mark Says:

    Little by little, you’re getting closer to Perl 6.

    You know you want to.

    Feels *good*, doesn’t it?

  11. lorg Says:

    1. What you call groups, I call blocks. Maybe I should rename it to blockify. I think group is a problematic name, easily confused with itertools.groupby.

    2. Regarding (2.2).__rpow__, I’d suggest taking a look at erezsh’s X, at http://pysnippets.wikidot.com/snippet:x

    • drj11 Says:

      Yeah, I like “blocks”. Reminds me of my MVS days. I agree about the confusion with “groupby” (which is a good name, by association with SQL).

      Looked at X very briefly. Amused.

  12. mjb67 Says:

    Yay I use the .split() thing a lot in scons files, so I can do:

    #ls -1 *.c

    then copy and paste the output from the terminal into the scons file, surround it with triple quotes and .split() and I’ve got an easily maintainable list of all my source files.

    Using a proper language to write build systems rocks.

    • drj11 Says:

      It does indeed rock (or at least, I think it would, if I did it).

      Of course you could always use «glob.glob(‘*.c’)» from within Python.


Leave a reply to drj11 Cancel reply