Archive for June, 2009

Convert Plan 9 images to PNG

2009-06-16

Should anyone be in the slightly unusual position of wanting to convert a Plan 9 image to PNG format, they can now use the plan9topng utility in PyPNG:

curl http://plan9.bell-labs.com/sources/plan9/sys/games/lib/sokoban/images/right.bit |
./plan9topng.py > glenda.png

Glenda says “hi!”: glenda

Python: Some Naughty Features

2009-06-03

Things in Python that I mostly like, but make me feel a bit naughty when I use them. I air my smelly code as a sort of cleansing ritual and perhaps to make you feel a little bit better about your own dirty habits.

zip(*[iter(s)]*n)

I now tend to call this zip/iter thing group. It came to me in a dream.

itertools.chain(*s)

The “inverse” of the above. I only “discovered” this recently. A good couple of months after the above discovery.

Bound methods of simple values. Example: In order to convert a sequence of image samples from perceptual space to linear space we need to raise each one to the power 2.2:

map((2.2).__rpow__, mylist)

«(2.2).__rpow__» is equivalent to the following function:

def f(x): return x**2.2

(as a lambda: «lambda x: x**2.2»)

__rpow__ is the swopped version of __pow__. Using «(2.2).__pow__» would be equivalent to «2.2**x», which isn’t what I want at all.

Literal string concatenation. To be honest, I’ve no idea why this feels a bit naughty to me, but it does. C has it, so why not in Python? Perhaps it’s because I often end up using it in cases where there are other problems with the code (this example from PyPNG):

    raise FormatError("Illegal combination of bit depth (%d)"
      " and colour type (%d)."
      " See http://www.w3.org/TR/2003/REC-PNG-20031110/#table111 ."
      % (self.bitdepth, self.color_type))

Sometimes I get a bit OCD about this and start removing extraneous string concatenation «+» operators from other people’s code.

The split thing that I mentioned in an earlier article. Here’s an example from Clear Climate Code:

noaa = """
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/station.inventory.Z
""".split()

It just seems so much nicer to be able to add a URL to the list without having to add any more syntax (and in this case, I think I originally created the list by pasting it in from a text file, so using split meant less reformatting).

Doing non-trivial things in a module’s top-level. Another example from PyPNG:

_signature = struct.pack('8B', 137, 80, 78, 71, 13, 10, 26, 10)

This is just a complicated way of writing an 8-byte literal string. It’s written this way so that the crucial 8 bytes are visible as decimal values, and can therefore be checked more easily against the relevant part of the PNG standard which also uses a decimal notation.

Passing a function with side-effects into map. pipcolour (an undocumented utility in PyPNG) counts the number of colours used in an image. In the code below, pixels is an iterator that yields each row of a PNG image; the pixels from each row are added to a pre-existing set of colours. Like this:

    col = set()
    for row in pixels:
        # Ewgh, side effects on col
        map(col.add, png.group(row, planes))

(in this code png.group is the grouping function which is the first item in this article; for an RGB image, planes is 3)

Now, it strikes me that I could’ve gone:

set(group(itertools.chain(*pixels), planes))

but that has radically different memory usage. The first form only loads one row at a time into its working set; the second form loads the entire decompressed PNG image into the working set (it’s all the fault of «*pixels»). If I was daring enough to require Python 2.6 then I could’ve used itertools.chain.from_iterable, but that feels clumsy. It one of those cases where I feel that Python is unnecessarily forcing me to choose between clarity and efficiency.

I think the above also illustrates a weakness with the sort of delayed evaluation programming that you get when using iterators: Apparently simple transformations to the code can result in huge changes to the space/time complexity. This is not a weakness that is unique to Python, any style of delayed evaluation has this risk (Haskell’s lazy evaluation, for example). When I was learning Prolog my chief criticism of it (and I think it’s still valid) was that it seemed to be fairly easy to be sure that your program would compute the correct answer, but very difficult to be sure that the answer would appear before the heat death of the universe.

Violating PEP 8 so I can write one-line functions:

    def test_gradient_horizontal_lr(x, y): return x
    def test_gradient_horizontal_rl(x, y): return 1-x
    def test_gradient_vertical_tb(x, y): return y
    def test_gradient_vertical_bt(x, y): return 1-y

Did you spot I did one of those earlier? How naughty. Of course PEP 8 kind of approves of this, because “a foolish consistency is the hobgoblin of little minds” which you probably know from PEP 8 but is of course from Ralph Waldo Emerson. In a curious coincidence I just happen to have a copy of his “Self-Reliance” on my desk; and I checked, he really did say that. In fact, the fuller quote is much more poetic than I was expecting: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.”.

Indexing with a boolean expression. Yet another example comes from PyPNG and illustrates how well written it is. Say you want to create an array of pixel values or you want to pack them into a string (using struct). If the number of bits per value (the bitdepth) is <= 8 then you want to use «array('B')», otherwise you want «array('H')». Select the format like this:

fmt = 'BH'[self.bitdepth > 8]

Iverson’s Convention for the win! (and it’s better than PEP 308 because it works on Python 2.4)

Now, this is naughty: Creating a naughty bit of Python code merely for purposes of this blog article. Well, that’s not quite true, I got distracted by some Python code and that code and a stupid idea I had a few days ago clicked together. The code is witty, but should never be used. PyPNG allows the caller to specify the zlib compression level as an optional argument when creating a png.Writer object. This is the argument to the zlib.compressobj constructor. The default (for PyPNG) is to use zlib’s default. Now it’s clearly not the business of PyPNG to know what zlib’s default is, so I don’t want that appearing in the code. So the code looks like this:

if self.compression is not None:
    compressor = zlib.compressobj(self.compression)
else:
    compressor = zlib.compressobj()

Now, what I really want is a way to say «pass in x as an argument, but only if it’s non None». If I had some expression, E, that evaluated to a list that was either the singleton, «[x]», or the empty list «[]» then I could get rid of the if and go:

compressor = zlib.compressobj(*E)

«*E» would expand to either the empty argument list, or an argument list with just one argument.

I discovered an E: [x for x in [self.compression] if x is not None]. So now the code is:

compressor = zlib.compressobj(*[x for x in [self.compression] if x is not None])

Well, the code isn’t really that, I’m not that much of a monster to check that into source control.

Multiplication, Addition, Counting, and Python

2009-06-01

Jason Dyer’s teasing post got me thinking. About how Python could be used to give some insight into the meta-cognitive aspects of whole number multiplication. Natch.

When children solve a multiplication problem by correspondence, the objects in the multiplier set are mapped over for each object in the multiplicand set (hmm, or is it the other way around?). A typical procedure for multiplying 4 cakes by a price of £2 per cake might be to point to a cake with the left hand and then count up the 2 pounds using the right hand, then move the left hand to the next cake and repeat the count with the right hand, with the oral count continuing up; this is repeated until the left hand has exhausted all cakes.

We can model this in Python with the following program:

def mul(multiplicand, multiplier):
    count = 0
    for i in range(multiplicand):
        for j in range(multiplier):
            count += 1
    return count

Whereas children using repeated addition do something more like this:

def mul(multiplicand, multiplier):
    count = 0
    for i in range(multiplier):
        count = add(count, multiplicand)
    return count

In this case, add is a subroutine. You could define one easily enough in Python or else you could go «from operator import add».

Clearly the second program is more efficient than the first, both as a Python program and as a manual procedure performed by children; it’s more the latter that I’m interested in, I’m using Python to describe the procedures. However, the second procedure requires that add is already adopted as an internal procedure. Of particular note is that, apart from count, the second procedure uses only one state variable, i; the first procedure uses two.

At the stage where multiplication is introduced, many children will not yet be performing addition accurately without counting. Effectively add is not yet available to them as an internal procedure. This is likely to be a problem because this type of learner is unlikely to be able to accurately add the multiplicand to the count without getting confused about where in the multiplication procedure they were.

As an example of what might go wrong, imagine that a learner starts by using the left hand to maintain the i variable in the second procedure (above); this hand will count from 1 to multiplier (hmm, there’s a small sleight of hand going on here, the Python counts from 0 to n-1 whereas most learners will prefer count from 1 to n). The count will be maintained orally (in other words by speaking out the successive multiples of the multiplicand). Begin by raising one finger of the left hand and uttering the multiplicand (the initial count). Now we need to add the multiplicand to the oral count. Maybe the learner can do that without using their fingers, maybe they can’t; in any case depending on the parameters chosen, at some point some learners may need to use their hands to perform the addition. So then the procedure for addition kicks in as the learner adds the multiplicand to the current count. Many learners will have personal procedure for addition that requires both hands. The addition may be performed accurately, but the i state variable will be lost. We lose track of where we were in the multiply procedure.

If the learner can accurately add the multiplicand to the count mentally, then they stand a much better chance of performing the second procedure. This is what I mean by having add available as an internal procedure.

The first procedure can be thought of as a way of simultaneously arranging to perform the multiplication and the additions required by the multiplication without having any state variables overlap. Thereby minimising the chance that confusion will result. Most learners will be capable of keeping track of the required state variables to perform the multiplication, but if left to their own devices may choose methods where the state variables overlap (in other words, they run out of hands). Thus, they can benefit by being guided towards a procedure which they can manage.

Another way to think about this is that at the sort of age where children begin to learn to multiply, their procedure for addition is leaky. It is not fully abstracted; performing addition may require the use of hands (for state variables), making the addition leak over any other use of the same hands to maintain state.

It seems to me that only when a learner can perform a typical count + multiplicand addition accurately in their head are they ready to perform multiplication as repeated addition.

Oh yeah, the original research on the whole “multiplication is/ain’t repeated addition” debate. It sucks. They test children at times t0 and t1 with a randomly chosen math related intervention in between. It seems to me that a more carefully designed study should have also included a non math-related intervention such as giving the subjects fish-oil pills or teaching them to conga. After all, if I was being tested on one day and then told that I was going to sit a similar test tomorrow, I would bone up on the material before the second test, regardless of what I was being taught. Wouldn’t you? They make no attempt to account for this effect.

Appendix for Python hackers

The first definition of mul that I give is of course completely worthless as a practical implementation in Python. However, note the following: «10*10L» is 100L but «mul(10,10L)» is 100; in other words mul returns an int if it possibly can.

Follow

Get every new post delivered to your Inbox.