Archive for February, 2009

Python: integer, int, float

2009-02-27

Python is a duck typed language, right? That means you don’t care what value something is, you only care what it quacks like. On the whole Python does pretty well at this. Functions that take lists can usually also take any sort of sequence or iterable; functions that take files can usually also take anything that looks sufficiently file-like (has a read method).

Where it disappoints me is the number of times a builtin requires an int instead of an integer. First a little bit of terminology. When I say “int” I mean the language type int; when I say “integer” I mean any value that has a mathematical value that is an integer. So 7 is an int (in Python); it’s also an integer. 7.0 is an integer, but it’s not an int.

So, I have a suggestion: anything in Python that currently requires an int should accept an integer.

So what sort of things am I talking about?

range. I should be able to go «range(1e6)» to get very long lists:

>>> len(range(1e6))
__main__:1: DeprecationWarning: integer argument expected, got float
1000000

I find this just annoying. 1e6 is an integer. Yes it happens to be stored in a float, but you, Python, ought not to care about that. Hmm, or perhaps you would rather I wrote «1e6» as «1*10**6»?

Another way of constructing very long lists is thwarted:

>>> [0]*1e6
Traceback (most recent call last):
  File "", line 1, in 
TypeError: can't multiply sequence by non-int of type 'float'

At least here the error message correctly refers to int instead of integer. But I would say that since 1e6 is an integer then the code has a perfectly clear and unambiguous meaning. So it should work.

Linguistically, what do I mean by an integer? Well, here’s a neat definition:

def isinteger(x): return int(x) == x

In Python we can define an integer to be any value that converts to int and is equal. Observe that this works for int, long, float, decimal, and hopefully any other vaguely numerical type that might be defined in the future:

>>> map(isinteger, [7, 7L, 7.0, 7.1, decimal.Decimal(7)])
[True, True, True, False, True]

If you were actually going to use this in real code, you need to catch exceptions in isinteger:

def isinteger(x):
  try:
    return int(x) == x
  except:
    return False

otherwise isinteger would barf on things like «list()».

It’s not just large integers, like 1e6, which happen to be more conveniently written in floating point format, that cause a problem. It can happen, quite reasonably, with smaller numbers. Especially when they are the results of computations.

Let’s say I’m creating a PNG file with bit depth of 2, and I have a sequence of pixel values for a single row. Each pixel is a value «in range(4)»; at some point I’m going to have to pack the pixels into bytes, 4 pixels per byte. It can be more convenient to do this if I round my row length up to a multiple of 4; that way my packer only has to deal with rows that fit into an exact number of bytes. So let’s say the row-length is l and I want it rounded up:

u = math.ceil(l/4.0)*4.0

(in real code, the “4.0” would probably be a parameter, and I’d probably have to use «float(pixels_per_byte)» to get the right sort of division (or use Python 3.0))

The amount of padding I need to add is therefore «u – l»:

padding = [0]*(u-l)

Alas, this barfs, because u is a float. So I end up having to use what feels to me like a gratuitious int:

padding = [0]*int(u-l)

Note that u came out of «math.ceil» so it’s already an integer; so «u-l» is an integer too.

If you were to take this “accept integers, not just ints” philosophy on board, you might find the following function useful:

def preferint(x):
  if isinteger(x):
    return int(x)
  return x

preferint will change any integer to int, leaving other values unchanged. You can stick «foo = preferint(foo)» at the front of all your functions. Or use those outré decorators.

We can write a prefer that works for any type:

def prefer(value, type):
  try:
    if type(value) == value:
      return type(value)
    except:
      return value

I suppose there are probably other parts of Python that this applies to, and I could go looking for them, but the two I’ve mentioned are the ones that bug me on a regular basis. List and string indexing would be one, but I can’t remember it annoying me:

>>> 'foo'[2.0]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: string indices must be integers
>>> range(3)[2.0]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: list indices must be integers

But yes, if I was campaigning for change then that should be changed too. Scanning through the builtins: chr already works (but «chr(7.1)» also works which is a bug), hex needs changing.

PS: (rant for another article) isinteger does not work for «complex(7)». Should it?

PNG filtering for small bit depths

2009-02-24

Willem van Schaik’s PngSuite is a most welcome set of PNG test images.

Mysteriously, it has no test images that have a non-zero scanline filter and a bit depth of < 8.

I give you f99n0g04.png:

f99n0g04

It uses a different scanline filter on successive lines, cycling round them all.

This is a useful test-case to have because the scanline filtering works at the byte level. Normally, when filtering, a byte from one pixel is compared to a byte from the previous pixel, but when the bit depth is < 8 there are several pixels per byte. So instead, the previous byte is used (which will comprise several of the earlier pixels, not necessarily even the immediately preceding one). So you can imagine getting some of the previous-byte/previous-pixel logic wrong. Basically it’s a special case, and the PngSuite doesn’t cover it.

Of course, the only PNG decoder I’ve found that barfs on this image, is PyPNG (a pure Python PNG decoder). And it only broke whilst I was in the middle of implementing small bit depth decoding.

[2009-02-24: Willem added this image to PngSuite]

I crush OptiPNG!

2009-02-20

(A further article in a mini-series about small PNGs. GDR’s earlier article; my earlier article. Neither is required reading, but you might need to get comfy with the PNG spec)

Consider the following PNG file:
29netpbm

It’s 29 (by 1) pixels. It’s the output of «echo P2 29 1 15 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 | pnmtopng». It’s 81 bytes long.

Here it is scaled to be a bit bigger:

29scaled

OptiPNG is a great tool. It messes with all those PNG compression options (zlib compression options, scanline filtering, and so on) to search for better compression. Given the above 81 byte image it produces the following 69 byte image:
29optipng

(this is of course an identical looking image, I’m just showing it so you can see the evidence)

I produce the following 68 byte image:
29drj

So how did I do one byte better than OptiPNG? Well I cheated; what did you expect?

I discovered a redundancy in the image format that I suspected that OptiPNG would not search over; then I created an image where hand selecting the redundant parts leads to better compression.

When using a bit depth less than 8, a scanline can have unused bits in its final (right-most) byte (see PNG spec section 7.2). A scanline is filtered before the data is passed to zlib. The unused bits are arbitrary, but they still influence the filtering because the filtering happens over the bytes. Therefore careful choice of what value to use for the unused bits can lead to a filtered scanline that compresses better.

When the 4-bit pixels of the image have been packed into bytes, the scanline of the image looks like this (in hex):

0102030405060708090a0b0c0d0e00

Uncompressing (with Python) the IDAT chunk from the file produced by OptiPNG shows that it correctly chooses filter type 1 to produce the filtered line:

>>> print zlib.decompress('081d636444019f0001890102'.decode('hex')).encode('hex')
010101010101010101010101010101f2

(the first 01 specifies the filter type; this sequence is one byte longer than the scanline)

Return to the scanline data, that final ‘0’ nybble is the redundant part. I can pick any value for that nybble that I want; that part of the byte isn’t used for pixel data because the width is odd (and we’re using 4-bits per pixel).

So I choose ‘f’ as the final nybble value, which gives me the following filtered bytes (I show you the bytes decompressed from the IDAT chunk of the 68 byte file):

>>> print zlib.decompress('789c636444050000980011'.decode('hex')).encode('hex')
01010101010101010101010101010101

It’s clear to see why that filtered line will compress better with zlib than the one OptiPNG used. And indeed, zlib compresses that to 11 bytes, as opposed to 12 bytes for OptiPNG’s filtered line.

So should OptiPNG and similar tools try different values for the redundant bits in a scanline? It is not a technique mentioned in “A guide to PNG optimization”. Rightly so, people have better things to do with their lives.

Caveat

I got my OptiPNG precompiled as a Mac OS X binary from here. It’s ancient. The binary itself a PowerPC binary, and it’s version 0.4.9 of OptiPNG. That was simpler than compiling it myself, and I don’t care very much. If you use OptiPNG or something similar, perhaps you can see what it does with the 81 byte version of the PNG file.

I have a sneaking suspicion that GDR will come along and reduce my PNG by 1 byte.

2 Player Settlers of Catan

2009-02-03

Recently we have revived a habit of playing lots of 2-player Settlers of Catan. The official rules don’t have a 2-player version, but it turns out that just using the ordinary rules as is gives a tolerably good game.

The most significant problems with the game played like this are: essentially no trading between players; possible for one player to be plagued by the robber for long periods of time. I do not propose a solution to the trading problem.

There are lots of 2-player rule variants available on the internet which try and help the robber problem and other problems. Of the ones we’ve tried (and invented), none seemed simple enough.

Then we had the following idea, which we have tried a few times now and seems to work well:

The robber can only rob for 4 rolls of the dice. In other words if the robber is on a hex (other than the desert) and hasn’t moved for 4 rolls, then after the 4th roll it is returned to the desert hex.

The other value of 4 that we have tried is 3, but I think 4 seems to work slightly better.

We use a coin to maintain the count (you’d think you’d be able to count to 4, but it’s surprisingly tricky). The coin is given, heads-up, to the player who has roll number 2 (that is, the player whose roll is not next). After you roll if you have the coin then you either flip it over, if it was showing heads to begin with; or, return it and the robber to the desert, if it was showing tails.

All the other problems with the 2 player game seem minor in comparison and can be compensated for by playing more games in a row. The 2 player game is quite short.

[Much later: 2009-11-22: Making it not quite so simple, we have added a further rule (to the 2-player game): Longest Road and Largest Army are worth 1 point each; they're roughly twice as easy to get and keep compared to the 4-player game. The "4 turns" rule for the robber continues to work quite well.]

Follow

Get every new post delivered to your Inbox.