Archive for the 'programming' Category

The 64-bit revolution

2012-02-03

Basically, wtf happened here? It’s 2012 and I’m chastised (a bit) on Twitter for installing 64-bit Ubuntu on my 64-bit laptop.

It’s 1993 and as a graduate student in the Computer Lab I get an account on the shared Unix server. A DEC Alpha, running OSF/1. The Alpha was a true 64-bit chip, no 32-bit heritage in its history, no 32-bit mode, and basically no 32-bit arithmetic instructions. Everything was 64-bit, no hostages. As a C programmer this is quite funny. int was still 32-bit (with the usual compiler flags). It was clear that the Alpha with its elegant, very RISC, instruction set was an example a new breed of processor architecture.

In 1994 I started my first job and in the next few years I use and program a variety of 64-bit architectures. The Alpha, UltraSPARC, 64-bit MIPS; I read up on various others (HP PA-RISC 2.0, POWER/PowerPC), because it was clear that the future was one where everyone had a 64-bit RISC workstation on their desk. Unix was clearly the only operating system with a chance of running on 64-bit architectures; sure, I saw, and used briefly, Windows NT running on Alpha, but it was a sorry effort. A 32-bit port of NT onto a 64-bit architecture. Microsoft/Intel were barely making it on 32-bit architectures, never mind 64-bit.

When Intel’s IA-64 architecture came out, I read the manuals. It was clear that: a) Intel hadn’t been paying attention; and, b) no-one would be able to write a decent C compiler for this architecture.

What happens next? I really don’t know. Sun got sucked into Java and took their eye off the ball, every other manufacturer canned their 64-bit line or sold to someone who didn’t care, Intel’s efforts to actually implement their IA-64 architecture (Itanium) were indeed a massive failure, and Intel had to piss away the R&D for 2 or 3 entire generations of CPU design. The amazing thing is that even though Intel basically failed to innovate for like a decade (between Pentium Pro and Core) everyone worshipped them and gave Intel enough time (aka money) to steal their new 64-bit strategy from their competitor AMD. They had the DEC designers, and the fabs, and they eventually produced Core, which was actually pretty good from a hardware perspective. But still based on the awful Intel software architecture.

Which is where we are now.

Does anyone have any idea why all the very sensible and nice 64-bit architectures (SPARC, Alpa, MIPS, PowerPC, not PA-RISC) all failed basically together?

My Favourite Bugs, Part 2

2011-05-12

Another one.

Two of us were working at the client’s site on the software for what was essentially a compact static robot that was responsible for moving small items from a hopper to a delivery point. There was approximately one instance of the hardware that we were writing software for, and it sat on a bench between our two workstations. Most of the mechanical bits were in a more or less finished state. I was particularly impressed with the piston that generated partial vacuum so that items could be picked with a moving arm with suction cups on. Just one or two gears were made of prototyping plastic; and because of a gearing problem the belt didn’t move at the speed that it said it should in the spec. But you know, typical prototype hardware. The electronics were a mixture of off-the-shelf dev kits for 8-bit embedded micros, mini custom circuit boards for novel sensors, and lovingly hand soldered discrete parts. Add to that the fact that as a software guy I didn’t really understand the importance of grounding, and La Machine wasn’t always completely reliable (my colleague had just lent me Tracy Kidder’s Soul of a New Machine).

Various optical/IR sensors kept track of the items as they moved inside the internals of the machine, various other sensors kept track of motor positions and/or speeds. There was a slightly hairy state machine (documented using OmniGraffle) to keep track of it all. The target pick rate was 5 items per second, and as it took more than 200ms for an item to go from the hopper to the point where it left the machine, there could be several items “in-flight” at any one time (and of course, picking an item was never completely reliable, so the sensors were used to track the items, and determine if a retry was required).

This day it seemed to be working fine, except for some reason the software was reporting that items were failing to be delivered, when in fact I could plainly see that items were popping out of the top of the machine. This was causing the machine to prematurely stop, as it would, sensibly, stop picking items if it thought that a picked item was stuck inside the machine somewhere. Up to this point it had basically been working fine; it had been working the same morning. I was sat thinking about this and investigating somewhat. I’d even checked the last thing I’d changed. So I called my colleague over (just at the next desk) and he came over to look while I demonstrated the problem. It worked. There was no problem. Flakies (there’s a memorable part of Kidder’s book where one of the engineers in helping a more junior engineer sort out a problem with a wire-wrap memory board, grabs the whole frame and shakes it, claiming that it’s probably just flakies; of course it works after that).

So that’s okay then. But when I try it again, it’s not working. Colleague comes over. Works. I work at it on my own. Doesn’t work.

It turned out that it was quite a sunny day. The sun reached the window in the afternoon. Sunlight was falling on machine near the optical sensor and presumably bouncing around enough to prevent the sensor from registering the occlusion as the item went past. When my colleague came over, he was standing over the machine and his shadow blocked the sunlight. This wouldn’t be a problem in the production hardware, as it was all in a box (and hence, dark inside).

I constructed an optical shield and installed it on La Machine (a post-it note stuck to the side).

This was not a problem that could be fixed using print. We did have print via the serial port, but printing more than one character was hazardous because it meant that time taken to transmit characters on the serial line would interfere with the realtime operation of the rest of the software (when items are being picked every 200ms, every millisecond counts).

My Favourite Bugs

2011-05-11

I’ve been reading Seibel‘s Coders at Work, a series of interviews with various venerable hacker types. It’s fun!

Seibel often steers the conversation to a few clearly prepared questions. This seems to be a nice strategy, it realigns the interview and keeps it going when someone is banging on about his pet project, and it means we often get several opinions on a topic. One of the stock questions is, “what’s the worst bug you’ve had to deal with?”. Here’s one of mine.

I was part of a small team writing the garbage collector for a new implementation of the Dylan language. Our garbage collector was soft-realtime (pauses for GC for interactive programs were comparable to pauses due to page faults), generational copying, with a read-barrier, implemented on stock hardware and on multiprocessor machines. 32-bit Intel, 64-bit Alpha, 32- or 64-bit MIPS, SPARC (and probably more, I forget).

Our client were the compiler group. They were implementing the compiler for Dylan, and they had to integrate our garbage collector with their runtime. Lots of meetings about object layouts, header formats, hash tables, multithreaded stack allocation, that sort of thing. The compiler development was (finally) going through a bootstrapping phase. The compiler was being used to compile itself (previously the compiler was compiled using a version of Dylan that was “hosted” on a Common Lisp implementation). This was a troublesome phase. Compiling the compiler was typically done overnight, it took several hours. There was a change in the semantics of multimethod dispatch (I think), and it was important to get the first self-hosted implementation of the compiler with the new semantics out to the people using the compiler, so they didn’t accidentally write code to the old semantics.

The overnight compile was usually done on the dual processor SPARC. Unfortunately the overnight compile had stopped. At an assertion in the garbage collector. After running for several hours. It wasn’t really feasible to restart the compilation, I would be on the bus home before it hit the assertion again. And maybe it was due to some thread scheduling that only became a problem on a multiprocessor machine. The assert looked fine. It was checking that some value only increased monotonically. After some time thinking about in our group, we realised that the assert was fine, unless we were on a multiprocessor machine. I think the assertion was outside of a monitored block or something like that. The rest of the code (we reasoned) was fine. So the code was right, but the assertion was wrong.

We could edit the assertion out and ship a new release over to the compiler group. They would integrate the new release and it would be fine the next day. What about all those people in the US that wanted to use the new compiler when they woke later today? They couldn’t wait that long.

We patched the live binary. The overnight compile had stopped at this assertion and was showing it in the debugger. We looked up the NOP instruction in the architecture manuals and poked it in at the location where the assertion branched (or did it trap into the debugger? I forget). So now the assertion never fired. We let the debugger continue the process, and the compile ran fine with no garbage collection problems and finished a couple of hours later. Everyone could use the new compiler today!

This was not a problem that could have been fixed by using print.

Python dictionary: iterating and adding

2010-03-05

On one of my earlier articles about Python dictionaries Jay asks (I paraphrase): What if you want to add items to a dictionary whilst iterating? And what if you want those items to appear in the iteration? Python’s standard dictionary iterators won’t help you.

Here’s a plan:

  1. keep a set, visited of keys that you have already visited;
  2. get the set of keys from the dictionary and subtract visited, leaving you with a set of keys not yet visited, todo;
  3. visit each of the keys in todo;
  4. now we have visited each of the keys in todo, add those keys to visited;
  5. during the iteration the set of keys in the dictionary may have changed, so repeat from (2), stopping when todo is empty.
def iterall(adict):
    """Iterate over all the items in a dictionary,
    including ones which are added during the iteration.
    """
    visited = set()
    while True:
        todo = set(adict.keys()) - visited
        if not todo:
            break
        for k in todo:
            if k in adict:
                yield k, adict[k]
        visited |= todo

Here it is in use:

>>> d=dict(a=1,b=2)
>>> for k,v in iterall(d):
...     print k,v
...     d['c'+k[0]] = 'new'
... 
a 1
b 2
cb new
ca new
cc new

Python’s set datatype makes this easy to program and it has reasonable performance.

Obligatory Python whinge: The function implemented above, iterall, is effectively a new method that I’ve implemented for the dict class. But I can’t add a new method to a class that someone else implemented. Unlike, say, Common Lisp, Dylan, Smalltalk, and Objective-C, where I can.

Java Implementation of Lua Language

2009-12-11

Just a quick note for those that don’t follow me on twitter:

I have recently announced the open sourcing of Jill: Java Implementation of Lua Language. It’s a little thing I (mostly) wrote 3 years ago, and it’s taken far too long to get it open sourced, but it’s open now.

No doubt it will prompt me to write one or two blog posts on JME type things.

Python: slicing with zip

2009-11-26

Wherein I feel compelled to write some more on Python code that I find more amusing than clear.

The more I use zip the more I love it. I’m thinking about writing a tutorial on how to (ab-) use zip, but for now just this recent discovery.

Say you have two iterators that each yield a stream of objects, iland and iocean (they could be gridded temperature values, say), and you want to get the first 100 values from each iterator to do some processing, whilst not consuming any more than 100 values. You can’t go list(iland)[:100] because that will consume the entire iland iterator and you’ll never be able to get those values past the 100th again.

You can use itertools (probably my second favourite Python module):

land100 = list(itertools.islice(iland, 100))
ocean100 = list(itertools.islice(iocean, 100))

It seems a shame to mention islice and 100 twice. One could use map with a quick pack and unpack, but this is not clear:

land100,ocean100 = map(lambda i: list(itertools.islice(i, 100)), (iland,iocean))

(a simple form of this, which I do sometimes use, is x,y = map(int, (x,y)))

What about giving some love to zip? It turns out that zip will stop consuming as soon as any argument is exhausted. So

zip(range(100), iland, iocean)

returns a list of 100 triples, each triple having an index (the integer from 0 to 99 from the range() list), a value from the iland iterator, and a value from the iocean iterator. And as soon as the list produced by range(100) is exhausted it stops consuming from iland and iocean, so their subsequent values can be consumed by other parts of the program.

And yes, this seems to work by relying on a rather implementation specific feature of zip that I’m not sure should be set in stone.

That zip form above is all very good if one wants to go for n,land,ocean in ..., but what if we want the 100 land values and 100 ocean values each in their own list (like the code at the beginning of the article)? We can use zip again!

_,land100,ocean100 = zip(*zip(range(100), iland, iocean))

zip(*thing) turns a list of triples into a triple of lists, which is then destructured into the 3 variables _ (a classic dummy), land100, and ocean100.

Don’t worry, the actual code use the islice form from the first box because I think it’s the clearest.

Python: Some Naughty Features

2009-06-03

Things in Python that I mostly like, but make me feel a bit naughty when I use them. I air my smelly code as a sort of cleansing ritual and perhaps to make you feel a little bit better about your own dirty habits.

zip(*[iter(s)]*n)

I now tend to call this zip/iter thing group. It came to me in a dream.

itertools.chain(*s)

The “inverse” of the above. I only “discovered” this recently. A good couple of months after the above discovery.

Bound methods of simple values. Example: In order to convert a sequence of image samples from perceptual space to linear space we need to raise each one to the power 2.2:

map((2.2).__rpow__, mylist)

«(2.2).__rpow__» is equivalent to the following function:

def f(x): return x**2.2

(as a lambda: «lambda x: x**2.2»)

__rpow__ is the swopped version of __pow__. Using «(2.2).__pow__» would be equivalent to «2.2**x», which isn’t what I want at all.

Literal string concatenation. To be honest, I’ve no idea why this feels a bit naughty to me, but it does. C has it, so why not in Python? Perhaps it’s because I often end up using it in cases where there are other problems with the code (this example from PyPNG):

    raise FormatError("Illegal combination of bit depth (%d)"
      " and colour type (%d)."
      " See http://www.w3.org/TR/2003/REC-PNG-20031110/#table111 ."
      % (self.bitdepth, self.color_type))

Sometimes I get a bit OCD about this and start removing extraneous string concatenation «+» operators from other people’s code.

The split thing that I mentioned in an earlier article. Here’s an example from Clear Climate Code:

noaa = """
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/station.inventory.Z
""".split()

It just seems so much nicer to be able to add a URL to the list without having to add any more syntax (and in this case, I think I originally created the list by pasting it in from a text file, so using split meant less reformatting).

Doing non-trivial things in a module’s top-level. Another example from PyPNG:

_signature = struct.pack('8B', 137, 80, 78, 71, 13, 10, 26, 10)

This is just a complicated way of writing an 8-byte literal string. It’s written this way so that the crucial 8 bytes are visible as decimal values, and can therefore be checked more easily against the relevant part of the PNG standard which also uses a decimal notation.

Passing a function with side-effects into map. pipcolour (an undocumented utility in PyPNG) counts the number of colours used in an image. In the code below, pixels is an iterator that yields each row of a PNG image; the pixels from each row are added to a pre-existing set of colours. Like this:

    col = set()
    for row in pixels:
        # Ewgh, side effects on col
        map(col.add, png.group(row, planes))

(in this code png.group is the grouping function which is the first item in this article; for an RGB image, planes is 3)

Now, it strikes me that I could’ve gone:

set(group(itertools.chain(*pixels), planes))

but that has radically different memory usage. The first form only loads one row at a time into its working set; the second form loads the entire decompressed PNG image into the working set (it’s all the fault of «*pixels»). If I was daring enough to require Python 2.6 then I could’ve used itertools.chain.from_iterable, but that feels clumsy. It one of those cases where I feel that Python is unnecessarily forcing me to choose between clarity and efficiency.

I think the above also illustrates a weakness with the sort of delayed evaluation programming that you get when using iterators: Apparently simple transformations to the code can result in huge changes to the space/time complexity. This is not a weakness that is unique to Python, any style of delayed evaluation has this risk (Haskell’s lazy evaluation, for example). When I was learning Prolog my chief criticism of it (and I think it’s still valid) was that it seemed to be fairly easy to be sure that your program would compute the correct answer, but very difficult to be sure that the answer would appear before the heat death of the universe.

Violating PEP 8 so I can write one-line functions:

    def test_gradient_horizontal_lr(x, y): return x
    def test_gradient_horizontal_rl(x, y): return 1-x
    def test_gradient_vertical_tb(x, y): return y
    def test_gradient_vertical_bt(x, y): return 1-y

Did you spot I did one of those earlier? How naughty. Of course PEP 8 kind of approves of this, because “a foolish consistency is the hobgoblin of little minds” which you probably know from PEP 8 but is of course from Ralph Waldo Emerson. In a curious coincidence I just happen to have a copy of his “Self-Reliance” on my desk; and I checked, he really did say that. In fact, the fuller quote is much more poetic than I was expecting: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.”.

Indexing with a boolean expression. Yet another example comes from PyPNG and illustrates how well written it is. Say you want to create an array of pixel values or you want to pack them into a string (using struct). If the number of bits per value (the bitdepth) is <= 8 then you want to use «array('B')», otherwise you want «array('H')». Select the format like this:

fmt = 'BH'[self.bitdepth > 8]

Iverson’s Convention for the win! (and it’s better than PEP 308 because it works on Python 2.4)

Now, this is naughty: Creating a naughty bit of Python code merely for purposes of this blog article. Well, that’s not quite true, I got distracted by some Python code and that code and a stupid idea I had a few days ago clicked together. The code is witty, but should never be used. PyPNG allows the caller to specify the zlib compression level as an optional argument when creating a png.Writer object. This is the argument to the zlib.compressobj constructor. The default (for PyPNG) is to use zlib’s default. Now it’s clearly not the business of PyPNG to know what zlib’s default is, so I don’t want that appearing in the code. So the code looks like this:

if self.compression is not None:
    compressor = zlib.compressobj(self.compression)
else:
    compressor = zlib.compressobj()

Now, what I really want is a way to say «pass in x as an argument, but only if it’s non None». If I had some expression, E, that evaluated to a list that was either the singleton, «[x]», or the empty list «[]» then I could get rid of the if and go:

compressor = zlib.compressobj(*E)

«*E» would expand to either the empty argument list, or an argument list with just one argument.

I discovered an E: [x for x in [self.compression] if x is not None]. So now the code is:

compressor = zlib.compressobj(*[x for x in [self.compression] if x is not None])

Well, the code isn’t really that, I’m not that much of a monster to check that into source control.

What Good is PyPNG? (and some MDIS pictures of Earth)

2009-05-06

I originally wrote PyPNG so that I could manipulate PNG images using Python. My original itch was a tiling application. Since then I’ve been using to develop all sort of little image utilities and investigate PNGs. pipdither is written in Python and the initial version took about an afternoon of fairly lazy coding.

Python is a great language, and now that I’ve got a module for creating PNGs, it’s easy to create other image tools in Python.

Here is a link to an image of Earth taken from the Messenger spacecraft (a 2.1e6 octet file that you won’t be able to read). Messenger was on an Earth flyby on its way to map mercury when it took the image. The image is archived by NASA’s PDS and stored in its IMG format. Which is basically a whole boatload of metadata followed by the image data in row major order at 16-bits per pixel.

Because it’s basically just a bunch of numbers after the metadata, this is really easy to grok using Python. Here it is converted to PNG using PyPNG (this is a small 8-bit version, follow the link to get the full 12-bit version):

ew0031477633c-black00-320

It looks rubbish. Someone has turned the brightness up too high on the telly. It looks rubbish for two reasons: a) I embedded a gAMA chunk to specify a gamma of 1.0 (linear); and, b) the camera’s black-level does not correspond to a code of 0. Here’s a histogram:
ew0031477633c-histogram

Sorry the histogram is a bit rubbish (the image is 12-bit, so the histogram is 4096 pixels wide). But you should be able to see that the first chunk of codes (on the left) are unused. That’s because even though there’s no light falling on the sensor, there’s thermal noise or some residual current drain, or something. So picture black is represented by a code of about 7%. And because I specified a gamma of 1.0 that 7% value in the PNG file actually looks not very dark at all.

Actually I had to fiddle with wordpress quite a lot to get it looking this bad. If left to its own devices wordpress will create its own scaled down version of the image and it THROWS AWAY the gAMA chunk. If your computer ignores the PNG’s gamma chunk then maybe the image doesn’t look to bad to you (a grey level of about 7% in some sort of perceptual space looks fairly black, not actually black, but good enough that you probably won’t tell unless you open a black image next to it).

So clearly to get a good image I should choose some black-level, 7% say, subtract 7% from all the pixel values and rescale to fit the full range. Yeah I could. But that would destroy the original data and impose my own processing on it. What if you wanted to do your own black-level compensation?

I had an idea. I could create an ICC profile that clamps the bottom 7% codes to black, then embed the profile in the PNG. The profile looks like this (in Apple’s ColorSync utility):
icc-screenshot

Apologies (yet again) for the fact that the screenshot doesn’t fit. WordPress would’ve scaled it, but apparently it falls over when trying to scale a 4-channel PNG. The screenshot has an alpha channel because it’s a shot of a Window and the window has chamfered corners which require alpha. How amusing.

Anyway. Input is on the x-axis, output is on the y-axis. You can see that this TRC curve maps the bottom 7% into 0, and the rest goes linearly.

This being the first time I’ve deliberately embedded a colour profile into a PNG image, I’m slightly gobsmacked that it actually works:

ew0031477633c-black07-320

You get the best of both worlds. A plausible looking picture for viewing, and all the original image data if you want to dink around with it yourself.

I’m pretty sure that a gamma of 1 is correct, by the way, because the values in the IMG file are raw 12-bit sensor values and they generally respond linearly to light. Actually they’re not quite raw, the onboard computer has compressed the image before it was transmitted to Earth, you can see compression artefacts in the “noise” in the dark regions of the picture (if you look at the version that doesn’t have a colour profile).

PS If these images look the same, your computer isn’t doing colour collection. It sucks. It works on a Mac [ed: if you use Safari]. Try my later article to see a simulation of what it looks like when it works.

2-bit dither

2009-04-30

An earlier article suggests we should pay attention to gamma, at least when the output is 1-bit deep. How should we dither when we want the output to be more than 1-bit deep. Say, dithering 8-bit input down to 2-bit output?

We need to decide:

- What output codes shall we use?
- How do we choose the each output code?

A general purpose dithering routine can use an arbitrary set of output codes. The number of output codes is determined by the desired bit-depth of the output. If we have a 2-bit output file, then that’s 4 output codes.

The easiest output codes to use are linear ones, because we’re doing the dithering in linear space (we are, right? Well, part of the reason for this blog article is to explore when that is worth doing). For example, a 2-bit output code in linear space can be selected with «int(round(v*3.0))». There’s a problem: this sucks. It sucks for all the same reasons that we use perceptual codes in the first place: we’re pissing away (output) precision on areas of the colour space where humans can’t perceive differences anyway; not enough buck for each bit.

This 8-bit ramp is dithered down to 2-bits and the gamma (of the 2-bit output image) is 1.0:

A dithered ramp.  Output codes linear.

Black pixels extend from the left almost to the middle, so just 1 of the 3 adjacent pairs of output colours (0 n 1, 1 n 2, 2 n 3) are dithered over 50% of the image area. That doesn’t seem like a good thing. Also, the 2 lightest colours are very close to each other (perceptually), giving the effect that the right-hand end of the ramp looks “blown out” with not enough shading.

So, the output codes ought to be in some sort of perceptual space (mostly); it makes sense to allow the user to specify the output codes. Not just how many output codes (the bit depth), but also the distribution of the output codes, which amounts to selecting the output gamma. Yeah, I guess in general you want to specify some calibrated colour space; gamma is a simplification.

Two curves show the difference between picking output codes in linear space (gamma = 1.0) and in a perceptual space (gamma = 1.8). The (relative) intensity values appear on the right:


Error diffusion dithering (which is what pipdither does) is a general term for a number of algorithms all of which boil down to:
- pick an order to consider the pixels of an image;
- for each pixel:
– from the pixel’s value v, select an output code with value t;
– the error, v-t, is distributed to pixels not yet considered by adding some fraction of the error onto each of a number of pixels;

The differences between most error diffusion dithering algorithms amount to what pixels the error is distributed to and what fraction each pixel receives. I won’t be considering the error diffusion step in detail in this article.

So, how do we choose the output code? Since we are doing the dithering in linear space, we have presumably converted the input image into linear values (greyscale intensity). The easy thing then is to arrange the output codes in the same linear space (in other words, convert all the output codes into linear space), and pick the nearest output code corresponding to the value of each pixel. Actually pipdither allows you to favour higher codes or lower codes by placing the cutoff point between each pair of adjacent output codes some fraction of the way between them; the fraction can be specified (with the -c option in the unlikely event that anyone actually wants to use pipdither), and defaults to 0.75, favouring lower output codes. I may explain why in a later article.

The one problem with this algorithm: the images it produces look almost the same as the naïve algorithm that just ignores all considerations of gamma (pretending that the input and output are linear when in fact both are coded in perceptual space). So our new improved algorithm is just pissing away CPU performance for almost no gain in image quality. Still, at least we can be smug in the knowledge that it Does The Right Thing.

Can we actually tell any difference between my Right Thing gamma-aware algorithm and the naïve algorithm which knows nothing about gamma (and hence just pretends everything is linear)? The middle image (below) is a greyscale gradient with 256 grey values (8-bit). The other two images are dithers down to 2-bit. Can you tell which image is which algorithm?

ramp-dither-gamma

ramp1

ramp-dither-lin

Here’s a couple of spacemen. I think the difference between the two algorithms is more pronounced:
as12-49-7281-b2-gamma

as12-49-7281-b2-lin

And I’d just like to point out that because of Apple’s 2-bit PNG bug, all of these “2-bit” PNGs are in fact 8-bit PNGs that just happen to only use 4 codes (I use pipdither -b 8 to increase a PNG image’s bit depth; no dithering takes places in that case).

Dither and Gamma

2009-04-03

Where every pixel counts.

Dithering (in images) turns this:
swatch

Into this:
swatch-dither

As you can do doubt see, the upper image has shades of grey, the lower image has only black and white. The intermediate shades have been approximated by a sort of pseudo random dotty pattern. The lower image is a dithered version of the upper image.

So, one question is: “For a given grey area, how many white dots should we use, on average?” Well, opinions vary (and rightly so), but here’s one way of thinking about the problem: When viewed from a modest distance the dithered image should emit the same number of photons as the greyscale image would have (at least approximately). And that should be true for any reasonably sized patch of the image too. I’m aware that this is just one method of determining what an ideal dithering should do, but it’s the one I’m going to adopt for this article.

So a “50% grey” patch should, when dithered, end up as roughly 50% black pixels and 50% white pixels. There’s a problem. Gamma.

If your greyscale image file has the value 128 (out of 255) for a pixel, that doesn’t mean “produce a grey pixel that emits 50% as many photons as a white pixel”. Oh no. What it probably means in reality (regardless of what the file format says it should), is “write the value 0×80 into the framebuffer”, and what that actually corresponds to is “produce a pixel that emits 18% as many photons as a white pixel” (0×80 corresponds to a fraction of about 0.5; framebuffer values often correspond linearly to voltages on the CRT gun, and the CRT responds like x2.5. And 0.52.5 is about 0.18). The details are complicated and vary a lot, but the bottom line is: When dithering, a pixel value of 128/255 does not correspond to 50% intensity.

Gamma is the exponent used to convert from linear intensity space to gamma encoded space (when < 1), and at the other end to convert from gamma encoded space to photons (when > 1).

So when I said “50% grey”, above, implicitly meaning “a grey that emits 50% of the photons that white emits”, in a gamma encoded image that corresponds to a much larger pixel value. If the image is encoded with a gamma of 0.45 (a typical value used in “don’t care” PC image processing), then the encoded value of a grey that is 50% as intense as white is 0.50.45 = 0.732. Such a grey actually looks quite light.

So when you dither an image you had better convert the pixel values into a linear light space. If you don’t, then an area that has a colour of #808080 will come out with 50% white pixels, instead of 18%. Does it matter? Well, it certainly makes a difference:
as12-49-7281-g045-320

A crop from Nasa’s AS12-49-7281 showing Charles Conrad Jr. reflected in Alan L. Bean’s visor.

Dithered assuming a gamma of roughly 0.45:
g045

Dithered assuming image represents linear light intensity (in other words a gamma of 1, what you do if you just applied the dithering algorithm with no adjustment):
glin

Well, I hope you can see that the two dithered images are different. In the lower image the obvious differences are that the bloom-effect on the left hand side of the image is much more pronounced than it should be, and the visor looks more washed out than it really should be. The lens of Bean’s camera is also the wrong shade in the lower image, but that actually picks out a detail that we cannot see in the upper image, so it’s not all “bad”.

The author of Netpbm must have had a similar realisation. In 2004 pgmtopbm, which did dithering but without any gamma correction, was superseded by pamditherbw, which gamma corrects and dithers in one.

Why did I dither assuming a gamma of 0.45? Well, because the source image was a JPEG that appeared to contain no information about how it had been gamma encoded. If I assume that the picture is displayed on typical crappy (PC) hardware and has been adjusted until that looks okay, then that corresponds to an encoding gamma of about 0.45 (1/2.2). One way to think about this is that PCs implicitly assume that images they are showing have been encoded with a gamma of 0.45; if they’re made by people on PCs then they probably have effectively been encoded with a gamma 0.45 whether the person making the image knows it or not. Incidentally OS X, if left to its own devices, has an implicit encoding assumption of 0.56 (1/1.8).

The images on this page are PNG images with their gamma encoded expressly specified by gAMA chunk. On my machine, Mac OS X actually takes notice of a PNG’s gamma encoding and adjusts the displayed image accordingly (the implicit encoding of 0.56 mentioned above comes from the fact that a PNG image with no gamma encoding specified and a PNG image with a gamma of 0.56 will display identically (assuming they have the same pixels values!)). So if you’re viewing on a Mac, there’s a good chance that you’ll see what I see. That’s important for the greyscale version of the astronaut picture because the gamma encoding actually makes it display a little bit darker than it otherwise would.

The dithering is actually done by a Python program that uses PyPNG. The first dithered image is the result from «pipdither < in.png» where in.png is the greyscale PNG including a gAMA chunk (which pipdither respects by default); the second, lower, dithered image is the result from «pipdither -l» (the -l option causes it to assume linear input). At the moment pipdither (and friends) live, undocumented and unloved, in the code/ subdirectory of PyPNG’s source.

By the way, these dithered images are a great way to expose the fact that Exposé on a Mac point samples the windows when it resizes them. Try it, and learn to love the aliases. Do graphics cards have a convolution engine in them yet? And no, I wasn’t really counting bilinear filtering (though that would be a lot better than what Exposé does right now).

Dithering isn’t just used to convert an image to bilevel black-and-white. It can be used to convert an image to use any particular fixed set of colours (for example, the “web-safe” colours). Reducing the bit depth of an image, for example converting from 8-bit to 6-bit values, can be seen a special case of this (in that the fixed set is the complete lattice of 6-bit values). pipdither can’t (yet) do arbitrary conversion, it can currently only be used to reduce to a greyscale image’s bit depth. It currently has a bug whereby it effectively produces a PNG file in linear intensity space, but doesn’t generate a gAMA to tell you that.

So when you dither you ought to gamma correct the samples. But what value of gamma to use? This is problematic: Many images do not specify their encoding gamma, even when the file format allows it; even when it is specified, the gamma might include a factor for rendering intent (both PNG spec section 12, and Poynton’s Rehabilitation of Gamma suggest that it is reasonable for the gamma to be tweaked to allow for probable viewing conditions); you might have your own reasons (artistic intent, let’s say) for choosing a particular gamma.

I think the practical upshot of this is that any dithering tool ought to let you specify what gamma conversion to use (probably on both input and output). Does yours?

Follow

Get every new post delivered to your Inbox.