Archive for the 'graphics' Category

Earth, again


So unless you’re using OS X (or, apparently, Firefox 3.5b4 on Vista), my earlier article and MDIS and embedded colour profiles and what-not must’ve seemed a little bit mysterious. The two pictures of Earth will have looked the same (on computers that ignore the ICC profile embedded in the PNG).

It turns out that Apple’s ColorSync Utility will let you change an image’s ICC profile: either editing the image to fit the new profile, or just slapping a different profile on (I discovered this by reading the Help, making it the second useful fact I’ve learnt from OS X Help). Strangely, ColorSync can also resize an image. And it’s a magnification tool. Anyway, remapping the images into sRGB allows me to show you a simulated version of what I see. Your computer will ignore the embedded sRGB profile, but that’ll be okay, because it’ll look roughly right. That’s what sRGB is designed to do.




The top image is a simulation of the raw MDIS camera values in a linear space. The bottom image is the corrected version using the ICC profile I showed in the earlier article.

The middle image is the “just blat the pixels onto the graphics card” version that ignores all gamma and ICC profiles. As you can see, in terms of black level, it’s hard to distinguish its background from your display’s true black unless you put a true black right next to it.

PS This is supposed to be Earth. Does anyone recognise what we’re looking at? Technically there’s enough metadata in the original IMG files for me to answer that question, since the exact time of the exposure is recorded as well as exactly where the camera was pointing, but like that’s going to happen.

What Good is PyPNG? (and some MDIS pictures of Earth)


I originally wrote PyPNG so that I could manipulate PNG images using Python. My original itch was a tiling application. Since then I’ve been using to develop all sort of little image utilities and investigate PNGs. pipdither is written in Python and the initial version took about an afternoon of fairly lazy coding.

Python is a great language, and now that I’ve got a module for creating PNGs, it’s easy to create other image tools in Python.

Here is a link to an image of Earth taken from the Messenger spacecraft (a 2.1e6 octet file that you won’t be able to read). Messenger was on an Earth flyby on its way to map mercury when it took the image. The image is archived by NASA’s PDS and stored in its IMG format. Which is basically a whole boatload of metadata followed by the image data in row major order at 16-bits per pixel.

Because it’s basically just a bunch of numbers after the metadata, this is really easy to grok using Python. Here it is converted to PNG using PyPNG (this is a small 8-bit version, follow the link to get the full 12-bit version):


It looks rubbish. Someone has turned the brightness up too high on the telly. It looks rubbish for two reasons: a) I embedded a gAMA chunk to specify a gamma of 1.0 (linear); and, b) the camera’s black-level does not correspond to a code of 0. Here’s a histogram:

Sorry the histogram is a bit rubbish (the image is 12-bit, so the histogram is 4096 pixels wide). But you should be able to see that the first chunk of codes (on the left) are unused. That’s because even though there’s no light falling on the sensor, there’s thermal noise or some residual current drain, or something. So picture black is represented by a code of about 7%. And because I specified a gamma of 1.0 that 7% value in the PNG file actually looks not very dark at all.

Actually I had to fiddle with wordpress quite a lot to get it looking this bad. If left to its own devices wordpress will create its own scaled down version of the image and it THROWS AWAY the gAMA chunk. If your computer ignores the PNG’s gamma chunk then maybe the image doesn’t look to bad to you (a grey level of about 7% in some sort of perceptual space looks fairly black, not actually black, but good enough that you probably won’t tell unless you open a black image next to it).

So clearly to get a good image I should choose some black-level, 7% say, subtract 7% from all the pixel values and rescale to fit the full range. Yeah I could. But that would destroy the original data and impose my own processing on it. What if you wanted to do your own black-level compensation?

I had an idea. I could create an ICC profile that clamps the bottom 7% codes to black, then embed the profile in the PNG. The profile looks like this (in Apple’s ColorSync utility):

Apologies (yet again) for the fact that the screenshot doesn’t fit. WordPress would’ve scaled it, but apparently it falls over when trying to scale a 4-channel PNG. The screenshot has an alpha channel because it’s a shot of a Window and the window has chamfered corners which require alpha. How amusing.

Anyway. Input is on the x-axis, output is on the y-axis. You can see that this TRC curve maps the bottom 7% into 0, and the rest goes linearly.

This being the first time I’ve deliberately embedded a colour profile into a PNG image, I’m slightly gobsmacked that it actually works:


You get the best of both worlds. A plausible looking picture for viewing, and all the original image data if you want to dink around with it yourself.

I’m pretty sure that a gamma of 1 is correct, by the way, because the values in the IMG file are raw 12-bit sensor values and they generally respond linearly to light. Actually they’re not quite raw, the onboard computer has compressed the image before it was transmitted to Earth, you can see compression artefacts in the “noise” in the dark regions of the picture (if you look at the version that doesn’t have a colour profile).

PS If these images look the same, your computer isn’t doing colour collection. It sucks. It works on a Mac [ed: if you use Safari]. Try my later article to see a simulation of what it looks like when it works.

2-bit dither


An earlier article suggests we should pay attention to gamma, at least when the output is 1-bit deep. How should we dither when we want the output to be more than 1-bit deep. Say, dithering 8-bit input down to 2-bit output?

We need to decide:

– What output codes shall we use?
– How do we choose the each output code?

A general purpose dithering routine can use an arbitrary set of output codes. The number of output codes is determined by the desired bit-depth of the output. If we have a 2-bit output file, then that’s 4 output codes.

The easiest output codes to use are linear ones, because we’re doing the dithering in linear space (we are, right? Well, part of the reason for this blog article is to explore when that is worth doing). For example, a 2-bit output code in linear space can be selected with «int(round(v*3.0))». There’s a problem: this sucks. It sucks for all the same reasons that we use perceptual codes in the first place: we’re pissing away (output) precision on areas of the colour space where humans can’t perceive differences anyway; not enough buck for each bit.

This 8-bit ramp is dithered down to 2-bits and the gamma (of the 2-bit output image) is 1.0:

A dithered ramp.  Output codes linear.

Black pixels extend from the left almost to the middle, so just 1 of the 3 adjacent pairs of output colours (0 n 1, 1 n 2, 2 n 3) are dithered over 50% of the image area. That doesn’t seem like a good thing. Also, the 2 lightest colours are very close to each other (perceptually), giving the effect that the right-hand end of the ramp looks “blown out” with not enough shading.

So, the output codes ought to be in some sort of perceptual space (mostly); it makes sense to allow the user to specify the output codes. Not just how many output codes (the bit depth), but also the distribution of the output codes, which amounts to selecting the output gamma. Yeah, I guess in general you want to specify some calibrated colour space; gamma is a simplification.

Two curves show the difference between picking output codes in linear space (gamma = 1.0) and in a perceptual space (gamma = 1.8). The (relative) intensity values appear on the right:

Error diffusion dithering (which is what pipdither does) is a general term for a number of algorithms all of which boil down to:
– pick an order to consider the pixels of an image;
– for each pixel:
– from the pixel’s value v, select an output code with value t;
– the error, v-t, is distributed to pixels not yet considered by adding some fraction of the error onto each of a number of pixels;

The differences between most error diffusion dithering algorithms amount to what pixels the error is distributed to and what fraction each pixel receives. I won’t be considering the error diffusion step in detail in this article.

So, how do we choose the output code? Since we are doing the dithering in linear space, we have presumably converted the input image into linear values (greyscale intensity). The easy thing then is to arrange the output codes in the same linear space (in other words, convert all the output codes into linear space), and pick the nearest output code corresponding to the value of each pixel. Actually pipdither allows you to favour higher codes or lower codes by placing the cutoff point between each pair of adjacent output codes some fraction of the way between them; the fraction can be specified (with the -c option in the unlikely event that anyone actually wants to use pipdither), and defaults to 0.75, favouring lower output codes. I may explain why in a later article.

The one problem with this algorithm: the images it produces look almost the same as the naïve algorithm that just ignores all considerations of gamma (pretending that the input and output are linear when in fact both are coded in perceptual space). So our new improved algorithm is just pissing away CPU performance for almost no gain in image quality. Still, at least we can be smug in the knowledge that it Does The Right Thing.

Can we actually tell any difference between my Right Thing gamma-aware algorithm and the naïve algorithm which knows nothing about gamma (and hence just pretends everything is linear)? The middle image (below) is a greyscale gradient with 256 grey values (8-bit). The other two images are dithers down to 2-bit. Can you tell which image is which algorithm?




Here’s a couple of spacemen. I think the difference between the two algorithms is more pronounced:


And I’d just like to point out that because of Apple’s 2-bit PNG bug, all of these “2-bit” PNGs are in fact 8-bit PNGs that just happen to only use 4 codes (I use pipdither -b 8 to increase a PNG image’s bit depth; no dithering takes places in that case).

Dither and Gamma


Where every pixel counts.

Dithering (in images) turns this:

Into this:

As you can do doubt see, the upper image has shades of grey, the lower image has only black and white. The intermediate shades have been approximated by a sort of pseudo random dotty pattern. The lower image is a dithered version of the upper image.

So, one question is: “For a given grey area, how many white dots should we use, on average?” Well, opinions vary (and rightly so), but here’s one way of thinking about the problem: When viewed from a modest distance the dithered image should emit the same number of photons as the greyscale image would have (at least approximately). And that should be true for any reasonably sized patch of the image too. I’m aware that this is just one method of determining what an ideal dithering should do, but it’s the one I’m going to adopt for this article.

So a “50% grey” patch should, when dithered, end up as roughly 50% black pixels and 50% white pixels. There’s a problem. Gamma.

If your greyscale image file has the value 128 (out of 255) for a pixel, that doesn’t mean “produce a grey pixel that emits 50% as many photons as a white pixel”. Oh no. What it probably means in reality (regardless of what the file format says it should), is “write the value 0x80 into the framebuffer”, and what that actually corresponds to is “produce a pixel that emits 18% as many photons as a white pixel” (0x80 corresponds to a fraction of about 0.5; framebuffer values often correspond linearly to voltages on the CRT gun, and the CRT responds like x2.5. And 0.52.5 is about 0.18). The details are complicated and vary a lot, but the bottom line is: When dithering, a pixel value of 128/255 does not correspond to 50% intensity.

Gamma is the exponent used to convert from linear intensity space to gamma encoded space (when < 1), and at the other end to convert from gamma encoded space to photons (when > 1).

So when I said “50% grey”, above, implicitly meaning “a grey that emits 50% of the photons that white emits”, in a gamma encoded image that corresponds to a much larger pixel value. If the image is encoded with a gamma of 0.45 (a typical value used in “don’t care” PC image processing), then the encoded value of a grey that is 50% as intense as white is 0.50.45 = 0.732. Such a grey actually looks quite light.

So when you dither an image you had better convert the pixel values into a linear light space. If you don’t, then an area that has a colour of #808080 will come out with 50% white pixels, instead of 18%. Does it matter? Well, it certainly makes a difference:

A crop from Nasa’s AS12-49-7281 showing Charles Conrad Jr. reflected in Alan L. Bean’s visor.

Dithered assuming a gamma of roughly 0.45:

Dithered assuming image represents linear light intensity (in other words a gamma of 1, what you do if you just applied the dithering algorithm with no adjustment):

Well, I hope you can see that the two dithered images are different. In the lower image the obvious differences are that the bloom-effect on the left hand side of the image is much more pronounced than it should be, and the visor looks more washed out than it really should be. The lens of Bean’s camera is also the wrong shade in the lower image, but that actually picks out a detail that we cannot see in the upper image, so it’s not all “bad”.

The author of Netpbm must have had a similar realisation. In 2004 pgmtopbm, which did dithering but without any gamma correction, was superseded by pamditherbw, which gamma corrects and dithers in one.

Why did I dither assuming a gamma of 0.45? Well, because the source image was a JPEG that appeared to contain no information about how it had been gamma encoded. If I assume that the picture is displayed on typical crappy (PC) hardware and has been adjusted until that looks okay, then that corresponds to an encoding gamma of about 0.45 (1/2.2). One way to think about this is that PCs implicitly assume that images they are showing have been encoded with a gamma of 0.45; if they’re made by people on PCs then they probably have effectively been encoded with a gamma 0.45 whether the person making the image knows it or not. Incidentally OS X, if left to its own devices, has an implicit encoding assumption of 0.56 (1/1.8).

The images on this page are PNG images with their gamma encoded expressly specified by gAMA chunk. On my machine, Mac OS X actually takes notice of a PNG’s gamma encoding and adjusts the displayed image accordingly (the implicit encoding of 0.56 mentioned above comes from the fact that a PNG image with no gamma encoding specified and a PNG image with a gamma of 0.56 will display identically (assuming they have the same pixels values!)). So if you’re viewing on a Mac, there’s a good chance that you’ll see what I see. That’s important for the greyscale version of the astronaut picture because the gamma encoding actually makes it display a little bit darker than it otherwise would.

The dithering is actually done by a Python program that uses PyPNG. The first dithered image is the result from «pipdither < in.png» where in.png is the greyscale PNG including a gAMA chunk (which pipdither respects by default); the second, lower, dithered image is the result from «pipdither -l» (the -l option causes it to assume linear input). At the moment pipdither (and friends) live, undocumented and unloved, in the code/ subdirectory of PyPNG’s source.

By the way, these dithered images are a great way to expose the fact that Exposé on a Mac point samples the windows when it resizes them. Try it, and learn to love the aliases. Do graphics cards have a convolution engine in them yet? And no, I wasn’t really counting bilinear filtering (though that would be a lot better than what Exposé does right now).

Dithering isn’t just used to convert an image to bilevel black-and-white. It can be used to convert an image to use any particular fixed set of colours (for example, the “web-safe” colours). Reducing the bit depth of an image, for example converting from 8-bit to 6-bit values, can be seen a special case of this (in that the fixed set is the complete lattice of 6-bit values). pipdither can’t (yet) do arbitrary conversion, it can currently only be used to reduce to a greyscale image’s bit depth. It currently has a bug whereby it effectively produces a PNG file in linear intensity space, but doesn’t generate a gAMA to tell you that.

So when you dither you ought to gamma correct the samples. But what value of gamma to use? This is problematic: Many images do not specify their encoding gamma, even when the file format allows it; even when it is specified, the gamma might include a factor for rendering intent (both PNG spec section 12, and Poynton’s Rehabilitation of Gamma suggest that it is reasonable for the gamma to be tweaked to allow for probable viewing conditions); you might have your own reasons (artistic intent, let’s say) for choosing a particular gamma.

I think the practical upshot of this is that any dithering tool ought to let you specify what gamma conversion to use (probably on both input and output). Does yours?

PNG filtering for small bit depths


Willem van Schaik’s PngSuite is a most welcome set of PNG test images.

Mysteriously, it has no test images that have a non-zero scanline filter and a bit depth of < 8.

I give you f99n0g04.png:


It uses a different scanline filter on successive lines, cycling round them all.

This is a useful test-case to have because the scanline filtering works at the byte level. Normally, when filtering, a byte from one pixel is compared to a byte from the previous pixel, but when the bit depth is < 8 there are several pixels per byte. So instead, the previous byte is used (which will comprise several of the earlier pixels, not necessarily even the immediately preceding one). So you can imagine getting some of the previous-byte/previous-pixel logic wrong. Basically it’s a special case, and the PngSuite doesn’t cover it.

Of course, the only PNG decoder I’ve found that barfs on this image, is PyPNG (a pure Python PNG decoder). And it only broke whilst I was in the middle of implementing small bit depth decoding.

[2009-02-24: Willem added this image to PngSuite]

I crush OptiPNG!


(A further article in a mini-series about small PNGs. GDR’s earlier article; my earlier article. Neither is required reading, but you might need to get comfy with the PNG spec)

Consider the following PNG file:

It’s 29 (by 1) pixels. It’s the output of «echo P2 29 1 15 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 | pnmtopng». It’s 81 bytes long.

Here it is scaled to be a bit bigger:


OptiPNG is a great tool. It messes with all those PNG compression options (zlib compression options, scanline filtering, and so on) to search for better compression. Given the above 81 byte image it produces the following 69 byte image:

(this is of course an identical looking image, I’m just showing it so you can see the evidence)

I produce the following 68 byte image:

So how did I do one byte better than OptiPNG? Well I cheated; what did you expect?

I discovered a redundancy in the image format that I suspected that OptiPNG would not search over; then I created an image where hand selecting the redundant parts leads to better compression.

When using a bit depth less than 8, a scanline can have unused bits in its final (right-most) byte (see PNG spec section 7.2). A scanline is filtered before the data is passed to zlib. The unused bits are arbitrary, but they still influence the filtering because the filtering happens over the bytes. Therefore careful choice of what value to use for the unused bits can lead to a filtered scanline that compresses better.

When the 4-bit pixels of the image have been packed into bytes, the scanline of the image looks like this (in hex):


Uncompressing (with Python) the IDAT chunk from the file produced by OptiPNG shows that it correctly chooses filter type 1 to produce the filtered line:

>>> print zlib.decompress('081d636444019f0001890102'.decode('hex')).encode('hex')

(the first 01 specifies the filter type; this sequence is one byte longer than the scanline)

Return to the scanline data, that final ‘0’ nybble is the redundant part. I can pick any value for that nybble that I want; that part of the byte isn’t used for pixel data because the width is odd (and we’re using 4-bits per pixel).

So I choose ‘f’ as the final nybble value, which gives me the following filtered bytes (I show you the bytes decompressed from the IDAT chunk of the 68 byte file):

>>> print zlib.decompress('789c636444050000980011'.decode('hex')).encode('hex')

It’s clear to see why that filtered line will compress better with zlib than the one OptiPNG used. And indeed, zlib compresses that to 11 bytes, as opposed to 12 bytes for OptiPNG’s filtered line.

So should OptiPNG and similar tools try different values for the redundant bits in a scanline? It is not a technique mentioned in “A guide to PNG optimization”. Rightly so, people have better things to do with their lives.


I got my OptiPNG precompiled as a Mac OS X binary from here. It’s ancient. The binary itself a PowerPC binary, and it’s version 0.4.9 of OptiPNG. That was simpler than compiling it myself, and I don’t care very much. If you use OptiPNG or something similar, perhaps you can see what it does with the 81 byte version of the PNG file.

I have a sneaking suspicion that GDR will come along and reduce my PNG by 1 byte.

A use for uncompressed PNGs


A new feature of Curly Logo is the SETTBG command to Set Text BackGround colour. settbg "#850c sets the background to be a nearly fully opaque brown. Note that you can set the alpha (opacity) channel. As I mention in an earlier article the background for this (XHTML) div element is a 1×1 PNG. In order to implement SETTBG I have to create a 1×1 PNG containing an arbitrary 4-channel colour value, in JavaScript.

Gareth Rees wrote a very useful post on making very small PNGs that does most of the work of showing which bits should go where in a PNG file. Unfortunately the IDAT chunk in a PNG file is required to be compressed with zlib/deflate. That means I have to take my 32-bits of data (RGBA), prepend a 0 octet (for PNG’s scanline filter) then compress those 5 octets into zlib format.

I haven’t written a fully general zlib encoder in JavaScript, so what I do is use the feature of the deflate format that allows uncompressed blocks.

Let’s say my pixel has colour value #c5005c6d (that’s in RGBA order). My PNG scanline requires a 00 to be prepended, so I need to zlib encode the 5 octets 00 c5 00 5c 6d. Here’s what the encoded zlib data looks like:

78 9c 01 05 00 fa ff 00 c5 00 5c 6d 04 3e 01 8f

Which breaks down like this:

78 9c
zlib header.
the little endian bit string 1 00 00000. 1 indicating the final block, 00 indicating a non-compressed block, and 00000 are 5 bits of padding to align the start of a block to on octet (which is required for non-compressed blocks, and very convenient for me).
05 00 fa ff
The number of octets of data in the uncompressed block (5). Stored as a little-endian 16-bit integer followed by its 1’s complement (!).
00 c5 00 5c 6d
The scanline data stored as plain octets.
04 3e 01 8f
The zlib Adler checksum. Observe that 1 + 0x00+0xc5+0x00+0x5c+0x6d = 0x18f and 5 + 5*0x00 + 4*0xc5 + 3*0x00 + 2*0x5c + 1*0x6d = 0x43e.

In the same style as Gareth’s Python here’s some JavaScript:

  // Create a PNG chunk.  Supply type and data, length and CRC are
  // computed.
  chunk: function(type, data) {
    return be32(data.length) + type + data +
        be32(this.crc(type + data))

  // Create binary data for a 1x1 4-channel PNG.  The colour is specified
  // by the number n: 0xRRGGBBAA
  unit: function(n) {
    var d = '\x00' + be32(n) // PNG scanline data.
    return '\x89PNG\r\n\x1A\n' +
      this.chunk('IHDR', be32(1) + be32(1) + '\x08\x06\x00\x00\x00') +
          '\x78\x9c\x01\x05\x00\xfa\xff' + d + be32(this.adler32(d))) +
      this.chunk('IEND', '')

These fragments are embedded inside an object, hence the object literal syntax to declare the functions and the use of this to refer to other functions defined alongside. be32 is a function to encode an integer in 4 octet (32-bit) big-endian format.

JavaScript bit twiddling

Getting all this data correctly inside a PNG requires two checksum algorithms: Zlib format uses a simple Adler checksum (at the end, see above); PNG uses a conventional 32-bit polynomial residue checksum for each of its chunks. I had to implement both of those in JavaScript.

JavaScript is actually a pretty nice language for bit-twiddling. Sure the performance is bound to suck but that’s a feature of the implementations not the language itself. Not that I actually measured the performance, but I Just Know. In any case performance is not an issue for encoding this trivial amount of data. JavaScript has all of C’s bit-twiddling operators, and it has a bonus feature that C doesn’t have: 32-bit operation guaranteed. Generally the way that an operator like «x ^ y» works in JavaScript is that both x and y are converted to 32-bit signed integers then the operation is performed as if on 32-bit signed integers and the result is converted to a JavaScript number (which is an IEEE double). In C you might have to worry about those 36-bit or 64-bit architectures that have “funny” widths for their types. Not in JavaScript.

Here’s the CRC code in C from the PNG Spec Annex D:

unsigned long update_crc(unsigned long crc, unsigned char *buf,
                             int len)
  unsigned long c = crc;
  int n;

  for (n = 0; n < len; n++) {
    c = crc_table&#91;(c ^ buf&#91;n&#93;) & 0xff&#93; ^ (c >> 8);
  return c;

Here’s my JavaScript version:

  crc32: function(buf, crc) {
    if(crc === undefined) {
      crc = 0xffffffff
    var i
    for(i=0; i<buf.length; ++i) {
      crc = this.crctable&#91;(crc^buf.charCodeAt(i)) & 0xff&#93; ^ (crc >>> 8)
    return crc

The core code inside the “for” loop has hardly changed. Instead of «buf[i]» I have to write «buf.charCodeAt(i)»; JavaScript doesn’t have a character type so conversions from characters to their internal representation are more explicit. Note that in JavaScript the shift is «crc >>> 8». JavaScript doesn’t distinguish signed and unsigned integers (or indeed integers and floats) so it can’t tell which sort of shift is required by inspecting crc. Like Java JavaScript has two right shift operators, «>>» for preserving the sign bit, and «>>>» for clearing the sign bit. Using the wrong one was a bug in an earlier implementation of mine.

JavaScript: Bitmap Editing


Following on from my discovery of RFC 2397 I realised that we would be able to manipulate the “src” property of the “img” element in JavaScript and thereby change the image being displayed. The Base64 encoding is conveniently handled by JavaScript’s (non-standard but ubiquitous in the browser) btoa function (2008-05-10: ubiquitous apart from Opera, that is). I chose OS/2’s BMP image file format as being suitably simple to manipulate.

The result is BME (BitMap Editor). Note that all the work is done by client-side JavaScript, there’s no server side programming or communication at all (that would be madness!).

Bi-level any-colour-you-like-as-long-as-it’s-grey-and-blue at the moment. I suggest you right-click to save.

This use of RFC 2397 shows up some bugs and misfeatures of the OS X user interface. For example, dragging the image onto opens a new mail message containing the contents of the URL as text rather than an image.