Another one.
Two of us were working at the client’s site on the software for what was essentially a compact static robot that was responsible for moving small items from a hopper to a delivery point. There was approximately one instance of the hardware that we were writing software for, and it sat on a bench between our two workstations. Most of the mechanical bits were in a more or less finished state. I was particularly impressed with the piston that generated partial vacuum so that items could be picked with a moving arm with suction cups on. Just one or two gears were made of prototyping plastic; and because of a gearing problem the belt didn’t move at the speed that it said it should in the spec. But you know, typical prototype hardware. The electronics were a mixture of off-the-shelf dev kits for 8-bit embedded micros, mini custom circuit boards for novel sensors, and lovingly hand soldered discrete parts. Add to that the fact that as a software guy I didn’t really understand the importance of grounding, and La Machine wasn’t always completely reliable (my colleague had just lent me Tracy Kidder’s Soul of a New Machine).
Various optical/IR sensors kept track of the items as they moved inside the internals of the machine, various other sensors kept track of motor positions and/or speeds. There was a slightly hairy state machine (documented using OmniGraffle) to keep track of it all. The target pick rate was 5 items per second, and as it took more than 200ms for an item to go from the hopper to the point where it left the machine, there could be several items “in-flight” at any one time (and of course, picking an item was never completely reliable, so the sensors were used to track the items, and determine if a retry was required).
This day it seemed to be working fine, except for some reason the software was reporting that items were failing to be delivered, when in fact I could plainly see that items were popping out of the top of the machine. This was causing the machine to prematurely stop, as it would, sensibly, stop picking items if it thought that a picked item was stuck inside the machine somewhere. Up to this point it had basically been working fine; it had been working the same morning. I was sat thinking about this and investigating somewhat. I’d even checked the last thing I’d changed. So I called my colleague over (just at the next desk) and he came over to look while I demonstrated the problem. It worked. There was no problem. Flakies (there’s a memorable part of Kidder’s book where one of the engineers in helping a more junior engineer sort out a problem with a wire-wrap memory board, grabs the whole frame and shakes it, claiming that it’s probably just flakies; of course it works after that).
So that’s okay then. But when I try it again, it’s not working. Colleague comes over. Works. I work at it on my own. Doesn’t work.
It turned out that it was quite a sunny day. The sun reached the window in the afternoon. Sunlight was falling on machine near the optical sensor and presumably bouncing around enough to prevent the sensor from registering the occlusion as the item went past. When my colleague came over, he was standing over the machine and his shadow blocked the sunlight. This wouldn’t be a problem in the production hardware, as it was all in a box (and hence, dark inside).
I constructed an optical shield and installed it on La Machine (a post-it note stuck to the side).
This was not a problem that could be fixed using print. We did have print via the serial port, but printing more than one character was hazardous because it meant that time taken to transmit characters on the serial line would interfere with the realtime operation of the rest of the software (when items are being picked every 200ms, every millisecond counts).