The use of blah (_), aka underscore

2008-04-18

I’m part of a tradition of C programmers that pronounce «_» as “blah” (other people might call it “underscore”). I got the habit from Mr Moore and Dr Owen. Given how much the blah character appears, it’s useful to have a one-syllable pronunciation for it. That we chose “blah” is a bit unfortunate as it conflicts with “blah” in ordinary speech meaning “insert extra random stuff here”. Unfortunate, but also I suspect partly deliberate.

In C and friends this character has different connotations. Some are blessed by the standard, some have grown up in the communities that use them.

Blah is used to denote reserved identifiers in C. All identifiers starting with __, two blahs, are reserved for use by the implementation. Identifiers of this form are often used to introduced non-standard extensions without fear of breaking any existing strictly conforming programs. For example, GCC allows assembler to written in a C function by using the __asm__ keyword. Because this keyword starts with __, GCC can guarantee that no correct portable C program can contain this keyword so GCC won’t break any existing code. If GCC had chosen asm to introduce inline assembler then this would have broken any existing C code that legitimately used asm as an ordinary identifier (for example, the name of a function).

Identifers beginning with _ and followed by a capital letter are also reserved for the use of the implementation. Clearly no portable C code should use _ followed by a capital letter.

C itself has used this to extend the language in a safe manner that doesn’t upset any pre-existing portable code. In C99 a proper Boolean type was added. Obviously this type couldn’t be called bool as that would upset any codes that perfectly reasonably used that name for their own purposes. So in C99 the Boolean type is called _Bool. No portable code can have been using this name already (since it was reserved by the C standard), so it was safe for the new version of the C standard to use.

Another place you often see _ is at the beginning of struct tags. Recall that a struct tag is the foo in «struct foo {int a; void *b;}». Some people code like this:

struct _node
{
    struct _node *parent;
    struct _node *prev;
   ...

As far as I can tell, this is voodoo (maybe it has something to with C++, how would I know?). In particular there’s nothing wrong with this code:

typedef struct node *node;

(or «typedef struct node node;» if you prefer)

In C it’s totally fine to use node for a struct tag as well as for a typedef. That’s because they’re separate namespaces (see [ISOC1999] section 6.2.3).

Lots of people use an initial capital letter for their types. If the blah convention for struct tags is used as well then this combines disastrously with the ISO C reserved namespace. This Amaya code is full of it:

typedef struct _Match
  {
     struct _SymbDesc   *MatchSymb;	/* pattern symbol */

Sorry guys, that’s straight-out use of a reserved identifier, and I claim my angry warthog. In all fairness to Amaya, they’re not the only ones doing it.

More boring uses of blah include:

Using it to separate components of a name. The C standard uses this for some of the macros it defines:

CHAR_BIT, DBL_DIG, EXIT_SUCCESS

This is of course a very widely adopted convention. A specialised use of this is where the components have different meanings, the blah is used as a sort of punctuation character. In «size_t», the «size» is really the name of the type, and the «t» suffix denotes that the name is a type (not a function or a variable, for example); there’s no enforcement of this, it’s just a convention. Similarly in «tm_year», «year» is the name of the field, and «tm» is a hint that the field is a member of the «struct tm» type (this is a hangover from pre-ANSI days when all structure member names shared the same namespace).

The standard is hilariously inconsistent in this regard. «size_t» is the simple case, but we have «ptrdiff_t», «sig_atomic_t» (note, blah between «sig» and «atomic», but not between «ptr» and «diff»), and «va_list» (it’s a type, but there’s no “blah t” suffix).

The C standard doesn’t use blah in ordinary identifiers (functions, basically). But loads of other people do. Once you get a sufficiently large C program where you have to think about modularity, it’s a good way to separate names into different (conventional) namespaces. xpidl_parse_iid, ssh_hmac_init, that sort of thing.

A sort of extension of this is using underscore to separate identifier metadata from its name. In «m_var», the m might indicate a member variable.

There are rarer, but possibly more interesting uses of underscore.

One is when you really really want to use a keyword as a name. Actually, for me, this crops up in Python more than C. I find I constantly want to call an input file «in», and quite often want to call a variable that holds a class, «class».

Another is when you have a local variable in a function that’s really just the same as one of the function parameters, but with a different type. In an object oriented language that might happen because of upcasting. In C it commonly happens because a generic pointer is passed as a void * but the called code uses it with some more specific type.

qsort is a good example. Say you want to sort an array of ints in C. You need a function that compares two ints. qsort expects a function that takes two void * arguments. So this sort of thing is common:

int compare_int(void *l_, void *r_)
{
  int *l = l_;
  int *r = r_;
  ...
}

(Personally I would tend to call these «lArg» and «rArg», but I’m not going to blink at this use of blah).

You can see OpenSSL doing something similar with the «data_» argument to md4_block_data_order. The function takes the argument «const void *data_», but the body of the function actually wants to use data as a char *: «const unsigned char *data=data_;».

Another use of blah, is to mark deliberately unused parameters. This can commonly happen in callback schemes. Like the previous qsort example, where the comparison function is constrained to have a particular type, often a callback is constrained to have a particular type, taking particular parameters. Sometimes not all of these parameters are needed. Example: zlib’s zalloc interface. zalloc’s type, when you peer behind the typedefs, is:

void *(*)(void *, unsigned, unsigned);

It’s a function pointer. If you want to use this interface, which you would do when you want zlib to use a custom allocator rather than malloc, then you need to implement a memory allocation function that takes 3 arguments:

void *superalloc(void *opaque, unsigned n, unsigned size) { … }

opaque is an opaque pointer that zlib doesn’t care about. It simply passes it from the struct z_stream_s opaque member to your function. What if you don’t need it? Then stick a blah after the parameter name to indicate that you intend to not use it:

void *superalloc(void *opaque_, unsigned n, unsigned size) { … }

Anyone know any other uses of blah?


Introduction to Functional Programming (UKUUG type)

2008-04-03

On Wednesday at the UKUUG Spring Conference I gave a talk: «Introduction to Functional Programming in Python». This sounds suspiciously similar to my PyCon UK talk I gave last year. I had intended to only tweak the talk a bit, but in the end quite a lot of the material changed, and there’s not actually all that much overlap.

Slides (769e3 octet PDF) and notes (52e3 octet PDF).

Thanks to those that attended.


Embedding Lua in 5 Minutes

2008-04-03

So at the UKUUG Spring Conference I kind of decided that there weren’t enough different dynamic languages being talked about; in fact it was pretty much divided into Python land and Perl land (at least as far as dynamic languages were concerned). So I decided to give a 5 minute lightning talk at the end of the conference, on embedding Lua into an application in 5 minutes.

This was my first lightning talk and it was a bit scary and a lot of fun. I highly recommend the experience.

I decided that instead of talking about embedding Lua, I would actually do it, standing in front of the conference live. Including downloading the Lua sources and compiling them (yay for working conference wifi). I thought this was hilarious, I have no idea what anyone else thought. I surprised myself by being able to type code in vi and talk at the same time, though I’m not sure I made much sense.

Here’s an example using the new command line option I added to yes:

$ ./a.out -l 'x=x or 1; x=x*2; return x' | head
2
4
8
16
32
64
128
256
512
1024

For the sake of completeness here is the modified version of yes.c that I ended up with:


#include <sys/cdefs.h>

#include <stdio.h>
#include "lua.h"
#include "lauxlib.h"

int main __P((int, char **));

int
main(argc, argv)
        int argc;
        char **argv;
{
  if(argc >= 3 && strcmp(argv[2], "-l")) {
    lua_State *l = luaL_newstate();
    luaL_openlibs(l);

    while(1) {
      luaL_dostring(l, argv[2]);
      puts(lua_tostring(l, -1));
      lua_settop(l, 0);
    }
  }

        if (argc > 1)
                for(;;)
                        (void)puts(argv[1]);
        else for (;;)
                (void)puts("y");
}

/*
 * Copyright (c) 1987, 1993
 *      The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *      This product includes software developed by the University of
 *      California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS “AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

Background Checks For Our Corporate Citizens

2008-02-29

Before people are employed it is routine to do some sort background check on them. Most employers would check a candidate’s references before going ahead and employing. Some jobs (or perhaps employers) have more involvd background checks than others. Applicants to be local government positions, for example, are required to disclose any previous convictions. Jobs that involve working with children require a CRB check.

I think we should extend this background checking principle to corporations that we contract with. For example, when a school contracts with a caterer it should check that the caterer has not been convicted of pushing alcohol at kids; when government buys software perhaps it could check and see whether the vendor has been convicted of running an illegal monopoly in multiple countries.

I’m looking at you Microsoft.


The GNU GPL is not an EULA!

2008-02-25

MPlayer’s OS X pkg displays the GNU GPL in the license section of the installer. The installer then requires that I click a button laballed “Agree” in order to continue. This is all fine and normal practice for the EULAs that are attached to software. MPlayer is not the only one that does this, quite a lot of open source software packaged for the Mac does it.

But the GPL is not an EULA!

The GPL is not a license to use the software. I can use the software without agreeing to the GPL. It says so, right there, clause 0: “The act of running the Program is not restricted”. The GPL is a license to distribute the software, if I don’t do any distribution I don’t need the license.

This is an important point about the GPL that is not understood by enough people. The GPL is not like (most) other software licenses, because it does not restrict my use. Unlike a tradition EULA which attempts to prevent me from doing something which I might otherwise be able to do, the GPL only licenses me to do something that I otherwise wouldn’t be able to do, namely distribute it. If I don’t want to distribute the software (and I’m certainly not obliged to), then I don’t need to agree to the GPL.

I think the GPL is very cunning in this regard.

So to summarise: The license section of the OS X packager is for EULAs, and I never want to see the GPL in that section again. Okay?


Abuses of Lambda, by Design

2008-02-11

Again and again I see the Greek capital letter lambda, Λ, standing in for the English (latin) letter capital A. You know, in trendy logos for film studios, web consultants and the like.

This is a sin against typography and it must stop!

When I’m reading, it just trips me up to see a capital lambda in the middle of English text. Yeah yeah, it looks cute and it introduces all sorts of amusing design possibilities, but it’s just bad writing.

I feel a tiny bit guilty about this rant because the most recent example I observed was Transitive:

who just happen to be one of the sponsors for the UKUUG Spring Conference at Birmingham where I am giving a talk. On guess what? Lambda.

Ooh I just found another one (I knew there was a good reason to delay publishing this article):

Navarre logo

and they commit the additional sin of using a Greek capital letter xi. Is there no end to this madness!

[A couple of month's later Dyalog send me an e-mail inviting me to their corporate headquarters]

Dyalog logo


The perils of going to Canada

2008-02-08

I went to Canada and Jeremy Beadle died! Oh My Gosh! I only just found out, why did no-one tell me?

Obligatory XKCD cartoon.


Canadian foetal gender

2008-02-06

According to the in-flight magazine, in Canada it is illegal for a doctor to disclose the gender of a foetus (to the woman bearing it, or anyone else) until the foetus is 24 weeks into term. Naturally there are walk-in clinics in California and Washington that are prepared to do the ultrasound gender determination for a reasonable fee.

Will they make it illegal to travel to another country in order to access medical facilities that are unavailable indigenously? And illegal to own and operate an OB ultrasound machine (Tom Cruise did it, but then there was a bill before senate to make it illegal; I got bored of following the trail)?

The ultrasound operator knows the gender. How is it ethical to withhold this information from the patient?

The other thing I learned on the plane is that people used to terminate pregnancies using slippery elm bark. Slippery elm bark cannot be sold in the UK.

I find myself wholly unequipped (quite possibly in a physical as well as a mental sense) to think about these issues.


Stupid Password, Stupid Sign-in

2008-01-18

I was going to write a longer article about stupid password requirements and other sign-in annoyances, but Jared Spool beat me to it.

Instead here’s a contribution from ourHer Majesty’s Government:

govpass.png

I had to shrink the picture to fit. So it case you can’t read it, it says my password must be memorable and:

  • be between 8 and 12 characters
  • contain a combination of letters and numbers
  • contain two or more numbers which are separated by one or more letters
  • not contain spaces or the word ‘password’
  • not contain three adjacent letters or numbers the same (eg ‘aaa’ or ‘999′)

They commit mistake number 10. Too many requirements on the form of the password.

So, let’s pick a password. Naturally my first choice is “bob”. Typically I try and the use same password on all these stupid websites where I have to create an account; that way I have a hope of remembering what it is.

bob is too short, how about bobandbob? Oh no, must have numbers is as well.

bob777bob? Oh hold on, there’s a little logic exercise to solve: 2 or more numbers (check) which are separated by one or more letters. Oh no, 7 and 7 are separated by 7. Hmm. This is tricky. Maybe they should just suggest an example password and I’ll use that. The wording of this requirement is precise but confusing (it’s almost as if they translated the Java code into English). “Must have numbers with letters in between the numbers” would have been a clearer way to say it.

Aha, what about 7boooob7? Oh, damnit! 7bobbob7 it is then. Good job I wrote this blog post so I can refer to it when I want my password again.

At least they didn’t commit mistake number 9 and hide all these complex requirements. Unlike Livejournal, which only reveals that your password must contain a number when reset it via e-mail. If they told me my password had to contain a number when I got it wrong, that would give me a clue as to what it is.

This sin committed by a lot of these websites that require an account is pride. They think they’re important enough for me to care about their website. So that I might actually forgive the annoying user interface and arbitrary requirements. Whereas the reality is that it’s just another tedious annoying hoop to be jumped through just to get on with whatever it was I was trying to do (get a new driving licence because I have moved house, in my case).


Microsoft don’t use VSS?

2008-01-11

Whaddya know. IE8’s source code is managed using Perforce (click on my first link and scroll down past the smiley face to the change log). Yay for agile tools! I wonder why they don’t use VSS? No. I really don’t. Will perforce save them from the tar pit of doom? Bwahahahaha!