Archive for April, 2008

The use of blah (_), aka underscore

2008-04-18

I’m part of a tradition of C programmers that pronounce «_» as “blah” (other people might call it “underscore”). I got the habit from Mr Moore and Dr Owen. Given how much the blah character appears, it’s useful to have a one-syllable pronunciation for it. That we chose “blah” is a bit unfortunate as it conflicts with “blah” in ordinary speech meaning “insert extra random stuff here”. Unfortunate, but also I suspect partly deliberate.

In C and friends this character has different connotations. Some are blessed by the standard, some have grown up in the communities that use them.

Blah is used to denote reserved identifiers in C. All identifiers starting with __, two blahs, are reserved for use by the implementation. Identifiers of this form are often used to introduced non-standard extensions without fear of breaking any existing strictly conforming programs. For example, GCC allows assembler to written in a C function by using the __asm__ keyword. Because this keyword starts with __, GCC can guarantee that no correct portable C program can contain this keyword so GCC won’t break any existing code. If GCC had chosen asm to introduce inline assembler then this would have broken any existing C code that legitimately used asm as an ordinary identifier (for example, the name of a function).

Identifers beginning with _ and followed by a capital letter are also reserved for the use of the implementation. Clearly no portable C code should use _ followed by a capital letter.

C itself has used this to extend the language in a safe manner that doesn’t upset any pre-existing portable code. In C99 a proper Boolean type was added. Obviously this type couldn’t be called bool as that would upset any codes that perfectly reasonably used that name for their own purposes. So in C99 the Boolean type is called _Bool. No portable code can have been using this name already (since it was reserved by the C standard), so it was safe for the new version of the C standard to use.

Another place you often see _ is at the beginning of struct tags. Recall that a struct tag is the foo in «struct foo {int a; void *b;}». Some people code like this:

struct _node
{
    struct _node *parent;
    struct _node *prev;
   ...

As far as I can tell, this is voodoo (maybe it has something to with C++, how would I know?). In particular there’s nothing wrong with this code:

typedef struct node *node;

(or «typedef struct node node;» if you prefer)

In C it’s totally fine to use node for a struct tag as well as for a typedef. That’s because they’re separate namespaces (see [ISOC1999] section 6.2.3).

Lots of people use an initial capital letter for their types. If the blah convention for struct tags is used as well then this combines disastrously with the ISO C reserved namespace. This Amaya code is full of it:

typedef struct _Match
  {
     struct _SymbDesc   *MatchSymb;	/* pattern symbol */

Sorry guys, that’s straight-out use of a reserved identifier, and I claim my angry warthog. In all fairness to Amaya, they’re not the only ones doing it.

More boring uses of blah include:

Using it to separate components of a name. The C standard uses this for some of the macros it defines:

CHAR_BIT, DBL_DIG, EXIT_SUCCESS

This is of course a very widely adopted convention. A specialised use of this is where the components have different meanings, the blah is used as a sort of punctuation character. In «size_t», the «size» is really the name of the type, and the «t» suffix denotes that the name is a type (not a function or a variable, for example); there’s no enforcement of this, it’s just a convention. Similarly in «tm_year», «year» is the name of the field, and «tm» is a hint that the field is a member of the «struct tm» type (this is a hangover from pre-ANSI days when all structure member names shared the same namespace).

The standard is hilariously inconsistent in this regard. «size_t» is the simple case, but we have «ptrdiff_t», «sig_atomic_t» (note, blah between «sig» and «atomic», but not between «ptr» and «diff»), and «va_list» (it’s a type, but there’s no “blah t” suffix).

The C standard doesn’t use blah in ordinary identifiers (functions, basically). But loads of other people do. Once you get a sufficiently large C program where you have to think about modularity, it’s a good way to separate names into different (conventional) namespaces. xpidl_parse_iid, ssh_hmac_init, that sort of thing.

A sort of extension of this is using underscore to separate identifier metadata from its name. In «m_var», the m might indicate a member variable.

There are rarer, but possibly more interesting uses of underscore.

One is when you really really want to use a keyword as a name. Actually, for me, this crops up in Python more than C. I find I constantly want to call an input file «in», and quite often want to call a variable that holds a class, «class».

Another is when you have a local variable in a function that’s really just the same as one of the function parameters, but with a different type. In an object oriented language that might happen because of upcasting. In C it commonly happens because a generic pointer is passed as a void * but the called code uses it with some more specific type.

qsort is a good example. Say you want to sort an array of ints in C. You need a function that compares two ints. qsort expects a function that takes two void * arguments. So this sort of thing is common:

int compare_int(void *l_, void *r_)
{
  int *l = l_;
  int *r = r_;
  ...
}

(Personally I would tend to call these «lArg» and «rArg», but I’m not going to blink at this use of blah).

You can see OpenSSL doing something similar with the «data_» argument to md4_block_data_order. The function takes the argument «const void *data_», but the body of the function actually wants to use data as a char *: «const unsigned char *data=data_;».

Another use of blah, is to mark deliberately unused parameters. This can commonly happen in callback schemes. Like the previous qsort example, where the comparison function is constrained to have a particular type, often a callback is constrained to have a particular type, taking particular parameters. Sometimes not all of these parameters are needed. Example: zlib’s zalloc interface. zalloc‘s type, when you peer behind the typedefs, is:

void *(*)(void *, unsigned, unsigned);

It’s a function pointer. If you want to use this interface, which you would do when you want zlib to use a custom allocator rather than malloc, then you need to implement a memory allocation function that takes 3 arguments:

void *superalloc(void *opaque, unsigned n, unsigned size) { … }

opaque is an opaque pointer that zlib doesn’t care about. It simply passes it from the struct z_stream_s opaque member to your function. What if you don’t need it? Then stick a blah after the parameter name to indicate that you intend to not use it:

void *superalloc(void *opaque_, unsigned n, unsigned size) { … }

Anyone know any other uses of blah?

Introduction to Functional Programming (UKUUG type)

2008-04-03

On Wednesday at the UKUUG Spring Conference I gave a talk: «Introduction to Functional Programming in Python». This sounds suspiciously similar to my PyCon UK talk I gave last year. I had intended to only tweak the talk a bit, but in the end quite a lot of the material changed, and there’s not actually all that much overlap.

Slides (769e3 octet PDF) and notes (52e3 octet PDF).

Thanks to those that attended.

Embedding Lua in 5 Minutes

2008-04-03

So at the UKUUG Spring Conference I kind of decided that there weren’t enough different dynamic languages being talked about; in fact it was pretty much divided into Python land and Perl land (at least as far as dynamic languages were concerned). So I decided to give a 5 minute lightning talk at the end of the conference, on embedding Lua into an application in 5 minutes.

This was my first lightning talk and it was a bit scary and a lot of fun. I highly recommend the experience.

I decided that instead of talking about embedding Lua, I would actually do it, standing in front of the conference live. Including downloading the Lua sources and compiling them (yay for working conference wifi). I thought this was hilarious, I have no idea what anyone else thought. I surprised myself by being able to type code in vi and talk at the same time, though I’m not sure I made much sense.

Here’s an example using the new command line option I added to yes:

$ ./a.out -l 'x=x or 1; x=x*2; return x' | head
2
4
8
16
32
64
128
256
512
1024

For the sake of completeness here is the modified version of yes.c that I ended up with:

#include <sys/cdefs.h>

#include <stdio.h>
#include "lua.h"
#include "lauxlib.h"

int main __P((int, char **));

int
main(argc, argv)
        int argc;
        char **argv;
{
  if(argc >= 3 && strcmp(argv[2], "-l")) {
    lua_State *l = luaL_newstate();
    luaL_openlibs(l);

    while(1) {
      luaL_dostring(l, argv[2]);
      puts(lua_tostring(l, -1));
      lua_settop(l, 0);
    }
  }

        if (argc > 1)
                for(;;)
                        (void)puts(argv[1]);
        else for (;;)
                (void)puts("y");
}

/*
 * Copyright (c) 1987, 1993
 *      The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *      This product includes software developed by the University of
 *      California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */