I’m part of a tradition of C programmers that pronounce «_» as “blah” (other people might call it “underscore”). I got the habit from Mr Moore and Dr Owen. Given how much the blah character appears, it’s useful to have a one-syllable pronunciation for it. That we chose “blah” is a bit unfortunate as it conflicts with “blah” in ordinary speech meaning “insert extra random stuff here”. Unfortunate, but also I suspect partly deliberate.
In C and friends this character has different connotations. Some are blessed by the standard, some have grown up in the communities that use them.
Blah is used to denote reserved identifiers in C. All identifiers starting with __, two blahs, are reserved for use by the implementation. Identifiers of this form are often used to introduced non-standard extensions without fear of breaking any existing strictly conforming programs. For example, GCC allows assembler to written in a C function by using the __asm__
keyword. Because this keyword starts with __, GCC can guarantee that no correct portable C program can contain this keyword so GCC won’t break any existing code. If GCC had chosen asm
to introduce inline assembler then this would have broken any existing C code that legitimately used asm
as an ordinary identifier (for example, the name of a function).
Identifers beginning with _ and followed by a capital letter are also reserved for the use of the implementation. Clearly no portable C code should use _ followed by a capital letter.
C itself has used this to extend the language in a safe manner that doesn’t upset any pre-existing portable code. In C99 a proper Boolean type was added. Obviously this type couldn’t be called bool
as that would upset any codes that perfectly reasonably used that name for their own purposes. So in C99 the Boolean type is called _Bool
. No portable code can have been using this name already (since it was reserved by the C standard), so it was safe for the new version of the C standard to use.
Another place you often see _ is at the beginning of struct tags. Recall that a struct tag is the foo in «struct foo {int a; void *b;}». Some people code like this:
struct _node { struct _node *parent; struct _node *prev; ...
As far as I can tell, this is voodoo (maybe it has something to with C++, how would I know?). In particular there’s nothing wrong with this code:
typedef struct node *node;
(or «typedef struct node node;
» if you prefer)
In C it’s totally fine to use node for a struct tag as well as for a typedef. That’s because they’re separate namespaces (see [ISOC1999] section 6.2.3).
Lots of people use an initial capital letter for their types. If the blah convention for struct tags is used as well then this combines disastrously with the ISO C reserved namespace. This Amaya code is full of it:
typedef struct _Match { struct _SymbDesc *MatchSymb; /* pattern symbol */
Sorry guys, that’s straight-out use of a reserved identifier, and I claim my angry warthog. In all fairness to Amaya, they’re not the only ones doing it.
More boring uses of blah include:
Using it to separate components of a name. The C standard uses this for some of the macros it defines:
CHAR_BIT, DBL_DIG, EXIT_SUCCESS
This is of course a very widely adopted convention. A specialised use of this is where the components have different meanings, the blah is used as a sort of punctuation character. In «size_t», the «size» is really the name of the type, and the «t» suffix denotes that the name is a type (not a function or a variable, for example); there’s no enforcement of this, it’s just a convention. Similarly in «tm_year», «year» is the name of the field, and «tm» is a hint that the field is a member of the «struct tm» type (this is a hangover from pre-ANSI days when all structure member names shared the same namespace).
The standard is hilariously inconsistent in this regard. «size_t» is the simple case, but we have «ptrdiff_t», «sig_atomic_t» (note, blah between «sig» and «atomic», but not between «ptr» and «diff»), and «va_list» (it’s a type, but there’s no “blah t” suffix).
The C standard doesn’t use blah in ordinary identifiers (functions, basically). But loads of other people do. Once you get a sufficiently large C program where you have to think about modularity, it’s a good way to separate names into different (conventional) namespaces. xpidl_parse_iid
, ssh_hmac_init
, that sort of thing.
A sort of extension of this is using underscore to separate identifier metadata from its name. In «m_var», the m might indicate a member variable.
There are rarer, but possibly more interesting uses of underscore.
One is when you really really want to use a keyword as a name. Actually, for me, this crops up in Python more than C. I find I constantly want to call an input file «in», and quite often want to call a variable that holds a class, «class».
Another is when you have a local variable in a function that’s really just the same as one of the function parameters, but with a different type. In an object oriented language that might happen because of upcasting. In C it commonly happens because a generic pointer is passed as a void *
but the called code uses it with some more specific type.
qsort
is a good example. Say you want to sort an array of int
s in C. You need a function that compares two int
s. qsort
expects a function that takes two void *
arguments. So this sort of thing is common:
int compare_int(void *l_, void *r_) { int *l = l_; int *r = r_; ... }
(Personally I would tend to call these «lArg» and «rArg», but I’m not going to blink at this use of blah).
You can see OpenSSL doing something similar with the «data_» argument to md4_block_data_order
. The function takes the argument «const void *data_», but the body of the function actually wants to use data
as a char *
: «const unsigned char *data=data_;».
Another use of blah, is to mark deliberately unused parameters. This can commonly happen in callback schemes. Like the previous qsort
example, where the comparison function is constrained to have a particular type, often a callback is constrained to have a particular type, taking particular parameters. Sometimes not all of these parameters are needed. Example: zlib’s zalloc interface. zalloc
‘s type, when you peer behind the typedefs, is:
void *(*)(void *, unsigned, unsigned);
It’s a function pointer. If you want to use this interface, which you would do when you want zlib to use a custom allocator rather than malloc
, then you need to implement a memory allocation function that takes 3 arguments:
void *superalloc(void *opaque, unsigned n, unsigned size) { … }
opaque
is an opaque pointer that zlib doesn’t care about. It simply passes it from the struct z_stream_s
opaque
member to your function. What if you don’t need it? Then stick a blah after the parameter name to indicate that you intend to not use it:
void *superalloc(void *opaque_, unsigned n, unsigned size) { … }
Anyone know any other uses of blah?