The use of blah (_), aka underscore

2008-04-18

I’m part of a tradition of C programmers that pronounce «_» as “blah” (other people might call it “underscore”). I got the habit from Mr Moore and Dr Owen. Given how much the blah character appears, it’s useful to have a one-syllable pronunciation for it. That we chose “blah” is a bit unfortunate as it conflicts with “blah” in ordinary speech meaning “insert extra random stuff here”. Unfortunate, but also I suspect partly deliberate.

In C and friends this character has different connotations. Some are blessed by the standard, some have grown up in the communities that use them.

Blah is used to denote reserved identifiers in C. All identifiers starting with __, two blahs, are reserved for use by the implementation. Identifiers of this form are often used to introduced non-standard extensions without fear of breaking any existing strictly conforming programs. For example, GCC allows assembler to written in a C function by using the __asm__ keyword. Because this keyword starts with __, GCC can guarantee that no correct portable C program can contain this keyword so GCC won't break any existing code. If GCC had chosen asm to introduce inline assembler then this would have broken any existing C code that legitimately used asm as an ordinary identifier (for example, the name of a function).

Identifers beginning with _ and followed by a capital letter are also reserved for the use of the implementation. Clearly no portable C code should use _ followed by a capital letter.

C itself has used this to extend the language in a safe manner that doesn't upset any pre-existing portable code. In C99 a proper Boolean type was added. Obviously this type couldn't be called bool as that would upset any codes that perfectly reasonably used that name for their own purposes. So in C99 the Boolean type is called _Bool. No portable code can have been using this name already (since it was reserved by the C standard), so it was safe for the new version of the C standard to use.

Another place you often see _ is at the beginning of struct tags. Recall that a struct tag is the foo in «struct foo {int a; void *b;}». Some people code like this:

struct _node
{
    struct _node *parent;
    struct _node *prev;
   ...

As far as I can tell, this is voodoo (maybe it has something to with C++, how would I know?). In particular there's nothing wrong with this code:

typedef struct node *node;

(or «typedef struct node node;» if you prefer)

In C it's totally fine to use node for a struct tag as well as for a typedef. That's because they're separate namespaces (see [ISOC1999] section 6.2.3).

Lots of people use an initial capital letter for their types. If the blah convention for struct tags is used as well then this combines disastrously with the ISO C reserved namespace. This Amaya code is full of it:

typedef struct _Match
  {
     struct _SymbDesc   *MatchSymb;	/* pattern symbol */

Sorry guys, that's straight-out use of a reserved identifier, and I claim my angry warthog. In all fairness to Amaya, they're not the only ones doing it.

More boring uses of blah include:

Using it to separate components of a name. The C standard uses this for some of the macros it defines:

CHAR_BIT, DBL_DIG, EXIT_SUCCESS

This is of course a very widely adopted convention. A specialised use of this is where the components have different meanings, the blah is used as a sort of punctuation character. In «size_t», the «size» is really the name of the type, and the «t» suffix denotes that the name is a type (not a function or a variable, for example); there's no enforcement of this, it's just a convention. Similarly in «tm_year», «year» is the name of the field, and «tm» is a hint that the field is a member of the «struct tm» type (this is a hangover from pre-ANSI days when all structure member names shared the same namespace).

The standard is hilariously inconsistent in this regard. «size_t» is the simple case, but we have «ptrdiff_t», «sig_atomic_t» (note, blah between «sig» and «atomic», but not between «ptr» and «diff»), and «va_list» (it's a type, but there's no "blah t" suffix).

The C standard doesn't use blah in ordinary identifiers (functions, basically). But loads of other people do. Once you get a sufficiently large C program where you have to think about modularity, it's a good way to separate names into different (conventional) namespaces. xpidl_parse_iid, ssh_hmac_init, that sort of thing.

A sort of extension of this is using underscore to separate identifier metadata from its name. In «m_var», the m might indicate a member variable.

There are rarer, but possibly more interesting uses of underscore.

One is when you really really want to use a keyword as a name. Actually, for me, this crops up in Python more than C. I find I constantly want to call an input file «in», and quite often want to call a variable that holds a class, «class».

Another is when you have a local variable in a function that's really just the same as one of the function parameters, but with a different type. In an object oriented language that might happen because of upcasting. In C it commonly happens because a generic pointer is passed as a void * but the called code uses it with some more specific type.

qsort is a good example. Say you want to sort an array of ints in C. You need a function that compares two ints. qsort expects a function that takes two void * arguments. So this sort of thing is common:

int compare_int(void *l_, void *r_)
{
  int *l = l_;
  int *r = r_;
  ...
}

(Personally I would tend to call these «lArg» and «rArg», but I'm not going to blink at this use of blah).

You can see OpenSSL doing something similar with the «data_» argument to md4_block_data_order. The function takes the argument «const void *data_», but the body of the function actually wants to use data as a char *: «const unsigned char *data=data_;».

Another use of blah, is to mark deliberately unused parameters. This can commonly happen in callback schemes. Like the previous qsort example, where the comparison function is constrained to have a particular type, often a callback is constrained to have a particular type, taking particular parameters. Sometimes not all of these parameters are needed. Example: zlib's zalloc interface. zalloc's type, when you peer behind the typedefs, is:

void *(*)(void *, unsigned, unsigned);

It's a function pointer. If you want to use this interface, which you would do when you want zlib to use a custom allocator rather than malloc, then you need to implement a memory allocation function that takes 3 arguments:

void *superalloc(void *opaque, unsigned n, unsigned size) { ... }

opaque is an opaque pointer that zlib doesn't care about. It simply passes it from the struct z_stream_s opaque member to your function. What if you don't need it? Then stick a blah after the parameter name to indicate that you intend to not use it:

void *superalloc(void *opaque_, unsigned n, unsigned size) { ... }

Anyone know any other uses of blah?

About these ads

14 Responses to “The use of blah (_), aka underscore”

  1. Matt Brubeck Says:

    gettext and similar systems define a _() function or macro (that is, the name of the function is “_” alone) for internationalizing strings. For example, _(“An error occurred.”) would return a localized version of that string, if available.

    In many functional languages, blah alone is used as a dummy argument in function signatures. This can be used for an argument whose value will be ignored; it silences compiler warnings about unused variables, and in languages with pattern matching it will match anything. For example, this Erlang function searches for an item within a list:

    member(X, [X|_]) -> true;
    member(X, [_|Tail]) -> member(X, Tail);
    member(_, []) -> false.

  2. Peter Says:

    I think you might see _ used as a don’t-care variable in unpacking syntax in Python. It’s used for this in the pattern matching syntax of various functional languages.

    This isn’t necessarily recommended style, but it does work in my Python interp.

    >>> _, x, _, y = [1,2,3,4]
    >>> x,_,y
    (2, 3, 4)

    It doesn’t really matter in what order the different instances of _ are bound, since it is not supposed to be used. It ends up as 3 here.

    This syntax isn’t available in C, so you won’t be seeing this in any C code of course. You did mention the related use of marking unused parameters.

    Thanks for the article.

  3. Minh Says:

    You mentioned using a leading _ in struct tags but that’s pretty much
    illegal, even if the tag is lowercase, because of (n1256) § 7.1.3,
    such identifiers fall under the second group:

    > All identifiers that begin with an underscore are always reserved
    for use as identifiers with file scope in both the ordinary and /tag
    name spaces/.

    (relevant part emphasized)

    As for other uses, I can think of two legitimate uses of _ as
    a leading character in identifiers:

    - as a struct or union member name ; and
    - as a parameter name in function prototypes.

    Identifiers starting with a _ and something other than a capital
    letter or another _ (so as not to fall into group 1 which would make
    them reserved for any use) can be used in prototype and member scopes,
    because they are in the second group mentioned above, so the code is
    legal.

    Now, both usages have the same purpose: to avoid name clashes with
    macro names in header files. This is because the standard says (from
    the same section):

    > If the program declares or defines an identifier in a context in
    which it is reserved (other than as allowed by 7.1.4), or defines
    a reserved identifier as a macro name, the behavior is undefined.

    Thus, a conforming program cannot define any identifier in group 2 as
    a macro (or so I believe), besides, those are only reserved for use as
    file scope and tag identifiers, not as macros, so the implementation
    cannot define those as macros neither.

    Anyway, I’ve never heard of that “blah” stuff; guess I’m just too
    young.

  4. drj11 Says:

    @Minh: Quite right, good work. Except… I couldn’t help notice you said “member scope”. C has four scopes: “function, file, block, and function prototype” (ISO C 9899:1999 section 6.2.1). “member scope” simply doesn’t exist in C. A member declaration in a struct declaration introduces an identifier with file scope. So you cannot have a member name beginning with a blah.

  5. drj11 Says:

    @Peter: Yes, and in fact I had originally meant to tie in the unused parameters thing to ML pattern matching. Didn’t know it was used in Python like that, but it’s very similar and I’m not surprised. Moral: Do not press “Publish” just because the train is pulling into the station.

  6. Gareth Rees Says:

    In the Python interactive environment, the global variable _ holds the value of the last expression evaluated at top level.

    >>> math.log(100)
    4.6051701859880918
    >>> math.exp(_)
    100.00000000000004

  7. Gareth Rees Says:

    Identifiers in a Python class definition starting with two underscores (but not ending with two underscores) are subject to “private name mangling“.

  8. Gareth Rees Says:

    In Python, names beginning with an underscore are not imported by a from module import * statement.

  9. David-Sarah Hopwood Says:

    In Javascript, there is no enforced access control on object properties, but a leading blah is often used to indicate properties that are intended to be private.
    (Trailing blah is also used by the Caja subset of Javascript for the same purpose.)

    In the Waterken server, trailing blah (in Java and Javascript) indicates a variable holding a promise, as in this example.

  10. Fred Smith Says:

    Blah is used in the Boost MetaProgrammingLibrary as a placeholder:

    A placeholder in a form _n is simply a synonym for the corresponding arg specialization. The unnamed placeholder _ (underscore) carries special meaning in bind and lambda expressions, and does not have defined semantics outside of these contexts.

    Placeholder names can be made available in the user namespace through using namespace mpl::placeholders; directive.

    etc…

    http://www.boost.org/doc/libs/1_35_0/libs/mpl/doc/refmanual/placeholders.html

  11. Merwok Says:

    In Python, an attribute (i.e. method or member in a class definition) beginning with one underscore is internal. That is, it is a normal attribute, but outside code must not use it. Contrary to the names beginning with two underscores, there will be no name-mangling, which is ugly and useless, it is merely a social convention. Social conventions are a Good Thing.

    By the way, the Python Style Guide (PEP 8) recommends using cls to name a class object, and (as I understand it) class_ if it is a class in some other sense than Python’s (i.e. you name a Python class object cls, and an HTML class class_).

  12. drj11 Says:

    @Peter, Gareth, Merwok: I’m now no longer sure whether I meant C or other languages when I asked for other uses. Clearly Python has an equally rich tradition of using blah; I wonder why I didn’t write about it? Perhaps a sister article.

  13. doh Says:

    Nobody is going to mention its use as free bounding in prolog unification?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: