Archive for the 'c++' Category

The use of blah (_), aka underscore

2008-04-18

I’m part of a tradition of C programmers that pronounce «_» as “blah” (other people might call it “underscore”). I got the habit from Mr Moore and Dr Owen. Given how much the blah character appears, it’s useful to have a one-syllable pronunciation for it. That we chose “blah” is a bit unfortunate as it conflicts with “blah” in ordinary speech meaning “insert extra random stuff here”. Unfortunate, but also I suspect partly deliberate.

In C and friends this character has different connotations. Some are blessed by the standard, some have grown up in the communities that use them.

Blah is used to denote reserved identifiers in C. All identifiers starting with __, two blahs, are reserved for use by the implementation. Identifiers of this form are often used to introduced non-standard extensions without fear of breaking any existing strictly conforming programs. For example, GCC allows assembler to written in a C function by using the __asm__ keyword. Because this keyword starts with __, GCC can guarantee that no correct portable C program can contain this keyword so GCC won’t break any existing code. If GCC had chosen asm to introduce inline assembler then this would have broken any existing C code that legitimately used asm as an ordinary identifier (for example, the name of a function).

Identifers beginning with _ and followed by a capital letter are also reserved for the use of the implementation. Clearly no portable C code should use _ followed by a capital letter.

C itself has used this to extend the language in a safe manner that doesn’t upset any pre-existing portable code. In C99 a proper Boolean type was added. Obviously this type couldn’t be called bool as that would upset any codes that perfectly reasonably used that name for their own purposes. So in C99 the Boolean type is called _Bool. No portable code can have been using this name already (since it was reserved by the C standard), so it was safe for the new version of the C standard to use.

Another place you often see _ is at the beginning of struct tags. Recall that a struct tag is the foo in «struct foo {int a; void *b;}». Some people code like this:

struct _node
{
    struct _node *parent;
    struct _node *prev;
   ...

As far as I can tell, this is voodoo (maybe it has something to with C++, how would I know?). In particular there’s nothing wrong with this code:

typedef struct node *node;

(or «typedef struct node node;» if you prefer)

In C it’s totally fine to use node for a struct tag as well as for a typedef. That’s because they’re separate namespaces (see [ISOC1999] section 6.2.3).

Lots of people use an initial capital letter for their types. If the blah convention for struct tags is used as well then this combines disastrously with the ISO C reserved namespace. This Amaya code is full of it:

typedef struct _Match
  {
     struct _SymbDesc   *MatchSymb;	/* pattern symbol */

Sorry guys, that’s straight-out use of a reserved identifier, and I claim my angry warthog. In all fairness to Amaya, they’re not the only ones doing it.

More boring uses of blah include:

Using it to separate components of a name. The C standard uses this for some of the macros it defines:

CHAR_BIT, DBL_DIG, EXIT_SUCCESS

This is of course a very widely adopted convention. A specialised use of this is where the components have different meanings, the blah is used as a sort of punctuation character. In «size_t», the «size» is really the name of the type, and the «t» suffix denotes that the name is a type (not a function or a variable, for example); there’s no enforcement of this, it’s just a convention. Similarly in «tm_year», «year» is the name of the field, and «tm» is a hint that the field is a member of the «struct tm» type (this is a hangover from pre-ANSI days when all structure member names shared the same namespace).

The standard is hilariously inconsistent in this regard. «size_t» is the simple case, but we have «ptrdiff_t», «sig_atomic_t» (note, blah between «sig» and «atomic», but not between «ptr» and «diff»), and «va_list» (it’s a type, but there’s no “blah t” suffix).

The C standard doesn’t use blah in ordinary identifiers (functions, basically). But loads of other people do. Once you get a sufficiently large C program where you have to think about modularity, it’s a good way to separate names into different (conventional) namespaces. xpidl_parse_iid, ssh_hmac_init, that sort of thing.

A sort of extension of this is using underscore to separate identifier metadata from its name. In «m_var», the m might indicate a member variable.

There are rarer, but possibly more interesting uses of underscore.

One is when you really really want to use a keyword as a name. Actually, for me, this crops up in Python more than C. I find I constantly want to call an input file «in», and quite often want to call a variable that holds a class, «class».

Another is when you have a local variable in a function that’s really just the same as one of the function parameters, but with a different type. In an object oriented language that might happen because of upcasting. In C it commonly happens because a generic pointer is passed as a void * but the called code uses it with some more specific type.

qsort is a good example. Say you want to sort an array of ints in C. You need a function that compares two ints. qsort expects a function that takes two void * arguments. So this sort of thing is common:

int compare_int(void *l_, void *r_)
{
  int *l = l_;
  int *r = r_;
  ...
}

(Personally I would tend to call these «lArg» and «rArg», but I’m not going to blink at this use of blah).

You can see OpenSSL doing something similar with the «data_» argument to md4_block_data_order. The function takes the argument «const void *data_», but the body of the function actually wants to use data as a char *: «const unsigned char *data=data_;».

Another use of blah, is to mark deliberately unused parameters. This can commonly happen in callback schemes. Like the previous qsort example, where the comparison function is constrained to have a particular type, often a callback is constrained to have a particular type, taking particular parameters. Sometimes not all of these parameters are needed. Example: zlib’s zalloc interface. zalloc‘s type, when you peer behind the typedefs, is:

void *(*)(void *, unsigned, unsigned);

It’s a function pointer. If you want to use this interface, which you would do when you want zlib to use a custom allocator rather than malloc, then you need to implement a memory allocation function that takes 3 arguments:

void *superalloc(void *opaque, unsigned n, unsigned size) { … }

opaque is an opaque pointer that zlib doesn’t care about. It simply passes it from the struct z_stream_s opaque member to your function. What if you don’t need it? Then stick a blah after the parameter name to indicate that you intend to not use it:

void *superalloc(void *opaque_, unsigned n, unsigned size) { … }

Anyone know any other uses of blah?

Advertisements

Embedding Lua in 5 Minutes

2008-04-03

So at the UKUUG Spring Conference I kind of decided that there weren’t enough different dynamic languages being talked about; in fact it was pretty much divided into Python land and Perl land (at least as far as dynamic languages were concerned). So I decided to give a 5 minute lightning talk at the end of the conference, on embedding Lua into an application in 5 minutes.

This was my first lightning talk and it was a bit scary and a lot of fun. I highly recommend the experience.

I decided that instead of talking about embedding Lua, I would actually do it, standing in front of the conference live. Including downloading the Lua sources and compiling them (yay for working conference wifi). I thought this was hilarious, I have no idea what anyone else thought. I surprised myself by being able to type code in vi and talk at the same time, though I’m not sure I made much sense.

Here’s an example using the new command line option I added to yes:

$ ./a.out -l 'x=x or 1; x=x*2; return x' | head
2
4
8
16
32
64
128
256
512
1024

For the sake of completeness here is the modified version of yes.c that I ended up with:

#include <sys/cdefs.h>

#include <stdio.h>
#include "lua.h"
#include "lauxlib.h"

int main __P((int, char **));

int
main(argc, argv)
        int argc;
        char **argv;
{
  if(argc >= 3 && strcmp(argv[2], "-l")) {
    lua_State *l = luaL_newstate();
    luaL_openlibs(l);

    while(1) {
      luaL_dostring(l, argv[2]);
      puts(lua_tostring(l, -1));
      lua_settop(l, 0);
    }
  }

        if (argc > 1)
                for(;;)
                        (void)puts(argv[1]);
        else for (;;)
                (void)puts("y");
}

/*
 * Copyright (c) 1987, 1993
 *      The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *      This product includes software developed by the University of
 *      California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

Fritz asks about C structure types

2007-10-24

Fritz asks:

Hello,

I have a question regarding structures in C.

What’s the correct way to use it since I saw different styles using typedef struct, and naming struct before ‘{’ and after ‘}’.

Is this correct?

struct _foo {
  int a, b;};
struct _foo *foo;

Thanks!

(short answer yes, but don’t use “_foo”)

Thanks for the question Fritz, which normally I would expect to have seen on the Coming Soon page.

It’s good every once in a while to review even the basic parts of a language with which one is intimately familiar, so I will.

I will be discussing C not C++ (pretty much all I know about C++ is that it’s a bit different in this regard).The code Fritz presents is almost fine. The error is that it uses the struct tag «_foo» which renders it undefined (a point I discuss below). The same code using «struct foo» would be just fine. In the discussion below I’ll use «struct foo».

struct is a keyword in C and it has two related uses:

  • To declare a new structure type and possibly a tag (name) for that type; and,
  • To refer to a previously declared type by its tag.
  • Fritz’s short example shows both of these. The first use of struct:

    struct foo {
      int a, b;
    };

    declares a new structure type and gives that type the name «struct foo». The «foo» in «struct foo» is called a tag and it is used to specify which structure type you mean when you want to refer to the stucture type later (or, as we see below, earlier).The second use:

    struct foo *foo;

    declares a new variable whose type is «struct foo *» (pointer to struct foo), referring to the type we declared earlier by its name. It so happens that the tag used to identify the struct, the «foo» in «struct foo», and the name of the variable are the same. That’s okay, it works because structure tags and variable names live in separate name spaces so they can never conflict (see [ISOC] section 6.2.3).These declarations can be combined; we can simultaneously declare a new structure type and a variable that uses that type:

    struct foo {
      int a,b;
    } *foo;

    This isn’t particular good style, but it is possible. It is downright dangerous in argument lists. Avoid it unless you know better.

    Incomplete types

    More usefully the declaration of the existence of a structure type can be separated from the definition of a structure type. The key idea is that you don’t need to know about the internal details of a structure type in order to have a pointer to it. So the declaration of a pointer type doesn’t have to have a full definition of the structure type.

    This is what makes linked list types possible in C:

    struct node {
      struct node *next;
      int v;
    };

    Or structure types that refer to each other:

    struct bar {
      struct zon *z;
    };
    
    struct zon {
      struct bar *b;
    };

    Note that the definition of «struct bar» doesn’t have to see the definition of «struct zon» before being allowed to use «struct zon *» to declare a pointer type.The same technique allows us to have abstract datatypes in C. Simply define an interface that only uses a pointer type, «struct foo *» say, and never put the definition of «struct foo» in the header, reserve that for the implementation file. Clients of the interface are restricted from inspecting the contents of the structure and the compiler gives a lot of help in enforcing that.

    typedef

    It’s common to use a typedef in conjunction with structure types. A typedef allows you create an alias, or another name, for a type. When used with structure type there are two common patterns of use. The first is to create a typedef for the structure type itself; the second is to create a typedef for a pointer to the structure type:

    typedef struct foo {
      int x;
    } foo_s;
    
    typedef struct foo *foo_p;

    The latter approach, making a typedef of the pointer type, is useful when you want the structure to be opaque so that clients can’t access it. This is the approach taken by the PAM client interface from Linux where all the functions take a pamc_handle_t which is defined as the opaque type «typedef struct pamc_handle_s *pamc_handle_t;». The client never gets to see the definition of the “inside” of the structure type so it can never (legally) access it except via the published interface. This is a good thing.

    The perils of _foo

    You can’t use a structure tag name that begins with an underscore at the top level of a file. At least, that’s my reading of this extract from [ISOC] section 7.1.3, Reserved Identifiers:

    All identifiers that begin with an underscore are always reserved for use as identifierswith file scope in both the ordinary and tag name spaces.

    Using a reserved identifier in your program leads to undefined behaviour (paragraph 2 of the same section, 7.1.3). So don’t do that then.

    C#: Not a careful standard

    2007-07-03

    I don’t know why, but I did. I guess I was just curious. I looked at the C# standard, ECMA 334 (now an ISO standard).

    A few seconds in I notice the following:

    Section 11.1.6 describes the set of values in the double type:

    “The finite set of non-zero values of the form s × m × 2e, where s is 1 or −1, and for double, 0 < m < 253 and −1075 ≤ e ≤ 970.”

    It also says that double is represented using the 64-bit double precision IEC 60559. Now of course I can’t actually get a copy of that standard, but it’s well known that the exponent for double precision is 11-bits wide with a bias of 1023. That means that the smallest (positive) normalised number is 2-1022; the smallest denormal is 2-52 times that value: 2-1074. Still, the C# boys nearly got it right. They’re off-by-one on the top end too, the largest finite double is (253-1) × 2971.

    Like a boy with a new toy (I discovered Python’s struct module a few days ago), I can check the math with Python:

    >>> struct.unpack(‘d’, struct.pack(‘Q’, 1))[0]
    4.9406564584124654e-324
    >>> 2**-1074
    4.9406564584124654e-324

    Further on in the same section the standard says: “The double type can represent values ranging from approximately 5.0 × 10−324 to 1.7 × 10308 with a precision of 15–16 digits.” At least they get the range right in decimal. I still find this slightly misleading because down in the denormal range you don’t get 15-16 digits of precision (consider, 3e-324 == 7e-324). It would be more accurate to say “The double type can represent values ranging from approximately 5.0 × 10−324 to 1.7 × 10308, with a precision of 15–16 digits when the magnituitude is greater than approximately 2.2 × 10-308.”

    These are tedious details; some might think them a little bit dull, but they’re details that a standard should get right. After all, if the guy writing the standard isn’t going to get them right, who is? Your vendor? (mwahaha) It’s sloppy and makes me wonder what else might they have got wrong. Am I encouraged to continue reading? No.

    Sorry, wrong category because wordpress thinks C, C++, and C# are all the same. Meh.

    C: prefer f(…) over (*f)(…)

    2007-04-30

    In C all function calls take place via a function pointer. It says so in section 6.5.2.2: “Constraints: The expression that denotes the called function shall have type pointer to function returning void or returning an object type other than an array type.”.

    So what happens when you write p = malloc(sizeof *p);? malloc isn’t a function pointer, it’s a function. What happens is that when a function is used in an expression it is automatically converted to a pointer to the function instead (section 6.3.2.1); except for sizeof where it is illegal, and & (address-of).

    That means that if callback is already a function pointer you don’t need to go (*callback)() to call it, you can just go callback(). Most of the time I prefer that. It’s neater because you don’t need the * so you don’t need the parentheses either. It also shows confidence; it shows that you know what you’re doing.

    If do you go (*callback)() then dereferencing the pointer yields a function, which is automatically converted back into the function pointer. It all seems a bit pointless. An amusing consequence, which not that many C programmers seem to realise, is that you can have as many *s as you like:

    (**********callback)();
    

    sizeof(char) is 1

    2007-04-08

    I found this piece of code, apparently in Mozilla, using Google’s codesearch facility. The code in question is:

    group->text = (char *) malloc(sizeof(char) * strlen(params->text));
    

    If you’re not careful you see code like this all the time. sizeof(char) is 1. The standard says so. There is no point in writing sizeof(char) in code like this. It adds clutter and it suggests that you don’t know enough C. In particular, you don’t know that sizeof(char) is 1, guaranteed. If you don’t know that, what else don’t you know about C? Maybe you don’t know that malloc returns void * and that this type can be correctly converted to any other (object) pointer type without an explicit cast.

    Aside: There are more serious problems with this code. The use of strlen inside a malloc without a +1 should always fire off alarm bells. And sure enough if we look at the surrounding code:

    group->text = (char *) malloc(sizeof(char) * strlen(params->text));
    if (group->text == NULL) {
    	res = MP_MEM;
    }
    strcpy(group->text, params->text);
    

    We see that the code is copying a string, but it doesn’t allocate enough memory for it. Recall that in C you need strlen(s)+1 bytes of storage to copy the string s. C invites you to make this error. Daily.

    Some people might think it’s reasonable to use sizeof(char) becuase char might be #defined to something else (such as wchar_t). You can’t. The standard says it’s illegal to do so.

    In C a byte is the amount of storage used to represent a char. When you’re talking about storage space the two are essentially synonymous. What about those weird architectures like the TI C54x DSP range? This comment in the FreeType sources says that it has 16-bit char. That’s right, sizeof(char) is still 1, but it has more than 8 bits. It’s still a byte; not all bytes are 8-bits (if you really do mean 8-bit byte then say octet). Does your bit-twiddling code work when CHAR_BIT is not 8?

    Zlib gives an example of how to code for the wrong assumptions. The code is:

    #define Buf_size (8 * 2*sizeof(char))
    /* Number of bits used within bi_buf. (bi_buf might be implemented on
     * more than 16 bits on some systems.)
    

    See how they understand that a 2 char buffer might be bigger than 16-bits? Unfortunately they assume that sizeof always returns its answers in 8-bit chunks (octets) and that it is sizeof(char) that varies. Wrong.

    sizeof and malloc

    sizeof and malloc go together. sizeof yields results in bytes and malloc requires bytes for its argument.

    A classic use is in code like this
    (a more or less randomly found example):

    file = (URL_FILE *)malloc(sizeof(URL_FILE));
    

    This is correct use of sizeof and malloc, but I don’t recommend it. There are better ways to write this code.

    The first issue I have with the code is that it assumes that the file object is of type URL_FILE in order to calculate the right size. As it happens it is, but it’s not necessary to have that assumption. sizeof can operate on an expression and should do so in this case:

    file = (URL_FILE *)malloc(sizeof *file);
    

    The other issue I have is that there is simply no need to explicitly cast the result of malloc to the type URL_FILE *. malloc returns void *, is defined to do so by the standard and that should be relied upon. void * will correctly convert, without an explicit cast, to any object pointer type. If the header files supplied by your vendor do not declare malloc correctly then your C implementation is not remotely standards compliant and you should complain to your vendor and probably fix the headers (yes, by editing the system supplied header files). If your implementation of malloc really does return int so you must supply an explicit cast then you have Special Needs and you should probably think very carefully before writing any C code at all.

    [Added on 2007-06-26: Apparently K&R’s CPL contains the advice to explicitly coerce the return value of malloc, perhaps that’s why lots of programmers do it. In the list of errata for that book they say (in the erratum for page 142) that, in the light of the Standard’s conversion rules, the explicit coercion is “not necessary […], and possibly harmful if malloc, or a proxy for it, fails to be declared as returning void *”.]

    So now we have:

    file = malloc(sizeof *file);
    

    Much nicer. If you need persuading that this code is better consider what happens if the type of file is changed. In my suggested code, the call to malloc does not need changing. In the example code as first presented then both the cast and the sizeof will need changing. Will your programmer change both? If they forget to change the sizeof then the wrong size block of memory will get allocated. Oopsie. Another reason why the code without the explicit cast is better is if you forget to #include <stdlib.h> to declare malloc. With the explicit cast many compilers will simply assume that you know what your doing, namely that you intended to call an undeclared function which implicitly returns int and convert the result to a pointer type, and they won’t warn you. Without the explicit cast most compilers will generate a warning even on their most feeble settings.

    By the way the syntax for sizeof is either sizeof(type) or sizeof expression. You only need parentheses around the operand of sizeof if it is a type. If the expression needs protecting from adjacent expressions then put the sizeof itself inside parentheses: malloc((sizeof *p) + (sizeof *q)). If you don’t need the parentheses then don’t use them, it shows that you know the syntax of sizeof and generates confidence that you know what you’re doing. p = malloc(sizeof *p) should be idiomatic.

    For more examples of cruelly finding somebody else’s bugs use codesearch, see The perils of multiple-value-prog1. Maybe you’ll learn some Lisp too.

    Stupid const, stupid mutable, stupid C++

    2007-03-21

    Every now and then someone drags me into a C++ discussion. I have friends who code in C++. Some of them do it for money. I’m sorry about that.

    Recently I heard about “const methods”. “What’s that?” I ask; “Methods that are contractually obliged not to change their object.”. Oh, I think. That seems really useful, but a feature like that would never make into C++ (I reason, based on what I know of C). Still, not knowing much about C++ I shut up and go and learn some more C++ instead. I consult my friends who actually know more C++ than I do. The replies are not consistent, not confident, and sometimes go all slippery and vague when I press for details.

    Then, it dawns on me. A “const method” is simply a method whose this pointer is const qualified. It’s an inevitable feature of the language once you’ve add both const and this; otherwise how would you invoke a method on a const instance, and what would the type of this be? You have to have methods with const this otherwise you end up throwing away constness and I’m sure we can all agree on what a bad idea that is. (of course when I say things like const this I mean that this has type const foo * (pointer to const thingy)).

    Here’s the most important thing to know about const methods:

    They cannot help the compiler; they cannot help you.

    (Actually they can help you, if certain conventions are obeyed. The key thing is that the conventions are just that, so const methods cannot be relied upon to help you; there is no contractual obligation). There’s a lot of confusion about const on the web, particular about what it means for a method to be const. You see stuff like this (from this random webpage):

    void inspect() const;   // This member promises NOT to change *this

    This is trying to claim that declaring the method to be const means that the method promises not to change the object. This is rot. Aside from naughty const-removing casts that would allow the inspect method to modify the object, the inspect method might simply call a function that happens to use a global pointer (that is not const) to modify the object in question. To illustrate (if my C++ is not idiomatic, that’s because I don’t get paid to program in it; if it’s not correct, I want to know):

    #include <iostream>
    
    class counter {
      public:
      int i;
    
      counter();
    
      int inspect() const;
      void increment();
    };
    
    counter sigma_inspect;
    
    counter::counter()
    {
      i = 0;
    }
    
    int counter::inspect() const
    {
      sigma_inspect.increment();
      return i;
    }
    
    void counter::increment()
    {
      ++ i;
      return;
    }
    
    int main(void)
    {
      counter a;
    
      std::cout << a.inspect() << "\\n";
    
      std::cout << sigma_inspect.inspect() << "\\n";
      std::cout << sigma_inspect.inspect() << "\\n";
      return 0;
    }
    

    inspect is a const method and increment is a non-const method. increment increments the value of a counter, and inspect returns the value of that counter. The inspect method also increments a global counter that records the total number of times that inspect has been called. Consider the two calls to sigma_inspect.inspect in the main function. They will print out as 2 and 3. The value of sigma_inspect.inspect() has changed even though inspect is a const method. This is of course because the object on which we are invoking the inspect method is sigma_inspect, the same object that inspect is using to keep track of the number of times it has been invoked. The reason that this works is that the inspect method is not using its this pointer to modify the sigma_inspect object, it is using the sigma_inspect global directly. It would be an error for the inspect method to try and modify *this because that’s a const lvalue and can’t be used to modify an object. Essentially the sigma_inspect object has been aliased.

    The best that can be said for const methods is that they’re a useful convention. It is useful to denote in an interface which methods do not modify an object and which do, and const is a way of doing that. But note that there are no obligations nor guarantees. A const method is not obliged to refrain from modifying the object (as the above example shows); the caller of a const method has no guarantee that the object will not change. const is not magic.

    I’ve heard some people tout const in C++ as an advantage it has over other languages such as Java or Common Lisp. const methods add contractual guarantees to the language that are useful. Well, there are no guarantees. As with my many “my language is better than yours” debates both sides would probably do well to simply spend more time programming in different languages (not necessarily the ones that are being argued about).

    Objective-C doesn’t have const methods, and yet Objective-C programmers using Apple’s Cocoa interfaces enjoy many of the same benefits, in particular a clear separation of mutating and non-mutating methods enforced by the compiler. How is it achieved? Through the class hierarchy. Observe that in C++ that if I have a foo instance then I can call const methods as well as non-const methods, whereas if I have a const foo instance I can only call const methods. A foo instance has all the methods of a const foo instance, and some more on top. This is just like inheritance in an ordinary class hierarchy. If A has all the methods of B and some more methods then we can usually implement this by making A a subclass of B. So it is in Cocoa. The Collections classes illustrate this idea best. Cocoa has an Array class (called NSArray for historical reasons) and a Mutable Array class. The NSArray class models read-only arrays. They can be created (and returned from other methods and functions and so on), but not modifed. The NSMutableArray class is a subclass of NSArray and it supports methods that can modify the array (such as adding, removing, and so on). A method like count, which counts the number of items in the array, is specified by the NSArray class so is invokable on instances on NSArray and instances NSMutableArray. You have an NSMutableArray instance in your hand and you want to pass it to a function which is expecting the read-only version NSArray? No problem, casting up the hierarchy is legal and problem free, just like a cast that adds const in C++.

    Some parts of Java’s JSE class hierarchy do a similar thing, see java.awt.image.Raster and its subclass java.awt.image.WritableRaster for example. And of course, Apple’s Cocoa classes (NSArray and friends) also exist in Java as well as Objective-C.

    Using the class hierarchy like this might scare you, especially if you have preconceived ideas about what to use a class hiearchy for. To be honest the only people I know to be scared are C++ programmers that haven’t been exposed to enough other object oriented languages.

    So, what about mutable? Well, perhaps by now you’ve guessed that I think it’s stupid and wrong and should not be tolerated. It doesn’t add anything that you can’t do with casts or pointers. Perhaps the simplest illustration is putting a pointer to itself in every object:

    #include <iostream>
    
    struct silly {
      silly();
      void foo() const;
      silly *const mutate;
      int i;
    };
    
    silly::silly(): mutate(this)
    {
      i=0;
    }
    
    void silly::foo() const
    {
      std::cout << i << "\\n";
      ++ mutate->i;
      std::cout << i << "\\n";
      return;
    }
    
    int main(void)
    {
      silly s;
      s.foo();
      return 0;
    }

    Every silly object has a mutate variable that points to itself. The const foo method can use the mutate variable to change the object, in this case incrementing i. Calling it mutate even makes it kind of self-documenting. If it’s that easy to get round the const restriction when you want, why bother with an entire new keyword, mutable? The only explanation I have is that the people that have influence over the evolution of C++ enjoy making it more complex in ways that don’t give you any more real programming power.

    Of course, as a C++ programmer you need to know about const, and you probably need to know about const methods, but that’s mostly because if you don’t you’ll annoy people that try to create const instances (or refs to such) of your classes.