Anonymous things in Python

2009-05-22

This post of pydanny prompted me to flesh this article out (previously it had just been floating around in my mind). pydanny’s article reminded me of when I first learned about lambda and how scary it was. That was in about 1988 when I was trying to get to grips with XLISP on the Atari ST (hah! a Lisp dialect so old it still spells its name with capital letters. sweet). Previous to XLISP I had been programming in (Sinclair) BASIC, z80 machine code, and 68000 assembler. Oh, and FORTH. Maybe you can imagine how scary and alien a concept like lambda was. I can just barely remember how confusing it was. I don’t think the confusion really passed until I started dabbling with ML, in 1991. By the time I started dabbling with the lambda calculus in 1993 I was actually pretty comfortable with it. So, only 5 years of exposure to get lambda to get comfy.

Of course now, I know that:

A lambda is a simply function with no name.

I emphasise the simply because I feel that that’s where the problem of understanding lay (for me). In BASIC and FORTH it was impossible to have a function without a name. In fact, the very thought was literally inconceivable. At least for a few years. Even in machine code where functions don’t have names they do have concrete locations, and they have names in the listing.

It wasn’t until many years later, in fact perhaps relatively recently, that I realised that there was a sort of hierarchy of things that must be named, and things that need not be named. For example, an expression, like «a**2 + b**2 + c**2» is an anonymous value, a value that does not need a name. More importantly, so are all the sub-expressions inside that expression. What would a language look like if it didn’t have anonymous values? Well, something like this:

a2 = a**2
b2 = b**2
c2 = c**2
s = a2+b2
s = s + c2

Do you see? Each expression must be given a name (by assigning it to a named variable), and no expression can contain a sub-expression, because that sub-expression would be anonymous.

Programming in assembler is a bit like this. The names are the registers and every expression has to go in a register. If you run out, you have to rename everything by spilling to the stack. It’s partly why it can be so tedious.

So almost everything worthy of being called a language has anonymous values. At least simple values, like numbers. In the BASIC era it was common for strings to be named only. The kinds of restrictions you had to deal with were things like not being able to create a string from part of an existing string. You had to first create a named variable that was the target string, then copy part of the source string into the target string. Maybe you had some string expressions but there were restrictions on where you could use them. For example maybe you could only pass named strings to functions, so you could go «T$=MID$(A$,12,16):PROCFOO(T$)» but not pass the MID$ expression directly: «PROCFOO(MID$(A$,12,16))» (bad bad bad). Life was hard, and we licked coal for breakfast.

Arrays were another thing where you had to name them and couldn’t pass them to functions (for example). AWK is still like this (arrays cannot be returned from a function, for example).

Speaking of arrays, an element of an array is a bit like an anonymous variable:

a[i] = x

Above, a[i] specifies the place where the computed value, x is this case, goes. I say “a bit like” because you could equally well think of the element being named by the array name and an small integer as a pair. A place where we can store a value that changes is called a variable. You knew that right? It’s just that most people think of named variables when you say variable. The generalised term for “anything you can put on the left of an assignment” is lvalue. A term you mostly see in the C programming language but often elsewhere too. Common Lisp calls such things, places. How sensible.

So the hierarchy of things you need not name starts something like:

  • simple numeric expressions
  • strings
  • arrays (including hash-tables, and so on)
  • objects (incuding structs, records, pairs, and so on)
  • functions
  • places (actually, not sure where this should go really)
  • As you go down the list, you see fewer and fewer languages that give you anonymous versions of the thing in question. Although these days we probably all agree that any decent language will have anonymous versions of all of the above. Python does, and that’s a Good Thing.

    One reason for arranging things this way is that we can use it to both explain and motivate things like lambdas. A lambda is an anonymous function, and you remember how useful it was when you realised you could have string expressions and you didn’t have to store intermediate strings in variables? You could just manipulate strings without having to name them? Well, the same is true of functions.

    So what else could and should be anonymous? Well we can look to other languages for inspiration. ML has anonymous types. Java has anonymous classes. So perhaps the list can continue:

  • classes
  • types
  • (note that a class is a sort of type, so this order has a natural feel to it).

    So everything I said about anonymous functions applies to anonymous classes and anonymous types. They’re great, and every language needs them. Basically my philosophy here is that if there’s a thing in a language and you can name it, then you should also be able to have anonymous versions of those things. That will make the language better.

    So what other sorts of things does Python have that we haven’t removed the names from yet? Modules. For a long while I wished that Python had anonymous modules, a way of getting hold of a module without it polluting your namespace. Mostly I just wished for this because it would make Python more orthogonal, or better as I like to say. Of course modules in Python are first class citizens. Once you’ve imported the module struct, then struct refers to the module as a first class value, and it can be passed around and so on. That’s how help(struct) works. Recently I found both the way to get at modules anonymously (without importing them into a namespace) , and a reason why you might want to.

    I’m a tidy sort of guy so when I program in Python I put my imports only in the functions that need them (a habit I picked up from Command Line Warriors but buggered if I can find the original inspiration). Well, sometimes I am.

    I often find myself debugging my Python by the time honoured tradition of inserting print statements. Because of my tidy habit, I quite often find that the sys module is not in scope (in fact, often the only function that has «import sys» is main. So this fails:

    print >>sys.stderr, "length:", l, "data:" x

    And I’ll be damned if I’m going to add an import sys and forget to remove it later. But this works:

    print >>__import__('sys').stderr, "length:", l, "data:" x

    Not a very nice syntax though, is it?

    19 Responses to “Anonymous things in Python”

    1. mathew Says:

      Why did you use XLISP, when Cambridge LISP was available? Just the money issue?

    2. drj11 Says:

      Cambridge LISP was available for the ST? I had no idea. So that was one thing. Almost certainly money would have been an issue (see “licking coal”). In any case I was doing A levels, and not in computer science, so I didn’t feel particularly inclined to spend money on dev tools.

    3. Gareth Rees Says:

      The Common Lisp standard doesn’t seem to have a way to make anonymous classes—the only portable way to make a class is to use the defclass macro, which gives the class a name. (But I’m no expert, so I could be missing something.)

      In an implementation of Common Lisp that support the MetaObject Protocol, you can make an anonymous class by calling (make-instance 'standard-class ...). But this seems a bit hacky to me.


    4. Sadly Python’s lambda is neutered by the restriction that they can only be expressions l-(

    5. phil Says:

      I’ve had reentrancy issues with imports in functions before — module authors sometimes put code in their modules that shouldn’t run concurrently. Python devs may have looked into this issue — the project I was working on was using 2.4 at the time, IIRC — but I wouldn’t be surprised if there were still tricky edge cases in that code.

      You can get 99% of anonymous modules with anonymous classes or objects, but truly anonymous modules would still be rather nice, since Python modules map so nicely to files and directories. You can delete the module from sys.modules, but that’s kludgy.

    6. mjb67 Says:

      Can you have module lvalues with that syntax, like:

      sysfoo = __import__(‘sys’)

      print >> sysfoo.stderr, “length:”, l, “data:” x

      That would be cool.

    7. mjb67 Says:

      @me ah cool, the documentation says that you can do this. It also amusingly says “__import__ … This is an advanced function that is not needed in everyday Python programming.” The only documented built-in function to have such a note!

      • drj11 Says:

        Ah yes, I meant to mention that
        «import foo»
        is a lot like
        «foo = __import__(‘foo’)»,
        and
        «import foo as bar»
        is like
        «bar = __import__(‘foo’)»,
        and
        «from foo import baz»
        is like «baz = __import__(‘foo’).baz».

        Which is all very neat.

        I wonder if I should do «module = __import__»?

    8. Nick Barnes Says:

      The programming concept which took me an embarrassingly long time to understand was lexical closure. After that, everything was obvious.

    9. Nick Barnes Says:

      The documentation for __getattr__ made me think that I wouldn’t need to define a class at all, but could just make an object and give it a __getattr__ method. But that didn’t work.

    10. Ronan Lamy Says:

      Great post and comment thread! It’d be a shame to let it end just short of perfect anonymousness. So:

      >>> module = type('', (), {'__getattr__': lambda self, name: __import__(name)})()
      >>> module.math.cos(1)
      0.54030230586813977
      

    Leave a reply to Nick Barnes Cancel reply