Archive for the '/bin/sh' Category

Piping into shell may be harmful



curl https://thing | sh

It has become fashionable to see this kind of shell statement as a quick way of installing various bits of software (nvm, docker, salt).

This is a bad idea.

Imagine that due to bad luck the remote server crashes halfway through sending the text of the shell script, and a line that reads

rm -fr /usr/local/go

gets truncated and now reads:

rm -fr /

You’re hosed.

curl may write a partial file to the pipe and sh has no way of knowing.

Can we defend against this?

Initially I was pessimistic. How can we consider all possible truncations of a shell program? But then I realised, after a couple of false turns, that it’s possible to construct a shell script that is syntactically valid only when the entire script has been transmitted; truncating this special shell script at any point will result in a syntactically invalid script which shell will refuse to execute.

Moreover this useful property of being syntactically valid only when complete is not just a property of a small specially selected set of shell scripts, it’s entirely general. It turns out to be possible to transform any syntactically valid shell script into one that has the useful property:

any old script here

We can bracket the entire script with the grouping keyword { and }. Now if the script gets cutoff somewhere in the middle, the { at the beginning will be missing its matching } and be syntactically invalid. Shell won’t execute the partial script.

As long as the script in the middle is syntactically valid, then the bracketed script will be syntactically valid too.

Let’s call these curly brackets that protect the script from truncation, Jones truncation armour.

Clearly Jones truncation armour should be applied to all scripts that may be piped directly into shell.

Can it be applied as an aftermarket add-on? Yes it can!

{ echo { && curl https://thing && echo } ; } | sh

Maybe this is even better. It means that the consumer of the script doesn’t have to rely on the provider to add the Jones truncation armour. But it also doesn’t matter if the script is already armoured. It still works.

Making change with shell


I was flicking through Wikström’s «Functional Programming Using Standard ML», when I noticed he describes the problem of making up change for an amount of money m using coins of certain denominations (18.1.2, page 233). He says we “want a function change that given an amount finds the smallest number of coins that adds to that amount”, and “Obviously, you first select the biggest possible coin”. Here’s his solution in ML:

exception change;
fun change 0 cs = nul
  | change m nil = raise change
  | change m (ccs as c::cs) = if m >= c
      then c::change (m-c) ccs
      else change m cs;

It’s pretty neat. The recursion proceeds by either reducing the magnitude of the first argument (the amount we are giving change for), or reducing the size of the list that is the second argument (the denominations of the coins we can use); so we can tell that the recursion must terminate. Yay.

It’s not right though. Well, it gives correct change, but it doesn’t necessarily find the solution with fewest number of coins. Actually, it depends on the denominations of coins in our currency; probably for real currencies the “biggest coin first” algorithm does in fact give the fewest number of coins, but consider the currency used on the island of san side-effect, the lambda. lambdas come in coins of Λ1, Λ10, and Λ25 (that’s not a wedge, it’s a capital lambda. It’s definitely not a fake A).

How do we give change of Λ30? { Λ10, Λ10, Λ10 } (3 tens); what does Wikström’s algorithm give? 1 twenty-five and 5 ones. Oops.

I didn’t work out a witty solution to the fewest number of coins change, but I did create, in shell, a function that lists all the possible ways of making change. Apart from trivial syntactic changes, it’s not so different from the ML:

_change () {
    # $1 is the amount to be changed;
    # $2 is the largest coin;
    # $3 is a comma separated list of the remaining coins

    if [ "$1" -eq 0 ] ; then
        echo ; return
    if [ -z "$2" ] ; then
    _change $1 ${3%%,*} ${3#*,}
    if [ "$1" -lt "$2" ] ; then
    _change $(($1-$2)) $2 $3 |
        while read a ; do
            echo $2 $a

change () {
    _change $1 ${2%%,*} ${2#*,},

Each solution is output as a single line with the coins used in a space separated list. change is a wrapper around _change which does the actual work. The two base cases are basically identical: «”$1″ -eq 0» is when we have zero change to give, and we output an empty line (just a bare echo) which is our representation for the empty list; «-z “$2″» is when the second argument (the first element of the list of coins) is empty, and, instead of raising an exception, we simply return without output any list at all.

The algorithm to generate all possible combinations of change is only very slightly different from Wikström’s: if we can use the largest coin, then we generate change both without using the largest coin (first recursive call to _change, on line 12) and using the largest coin (second recursive call to _change, on line 16). See how we use a while loop to prepend (cons, if you will) the largest coin value to each list returned by the second recursive call. Of course, when the largest coin is too large then we proceed without it, and we only have the first recursive call.

The list of coins is managed as two function arguments. $2 is the largest coin, $3 is a comma separated list of the remaining coins (including a trailing comma, added by the wrapping change function). See how, in the first recursive call to _change $3 is decomposed into a head and tail with ${3%%,*} and ${3#*,}. As hinted at in the previous article, «%%» is a greedy match and removes the largest suffix that matches the pattern «,*» which is everything from the first comma to the end of the string, and so it leaves the first number in the comma separated list. «#» is a non-greedy match and removes the smallest prefix that matches the pattern «*,», so it removes the first number and its comma from the list. Note how I am assuming that all the arguments do not contain spaces, so I am being very cavalier with double quotes around my $1 and $2 and so on.

It even works:

$ change 30 25,10,1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 10 1 1 1 1 1 1 1 1 1 1
10 10 10
25 1 1 1 1 1

Taking the bash out of Mark


Mark Dominus, in his pretty amusing article about exact rational arithmetic in shell gives us this little (and commented!) shell function:

        # given an input number which might be a decimal, convert it to
        # a rational number; set n and d to its numerator and
        # denominator.  For example, 3.3 becomes n=33 and d=10;
        # 17 becomes n=17 and d=1.
        to_rational() {
          # Crapulent bash can't handle decimal numbers, so we will convert
          # the input number to a rational
          if [[ $1 =~ (.*)\.(.*) ]] ; then
              d=$(( 10 ** ${#f_part} ))

Since I’m on a Korn overdrive, what would this look like without the bashisms? Dominus uses BASH_REMATCH to split a decimal fraction at the decimal point, thus splitting ‘fff.iii’ into ‘fff’ and ‘iii’. That can be done using portable shell syntax (that is, blessed by the Single Unix Specification) using the ‘%’ and ‘#’ features of parameter expansion. Example:

$ f=3.142
$ echo ${f%.*}
$ echo ${f#*.}

In shell, «${f}» is the value of the variable (parameter) f; you probably knew that. «${f%pattern}» removes any final part of f that matches pattern (which is a shell pattern, not a regular expression). «${f#pattern}» removes any initial part of f that matches pattern (full technical details: they remove the shortest match; use %% and ## for greedy versions).

Thus, between them «${f%.*}» and «${f#*.}» are the integer part and fractional part (respectively) of the decimal fraction. The only problem is when the number has no decimal point. Well, Dominus special cased that too. Of course the “=~” operator is a bashism (did perl inspire bash, or the other way around?), so portable shell programmers have to use ‘case’ (which traditionally was always preferred even when ‘[‘ could be used because ‘case’ didn’t fork another process). At least this version features a secret owl hidden away (on line 3):

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      d=$(( 10 ** ${#f_part} )) ;;
    (*) n=$1 d=1 ;;

The ‘**’ in the arithmetic expression raised a doubt in my mind and, *sigh*, it turns out that it’s not portable either (it does work in ‘ksh’, but it’s not in the Single Unix Specification). Purists have to use a while loop to add a ‘0’ digit for every digit removed from f_part:

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      while [ -n "${f_part}" ] ; do
      done ;;
    (*) n=$1 d=1 ;;

Traditional shell didn’t support this «${f%.*}» stuff, but it’s been in Single Unix Specification for ages. It’s been difficult to find a Unix with a shell that didn’t support this syntax since about the year 2000. It’s time to start to be okay about using it.

Interactive Shells and their prompts


What can we say about what the “-i” option to shell does? It varies according to the shell. Below, I start with PS1 set to “demo$ “, which is not my default. (you can probably work it out from the transcript, but) it might help to know that my default shell is bash (for now). There’s nothing special about ‘pwd’ in the examples below, it’s just a command with a short name that outputs something.

demo$ echo pwd | ksh
demo$ echo pwd | bash
demo$ echo pwd | ksh -i
$ /home/drj/hackdexy/content
demo$ echo pwd | bash -i
drj$ pwd
drj$ exit
demo$ echo pwd | bash --norc -i
bash-4.2$ pwd
bash-4.2$ exit
demo$ echo pwd | sh -i
$ /home/drj/hackdexy/content
sh: Cannot set tty process group (No such process)
demo$ . ../bin/activate
(hackdexy)demo$ echo pwd | ksh -i
(hackdexy)demo$ /home/drj/hackdexy/content
(hackdexy)demo$ deactivate

What have we learnt? (when stdin is not a terminal) shells do not issue prompts unless the “-i” option is used. That’s basically what “-i” does. bash, but not ksh, will echo the command. That has the effect of making bash’s output seem more like an interactive terminal session.

Both ‘bash’ and ‘ksh’ changed the prompt. ‘bash’ changed the prompt because it sources my ‘.bashrc’ and that sets PS1; ‘ksh’ changed my prompt, apparently because PS1 is not an exported shell variable, and so ‘ksh’ does not inherit PS1 from its parent process, and so sets it to the default of “$ “. If we stop ‘bash’ from sourcing ‘.bashrc’ by using the ‘–norc’ option (‘–posix’ will do as well) then it too will use its default prompt: the narcissistic “bash-4.2$ “.

‘sh’ (dash on my Ubuntu laptop, apparently), is like ‘ksh’ in that it does not echo the commands. But it does output a narked warning message.

The last command, after using ‘activate’ from virtualenv, demonstrates that ‘activate’ must export PS1. I think that is a bug, but I would welcome comment on this matter.

I guess most people use ‘bash’ and mostly set PS1 their ‘.bashrc’, so wouldn’t notice any of this subtlety (for example, they will not notice that ‘activate’ exports PS1 because the ‘.bashrc’ will set it to something different).

I note that the example login and ENV scripts given in The KornShell Command and Programming Language (p241 1989 edition) set PS1 is the login script only, and do not export it. Meaning that subshells will have the default prompt. I quite like that (it also means that subshells from an ‘activate’d shell will have the ‘activate’ prompt). Why doesn’t anyone do it like that any more?

Unix command line pebbles


I always find that “sitting next to someone new” is an interesting and revealing experience. I learn new things, they learn new things. Over the years I’ve accumulated a bunch of tricks about using the command line, using vi, and even using emacs. The thing I’m talking about are things can’t really be learned in a formal context, they are too small and there are too many of them. Yet collectively they form part of the fabric of what it means to be a vi user or a shell programmer or whatever. Picking them up and passing them on by sitting next to someone is the only way I know to spread them. Well, maybe this blog post is a start.

Here’s a few I’ve recently picked up or passed on:

cd -

It changes directory to the last directory that you changed from. It’s /bin/bash not /bin/sh (so I personally would avoid writing it in scripts). I think someone alleged that this was undocumented, but in fact I recently checked and it is in fact documented in the builtins section of the /bin/bash man page, which is huge.

It’s useful when you want to cd to some directory to run a command, but then cd back.

Line 25 of the runlocal script in ScraperWiki used to do just that in order to start a service from within a different current directory:

cd ../services/scriptmgr
node ./scriptmgr.js &
cd -

Because I don’t like writing bash-specific scripts, I changed this to use a subshell, with round brackets:

    cd ../services/scriptmgr
    node ./scriptmgr.js &

A subshell is used to run the code between the round brackets, and it is just a forked copy of the shell. It has exactly the same environment (environment variables, open files, and so on), but is in another process. Since the Current Working Directory is specific to the process, changing it in the subshell has no effect on the outer shell, which carries on unaffected after the subshell has returned. So that’s something I’ve passed on recently.

cp foo.cnf{-inhg,}

This is the same as cp foo.cnf-inhg foo.cnf. It copies a file by removing the -inhg extension from its name. ScraperWiki stores some config files in Mercurial (the -inhg versions), and they are copied so they can be edited locally (the versions without the -inhg suffix). I’m sure this has all sorts of evil uses.

sudo !!

Of the things I’ve picked up recently, this is my favourite. I like how some of these tips are a combination of things that I already knew individually but hadn’t thought to combine. Of course I know that sudo thing runs thing as root, and I did once know that !! is the /bin/csh mechanism (stolen by /bin/bash) for executing the previously typed command. In combination, sudo !! runs the previously typed command as root. Which is perfect for when you typed pip install thinginator instead of sudo pip install thinginator.


The favourite thing I seem to have spread seems to be Ctrl-R. (in /bin/bash) you press Ctrl-R, start typing, and bash will search for what you type in your command history; press Return to execute the command or Ctrl-R to find the next older match. Press Ctrl-C if you didn’t really want to do that.


One of the @pysheff crowd for sudo !!; can’t remember whether it was @davbo or @dchetwynd.

Ross Jones for cd -;

Bitbucket user mammadori “{-ext,}”.

If you enjoyed this post, you may also like Mark Dominus’ semi-rant on the craptastic way to do calculation in shell scripts. Huh, now I feel compelled to that Mark gets it wrong about GNU Shell; arithmetic expansion is not a GNU innovation, it comes from POSIX, having been slightly mutated from an earlier Korn Shell feature. And if he had comments I would’ve said that on his “blag”.

Using the RFC 2397 “data” URL scheme to micro-optimise small images


Curly Logo has a text area with a transparent background (maybe you haven’t noticed, but you can move the turtle “underneath” the purple text area and it is still visible, try «bk 333»). Support for colours with alpha channels (a CSS3 feature) was limited when I tried, so I ended up implementing this using a transparent 1×1 PNG which is repeated across the background.

That PNG file is 95 octets big. That’s no big deal, but the HTTP 1.1 headers that are transmitted before the file are about 400 octets. I’m paying to transmit the headers to you, and it takes time. Receiving the header takes 4 times as long as transmitting the file. Time is Money. [edit: 2007-11-16 I do pay to transmit headers, I was wrong when I said I didn't.]

If I could somehow bundle the PNG file inside the only file that uses it (it’s used in a CSS background-image property in an XHTML file) then you could avoid downloading the extra header. Win! This would be a win even if the bundled PNG was slightly larger. Even if the overall transmitted octet count was a bit higher it would probably be a win in elapsed time because we avoid having to do another HTTP round trip for the extra file (and on some browsers that might mean another TCP/IP connexion, so we save all that too). It turns out we can bundle the PNG file inside the CSS.

We use the apparently obscure “data” URL scheme from RFC 2397. It works by having URLs like this:


(this example is actually a graphic from my earlier article on anti-aliasing, paste it into the location field of a browser to try it out)

“image/png” is optional but defaults to “text/plain” so you probably need to specify it for almost any practical application.

“;base64″ is also optional but if you don’t use it then you need to use the standard %xx URL encoding for non-ASCII octets. For binary data it’s probably saner to use “;base64″. Conceivably there might be binary files for which it was shorter to not use “;base64″.

The comma, “,”, is not optional.

So my CSS changes from:

background-image: url(ts.png);



Once I’ve gzip’d everything (which I used to not do, but is a big win for XML and JavaScript) I end up with an extra 19 octets. Which I pay for to store and transmit. So I’m 19 octets worse off, but you guys lose an entire header so you’re well over 300 octets better off plus an entire round-trip. How good is that?

Naturally RFC 2397 is implemented in Safari (3.0.3), Firefox, and Opera.

Now looking at the Base64 encoded version of the 1×1 PNG I can see that the PNG file is mostly overhead. Maybe I can get rid of some of those obviously unused header fields or chunks? Maybe there is some other image file format that would have less overhead for very tiny images (must be able to store at least 1 pixel to 8-bit precision for each of 4 channels). It’s 1-pixel GIFs all over again. Sorry.

Appendix – The Script

Happily uuencode turns out to support Base64 (on OS X and Single Unix Specification).

(includes bugfix!)

# $Id: //depot/prj/logoscript/master/code/dataurl#1 $
# Convert anything to a data URL.
# See
# Base64 is always used.
# dataurl [filename [mimetype]]

if test "$1" != "" && test "$2" == ""
  case "x$1" in
  *.png) m=image/png;;
  *.gif) m=image/gif;;

if test "$1" = ""
  uuencode -m foo
  uuencode -m "$1" foo
fi |
   { echo data:"${m};base64," ; sed '1d;$d;' ; } | tr -d '

Shell programming: The fu example


Many years ago, back when I thought spending time configuring my Unix environment was a good idea, I wrote a script that I call fu. It helps manage long pathnames and is particularly useful with our tendency to have deep directory structures in perforce:

$ pwd
$ fu master
$ cd `fu master`
$ pwd

As you can see, fu searches up the directory hierarchy (towards root) until it finds a match for its argument.

I think its code provides some interesting points of shell programming that are not always stressed:

# $Header: //depot/home/drj/master/prj/uxutils/fu#3 $
# fu - find up
# finds a file who's name is / where x is specified on the
# command line, and  is some prefix (dirname) of the current
# working directory.



while :
  case "X$d" in
    */) d=`echo "$d"|sed '$s/.$//'`;;
  if test -r "$d/$x"
    echo "$d/$x"
    exit 0
  n=`dirname "$d"`
  if test "X$d" = "X$n"
    exit 4
exit 2

The first things to point out is all the double quotes around variable expansions, like this «”$d”». That’s done so that the variable expands into a single word even when it contains a space. This is almost always what you want. Consider a simple example like «ls $d», if d is the string «foo bar» (which contains a space) then this will ask ls to list the two files «foo» and «bar», which is probably not what you wanted. Of course people that put spaces in their filenames, then pass those names as arguments to Unix programs, then expect it to work, are insane (are you reading this Apple? «Library/Application Support/»). But we do what we can. So unless there are very good reasons why not, every variable expansion gets double quotes around it.

The line that invokes sed, «d=`echo “$d”|sed ‘$s/.$//’`», is pretty typical of string manipulation in shell: tedious, ugly, and sometimes obscure. All it’s doing is stripping a trailing «/» from d, but look how awkward it is. It invokes an external program and probably forks another shell; it could easily be a million times slower than the equivalent code in more traditional compiled language like C or Lisp. You don’t have to do very much of this sort of fiddly manipulation before it becomes very sensible to use a proper language like Python.

«test -r» is evidence of how old the script is (I think). -r tests whether a file is readable, whereas all I care about is existence; clearly I should be using «test -e», but I suspect that at the time I originally wrote the script -e was either not standard or not implemented widely enough. These days I should probably change it.

Those initial Xs in «if test “X$d” = “X$n”» are kind of curious. They’re there to prevent the kind of nonsense that happens if d happens to start with a hyphen and therefore confuse test. If d happened to be «-x» for example, then we would have test -x = something which might confuse test into thinking that it should be executing the -x test (test for an executable file). The version of test on OS X doesn’t suffer this problem, but I’m pretty sure that earlier ones did. I’m pretty sure that the X in «case “X$d”» is there for the same reason, but that’s a bogus reason because case doesn’t suffer this problem.

Hmm. Well, perhaps it would made for some interesting points of shell programming if two of my circumlocutions weren’t made obsolete by progress in Unix utility implementations.

Awesome Shell Power: stupid z trick


Say you want to move all the files in the current directory down into a new directory called “awesome”.

Instead of:

mkdir awesome
mv * awesome

which is likely to complain: “mv: rename awesome to awesome/awesome: Invalid argument” (though it does in fact work), you might try:

mkdir z
mv *
mv z awesome

This “works” when the newly created directory, called “z”, comes at the end of the list that “*” expands to. You weren’t in the habit of creating files beginning with z were you?

Awesome Shell Power: yes ” | head | nl -ba


Often when programming in shell you want a sequence of numbers. bash has quirky for syntax for doing it, but I’m not really interested in that. I want something portable.

«yes '' | head | nl -ba» will generate a sequence of numbers from 1 to 10. 1 to N can be achieved by specifying an argument to head: «yes '' | head -7 | nl -ba», or, as I never tire of telling people, using sed instead of head: «yes '' | sed 7q | nl -ba». The sed version is one character shorter.

As long as we’ve invoked sed we may as well read the manual to get: «yes | sed -n '=;7q'». Note that as sed is not actually processing the input lines, just counting them, it doesn’t matter what their contents are, so we no longer need an argument to yes.

It turns out that yes is not a standard part of the Unix utilities. So much for portability. Whilst we could make an adequate simulation of yes with «while : ; do echo y ; done», it’s much more amusing to use dd:

$ dd < /dev/zero ibs=1 cbs=1 count=7 conv=unblock | sed -n =
7+0 records in
0+1 records out
14 bytes transferred in 0.000060 secs (233945 bytes/sec)

I love dd for its obscurity, its parody of the DD statement in JCL, and the way no-one seems to know what it is for any more. There are some stderr turds left by dd which we can eliminate with a quick 2>&-: «dd < /dev/zero 2>&- ibs=1 cbs=1 count=7 conv=unblock | sed -n =».

This version of the command is improved in many ways: “count=7″ is practically self-documenting; we no longer need quotes around the sed program (though the way a bare = appears at the end does seem like a bit of a non sequitur, especially given all the other = that appear in the dd command that have a completely different meaning). However, it’s unsatisfying in many other ways. It uses /dev/zero which isn’t guaranteed to exist (even on conformant Unix OSes). It squirts binary data (a sequence of alternating NUL and LF characters) into sed which isn’t obliged to behave on such input. It’s long and silly.

Much better to just generate the sequence directly using something approaching a real programming language:

awk 'BEGIN{ for(i=1;i<=7;++i) print i}'

Even those who don’t know the mighty awk can probably work out what it does. It’s portable (some now truly ancient, that is SunOS 4.x, systems might need an extra < /dev/null); it’s clear; it works.

Of course on systems with jot installed you can just go jot 7. That includes FreeBSD and OS X. It turns out that jot is named after APL’s ɩ, iota, which is spelt i. in J.

Awesome Shell Power: 2>&1 | tee …


Collect the stdout and stderr of a command into a single file and see the output interactively:

some command 2>&1 | tee err

I use this with make so often that I have a script called mk:

make "$@" 2>&1 | tee err

(and don’t forget, we always use "$@", never $*)

The output file is named err but would probably be better named ofile (suggested by Nick B in comments). I use err in deference to 15 years of my personal Unix tradition.

The Order of Redirections

Now let’s take pipe out of the equation for a moment and consider redirections alone. On my Mac OS X laptop the command ls -d /eta /etc produces the following output:

ls: /eta: No such file or directory

The “No such file or directory” message appears on stderr and “/etc” appears on stdout.

You can prove that to yourself by closing one or the other file descriptor. Play around with ls -d /eta /etc >&-, and ls -d /eta /etc 2>&- and convince yourself that this is so.

Shell redirections are processed from left to right; we can get the stderr of a command to appear on its stdout and get rid of its stdout entirely:

$ ls -d /eta /etc 2>&1 1>&-
ls: /eta: No such file or directory

You might like to think about why this works. fd 2 is redirected to fd 1, meaning output to fd 2 goes to the same place as fd 1 at the time that the redirection was made. Then fd 1 is closed; this doesn’t affect the previous redirection of fd 2.

Doing it the other way round, we get this:

$ ls -d /eta /etc 1>&- 2>&1
-bash: 1: Bad file descriptor

That’s because by the time that it comes to redirect fd 2 to fd 1, fd 1 has been closed and can’t be used for a redirect.

How can we tell that 2>&1 1>&- sends what normally appears on stderr to stdout? Does this convince you:

$ ( ls -d /eta /etc 2>&1 1>&- ) | sed 's/^/*** /'
*** ls: /eta: No such file or directory

By borrowing an extra fd you can even swap stderr and stdout over:

$ ls -d /eta /etc 3>&2 2>&1 1>&3 3>&- | sed 's/^/*** /'
*** ls: /eta: No such file or directory

Notice that the error message from ls has gone through the pipe and been transformed by sed, so it must have been sent to stdout. The normal output of ls, “/etc”, has not gone through the pipe, so must’ve been sent to stderr.


Get every new post delivered to your Inbox.