Archive for the '/bin/sh' Category

bash functions: it is mad!

2014-10-02

bash can export functions to the environment. A consequence of this is that bash can import functions from the environment. This leaves us #shellshocked. #shellshock aside, why would anyone want to do this? As I Jackson says: “it is mad”.

Exporting bash functions allows a bash script, or me typing away at a terminal in bash, to affect the behaviour of basically any other bash program I run. Or a bash program that is run by any other program.

For example, let’s say I write a program in C that prints out the help for the zgrep program:

#include <stdlib.h>

int main(void)
{
    return system("zgrep --help");
}

This is obviously just a toy example, but it’s not unusual for Unix programs to call other programs to do something. Here it is in action:


drj$ ./zgrephelp
Usage: /bin/zgrep [OPTION]... [-e] PATTERN [FILE]...
Look for instances of PATTERN in the input FILEs, using their
uncompressed contents if they are compressed.

OPTIONs are the same as for 'grep'.

Report bugs to <bug-gzip@gnu.org>.

Now, let’s say I define a function called test in my interactive bash session:


test () { bob ; }

This is unwise (test is the name of a well know Unix utility), but so far only harmful to myself. If I try and use test in my interactive session, things go a bit weird:


drj$ test -e /etc/passwd
The program 'bob' is currently not installed. You can install it by typing:
sudo apt-get install python-sponge

but at least I can use bash in other processes and it works fine:


drj$ bash -c 'test -e /etc/passwd' ; echo $?
0

What happens if I export the function test to the environment?


drj$ export -f test
drj$ ./zgrephelp
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
gzip: /bin/zgrep: bob: command not found
--help.gz: No such file or directory
/bin/zgrep: bob: command not found
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found
/bin/zgrep: bob: command not found

zgrephelp stops working. Remember, zgrephelp is written in C! Of course, zgrephelp runs the program zgrep which is written in… bash! (on my Ubuntu system).

Exporting a function can affect the behaviour of any bash script that you run, including bash scripts that are run on your behalf by other programs, even if you never knew about them, and never knew they were bash scripts. Did you know /bin/zcat is a bash script? (on Ubuntu)

How is this ever useful? Can you ever safely export a function? No, not really. Let’s say you export a function called X. Package Y might install a binary called X and a bash script Z that calls X. Now you you’ve broken Z. So you can’t export a function if it has the same name as a binary installed by any package that you ever might install (including packages that are you never use directly but are installed merely to compile some package that you do want to use).

Let’s flip this around, and consider the import side.

When a bash script starts, before it’s read a single line of your script it will import functions from the environment. These are just environment variables of the form BASH_FUNC_something()=() { function defintion here ; }. You don’t have to create those by exporting a function, you can just create an environment variable of the right form:


drj$ env 'BASH_FUNC_foo()=() { baabaa ; }' bash -c foo
bash: baabaa: command not found

Imagine you are writing a bash script and you are a conscientious programmer (this requires a large amount of my imagination). bash will import potentially arbitrary functions, with arbitrary names, from the environment. Can you prevent these functions being defined?

It would appear not.

Any bash script should carefully unset -f prog for each prog that it might call (including builtins like cd; yes you can define a function called cd).

Except of course, that you can’t do this if unset has been defined as a function.

Why is exporting functions ever useful?

How to patch bash

2014-09-26

3rd in what is now becoming a series on ShellShock. 1st one: 10 tips for turning bash scripts into portable POSIX scripts; 2nd one: why big is bad.

ShellShock is a remote code exploit in /bin/bash (when used in conjunction with other system components). It relies on the way bash exports functions to its environment. If you run the command «env - bash -c 'foo () { hello ; } ; export -f foo ; env'» you can see how this works ordinarily:


PWD=/home/drj/bash-4.2/debian/patches
SHLVL=1
foo=() { hello
}
_=/usr/bin/env

The function foo when exported to the environment turns into an environment variable called foo that starts with «() {». When a new bash process starts it scans its environment to see if there are any functions to import and it is that sequence of characters, «() {», that triggers the import. It imports a function by executing its definition which causes the function to be defined.

The ShellShock bug is that there is a problem in the way it parses the function out of the environment which causes it to execute any code after the function definition to be also executed. That’s bad.

So the problem is with parsing the function definition out of the environment.

Fast forward 22 years after this code was written. That’s present day. S Chazelas discovers that this behaviour can lead to a serious exploit. A patch is issued.

Is this patch a 77 line monster that adds new functionality to the parser?

Why, yes. It is.

Clearly function parsing in bash is already a delicate area, that’s why there’s a bug in it. That should be a warning. The fix is not to go poking about inside the parser.

The fix, as suggested by I Jackson, is to “Disable exporting shell functions because they are mad“. It’s a 4 line patch (well, morally 4 lines) that just entirely removes the notion of importing functions from the environment:

--- a/variables.c
+++ b/variables.c
@@ -347,6 +347,7 @@ initialize_shell_variables (env, privmode)
 
       temp_var = (SHELL_VAR *)NULL;
 
+#if 0 /* Disable exporting shell functions because they are mad. */
       /* If exported function, define it now.  Don't import functions from
         the environment in privileged mode. */
       if (privmode == 0 && read_but_dont_execute == 0 && STREQN ("() {", string, 4))
@@ -380,6 +381,9 @@ initialize_shell_variables (env, privmode)
              report_error (_("error importing function definition for `%s'"), name);
            }
        }
+#else
+      if (0) ; /* needed for syntax */
+#endif
 #if defined (ARRAY_VARS)
 #  if ARRAY_EXPORT
       /* Array variables may not yet be exported. */

In a situation where you want to safely patch a critical piece of infrastructure you do the simplest thing that will work. Later on you can relax with a martini and recraft that parser. That’s why Jackson’s patch is the right one.

Shortly after the embargo was lifted on the ShellShock vulnerability and the world got to know about it and see the patch, someone discovered a bug in the patch and so we have another CVN issued and another patch that pokes about with the parser.

That simply wouldn’t have happened if we’d cut off the head of the monster and burned it with fire. /bin/bash is too big. It’s not possibly to reliably patch it. (and just in case you thinking making patches is easy I had to update this article to point to Jackson’s corrected patch)

Do you really think this is the last patch?

Why big is bad

2014-09-26

If you know me, you know that I don’t like using /bin/bash for scripting. It’s not that hard to write scripts that are portable, and my earlier “10 tips” article might help.

Why don’t I like /bin/bash? There are many reasons, but it’s mostly about size.


drj$ ls -lL $(which sh bash)
-rwxr-xr-x 1 root root 959120 Sep 22 21:39 /bin/bash
-rwxr-xr-x 1 root root 109768 Mar 29 2012 /bin/sh

/bin/bash is nearly 10 times the size of /bin/sh (which in this case, is dash). It’s bigger because it’s loaded with features that you probably don’t need. An interactive editor (two in fact). That’s great for interactive use, but it’s just a burden for non-interactive scripts. Arrays. Arrays are really super useful and fundamental to many algorithms. In a real programming language. If you need arrays, it’s time for your script to grow up and become a program, in Python, Lua, Go, or somesuch.

Ditto job control.
Ditto Extended Regular Expression matching.
Ditto mapfile.
Ditto a random number generator.
Ditto a TCP/IP stack.

You might think that these things can’t harm you if you don’t use them. That’s not true. We have a little bit of harm just by being bigger. When one thing is 10 times bigger than it needs to be, no one will notice. When everything is 10 times bigger than it needs to be then it’s wasteful, and extremely difficult to fix. These features take up namespace. Got a shell script called source or complete? Can’t use it, those are builtins in bash. They slow things down. Normally I wouldn’t mention speed, but 8 years ago Ubuntu switched from bash to dash for the standard /bin/sh and the speed increase was enough to affect boot time. Probably part of the reason that bash is slower is simply that it’s bigger. There are more things it has to do or check even though you’re not making use of those features.

If you’re unlucky a feature you’ve never heard of and don’t use will interact with another feature or a part of your system and surprise you. If you’re really unlucky it will be a remote code exploit so easy to use you can tweet exploits, which is what ShellShock is. Did you know you can have functions in bash? Did you know you can export them to the environment? Did you know that the export feature works by executing the definition of the function? Did you know that it’s buggy and can execute more than bash expected? Did you know that with CGI you can set environment variables to arbitrary strings?

There are lots of little pieces to reason about when considering the ShellShock bug because bash is big. And that’s after we know about the bug. What about all those features of you don’t use and don’t even know about? Have you read and understood the bash man page? Well, those features you’ve never heard of are probably about as secure as the feature that exports functions to the environment, a feature that few people know about, and fewer people use (and in my opinion, no one should use).

The most important thing about security is attitude. It’s okay to have the attitude that a shell should have lots of useful interactive features; it’s arguable that a shell should have a rich programming environment that includes arrays and hash tables.

It’s not okay to argue that this piece of bloatware should be installed as the standard system shell.

10 tips for turning bash scripts into portable POSIX scripts

2014-09-26

In the light of ShellShock you might be wondering whether you really need bash at all. A lot of things that are bash-specific have portable alternatives that are generally only a little bit less convenient. Here’s a few tips:

1. Avoid «[[»

bash-specific:

if [[ $1 = yes ]]

Portable:

if [ "$1" = "yes" ]

Due to problematic shell tokenisation rules, Korn introduced the «[[» syntax into ksh in 1988 and bash copied it. But it’s never made it into the POSIX specs, so you should stick with the traditional single square bracket. As long as you double quote all the things, you’ll be fine.

doublequote

2. Avoid «==» for testing for equality

bash-specific:

if [ "$1" == "yes" ]

Portable:

if [ "$1" = "yes" ]

The double equals operator, «==», is a bit too easy to use accidentally for old-school C programmers. It’s not in the POSIX spec, and the portable operator is single equals, «=», which works in all shells.

Technically when using «==» the thing on the right is a pattern. If you see something like this: «[[ $- == *i* ]]» then see tip 7 below.

3. Avoid «cd -»

bash-specific:

cd -

ksh and bash:

cd ~-

You tend to only see «cd -» used interactively or in weird things like install scripts. It means cd back to the previous place.

Often you can use a subshell instead:

... do some stuff in the original working directory
( # subshell
cd newplace
... do some stuff in the newplace
) # popping out of the subshell
... do some more stuff in the original working directory

But if you can’t use a subshell then you can always store the current directory in a variable:

old=$(pwd)

then do «cd “$old”» when you need to go there. If you must cling to the «cd -» style then at least consider replacing it with «cd ~-» which works in ksh as well as bash and is blessed as an allowed extention by POSIX.

4. Avoid «&>»

bash-specific:

ls &> /dev/null

Portable:

ls > /dev/null 2&>1

You can afford to take the time to do the extra typing. Is there some reason why you have to type this script in as quickly as possible?

5. Avoid «|&»

bash-specific:

ls xt yt |& tee log

Portable:

ls xt yt 2>&1 | tee log

This is a variant on using «&>» for redirection. It’s a pipeline that pipes both stdout and stderr through the pipe. The portable version is only a little bit more typing.

6. Avoid «function»

bash-specific:

function foo { ... }

Portable:

foo () { ... }

Don’t forget, you can’t export these to the environment. *snigger*

7. Avoid «((»

bash-specific:

((x = x + 1))

Portable:

x=$((x + 1))

The «((» syntax was another thing introduced by Korn and copied into bash.

8. Avoid using «==» for pattern matching

bash-specific:

if [[ $- == *i* ]]; then ... ; fi

Portable:

case $- in (*i*) ... ;; esac

9. Avoid «$’something’»

bash-specific:

nl=$'\n'

Portable:

nl='
'

You may not know that you can just include newlines in strings. Yes, it looks ugly, but it’s totally portable.

If you’re trying to get even more bizarre characters like ISO 646 ESC into a string you may need to investigate printf:

Esc=$(printf '\33')

or you can just type the ESC character right into the middle of your script (you might find Ctrl-V helpful in this case). Word of caution if using printf: while octal escapes are portable POSIX syntax, \377, hex escapes are not.

10. «$PWD» is okay

A previous version of this article said to avoid «$PWD» because I had been avoiding it since the dark ages (there was a time when some shells didn’t implement it and some did).

Conclusion

Most of these are fairly simple replacements. The simpler tokenisation that Korn introduced for «[[» and «((» is welcome, but it comes at the price of portability. I suspect that most of the bash-specific features are introduced into scripts unwittingly. If more people knew about portable shell programming, we might see more portable shell scripts.

I’m sure there are more, and I welcome suggestions in the comments or on Twitter.

Thanks to Gareth Rees who suggested minor changes to #3 and #7 and eliminated #10.

Piping into shell may be harmful

2014-03-19

Consider

curl https://thing | sh

It has become fashionable to see this kind of shell statement as a quick way of installing various bits of software (nvm, docker, salt).

This is a bad idea.

Imagine that due to bad luck the remote server crashes halfway through sending the text of the shell script, and a line that reads

rm -fr /usr/local/go

gets truncated and now reads:

rm -fr /

You’re hosed.

curl may write a partial file to the pipe and sh has no way of knowing.

Can we defend against this?

Initially I was pessimistic. How can we consider all possible truncations of a shell program? But then I realised, after a couple of false turns, that it’s possible to construct a shell script that is syntactically valid only when the entire script has been transmitted; truncating this special shell script at any point will result in a syntactically invalid script which shell will refuse to execute.

Moreover this useful property of being syntactically valid only when complete is not just a property of a small specially selected set of shell scripts, it’s entirely general. It turns out to be possible to transform any syntactically valid shell script into one that has the useful property:

{
...
any old script here
...
}

We can bracket the entire script with the grouping keyword { and }. Now if the script gets cutoff somewhere in the middle, the { at the beginning will be missing its matching } and be syntactically invalid. Shell won’t execute the partial script.

As long as the script in the middle is syntactically valid, then the bracketed script will be syntactically valid too.

Let’s call these curly brackets that protect the script from truncation, Jones truncation armour.

Clearly Jones truncation armour should be applied to all scripts that may be piped directly into shell.

Can it be applied as an aftermarket add-on? Yes it can!

{ echo { && curl https://thing && echo } ; } | sh

Maybe this is even better. It means that the consumer of the script doesn’t have to rely on the provider to add the Jones truncation armour. But it also doesn’t matter if the script is already armoured. It still works.

Making change with shell

2012-03-13

I was flicking through Wikström’s «Functional Programming Using Standard ML», when I noticed he describes the problem of making up change for an amount of money m using coins of certain denominations (18.1.2, page 233). He says we “want a function change that given an amount finds the smallest number of coins that adds to that amount”, and “Obviously, you first select the biggest possible coin”. Here’s his solution in ML:

exception change;
fun change 0 cs = nul
  | change m nil = raise change
  | change m (ccs as c::cs) = if m >= c
      then c::change (m-c) ccs
      else change m cs;

It’s pretty neat. The recursion proceeds by either reducing the magnitude of the first argument (the amount we are giving change for), or reducing the size of the list that is the second argument (the denominations of the coins we can use); so we can tell that the recursion must terminate. Yay.

It’s not right though. Well, it gives correct change, but it doesn’t necessarily find the solution with fewest number of coins. Actually, it depends on the denominations of coins in our currency; probably for real currencies the “biggest coin first” algorithm does in fact give the fewest number of coins, but consider the currency used on the island of san side-effect, the lambda. lambdas come in coins of Λ1, Λ10, and Λ25 (that’s not a wedge, it’s a capital lambda. It’s definitely not a fake A).

How do we give change of Λ30? { Λ10, Λ10, Λ10 } (3 tens); what does Wikström’s algorithm give? 1 twenty-five and 5 ones. Oops.

I didn’t work out a witty solution to the fewest number of coins change, but I did create, in shell, a function that lists all the possible ways of making change. Apart from trivial syntactic changes, it’s not so different from the ML:

_change () {
    # $1 is the amount to be changed;
    # $2 is the largest coin;
    # $3 is a comma separated list of the remaining coins

    if [ "$1" -eq 0 ] ; then
        echo ; return
    fi
    if [ -z "$2" ] ; then
        return
    fi
    _change $1 ${3%%,*} ${3#*,}
    if [ "$1" -lt "$2" ] ; then
        return
    fi
    _change $(($1-$2)) $2 $3 |
        while read a ; do
            echo $2 $a
        done
}

change () {
    _change $1 ${2%%,*} ${2#*,},
}

Each solution is output as a single line with the coins used in a space separated list. change is a wrapper around _change which does the actual work. The two base cases are basically identical: «”$1″ -eq 0» is when we have zero change to give, and we output an empty line (just a bare echo) which is our representation for the empty list; «-z “$2″» is when the second argument (the first element of the list of coins) is empty, and, instead of raising an exception, we simply return without output any list at all.

The algorithm to generate all possible combinations of change is only very slightly different from Wikström’s: if we can use the largest coin, then we generate change both without using the largest coin (first recursive call to _change, on line 12) and using the largest coin (second recursive call to _change, on line 16). See how we use a while loop to prepend (cons, if you will) the largest coin value to each list returned by the second recursive call. Of course, when the largest coin is too large then we proceed without it, and we only have the first recursive call.

The list of coins is managed as two function arguments. $2 is the largest coin, $3 is a comma separated list of the remaining coins (including a trailing comma, added by the wrapping change function). See how, in the first recursive call to _change $3 is decomposed into a head and tail with ${3%%,*} and ${3#*,}. As hinted at in the previous article, «%%» is a greedy match and removes the largest suffix that matches the pattern «,*» which is everything from the first comma to the end of the string, and so it leaves the first number in the comma separated list. «#» is a non-greedy match and removes the smallest prefix that matches the pattern «*,», so it removes the first number and its comma from the list. Note how I am assuming that all the arguments do not contain spaces, so I am being very cavalier with double quotes around my $1 and $2 and so on.

It even works:

$ change 30 25,10,1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 10 1 1 1 1 1 1 1 1 1 1
10 10 10
25 1 1 1 1 1

Taking the bash out of Mark

2012-03-05

Mark Dominus, in his pretty amusing article about exact rational arithmetic in shell gives us this little (and commented!) shell function:

        # given an input number which might be a decimal, convert it to
        # a rational number; set n and d to its numerator and
        # denominator.  For example, 3.3 becomes n=33 and d=10;
        # 17 becomes n=17 and d=1.
        to_rational() {
          # Crapulent bash can't handle decimal numbers, so we will convert
          # the input number to a rational
          if [[ $1 =~ (.*)\.(.*) ]] ; then
              i_part=${BASH_REMATCH[1]}
              f_part=${BASH_REMATCH[2]}
              n="$i_part$f_part";
              d=$(( 10 ** ${#f_part} ))
          else
              n=$1
              d=1
          fi
        }

Since I’m on a Korn overdrive, what would this look like without the bashisms? Dominus uses BASH_REMATCH to split a decimal fraction at the decimal point, thus splitting ‘fff.iii’ into ‘fff’ and ‘iii’. That can be done using portable shell syntax (that is, blessed by the Single Unix Specification) using the ‘%’ and ‘#’ features of parameter expansion. Example:

$ f=3.142
$ echo ${f%.*}
3
$ echo ${f#*.}
142

In shell, «${f}» is the value of the variable (parameter) f; you probably knew that. «${f%pattern}» removes any final part of f that matches pattern (which is a shell pattern, not a regular expression). «${f#pattern}» removes any initial part of f that matches pattern (full technical details: they remove the shortest match; use %% and ## for greedy versions).

Thus, between them «${f%.*}» and «${f#*.}» are the integer part and fractional part (respectively) of the decimal fraction. The only problem is when the number has no decimal point. Well, Dominus special cased that too. Of course the “=~” operator is a bashism (did perl inspire bash, or the other way around?), so portable shell programmers have to use ‘case’ (which traditionally was always preferred even when ‘[‘ could be used because ‘case’ didn’t fork another process). At least this version features a secret owl hidden away (on line 3):

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      n="$i_part$f_part"
      d=$(( 10 ** ${#f_part} )) ;;
    (*) n=$1 d=1 ;;
  esac
}

The ‘**’ in the arithmetic expression raised a doubt in my mind and, *sigh*, it turns out that it’s not portable either (it does work in ‘ksh’, but it’s not in the Single Unix Specification). Purists have to use a while loop to add a ‘0’ digit for every digit removed from f_part:

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      n="$i_part$f_part"
      d=1;
      while [ -n "${f_part}" ] ; do
          d=${d}0
          f_part=${f_part%?}
      done ;;
    (*) n=$1 d=1 ;;
  esac
}

Traditional shell didn’t support this «${f%.*}» stuff, but it’s been in Single Unix Specification for ages. It’s been difficult to find a Unix with a shell that didn’t support this syntax since about the year 2000. It’s time to start to be okay about using it.

Interactive Shells and their prompts

2012-03-04

What can we say about what the “-i” option to shell does? It varies according to the shell. Below, I start with PS1 set to “demo$ “, which is not my default. (you can probably work it out from the transcript, but) it might help to know that my default shell is bash (for now). There’s nothing special about ‘pwd’ in the examples below, it’s just a command with a short name that outputs something.

demo$ echo pwd | ksh
/home/drj/hackdexy/content
demo$ echo pwd | bash
/home/drj/hackdexy/content
demo$ echo pwd | ksh -i
$ /home/drj/hackdexy/content
$ 
demo$ echo pwd | bash -i
drj$ pwd
/home/drj/hackdexy/content
drj$ exit
demo$ echo pwd | bash --norc -i
bash-4.2$ pwd
/home/drj/hackdexy/content
bash-4.2$ exit
demo$ echo pwd | sh -i
$ /home/drj/hackdexy/content
$ 
sh: Cannot set tty process group (No such process)
demo$ . ../bin/activate
(hackdexy)demo$ echo pwd | ksh -i
(hackdexy)demo$ /home/drj/hackdexy/content
(hackdexy)demo$ 
(hackdexy)demo$ deactivate

What have we learnt? (when stdin is not a terminal) shells do not issue prompts unless the “-i” option is used. That’s basically what “-i” does. bash, but not ksh, will echo the command. That has the effect of making bash’s output seem more like an interactive terminal session.

Both ‘bash’ and ‘ksh’ changed the prompt. ‘bash’ changed the prompt because it sources my ‘.bashrc’ and that sets PS1; ‘ksh’ changed my prompt, apparently because PS1 is not an exported shell variable, and so ‘ksh’ does not inherit PS1 from its parent process, and so sets it to the default of “$ “. If we stop ‘bash’ from sourcing ‘.bashrc’ by using the ‘–norc’ option (‘–posix’ will do as well) then it too will use its default prompt: the narcissistic “bash-4.2$ “.

‘sh’ (dash on my Ubuntu laptop, apparently), is like ‘ksh’ in that it does not echo the commands. But it does output a narked warning message.

The last command, after using ‘activate’ from virtualenv, demonstrates that ‘activate’ must export PS1. I think that is a bug, but I would welcome comment on this matter.

I guess most people use ‘bash’ and mostly set PS1 their ‘.bashrc’, so wouldn’t notice any of this subtlety (for example, they will not notice that ‘activate’ exports PS1 because the ‘.bashrc’ will set it to something different).

I note that the example login and ENV scripts given in The KornShell Command and Programming Language (p241 1989 edition) set PS1 is the login script only, and do not export it. Meaning that subshells will have the default prompt. I quite like that (it also means that subshells from an ‘activate’d shell will have the ‘activate’ prompt). Why doesn’t anyone do it like that any more?

Unix command line pebbles

2012-01-13

I always find that “sitting next to someone new” is an interesting and revealing experience. I learn new things, they learn new things. Over the years I’ve accumulated a bunch of tricks about using the command line, using vi, and even using emacs. The thing I’m talking about are things can’t really be learned in a formal context, they are too small and there are too many of them. Yet collectively they form part of the fabric of what it means to be a vi user or a shell programmer or whatever. Picking them up and passing them on by sitting next to someone is the only way I know to spread them. Well, maybe this blog post is a start.

Here’s a few I’ve recently picked up or passed on:

cd -

It changes directory to the last directory that you changed from. It’s /bin/bash not /bin/sh (so I personally would avoid writing it in scripts). I think someone alleged that this was undocumented, but in fact I recently checked and it is in fact documented in the builtins section of the /bin/bash man page, which is huge.

It’s useful when you want to cd to some directory to run a command, but then cd back.

Line 25 of the runlocal script in ScraperWiki used to do just that in order to start a service from within a different current directory:

cd ../services/scriptmgr
node ./scriptmgr.js &
cd -

Because I don’t like writing bash-specific scripts, I changed this to use a subshell, with round brackets:

(
    cd ../services/scriptmgr
    node ./scriptmgr.js &
)

A subshell is used to run the code between the round brackets, and it is just a forked copy of the shell. It has exactly the same environment (environment variables, open files, and so on), but is in another process. Since the Current Working Directory is specific to the process, changing it in the subshell has no effect on the outer shell, which carries on unaffected after the subshell has returned. So that’s something I’ve passed on recently.

cp foo.cnf{-inhg,}

This is the same as cp foo.cnf-inhg foo.cnf. It copies a file by removing the -inhg extension from its name. ScraperWiki stores some config files in Mercurial (the -inhg versions), and they are copied so they can be edited locally (the versions without the -inhg suffix). I’m sure this has all sorts of evil uses.

sudo !!

Of the things I’ve picked up recently, this is my favourite. I like how some of these tips are a combination of things that I already knew individually but hadn’t thought to combine. Of course I know that sudo thing runs thing as root, and I did once know that !! is the /bin/csh mechanism (stolen by /bin/bash) for executing the previously typed command. In combination, sudo !! runs the previously typed command as root. Which is perfect for when you typed pip install thinginator instead of sudo pip install thinginator.

Ctrl-R

The favourite thing I seem to have spread seems to be Ctrl-R. (in /bin/bash) you press Ctrl-R, start typing, and bash will search for what you type in your command history; press Return to execute the command or Ctrl-R to find the next older match. Press Ctrl-C if you didn’t really want to do that.

Credits

One of the @pysheff crowd for sudo !!; can’t remember whether it was @davbo or @dchetwynd.

Ross Jones for cd -;

Bitbucket user mammadori “{-ext,}”.

If you enjoyed this post, you may also like Mark Dominus’ semi-rant on the craptastic way to do calculation in shell scripts. Huh, now I feel compelled to that Mark gets it wrong about GNU Shell; arithmetic expansion is not a GNU innovation, it comes from POSIX, having been slightly mutated from an earlier Korn Shell feature. And if he had comments I would’ve said that on his “blag”.

Using the RFC 2397 “data” URL scheme to micro-optimise small images

2007-11-14

Curly Logo has a text area with a transparent background (maybe you haven’t noticed, but you can move the turtle “underneath” the purple text area and it is still visible, try «bk 333»). Support for colours with alpha channels (a CSS3 feature) was limited when I tried, so I ended up implementing this using a transparent 1×1 PNG which is repeated across the background.

That PNG file is 95 octets big. That’s no big deal, but the HTTP 1.1 headers that are transmitted before the file are about 400 octets. I’m paying to transmit the headers to you, and it takes time. Receiving the header takes 4 times as long as transmitting the file. Time is Money. [edit: 2007-11-16 I do pay to transmit headers, I was wrong when I said I didn't.]

If I could somehow bundle the PNG file inside the only file that uses it (it’s used in a CSS background-image property in an XHTML file) then you could avoid downloading the extra header. Win! This would be a win even if the bundled PNG was slightly larger. Even if the overall transmitted octet count was a bit higher it would probably be a win in elapsed time because we avoid having to do another HTTP round trip for the extra file (and on some browsers that might mean another TCP/IP connexion, so we save all that too). It turns out we can bundle the PNG file inside the CSS.

We use the apparently obscure “data” URL scheme from RFC 2397. It works by having URLs like this:



(this example is actually a graphic from my earlier article on anti-aliasing, paste it into the location field of a browser to try it out)

“image/png” is optional but defaults to “text/plain” so you probably need to specify it for almost any practical application.

“;base64″ is also optional but if you don’t use it then you need to use the standard %xx URL encoding for non-ASCII octets. For binary data it’s probably saner to use “;base64″. Conceivably there might be binary files for which it was shorter to not use “;base64″.

The comma, “,”, is not optional.

So my CSS changes from:

background-image: url(ts.png);

to

background-image: url();

Once I’ve gzip’d everything (which I used to not do, but is a big win for XML and JavaScript) I end up with an extra 19 octets. Which I pay for to store and transmit. So I’m 19 octets worse off, but you guys lose an entire header so you’re well over 300 octets better off plus an entire round-trip. How good is that?

Naturally RFC 2397 is implemented in Safari (3.0.3), Firefox, and Opera.

Now looking at the Base64 encoded version of the 1×1 PNG I can see that the PNG file is mostly overhead. Maybe I can get rid of some of those obviously unused header fields or chunks? Maybe there is some other image file format that would have less overhead for very tiny images (must be able to store at least 1 pixel to 8-bit precision for each of 4 channels). It’s 1-pixel GIFs all over again. Sorry.

Appendix – The Script

Happily uuencode turns out to support Base64 (on OS X and Single Unix Specification).

(includes bugfix!)

#!/bin/sh
# $Id: //depot/prj/logoscript/master/code/dataurl#1 $
# Convert anything to a data URL.
# See http://www.ietf.org/rfc/rfc2397.txt
# Base64 is always used.
# dataurl [filename [mimetype]]

m="$2"
if test "$1" != "" && test "$2" == ""
then
  case "x$1" in
  *.png) m=image/png;;
  *.gif) m=image/gif;;
  esac
fi

if test "$1" = ""
then
  uuencode -m foo
else
  uuencode -m "$1" foo
fi |
   { echo data:"${m};base64," ; sed '1d;$d;' ; } | tr -d '
'
Follow

Get every new post delivered to your Inbox.