Taking the bash out of Mark

2012-03-05

Mark Dominus, in his pretty amusing article about exact rational arithmetic in shell gives us this little (and commented!) shell function:

        # given an input number which might be a decimal, convert it to
        # a rational number; set n and d to its numerator and
        # denominator.  For example, 3.3 becomes n=33 and d=10;
        # 17 becomes n=17 and d=1.
        to_rational() {
          # Crapulent bash can't handle decimal numbers, so we will convert
          # the input number to a rational
          if [[ $1 =~ (.*)\.(.*) ]] ; then
              i_part=${BASH_REMATCH[1]}
              f_part=${BASH_REMATCH[2]}
              n="$i_part$f_part";
              d=$(( 10 ** ${#f_part} ))
          else
              n=$1
              d=1
          fi
        }

Since I’m on a Korn overdrive, what would this look like without the bashisms? Dominus uses BASH_REMATCH to split a decimal fraction at the decimal point, thus splitting ‘fff.iii’ into ‘fff’ and ‘iii’. That can be done using portable shell syntax (that is, blessed by the Single Unix Specification) using the ‘%’ and ‘#’ features of parameter expansion. Example:

$ f=3.142
$ echo ${f%.*}
3
$ echo ${f#*.}
142

In shell, «${f}» is the value of the variable (parameter) f; you probably knew that. «${f%pattern}» removes any final part of f that matches pattern (which is a shell pattern, not a regular expression). «${f#pattern}» removes any initial part of f that matches pattern (full technical details: they remove the shortest match; use %% and ## for greedy versions).

Thus, between them «${f%.*}» and «${f#*.}» are the integer part and fractional part (respectively) of the decimal fraction. The only problem is when the number has no decimal point. Well, Dominus special cased that too. Of course the “=~” operator is a bashism (did perl inspire bash, or the other way around?), so portable shell programmers have to use ‘case’ (which traditionally was always preferred even when ‘[‘ could be used because ‘case’ didn’t fork another process). At least this version features a secret owl hidden away (on line 3):

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      n="$i_part$f_part"
      d=$(( 10 ** ${#f_part} )) ;;
    (*) n=$1 d=1 ;;
  esac
}

The ‘**’ in the arithmetic expression raised a doubt in my mind and, *sigh*, it turns out that it’s not portable either (it does work in ‘ksh’, but it’s not in the Single Unix Specification). Purists have to use a while loop to add a ‘0’ digit for every digit removed from f_part:

to_rational () {
  case $1 in
    (*.*) i_part=${1%.*} f_part=${1#*.}
      n="$i_part$f_part"
      d=1;
      while [ -n "${f_part}" ] ; do
          d=${d}0
          f_part=${f_part%?}
      done ;;
    (*) n=$1 d=1 ;;
  esac
}

Traditional shell didn’t support this «${f%.*}» stuff, but it’s been in Single Unix Specification for ages. It’s been difficult to find a Unix with a shell that didn’t support this syntax since about the year 2000. It’s time to start to be okay about using it.

8 Responses to “Taking the bash out of Mark”

  1. Clive Says:

    Wow! Mark Dominus! That would presumably be the same Mark Jason Dominus I knew from various disreputable corners of the Internet two decades ago. (-8

  2. neilbowers Says:

    Did perl inspire bash or vice-versa? Since it’s me following up, you already know the answer. But to be clear, version 3 of bash introduced =~, and came out in 2004, looong after Perl introduced =~.

  3. David Jones Says:

    Aside: note how the syntax highlighter thinks that the ‘#’ in ${f#*.} introduces a comment. This is my number 1 gripe with syntax highlighters, they never ever never get the exact syntax of the language exactly right.

  4. Mark Dominus Says:

    Thanks for pointing this out! I was unhappy with BASH_REMATCH and I am glad to learn a better way to do it. I will add a note to the original article.

    • David Jones Says:

      yay! I’m glad you want to remove the bashisms. :)

      I’m kinda making it my mission to remove bashisms from shell scripts where i find them, or at least see how hard it is.

  5. Jonathan Coxhead Says:

    I don’t have the Single Unix spec available, but would d=1${f_part//[1-9]/0} be an alternative to d=$(( 10 ** ${#f_part} ))?


Leave a comment