Home

Translating Shell to Oil

2017-02-05

In Success with ASDL, I mentioned that a top priority is to automatically translate shell programs to the oil language. The ability to express real programs is a test of the language's design, especially when they're written by others.

I've done perhaps 25% of the work, but the translations are starting to look accurate, because language features are used in a Pareto or "long tail" distribution.

In this post, I'll show examples of conversions that are essentially 100% true to intent, and discuss the features of the oil language that they reveal.

Why a New Language?

Before looking at code, let's remind ourselves of the motivation. At first glance, this project seems similar to CoffeeScript. We want a better syntax for shell, in order to reveal its powerful semantics, e.g. Bernstein chaining and pipelines.

I believe this is important because syntax matters.

But more important is that the existing syntax leaves no room for new features. New features will necessarily have a tortured syntax, such as the ^, ^^, , and ,, operations to change the case of a string. I plan to justify this further in a post called Declaring Syntax Bankruptcy on Shell.

So the bigger motivation for the oil language is to add features to shell — in particular borrowing some from awk and make, as well providing a dialect for config files. I'm excited about these goals, but they requires getting past some tedious work.

Selected Translations

I plan to show two files from Aboriginal Linux and two files from the /etc/init.d directory on my Ubuntu machine. (Early blog posts: Aboriginal, init.d.)

I chose these files because they're short, use a variety of language features, and can now be translated automatically. We'll see the first file today, and the remaining three tomorrow.

Open sources/toys/make-hdb.sh in a new window to see:

  1. original shell source,
  2. an oil translation, and
  3. the pretty-printed AST.

Make sure to widen the window so that the two code panes appear side-by-side.

Notice that whitespace and comments are intentionally preserved. That is, if your style is to put then on its own line, the opening { in oil will remain on its own line. I'll describe the algorithm for style-preserving translation in a future post.

To repeat, the original code is:

make_hdb()
{
  # Some distros don't put /sbin:/usr/sbin in the $PATH for non-root users.
  if [ -z "$(which  mke2fs)" ] || [ -z "$(which tune2fs)" ]
  then
    export PATH=/sbin:/usr/sbin:$PATH
  fi

  truncate -s ${HDBMEGS}m "$HDB" &&
  mke2fs -q -b 1024 -F "$HDB" -i 4096 &&
  tune2fs -j -c 0 -i 0 "$HDB"

  [ $? -ne 0 ] && exit 1
}

And here is the oil code, slightly reformatted by hand:

proc make_hdb {
  # Some distros don't put /sbin:/usr/sbin in the $PATH for non-root users.
  if test -z $[which mke2fs] || test -z $[which tune2fs] {
    export PATH = "/sbin:/usr/sbin:$PATH"
  }

  truncate -s $(HDBMEGS)m $HDB      &&
  mke2fs -q -b 1024 -F $HDB -i 4096 &&
  tune2fs -j -c 0 -i 0 $HDB

  test $Status -ne 0 && exit 1
}

They look similar from a distance, which is good. But notice the following changes:

(1) The proc keyword. Oil will have both "procs" and functions, denoted with keywords proc and func.

Procs are what we call shell "functions": they accept an argv array of strings, return an integer status, and have file descriptors. They're simultaneously similar to a process and a procedure.

Functions are like those in Python or JavaScript. They have typed arguments and return values.

So it makes sense to have proper functions, but procs are important too because they're isomorphic to an external process. I'll explain how procs and funcs work together in a future post.

(2) if uses curly braces as block delimiters instead of then and fi. Reasons for this:

Note that { is an operator in oil, but confusingly it isn't in shell. That means it doesn't need spaces to separate it from other tokens.

(3) The conversion uses [ instead of test. Oil will have C-style infix boolean expressions, but legacy code may use test.

Not only is the [ command an ugly syntactic pun, but the [ character is an operator in oil, and would require quoting in a command name.

I believe this is a misfeature of shell:

$ echo 'echo hi from script with funny name' > ]{
> chmod +x ]{
> ./]{
hi from script with funny name

The fact that [ and { aren't operators prevents the shell language from evolving. In oil, you would just add single quotes like this: ']{'.

(4) Special variables look like $Status rather than $?. In oil code, we prefer readable names. A completion system that's configured well by default will make them easy to type.


The remaining observations require some background. Recall that shell is composed of four mutually recursive sublanguages:

  1. the command language: for, if, functions, ...
  2. the word language: ${}, $(), $(()), ...
  3. the arithmetic language: a**2 + b**2
  4. the boolean language [[ a =~ b ]]

Roughly speaking, shell has a separate expression language for each type: strings, integers, and booleans. Oil does away with this complexity and has a single expression language with multiple types.

Thus it has just two sublanguages: commands and expressions.

The characters [] are used for arrays, and the characters () are used for grouping expressions, as in most languages. So it makes sense for $[] to be command substitution and $() to be expression substitution. Commands are simply arrays of strings.

Notice the following:

(5) $(HDBMEGS) is a delimited variable substitution, in contrast to ${HDBMEGS}.

(6) $[which mke2fs] is command substitution, in contrast to $(which mke2fs).

Arithmetic substitution will be $(x + 1) instead of $((x + 1)).

(7) Substitutions aren't quoted. Oil doesn't split words because it's a misfeature designed to simulate arrays. (Most shell implementations additionally have arrays, but they're not in POSIX.)

Splitting can be done explicitly with @split(HDB) or @[which mke2fs]. The @ character is associated with arrays, i.e. for splitting and splicing.

(8) In contrast, strings on the right-hand side of assignments must be quoted. This is because everything to the right of = is parsed in expression mode rather than command mode.

Examples:

echo foo bar  # command mode: command and two literal words
foo = bar     # expression mode: bar is a variable, as in C or python
foo = 'bar'   # bar is a string
x = 1 + 2 * 3           # an integer expression
s = myStr or 'default'  # a string expression

Next

The 7 line make-hdb.sh script showed us many features of Oil. It has "procs", blocks with curly braces {}, arrays with brackets [], more descriptive variable names, and "command" and "expression" sublanguages. Word splitting is explicit rather than automatic.

We'll continue tomorrow with three scripts that show more oil language features.


Discuss this post on Reddit