blog |

Success with Aboriginal, Alpine, and Debian


Three months ago, in Roadmap #5, I wrote that OSH will be a better shell for building Linux distributions. It will run existing code, including bash scripts, but it's stricter and easier to debug.

In the last month, I've made significant progress toward this goal. I fixed dozens of bugs, implemented new features, and simplified the codebase.

OSH can now run thousands of lines of shell scripts that build three distros: Aboriginal Linux, Alpine Linux, and Debian. This post describes what I did, and the technical work that was involved.

Recap of Recent Progress

I haven't written about Linux distributions in awhile. What happened?

OSH was able to run abuild -h back in October, but its parsing speed made debugging sessions unpleasant. On a fast machine, it took more than 1600 milliseconds to parse abuild!

So I pushed two tasks onto the stack, for a total of three:

  1. Run abuild from Alpine Linux.
  2. Optimize the parser so running abuild isn't painful.
  3. Fix bugs in the parser before optimizing it.

The two releases since October popped #3 and #2 off the stack:

  1. In OSH 0.2, I fixed the bugs revealed by torturing the parser with a million lines of shell. I also introduced parser benchmarks.
  2. OSH 0.3 sped up the parser by 6-7x. I introduced more benchmarks, including ones that measure execution speed.

Now OSH can parse abuild in about 250 milliseconds. That's still too slow, but it's not blocking progress.

I plan to release OSH 0.4 at the end of this month. It will be able to run not just abuild, but also shell scripts from Aboriginal Linux and Debian.

After that, the stack will be empty again. I had to shave some yaks, but I didn't lose sight of the goal!

What Is a Linux Distro?

I didn't understand how Linux distros worked until pretty recently. It's useful to think of them as having (at least) these four components:

  1. A set of source tarballs from "upstream" sources, e.g. GNU, Linux, Apache, or LLVM.
  2. A "meta" build system that turns source tarballs into binary packages. This build system invariably uses shell scripts. Sometimes GNU make is used; sometimes Python is used; but there are always shell scripts.
  3. A script to create the root file system — i.e. to "bootstrap" the system so that it can build its own packages. We'll see this below.
  4. A package manager that allows end users to install binary packages. For Debian-derived systems (Ubuntu, etc.), this is apt; for Red Hat-based systems (CentOS, Fedora, etc.), it's yum.

What's the Difference Between Distros?

I like the diversity of the three distros I worked with, because it gives me confidence that OSH is correct:

  1. Debian: arguably the most popular distro, and one of the oldest. It has many derivatives like Ubuntu. The debootstrap script normally runs under dash, which is one of the most incompatible shells.
  2. Alpine Linux: a "modern" distro for embedded systems and containers. It runs busybox ash.
  3. Aboriginal Linux: an educational project, also with an embedded slant. It runs under bash, and uses many of its quirks.

So not only am I testing shell scripts by different authors, I'm also testing OSH for compatibility with scripts written for different shell dialects.

More background and detail on what I did:

(1) Debian

(2) Alpine Linux

(3) Aboriginal Linux

In summary, I tested OSH on a diverse set of shell scripts found in the wild, and did whatever was necessary to make them run.

I started this process after the last release, and I honestly didn't know how long it would take. There were more problems than I expected, but I was also able to fix them more quickly than expected!

Features Added

This section describes are the holes I filled in to make these scripts work.

Tracing Support

Some errors I ran into had obvious causes. For example, OSH would throw NotImplementedError when it encountered ${s:1:2} (string slicing). Implementing slicing and getting past the error was easy.

However, other errors required debugging thousands of lines of other people's shell scripts. This motivated me to learn more about bash and debugging. In particular, this tip on making xtrace useful by setting $PS4 helped me figure out where scripts were going wrong.

I implemented these debugging features in OSH:

Shell Options for Strict Behavior

A recurring theme was relaxing OSH's strict behavior in order to accomodate common shell usage. However, I added the ability to opt in to the strict behavior. I added set -o strict-control-flow, strict-array, and strict-errexit. I'll address this topic in another blog post.

Overhaul of Word Splitting and Evaluation

POSIX has quirky rules for the $IFS variable, which determines both how words are split and how the read builtin splits fields.

I rewrote the crappy regex-based version of IFS-splitting with an explicit state machine. This is an interesting piece of code which I may explain in another blog post. It's in core/ It turned a lot of red tests green.

Two Kinds of C-Escaped Strings

echo -e '1\n2' and echo $'1\n2' both print the lines 1 and 2. Their relationship is the same as the relationship between [ and [[ — the former is dynamically parsed, and the latter is statically parsed. For example, dynamic parsing allow this: char=n; echo -e "1\\${char}2", but static parsing doesn't.

I implemented these with a similar but not identical lexers. Metaprogramming let me avoid duplication.

Prefix/Suffix Strip Operations Use the Conventional Algorithm

This is another feature that touches on some computer science. Originally, I translated globs to Python regexes, in order to take advantage of Python's non-greedy matching, e.g. the expression ${x%%*suffix} could be implemented with the regex .*?(suffix).

However, abuild uses character classes in globs, e.g. ${i%%[<>=]*} which isn't easy to translate reliably.

So instead I had to implement these operators using a linear number of calls to fnmatch(), which makes the overall algorithm quadratic. If fnmatch() isn't linear in the worst case, which it often isn't, then the algorithm could be cubic.

However this issue doesn't appear to arise in practice, as all shells use this slow algorithm. Strings are generally short.

Minor Features

There were several other minor features to implement. In most cases, I had already done the hard part: representing code with the lossless syntax tree. The implementation often "falls out" after choosing a good representation.

Shell WTFs

Reimplementing these shell quirks was both fun and depressing. As penance, I've been maintaining a wiki page of Shell WTFs (which is not well-organized).

I could blog every day about one of these and not be done for months. But I remind myself that my main goal is to improve shell with the Oil language, not dwell on the past. Legacy behavior is only useful as far as it gives people an upgrade path to Oil.

Bugs Fixed

File Descriptor Usage

As far as I know, a shell must handle file descriptors differently than any other Unix program. It can't open any files in the descriptor range 3-9, because shell scripts may use them directly.

I used the /proc/$$/fd/ mechanism I mentioned in in OSH Runs Real Shell Programs to debug these problems. It's a very useful way of showing the file descriptor state of a process.

Bugs Related to CPython's Buffering

In The Riskiest Part of the Project, I mentioned several difficulties with using CPython to write a Unix shell.

I encountered another problem: Python does its own buffering of file I/O. I believe this is on top of libc's buffering, although I haven't looked into it deeply.

Instead, I have to read a byte at a time from file descriptor 0. This seems inefficient, but I noticed that dash, mksh, and zsh also do this (in C).

Other Bugs

What Was Not Done

I punted on a few things which weren't strictly necessary to build system images, or which had easy workarounds:

Also note that these OSH builds are in a sense "shallow". I changed the shebang lines of thousands of lines of top-level scripts, but they often invoke more shell scripts with a #!/bin/bash or #!/bin/sh shebang line.

For example, building any Linux distro will require running dozens of configure scripts. Fortunately, OSH can already run those.

What's Next?

As mentioned, the upcoming OSH 0.4 release will include all this work.

I also have several writing tasks on my TODO list:

It would also be nice to get oil-dev@ going again. If you're interested in contributing, e-mail me or leave a comment.