blog | oilshell.org

Success with Aboriginal, Alpine, and Debian

2018-01-15

Three months ago, in Roadmap #5, I wrote that OSH will be a better shell for building Linux distributions. It will run existing code, including bash scripts, but it's stricter and easier to debug.

In the last month, I've made significant progress toward this goal. I fixed dozens of bugs, implemented new features, and simplified the codebase.

OSH can now run thousands of lines of shell scripts that build three distros: Aboriginal Linux, Alpine Linux, and Debian. This post describes what I did, and the technical work that was involved.

Table of Contents
Recap of Recent Progress
What Is a Linux Distro?
What's the Difference Between Distros?
Debian
Alpine Linux
Aboriginal Linux
Features Added
Tracing Support
Shell Options for Strict Behavior
Overhaul of Word Splitting and Evaluation
Two Kinds of C-Escaped Strings
Stripping Glob Prefixes and Suffixes With POSIX APIs
Minor Features
Shell WTFs
Bugs Fixed
File Descriptor Usage
Bugs Related to CPython's Buffering
Other Bugs
What Was Not Done
What's Next?

Recap of Recent Progress

I haven't written about Linux distributions in awhile. What happened?

OSH was able to run abuild -h back in October, but its parsing speed made debugging sessions unpleasant. On a fast machine, it took more than 1600 milliseconds to parse abuild!

So I pushed two tasks onto the stack, for a total of three:

  1. Run abuild from Alpine Linux.
  2. Optimize the parser so running abuild isn't painful.
  3. Fix bugs in the parser before optimizing it.

The two releases since October popped #3 and #2 off the stack:

  1. In OSH 0.2, I fixed the bugs revealed by torturing the parser with a million lines of shell. I also introduced parser benchmarks.
  2. OSH 0.3 sped up the parser by 6-7x. I introduced more benchmarks, including ones that measure execution speed.

Now OSH can parse abuild in about 250 milliseconds. That's still too slow, but it's not blocking progress.

I plan to release OSH 0.4 at the end of this month. It will be able to run not just abuild, but also shell scripts from Aboriginal Linux and Debian.

After that, the stack will be empty again. I had to shave some yaks, but I didn't lose sight of the goal!

What Is a Linux Distro?

I didn't understand how Linux distros worked until pretty recently. It's useful to think of them as having (at least) these four components:

  1. A set of source tarballs from "upstream" sources: e.g. GNU, Linux, Apache, or LLVM.
  2. A "meta" build system that turns source tarballs into binary packages. This build system invariably uses shell scripts. Sometimes GNU make is used; sometimes Python is used; but there are always shell scripts.
  3. A script to create the root file system — i.e. to "bootstrap" the system so that it can build its own packages.
  4. A package manager that allows end users to install binary packages. For Debian-derived systems, this is apt; for Red Hat-derived systems (CentOS, Fedora, etc.), it's yum.

What's the Difference Between Distros?

I'm pleased by the diversity of the three distros I worked with because it gives me confidence that OSH is working:

  1. Debian: arguably the most popular distro, and one of the oldest. It has many derivatives like Ubuntu. The debootstrap script I ran normally runs under dash, which is one of the most incompatible shells.
  2. Alpine Linux: a "modern" distro for embedded systems and containers. It runs busybox ash.
  3. Aboriginal Linux: an educational project with a minimalist/embedded slant. However, it runs under bash, and uses many "bash-isms".

So not only am I testing shell scripts by different authors, I'm also testing OSH for compatibility with scripts written for different shell dialects.

Here is some more background on these projects and detail on what I did:

Debian

debootstrap assembles the Debian root file system from .deb packages. Roughly speaking, .debs are tarballs of binaries, scripts, and metadata. I parsed debootstrap with OSH back in October 2016.

It's ~2600 lines of shell (excerpt). I worked with this script a few years ago, and I remember it looking scary. There were weird incantations that I didn't understand. Now it's easy to read, which I think means I've spent too much time with shell :-)

What Now Works: I used OSH to build an Ubuntu Xenial image, chroot into it, and run commands. The sections below describe the fixes required to make this work.

Alpine Linux

Alpine Linux started out as a distro for embedded systems like routers, but it's also now used for containers in the cloud. Docker, Inc. sponsors it, and postmarketOS is based on it.

What Now Works:

  1. I ran alpine-chroot-install with OSH, and successfully built a system image.
  2. I entered the image using chroot, and built OSH in this environment. This OSH build is linked against musl libc.
  3. I built three .apk packages with abuild running under OSH-musl.
  4. I ran abuild verify to check that the packages looked reasonable.

Aboriginal Linux

Aboriginal Linux isn't a distro, per se. It's an educational project that looks like a distro. It answers the question: What is the smallest number of packages that will create a Linux system that can rebuild itself?

The project is now defunct. But the code still works, and I still find it interesting, e.g. from a security point-of-view.

It's ~3700 lines of bash (excerpt). It was the first project I parsed with OSH.

What Now Works:

  1. I built the i686 target using OSH. This builds a complete system image from source code. In contrast, debootstrap assembles an image from binary packages.
  2. I booted the resulting image in QEMU and got a shell prompt!

In summary, I tested OSH on a diverse set of shell scripts found in the wild, and fixed what was necessary to make them run.

I started this process after the last release, and I honestly didn't know how long it would take. There were more problems than I expected, but I was also able to fix them more quickly than expected.

Features Added

What features were missing?

Tracing Support

Some errors I ran into had obvious causes. For example, OSH would throw NotImplementedError when a program used ${s:1:2} (string slicing). Getting past this error by implementing slicing was simple.

Other errors required debugging thousands of lines of other people's shell scripts. So I needed to learn more about bash and debugging. This tip on making xtrace useful helped me. In bash, you can set the $PS4 variable so that traces include the filename and line number.

So I mimicked these debugging features in OSH:

Note that bash actually has a debugger called bashdb! Describing the way it works would be another post. In short, it uses hooks specified with the trap builtin, as well as several $BASH_* variables.

Shell Options for Strict Behavior

A recurring theme was relaxing OSH's strict behavior in order to accomodate common shell usage. However, I added the ability to opt in to the strict behavior, with set -o strict-control-flow, strict-array, and strict-errexit.

I'll address this topic in another blog post, but feel free to leave comments if you're curious.

Overhaul of Word Splitting and Evaluation

POSIX has quirky rules for the $IFS variable, which determines:

  1. How unquoted words are split, and
  2. How the read builtin splits fields.

I rewrote the buggy regex-based IFS-splitting with an explicit state machine. This is an interesting piece of code which I may explain in another blog post. It's in core/legacy.py. It turned a lot of red tests green.

Two Kinds of C-Escaped Strings

echo -e 'foo\n' and $'foo\n' are both ways to write C-escaped strings. Their relationship is the same as the relationship between [ and [[ — the former is dynamically parsed, and the latter is statically parsed.

(For example, dynamic parsing allows this: char=n; echo -e "1\\${char}2", but static parsing doesn't.)

I implemented these with a similar, but not identical, lexers, using the style described in my posts on lexing. I again found that metaprogramming is useful for avoiding code duplication.

Stripping Glob Prefixes and Suffixes With POSIX APIs

This is another feature that touches some computer science. I discovered that semantics that originate with ksh can't be efficiently expressed with POSIX APIs:

In theory, Python's API should be able to efficiently express the semantics of ${s%suffix} vs. ${s%%suffix}, so OSH used the strategy of translating globs to Python regexes. For example, the expression ${s%%*suffix} could be implemented with the regex .*?(suffix).

However, abuild uses character classes in globs, e.g. ${i%%[<>=]*}, which aren't straightforward to translate.

So I reimplemented these operators using the conventional, inefficient algorithm: a linear number of calls to fnmatch(), one for each position in the string! (in the worst case)

This makes the overall algorithm quadratic. If fnmatch() isn't linear, which it often isn't, then stripping glob prefixes and suffixes will be even slower than quadratic.

However this issue doesn't appear to arise in practice, as all shells use the slow algorithm. Of course, Oil will provide string manipulation functions that aren't slow in theory. I want the language to be safe to use in adversarial contexts.

Minor Features

Running the distro scripts required several other shell features. In most cases, I had already done the hard part: representing code with the lossless syntax tree. The implementation often "falls out" after choosing a good representation.

Shell WTFs

Reimplementing these shell quirks was both fun and depressing. As penance, I've been maintaining a wiki page of Shell WTFs (which is not well-organized).

I could blog every day about one of these and not be done for months. But I remind myself that my goal is to improve shell with the Oil language, not dwell on the past. Legacy behavior is only useful as far as it gives users an upgrade path to Oil.

Bugs Fixed

In addition to implementing features, I also found and fixed bugs in OSH.

File Descriptor Usage

As far as I know, a shell must handle file descriptors differently than any other Unix program. It can't open any files in the descriptor range 3-9, because shell scripts may use them directly.

To debug these issues, I used the /proc/$$/fd/ mechanism mentioned in OSH Runs Real Shell Programs. It's a nice way of showing the file descriptor state of a process.

Bugs Related to CPython's Buffering

In The Riskiest Part of the Project, I mentioned several difficulties with using CPython to write a Unix shell.

I encountered another problem: Python does its own buffering of file I/O. I believe this is on top of libc's buffering, although I haven't looked into it deeply.

Instead, I have to read a byte at a time from file descriptor 0. This seems inefficient, but I noticed that dash, mksh, and zsh all do the same thing (in C). For example, try:

$ strace zsh -c 'read x <<< "hello world"'

Other Bugs

What Was Not Done

I punted on a few things that weren't strictly necessary to build the distros, or which had easy workarounds:

Also note that these OSH builds are in a sense "shallow". I changed the shebang lines of the top-level scripts, which are thousands of lines long, but they often invoke more shell scripts with a #!/bin/bash or #!/bin/sh shebang line.

For example, building any Linux distro will require running dozens of configure scripts. Fortunately, OSH can already run those.

What's Next?

As mentioned, the upcoming OSH 0.4 release will include all this work.

After concentrating so much on the code, I now have several writing tasks backed up:

It would also be nice to get oil-dev@ going again. If you're interested in contributing, e-mail me or leave a comment.