How I Use Tests

2017-06-22 (Last updated 2019-03-23)

In the last post, I said that I would write about my work on the OSH runtime. Before doing that, I'll give you a sense for how I'm working.

In short, I'm using test-driven development, with bespoke test harnesses written in shell and Python. (I wrote about bespoke code generators last December, in a diversion on searching for code that matches a spec.)

This approach isn't unusual. The authors of the now-defunct pdksh wrote their own test framework to clone AT&T's ksh. Two active forks of pdksh, the OpenBSD shell and mksh, use derivatives of these tests:

regress/bin/ksh in OpenBSD (Github)
- th Harness in Perl
mksh files (Launchpad). See check.t and check.pl.

(I learned about these test cases after starting OSH, and I'd like to eventually use them.)

Table of Contents

Four Types of Test

The Flow

Examples of Gold Tests

Four Types of Test

OSH has four test harnesses for four types of test. They are conveniently named test/{wild,unit,spec,gold}.sh in the oil repo.

Wild tests run the parser on shell scripts found in the wild, and produce a pretty-printed ASDL representation. I check that the parse doesn't fail, but I make no assertion on the output. I reported results from this kind of test in posts like Four More Projects Parsed.
Unit tests are written in Python, using the built-in unittest module. They're useful for exhaustively testing tricky code.
Spec tests use the sh_spec.py script to run shell snippets against many shells. It has a little language for making assertions on stdout, stderr, and the exit code. I've shown spec test results as HTML in several recent blog posts. Clicking through lets you see the code for a test case.
Gold tests run a shell script under both bash and OSH, and compare the output. Thus, the assertions are implicit and you don't have to write them by hand.

The test/gold.sh framework looks like this:

_compare() {
  "$@" >_tmp/left.txt  # run with shell in shebang line
  local left_status=$?

  bin/osh "$@" >_tmp/right.txt  # run with OSH
  local right_status=$?

  # ... compare output and status
}

# Test cases: run the command under two shells
_compare ./configure
_compare build/actions.sh gen-module-init

One reason I'm writing a Unix shell is that I've found tiny scripts like this to be pleasant and productive. I want my software to work well, and shell helps me achieve that.

The Flow

I pick shell scripts to run as gold tests, which uncovers unimplemented features. The implicit assertions are a rough check for correctness. Then I nail down the exact behavior with explicit assertions using spec tests.

For example, I use set -o errexit in all my scripts, so the gold tests quickly revealed that I needed to implement it. Then I wrote more than a dozen or spec test cases for it:

Cases 9 to 24 in sh-options.test.sh

Scanning across rows reveals differences between shells:

dash doesn't implement the (( )) arithmetic construct, so that case is marked N-I for not implemented.
You can see that bash is the only shell that ignores a failure within command sub, e.g. $(echo one; false; echo two).
All shells ignore a failure within a local assignment (but not within a global assignment), because local it a builtin with its own exit code.

In keeping with its philosophy of being more strict, OSH fixes the latter two issues.

If the spec tests are too coarse or become too numerous, then I switch to unit tests. (This happened today when implementing flag parsing for shell builtins.)

Examples of Gold Tests

Over the last few weeks, these cases prioritized what shell features I implemented:

Running the OSH ./configure script under bash and OSH. For example, this caught the fact that negation of the exit code, e.g. if ! cc ..., wasn't implemented.
Running parts of the OVM build like gen-module-init. (This shell function generates C code to initialize statically-linked Python extension modules.)
scripts/count.sh — My line count script, shown in Project Metrics. It uses brace expansion, which motivated me to implement that feature in OSH a few months ago.
benchmarks/startup.sh — A script to test the startup time of OSH vs. other shells and interpreters. This exposed a bug with redirects in pipelines, e.g. strace python 2>&1 | wc -l.

Confusingly, because the test frameworks are shell scripts themselves, we can use them as gold tests:

test/wild.sh. Parse shell scripts, produce a pretty-printed LST, and a bash to Oil translation.
test/unit.sh. Find and run all the Python unit tests.
test/spec.sh. Run individual spec tests.
test/spec-runner.sh. Run spec tests in parallel and produce an HTML report.

In other words, the OSH test frameworks run under OSH.

The next post was going to be a log of what I did in the last few weeks, titled The Long Slog Through a Shell.

But in writing this post, I realized I have more thoughts about tests, which are higher level and forward looking. So the next post will be How I Would Like to Use Tests.

How I Use Tests

Four Types of Test

The Flow

Examples of Gold Tests

Next