Measuring Progress with Tests


Last month, I mentioned my plan to focus on coding more than blogging. Now I have something to show for the silence.

But I haven't been completely silent — I invited contribution to the oil-dev mailing list. And I started writing more informative commit messages and sending links to the list.

This post has a table of interesting commits and e-mails, and quantifies the progress made on the spec test results. In summary, we now have 375 out of 573 test cases passing, up from 249 out of 478 about 3 weeks ago.

After that I'll write about plans for future work.

Spec Test Numbers Over Time

The table below tracks progress toward the goal of filling out the shell runtime in Python. Remember that the immediate goal is to write a bash-compatible shell called OSH.

Each row is a substantive code change:

  1. The description column links to my oil-dev e-mails about the change.
  2. The commits column links to readable commit descriptions on Github.
  3. The results table column shows the state of spec tests as of that change. I copied the stats at the bottom of each results table to the test stats columns.

Date Test Stats Commits Description
total test cases osh pass osh fail not run results table
3/2 478 249 168 61 link - Welcome to oil-dev!. This is after I first published spec tests as HTML.
3/9 485 254 167 64 link - Oil-dev is Alive!.
3/13 517 295 168 54 link 43581dec 700c2238 Brace Detection / Brace Expansion
3/16 528 305 165 58 link 4da19bf6 Properly implement break/continue/return
3/17 532 311 163 58 - 8af0906d 0d48e07e Small example commits: Implement $0 and fix $HOME Bug
3/20 558 340 160 58 link c5af6970 Rewrite of the entire word evaluation pipeline: word substitution, splitting, reframing, joining, globbing. Then 4 follow-up commits to fix bugs that were exposed by the new data structures.
3/22 573 375 136 62 link 5502d076 More ops like ${#a}, ${a%suffix}, and ${a:-default}. Array fixes.
delta +95 +126 -32 +1

I'm happy with this progress. At the very least, it feels like an OSH interpreter can be finished. Writing a bash-compatible shell is a big project, but tools like the spec tests and ASDL have made it tractable.

The Oil language is a separate chunk of work, which isn't addressed by this post.


The new word evaluation pipeline as of 3/20 is particularly notable: I extended the use of ASDL data types to the shell runtime. Prior to this commit, ASDL types were confined to the LST structure created at parse time.

I'd go as far to say that OSH is now written not in Python, but in Python+ASDL. I want to write about that, but it's at the bottom of a big pile. Leave a comment if you're curious.


I was happy to get help running the spec tests on different machines, but otherwise there hasn't been much activity on oil-dev.

I will chalk it up to the project being new and having no releases yet. I've gotten a lot of positive feedback, but there's a chicken and egg problem where people aren't motivated to write code for something they're not using yet.

I've also been churning the code a lot, which is good but probably disorienting. I hope it will settle down soon.

If you're interested in contributing, please leave a comment. Let me know if there's something I can do to make contributions easier.

What Features are Left?

(This section has a lot of detail, mostly for potential contributors. Casual readers can skip to the end.)

(1) Many of the 136 failures are actually assertions on bad input that will be turned into proper user-facing errors. So once I come up with a consistent error handling scheme, the number of failures should quickly dip below 100.

(2) You can see in the latest spec results table that some files aren't being run against OSH (the uncolored white rows):

(3) These are covered by tests and need to be implemented:

(4) These are covered by tests, but I probably won't implement them until there's demand:

(5) What about shell features aren't covered by the spec tests?

Note that shell scripts in the wild use features in a Pareto or "long tail" distribution. So even without some advanced features, I suspect OSH will run many real programs.

What Programs to Run?

The above commits allow OSH to run some of its own scripts. These didn't work three weeks ago, but now they do:

$ bin/osh ./spec.sh version-text  # part of the HTML table
$ bin/osh ./count.sh all  # count lines of code
$ bin/osh ./unit.sh all  # Run unit tests

As for shell scripts I didn't write, I still have the corpus of programs I used to test the parser, like:

Leave a comment if you have suggestions. I'd like to use scripts that can be run and tested quickly. Aboriginal Linux and debootstrap have the nice property that you can test the resulting system image, but they take awhile to run.


I showed quantified progress on the spec tests. The e-mails in the table above describe some interesting code changes and a big architecture change — rewriting word evaluation with ASDL data types. The brace detection and expansion algorithms are also fun.

I talked about what features are left to implement, and what real programs I will try to run.

After OSH runs programs a few significant programs I didn't write, a 0.1 release will be appropriate. It feels like that will come soon, but no promises.