Home

Measuring Progress with Tests

2017-03-23

I mentioned last month that I planned to focus on coding more than blogging, and I now have something to show for the silence.

But I haven't been completely silent — I invited contribution to the oil-dev mailing list. And I started writing more informative commit messages and sending links to the list.

This post has an index of interesting commits and e-mails in a nice format, and quantifies the progress made on the spec test results. In summary, we now have 375 out of 573 test cases passing, up from 249 out of 478 about 3 weeks ago.

Spec Test Numbers Over Time

The table below tracks progress toward the goal of filling out the shell runtime in Python. Remember that Our immediate goal is to write a bash-compatible shell called OSH.

Each row is a substantive code change:

  1. The description column links to my oil-dev e-mails.
  2. The commits column links to readable change descriptions on Github.
  3. The results table shows the state of spec tests. I copied the stats at the bottom of each table to the test stats columns.

Date Test Stats Commits Description
total test cases osh pass osh fail not run results table
3/2 478 249 168 61 link - Welcome to oil-dev!. This is after I first published spec tests as HTML.
3/9 485 254 167 64 link - Oil-dev is Alive!.
3/13 517 295 168 54 link 43581dec 700c2238 Brace Detection / Brace Expansion
3/16 528 305 165 58 link 4da19bf6 Properly implement break/continue/return
3/17 532 311 163 58 - 8af0906d 0d48e07e Small example commits: Implement $0 and fix $HOME Bug
3/20 558 340 160 58 link c5af6970 Rewrite of the entire word evaluation pipeline: word substitution, splitting, reframing, joining, globbing. Then 4 follow-up commits to fix bugs that were exposed by the new data structures.
3/22 573 375 136 62 link 5502d076 More ops like ${#a}, ${a%suffix}, and ${a:-default}. Array fixes.
delta +95 +126 -32 +1

I'm happy with this progress. At the very least, it feels like an OSH interpreter can be finished. Writing a bash-compatible shell is a big project, but tools like the spec tests and ASDL have made it tractable.

The Oil language is another huge chunk of work, but I'll talk about that later.

ASDL

The new word evaluation pipeline is notable because I extended my use of ASDL data types to the runtime. Prior to the 3/20 commit, they were confined to the LST structure created at parse time. I'd go as far to say that OSH is now written not in Python, but in Python+ASDL.

I want to write about that, but it's at the bottom of a big pile. Leave a comment if you're curious about this.

oil-dev

I was happy to get help running the spec tests on different machines, but otherwise there hasn't been much activity on oil-dev.

I will chalk it up to the project being new, and having no releases yet. I've gotten a lot of positive feedback, but there's a chicken and egg problem where people are not motivated to write for something they're not using yet.

If you're interested in contributing, please leave a comment. Let me know if there something you think I can do to encourage contributions.

What Features are Left?

(This section has a lot of detail, mostly for potential contributors.)

Many of the 136 failures are actually assertions that can be turned into proper user-facing errors. Once I make the error handling consistent, I suspect the number of failures will quickly dip below 100.

You can see in the latest spec results table that some files aren't being run against OSH (the uncolored white rows):

But what about shell features aren't covered by the 586 spec tests?

Note that shell programs use features in a Pareto or "long tail" distribution. So even without these features, I suspect OSH can run many real programs.

What Programs Will I Try to Run?

To start, the features implemented in the above commits allow OSH to run some of its scripts:

$ bin/osh ./spec.sh version-text  # part of the HTML table
$ bin/osh ./count.sh all  # count lines of code
$ bin/osh ./unit.sh all  # Run unit tests

None of those worked three weeks ago.

As for shell script I didn't write, I still have the corpus of programs I used to test the parser, like:

Leave a comment if you have suggestions. Ideally I'd like to have scripts that can be tested quickly. Aboriginal Linux and debootstrap have the nice property that you can test the resulting system image, but they take awhile to run.

Recap

I showed quantified progress on the spec tests. The e-mails in the table above describe some interesting code changes and a big architecture change — rewriting word evaluation with ASDL data types.

I talked about what features are left to implement, and what real programs I will try to run.

After it runs programs I didn't write, I believe an OSH 0.1 release is appropriate.


Discuss this post on Reddit.
Get notified about new posts via @oilshellblog on Twitter.