Measuring Progress with Tests

2017-03-23

Last month, I mentioned my plan to focus on coding more than blogging. Now I have something to show for the silence.

But I haven't been completely silent — I invited contribution to the oil-dev mailing list. And I started writing more informative commit messages and sending links to the list.

This post has a table of interesting commits and e-mails, and quantifies the progress made on the spec test results. In summary, we now have 375 out of 573 test cases passing, up from 249 out of 478 about 3 weeks ago.

After that I'll write about plans for future work.

Spec Test Numbers Over Time

The table below tracks progress toward the goal of filling out the shell runtime in Python. Remember that the immediate goal is to write a bash-compatible shell called OSH.

Each row is a substantive code change:

The description column links to my oil-dev e-mails about the change.
The commits column links to readable commit descriptions on Github.
The results table column shows the state of spec tests as of that change. I copied the stats at the bottom of each results table to the test stats columns.

Date	Test Stats					Commits	Description
	total test cases	osh pass	osh fail	not run	results table
3/2	478	249	168	61	link	-	Welcome to oil-dev!. This is after I first published spec tests as HTML.
3/9	485	254	167	64	link	-	Oil-dev is Alive!.
3/13	517	295	168	54	link	43581dec 700c2238	Brace Detection / Brace Expansion
3/16	528	305	165	58	link	4da19bf6	Properly implement break/continue/return
3/17	532	311	163	58	-	8af0906d 0d48e07e	Small example commits: Implement $0 and fix $HOME Bug
3/20	558	340	160	58	link	c5af6970	Rewrite of the entire word evaluation pipeline: word substitution, splitting, reframing, joining, globbing. Then 4 follow-up commits to fix bugs that were exposed by the new data structures.
3/22	573	375	136	62	link	5502d076	More ops like `${#a}`, `${a%suffix}`, and `${a:-default}`. Array fixes.
delta	+95	+126	-32	+1

I'm happy with this progress. At the very least, it feels like an OSH interpreter can be finished. Writing a bash-compatible shell is a big project, but tools like the spec tests and ASDL have made it tractable.

The Oil language is a separate chunk of work, which isn't addressed by this post.

ASDL

The new word evaluation pipeline as of 3/20 is particularly notable: I extended the use of ASDL data types to the shell runtime. Prior to this commit, ASDL types were confined to the LST structure created at parse time.

I'd go as far to say that OSH is now written not in Python, but in Python+ASDL. I want to write about that, but it's at the bottom of a big pile. Leave a comment if you're curious.

oil-dev

I was happy to get help running the spec tests on different machines, but otherwise there hasn't been much activity on oil-dev.

I will chalk it up to the project being new and having no releases yet. I've gotten a lot of positive feedback, but there's a chicken and egg problem where people aren't motivated to write code for something they're not using yet.

I've also been churning the code a lot, which is good but probably disorienting. I hope it will settle down soon.

If you're interested in contributing, please leave a comment. Let me know if there's something I can do to make contributions easier.

What Features are Left?

(This section has a lot of detail, mostly for potential contributors. Casual readers can skip to the end.)

(1) Many of the 136 failures are actually assertions on bad input that will be turned into proper user-facing errors. So once I come up with a consistent error handling scheme, the number of failures should quickly dip below 100.

(2) You can see in the latest spec results table that some files aren't being run against OSH (the uncolored white rows):

append.test.sh: Bash extension to append to a string, like s+='suffix'.
assoc*.test.sh: Associative arrays. These are highly non-standard and have quirky behavior in bash. I plan to implement some version of associative arrays, but the details are fuzzy.
let.test.sh: arithmetic parsing. This is rarely-used and redundant with (( 1 + 2 )). It can be implemented on demand.
regex.test.sh: for [[ foo ~= ^[a-z]+$ ]]. Oops, these tests actually pass, and should be run! That means we have 586 total tests instead of 573. But I'm preserving the numbers above for a fair comparison over time.
var-ref.test.sh: for ${!var_name}. This feature is advanced, unintuitive, and can probably be replaced by associative arrays in most scripts, but I need to do more research.

(3) These are covered by tests and need to be implemented:

Pattern replacement ${var/pat/replace}
Slicing like ${str:1:3} and ${array[@]:5:7}
C-escaped strings like $'\t\n'

(4) These are covered by tests, but I probably won't implement them until there's demand:

Piping stderr with |&
Process substitution like <(echo hi) and >(tee out.txt)
Case fallthrough with ;;&

(5) What about shell features aren't covered by the spec tests?

Shell builtins are probably the biggest category of missing features. We have a basic architecture, but many of them aren't implemented, and none of them implement flags. There's no shift, cd, echo -n, read -d, etc.
Unparsed language features:
- extended glob: set -o extglob
- the time builtin, which takes a pipeline.
- coprocesses
There's also little in the way of an interactive shell. I'm focusing first on the shell as a programming language, so that the interactive parts have a solid foundation.

Note that shell scripts in the wild use features in a Pareto or "long tail" distribution. So even without some advanced features, I suspect OSH will run many real programs.

What Programs to Run?

The above commits allow OSH to run some of its own scripts. These didn't work three weeks ago, but now they do:

$ bin/osh ./spec.sh version-text  # part of the HTML table
$ bin/osh ./count.sh all  # count lines of code
$ bin/osh ./unit.sh all  # Run unit tests

As for shell scripts I didn't write, I still have the corpus of programs I used to test the parser, like:

Aboriginal Linux
debootstrap
Git
Four More Projects
And perhaps 15 more projects in wild.sh that I never mentioned. I've been collecting both popular scripts (like those related to "the cloud") and weird scripts (like Lisp interpreters and web frameworks.)

Leave a comment if you have suggestions. I'd like to use scripts that can be run and tested quickly. Aboriginal Linux and debootstrap have the nice property that you can test the resulting system image, but they take awhile to run.

Recap

I showed quantified progress on the spec tests. The e-mails in the table above describe some interesting code changes and a big architecture change — rewriting word evaluation with ASDL data types. The brace detection and expansion algorithms are also fun.

I talked about what features are left to implement, and what real programs I will try to run.

After OSH runs programs a few significant programs I didn't write, a 0.1 release will be appropriate. It feels like that will come soon, but no promises.