blog | oilshell.org

Oil 0.8.pre5 - Progress in C++

2020-05-25

This is the latest version of Oil, a Unix shell:

Oil version 0.8.pre5 - Source tarballs and documentation.

To build and run it, follow the instructions in INSTALL.txt. If you're new to the project, see Why Create a New Shell? and the 2019 FAQ.

Table of Contents
Highlights
Closed Issues
Semi-Automatic Translation to C++
Two Analogies: Go Compiler and TeX
Recap
Details
TODO on Translation
DSLs and Code Generation
Wrapping Shell Dependencies
Open Problems
Plan for 2020
What's Next?
Appendix: Selected Metrics

Highlights

I'd still like more bug reports! See How To Test OSH.

(+) Test harness bug that will be fixed: 1539 should be 1560.

Closed Issues

#758 Incorrect fnmatch due to extended glob syntax
#754 Implement test -u and test -g
#753 ${var+foo} shouldn't cause error when 'set -o nounset'
#727 1 ? (a=42) : b shouldn't require parentheses

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

What's all this about C++? Here are two analogies to help explain what's going on.

  1. GopherCon 2014: Go from C to Go by Russ Cox (YouTube, 31 minutes). It's time for the Go compilers to be written in Go, not in C. I'll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code. (c2go is the one-off tool that helped with translation, analogous to mycpp.)

    The flavor of the work is similar to what I'm doing with Oil, but there's a key difference: Oil's source will remain in statically typed Python and DSLs like Zephyr ASDL for the forseeable future. We won't be writing C++ by hand.

    Static types play an important role in both translations.

  2. How to compile the source code of TeX. Knuth wrote TeX in a dialect of Pascal, but it's not compiled with a Pascal compiler. Instead, it's translated to C and compiled with a C compiler.

The common thread is that we want to preserve the correctness of an existing codebase. Oil runs thousands of lines of existing bash scripts, including some of the biggest shell programs in the world.

Rewriting by hand would introduce a lot of bugs, so instead we write a custom translator and apply it to the codebase. In Oil's case, there are more code generators to remove dynamic typing and reflection, discussed below.

Recap

In addition to the new spec test metrics, these line counts give a feel for recent progress:

For comparison, the slow OSH interpreter consists of about 30K lines of Python code. This doesn't include the Oil language, which I haven't started translating.

The translation isn't going as quickly as I'd like it to, but it's working, and I'm solving interesting technical problems along the way.

As far as I can tell, this unusual process is the shortest path to a fast shell. (As mentioned in January, I encourage parallel efforts. Feel free to ask me about this.)

Details

I keep a log of the translation process on Zulip.

More background: the March recap had a similar section with Zulip threads: mycpp: The Good, the Bad, and the Ugly.

TODO on Translation

Even though about two-thirds of OSH translates to C++ and compiles, and much of it runs correctly, there's still a lot of work left.

Oil is simply a big project: recall that bash consists of over 140K lines of code. I estimate that OSH implements 80% of bash, with significant fixes. And Oil is a new language with many features on top.

DSLs and Code Generation

Oil's source code will remain in high-level languages for the forseeable future, so we need to enhance the code generators to produce correct and fast C++.

Wrapping Shell Dependencies

In the January blog roadmap, I mentioned that there are two technical problems with translation.

One of them was wrapping native C code, which I no longer see as a risk. It's just work. The shell has three main dependencies:

  1. libc. I've wrapped pure functions like fnmatch() in C++, and this is straightforward.
  2. The Unix kernel. Wrapping functions like execve() is similar to wrapping libc, but errno handling is an issue I want to revisit. (These Unix comics are relevant.)
  3. GNU readline for interactive features. To be honest, I'd rather punt interactive features to Oil code, analogous to ble.sh. But Oil should have basic readline support.

Open Problems

Plan for 2020

As mentioned in January, the bare minimum for "success" is when OSH can replace bash for my own use.

After reviewing all this work, I still feel like OSH can be "finished" in 2020. I won't be extremely surprised if isn't, but it seems reasonable.

On the other hand, it seems clear that the Oil language will remain a prototype for the remainder of 2020. I haven't gotten much feedback on it, probably because there isn't much documentation.

This is disappointing, but I don't have a solution to this problem.

In short, the project's focus has necessarily narrowed. The only two goals on my radar are:

  1. The OSH language should be translated to C++, tested, and optimized.
  2. The Oil language should be divorced from the Python runtime and similarly translated. This will almost certainly bleed into 2021.

I should write a longer blog post about this, but almost everything else is cut. Oil will be more like a library than a shell. (As mentioned, I'll need basic GNU readline support for my own use.)

The docs are another sore point. I've mostly been writing them "on demand" (whenever anyone asks). It seems like that pattern will continue, given all the other work that needs to be done.

What's Next?

Feel free to ask questions in the comments or on Zulip!

Appendix: Selected Metrics

Let's compare this release with the previous one, version 0.8.pre4.

Native Code Metrics

We have nearly 70K lines of C++ code, including over 20K translated by mycpp.

The size of the osh_eval.opt.stripped executable differs between GCC and Clang, an I don't yet know why. In any case, the increase is consistent with translating and compiling more lines of code.

Test Results

OSH spec tests:

There was no work on the Oil language! I'm a bit concerned by that, which is one reason for the scope reduction mentioned above.

Line Counts

We have ~300 new significant lines of code in OSH:

And ~500 new physical lines of code:

Benchmarks

The parsing benchmark didn't change much:

Nor did the runtime benchmark: