Why Sponsor Oils? | blog | oilshell.org
This is the latest version of Oil, a Unix shell:
Oil version 0.8.pre5 - Source tarballs and documentation.
To build and run it, follow the instructions in INSTALL.txt. If you're new to the project, see Why Create a New Shell? and the 2019 FAQ.
shopt -s extglob
is now respected.I'd still like more bug reports! See How To Test OSH.
(+) Test harness bug that will be fixed: 1539 should be 1560.
#758 | Incorrect fnmatch due to extended glob syntax |
#754 | Implement test -u and test -g |
#753 | ${var+foo} shouldn't cause error when 'set -o nounset' |
#727 | 1 ? (a=42) : b shouldn't require parentheses |
What's all this about C++? Here are two analogies to help explain what's going on.
GopherCon 2014: Go from C to Go by Russ Cox (YouTube, 31 minutes). It's time for the Go compilers to be written in Go, not in C. I'll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code. (c2go is the one-off tool that helped with translation, analogous to mycpp.)
The flavor of the work is similar to what I'm doing with Oil, but there's a key difference: Oil's source will remain in statically typed Python and DSLs like Zephyr ASDL for the forseeable future. We won't be writing C++ by hand.
Static types play an important role in both translations.
How to compile the source code of TeX. Knuth wrote TeX in a dialect of Pascal, but it's not compiled with a Pascal compiler. Instead, it's translated to C and compiled with a C compiler.
The common thread is that we want to preserve the correctness of an existing codebase. Oil runs thousands of lines of existing bash scripts, including some of the biggest shell programs in the world.
Rewriting by hand would introduce a lot of bugs, so instead we write a custom translator and apply it to the codebase. In Oil's case, there are more code generators to remove dynamic typing and reflection, discussed below.
In addition to the new spec test metrics, these line counts give a feel for recent progress:
osh_parse.cc
has 9,867 lines of code (raw
data).
I showed that the OSH parser can be gradually
refactored and translated to C++. Notably, the result is as fast as
hand-written C code.osh_eval.cc
has 16,491 lines of code. In addition to the parser, we
translate the word and arithmetic evaluators.osh_eval.cc
has 20,875 lines of code. We translate the command
evaluator, including assignments. So the resulting C++ interpreter can run
code like readonly x=y; echo $x
. Details below.For comparison, the slow OSH interpreter consists of about 30K lines of Python code. This doesn't include the Oil language, which I haven't started translating.
The translation isn't going as quickly as I'd like it to, but it's working, and I'm solving interesting technical problems along the way.
As far as I can tell, this unusual process is the shortest path to a fast shell. (As mentioned in January, I encourage parallel efforts. Feel free to ask me about this.)
I keep a log of the translation process on Zulip.
declare -g foo=bar
now work, so we have a path
to translate more shell builtins to C++.map[string, int]
.osh_eval.cc
doesn't even
run ls
, because it's external process! But it understands the hairy
details of word evaluation ${}
, arithmetic evaluation $(( ))
, brace
expansion {a,b}
, and more.More background: the March recap had a similar section with Zulip threads: mycpp: The Good, the Bad, and the Ugly.
Even though about two-thirds of OSH translates to C++ and compiles, and much of it runs correctly, there's still a lot of work left.
Oil is simply a big project: recall that bash consists of over 140K lines of code. I estimate that OSH implements 80% of bash, with significant fixes. And Oil is a new language with many features on top.
Oil's source code will remain in high-level languages for the forseeable future, so we need to enhance the code generators to produce correct and fast C++.
try
/ finally
for scoped destruction,
but C++ doesn't have finally
. We should probably use Python's context
managers, and have mycpp translate such blocks into constructors and
destructors.#ifdef
. Exceptions are more like structs than
classes, so they could be naturally expressed in ASDL.In the January blog roadmap, I mentioned that there are two technical problems with translation.
One of them was wrapping native C code, which I no longer see as a risk. It's just work. The shell has three main dependencies:
fnmatch()
in C++, and
this is straightforward.execve()
is similar to wrapping
libc, but errno
handling is an issue I want to revisit. (These
Unix comics are relevant.)yield
, which I can't (or don't want to) use in C++. I might
rewrite it with fork()
and write()
to a pipe.
yield
). A few
weeks ago, I played with the shell and C code in his 2014 explanation of
the coroutine prime number sieve (PDF).As mentioned in January, the bare minimum for "success" is when OSH can replace bash for my own use.
After reviewing all this work, I still feel like OSH can be "finished" in 2020. I won't be extremely surprised if isn't, but it seems reasonable.
On the other hand, it seems clear that the Oil language will remain a prototype for the remainder of 2020. I haven't gotten much feedback on it, probably because there isn't much documentation.
This is disappointing, but I don't have a solution to this problem.
In short, the project's focus has necessarily narrowed. The only two goals on my radar are:
I should write a longer blog post about this, but almost everything else is cut. Oil will be more like a library than a shell. (As mentioned, I'll need basic GNU readline support for my own use.)
The docs are another sore point. I've mostly been writing them "on demand" (whenever anyone asks). It seems like that pattern will continue, given all the other work that needs to be done.
errexit
(issue
709). I'd also like to resume work on Running ble.sh With
Oil.Feel free to ask questions in the comments or on Zulip!
Let's compare this release with the previous one, version 0.8.pre4.
We have nearly 70K lines of C++ code, including over 20K translated by mycpp.
osh_eval.cc
osh_eval.cc
The size of the osh_eval.opt.stripped
executable differs between GCC and
Clang, an I don't yet know why. In any case, the increase is consistent with
translating and compiling more lines of code.
OSH spec tests:
There was no work on the Oil language! I'm a bit concerned by that, which is one reason for the scope reduction mentioned above.
We have ~300 new significant lines of code in OSH:
And ~500 new physical lines of code:
The parsing benchmark didn't change much:
Nor did the runtime benchmark: