blog | oilshell.org
The term "middle out" makes me laugh, but let's save humor for the end of this post.
First, I want to explain an issue that has confused many readers: the strategy of implementing a shell in Python, then semi-automatically translating it to C++.
For example, Nix contributors asked questions about this, and December's Language FAQs answered some of them. I also wrote Why Is it Written In Python? in 2019. Summary:
But I haven't written much about the C++ translation, other than a few posts mentioning our #mycpp tool (e.g. Progress in C++).
So this post uses the term "middle out" to explain it. I also repeat that I need help with this part of the project, so I'm collecting donations to pay a compiler engineer (not me). I show evidence that it absolutely can be done, including a plot of progress over the years.
I was surprised to find this term in a computer science paper*:
Language-Oriented Programming by M.P. Ward (1994, PDF)
It describes a method for "organizing the development of a large software system":
- Start by developing a formally specified, domain-oriented, very high-level language which is well-suited to developing "this kind of program"
- Implement the system using this "middle level" language
- Implement a compiler or translator or interpreter for the language
The abstract continues:
The “middle out” development style is compared and contrasted with the more usual “top down”, “bottom up” and “outside in” development methods.
* I can't find many citations to it, but it's still interesting.
"Middle out" is how Oil is being developed! Our approach is more iterative, but clearly similar:
The idea behind this approach is exhaustive recognition and representation of shell's elaborate syntax and semantics.
It's been a consistent advantage over 5 years. For example, I described November's Oil 0.9.5 release as The Triumph of Lexer Modes because the technique helped me with tricky syntax issues like typed arguments to Oil procs, and extended globs in OSH.
Is this a roundabout strategy? It's not obvious, but it actually makes the system smaller, as noted in the 2019 FAQ. You get leverage if the "middle language" matches the problem domain well.
The updated line counts for translation show that:
The slogan I use is that bash is implemented with at least 142K lines of "groveling through backslashes and braces one at a time" in C. Oil is implemented in a completely different style, and future posts could show examples.
So despite the addition of the new Oil language, Oil is many times smaller than bash in terms of its source code.
For more color, the appendix mentions that The Implementation Language of Shell Has Always Been a Problem, with historical evidence.
This plot shows development progress through 80 releases of Oil, representing nearly 6 years of work!
printf, ...) with Python-based DSLs.
In other words, the blue line goes from the "middle languages" up to shell, and the red line goes down to C++.
We're implementing Oil middle out!
Does it look like Oil is almost done? The red line is converging on the blue line.
Not quite. Here's a summary of caveats I've mentioned in the past:
oil-native. That is, the blue line tests the
dumb_allocvariant, an interpreter that doesn't reclaim any memory. This obviously needs to be fixed.
However, we don't slide backwards. I like implementing a feature in Python and having it work "for free" in fast C++. The "middle out" strategy is pleasant in this respect!
The gray rectangle highlights that the red line barely went up for a year — from summer 2020 to summer 2021.
What was I doing then? Working on the other 6 parts of the project, including the Oil language and the garbage collector. A sample of work:
So clearly the project is bottlenecked and should be parallelized. We need someone to work on the bottom half of the project while we work on the top half. These are largely separate tasks, requiring different skills.
(We've had several new contributors recently, who I'll acknowledge in release notes.)
After another fallow period, I resumed work on translation last week. This generally involves a mix of the following:
In less than 4 days, we went from 1137 passing tests in Oil 0.9.8 to 1487 — the big leap in the red line. This is huge progress toward the blue line, which sits at 1774. I then released Oil 0.9.9.
Making these tests pass involved only 6 or 7 fixes, almost all in C++ code.
One example is simply removing
assert(0) stubs I had put in for library calls
posix::strerror(). The next post could show more
So recent progress gives me additional confidence that this approach will work.
I just updated this page:
Here's a summary of skills sought, in order of importance:
I'm mentioning this job "softly" because it's not fully funded yet. We were recently approved for Github Sponsors, and I'm waiting to hear back on an NLNet grant application (after a short round of questions).
My goal is to find the best person (or people) for the job, regardless of their compensation requirements. I've set the Github sponsors goal at $25K / month, but I need your feedback on whether that can be achieved, and how it should be allocated.
One possibility is to spend $5K or so on "bounties" to make the red line go up. Perhaps the best candidate will emerge out of that process.
I also want to hire a technical writer, but let's do one thing at a time.
I don't enjoy administrative work like fundraising and hiring. I prefer improving Oil (#project-updates) and writing technical blog posts (e.g. #parsing).
So honestly I would like to "offload" some of this work to users and readers. If you want a new shell, this could be one of the best uses of your time!
(1) Tell Your Friends and Coworkers About the Project.
This blog has several thousand technical readers, but there are millions of people who use shell.
(2) Help Raise Funds
(3) Donate to Help Pay the Compiler Engineer
Are you employed in the software industry? Do you spend more than an hour a month with shell? Then it's possible that donating $25/month or $100/month will pay for itself in the future by saving you time.
You can get a sponsor badge on your Github profile for just $10/month.
Remember, Oil is our upgrade path from bash to a better language and runtime. It's the only such project out of dozens of alternative shells.
I also set up $200/month and $2000/month tiers if you'd like your logo to appear on this website.
(4) Apply For the Job
If you have the time and the skills, you should apply!
Please contact me at
email@example.com or on Zulip. It may
take awhile to find the right person, but introducing yourself never hurts.
If you can make one new spec test pass (with my help), then you'll jump to the front of the queue. This is a very mechanical process. Start with the README.md and Contributing.
(5) Contribute to Oil's Code
I hope that you will keep contributing to the top half of Oil's code even if we meet our $25K/month goal.
Readers have offered to send me donations in the past, but I've declined. The reason is that I want to keep all contributors on equal footing. I don't want there to be "two tiers".
For the same reason, even with this fundraising project, I'm not personally accepting any money. Think of me as a "coordinator".
I'm accepting the reality that we need a specialized skill set that's not likely to show up in volunteer contributions. The garbage collector and type system work requires "global" and concentrated design knowledge.
In other words, I want to wall off the paid part of the project:
The last section described 5 ways you can help. At the risk of being negative, I will insert a rude slogan here: Uninformed back-seat driving doesn't help get the shell done! Let me elaborate on this :-)
This is the first time I've called our strategy "middle out", but I've covered the similar topics in the past, including during an early "wrong turn". (Summary: OPy won't be fast enough, and we won't reuse any part of the Python interpreter.)
But here's what often happens when I write about Oil's implementation: commenters derail the thread by proposing solutions that they haven't tried.
For example, when I set up the
oil-dev@ mailing list in 2017 (now
Zulip), one of the first messages was "Ewww C++" (yes really).
This annoyed me, but now I see it more positively. What I hear now is I'm interested in contributing to Oil but I don't know C++. So it's a form of interest that's low effort, but flattering.
Another comment I often get is: Why not PyPy or Cython?
I've answered these questions in past threads, and will try to write detailed posts on them in the future. Some relevant issues are:
fork(), libc, signals
Notes for the curious: Why Isn't Oil Written in Rust, D, Nim, etc.? (Zulip)
More things to keep in mind:
You could say that the job of the compiler engineer is to work on the unsafe core of the project. An implementation of a garbage collector is inherently unsafe because its job is largely to manipulate raw memory.
But the interpreter that uses it must be 100% statically typed (with MyPy) before it can be translated to C++. And the generated C++ is memory safe by virtue of garbage collection. It's impossible to express bugs like use-after-free or double-free in the source language!
So please refrain from low effort comments until I write those posts. I will simply reply with a link to this section.
On the other hand:
Does this post make sense? I expect it will help some readers understand the project, but others will still question the motivation. I agree it's somewhat unusual, but as I wrote recently: if you want a different result, you have to do something different.
Please ask questions in the comments, and feel free to make suggestions about fundraising, which I know little about. I'll use my answers to inform future blog posts.
"Middle Out" is also used to describe the compression algorithm by fictional Silicon Valley startup Pied Piper (2014).
This is where I first heard the term. I thought it was invented by writers — a play on the common computer science terms "top-down" and "bottom-up". But then I found it in Ward's 1994 paper on Language-Oriented Programming!
You can think of it either way :-)
As mentioned, I want to write a FAQ about implementation alternatives:
More on the implementation: