Why Sponsor Oils? | blog | oilshell.org

Oil Is Being Implemented "Middle Out"

2022-03-31

The term "middle out" makes me laugh, but let's save humor for the end of this post.

First, I want to explain an issue that has confused many readers: the strategy of implementing a shell in Python, then semi-automatically translating it to C++.

For example, Nix contributors asked questions about this, and December's Language FAQs answered some of them. I also wrote Why Is it Written In Python? in 2019. Summary:

But I haven't written much about the C++ translation, other than a few posts mentioning our #mycpp tool (e.g. Progress in C++).

So this post uses the term "middle out" to explain it. I also repeat that I need help with this part of the project, so I'm collecting donations to pay a compiler engineer (not me). I show evidence that it absolutely can be done, including a plot of progress over the years.

Table of Contents
Language-Oriented Programming is "Middle Out"
Oil Is Expressed With DSLs
Size of Oil's Two Halves < Size of Bash
Plot: Progress in Two Directions
Caveats
Fallow Period
Oil 0.9.9 Made a Big Leap
Compiler Engineer Wanted
Funding In Progress
Five Ways You Can Help
What Doesn't Help
Conclusion
Appendices
HBO's Silicon Valley
More Writing on "Middle Out" and Translation

Language-Oriented Programming is "Middle Out"

I was surprised to find this term in a computer science paper*:

Language-Oriented Programming by M.P. Ward (1994, PDF)

It describes a method for "organizing the development of a large software system":

  1. Start by developing a formally specified, domain-oriented, very high-level language which is well-suited to developing "this kind of program"
  2. Implement the system using this "middle level" language
  3. Implement a compiler or translator or interpreter for the language

The abstract continues:

The “middle out” development style is compared and contrasted with the more usual “top down”, “bottom up” and “outside in” development methods.

* I can't find many citations to it, but it's still interesting.

Oil Is Expressed With DSLs

"Middle out" is how Oil is being developed! Our approach is more iterative, but clearly similar:

  1. Through experimentation, I found that Python, Zephyr ASDL, and regular languages can compactly represent a shell interpreter.
  2. We've been implementing the shell in this set of metalanguages.
  3. Now we have the task of translating this code to C++.

The idea behind this approach is exhaustive recognition and representation of shell's elaborate syntax and semantics.

It's been a consistent advantage over 5 years. For example, I described November's Oil 0.9.5 release as The Triumph of Lexer Modes because the technique helped me with tricky syntax issues like typed arguments to Oil procs, and extended globs in OSH.

Size of Oil's Two Halves < Size of Bash

Is this a roundabout strategy? It's not obvious, but it actually makes the system smaller, as noted in the 2019 FAQ. You get leverage if the "middle language" matches the problem domain well.

The updated line counts for translation show that:

The slogan I use is that bash is implemented with at least 142K lines of "groveling through backslashes and braces one at a time" in C. Oil is implemented in a completely different style, and future posts could show examples.

So despite the addition of the new Oil language, Oil is many times smaller than bash in terms of its source code.

For more color, the appendix mentions that The Implementation Language of Shell Has Always Been a Problem, with historical evidence.

Plot: Progress in Two Directions

This plot shows development progress through 80 releases of Oil, representing nearly 6 years of work!

Plot of Spec Test Progress
  1. The blue line is step 2 above. We're implementing dozens of shell features (aliases, process substitution, printf, ...) with Python-based DSLs.
  2. The red line is step 3 above. We're translating this high-level interpreter to fast C++.

In other words, the blue line goes from the "middle languages" up to shell, and the red line goes down to C++.

We're implementing Oil middle out!

Caveats

Does it look like Oil is almost done? The red line is converging on the blue line.

Not quite. Here's a summary of caveats I've mentioned in the past:

  1. The garbage collector works on small examples, but it it's not yet hooked up to oil-native. That is, the blue line tests the dumb_alloc variant, an interpreter that doesn't reclaim any memory. This obviously needs to be fixed.
  2. These spec tests measure OSH only. None of the tests for the Oil language pass under oil-native.
  3. Finishing the translation will involve a long tail of fixes. I expect the progress to slow down as the red line converges on the blue line.

However, we don't slide backwards. I like implementing a feature in Python and having it work "for free" in fast C++. The "middle out" strategy is pleasant in this respect!

Fallow Period

The gray rectangle highlights that the red line barely went up for a year — from summer 2020 to summer 2021.

What was I doing then? Working on the other 6 parts of the project, including the Oil language and the garbage collector. A sample of work:

So clearly the project is bottlenecked and should be parallelized. We need someone to work on the bottom half of the project while we work on the top half. These are largely separate tasks, requiring different skills.

(We've had several new contributors recently, who I'll acknowledge in release notes.)

Oil 0.9.9 Made a Big Leap

After another fallow period, I resumed work on translation last week. This generally involves a mix of the following:

In less than 4 days, we went from 1137 passing tests in Oil 0.9.8 to 1487 — the big leap in the red line. This is huge progress toward the blue line, which sits at 1774. I then released Oil 0.9.9.

Making these tests pass involved only 6 or 7 fixes, almost all in C++ code. One example is simply removing assert(0) stubs I had put in for library calls like posix::write() and posix::strerror(). The next post could show more details.

So recent progress gives me additional confidence that this approach will work.

Compiler Engineer Wanted

I just updated this page:

Here's a summary of skills sought, in order of importance:

  1. Hard-won C++ experience, including knowledge of portability.
  2. Understanding of garbage collectors.
  3. Comfort with a test-driven and terminal-based workflow. We need to make the red line go up!
  4. Understanding of static type systems.
  5. Python. (This can likely be learned on the job.)

General attributes:

  1. You should be a finisher. This is a project that needs a solid engineer. It's not a research project!
  2. Good communication skills. We should be able to talk and write about technical designs and coding strategies.
  3. You should be generally interested in the goals of the Oil project. If you want to use Oil in the future, this work should be fun!

Funding In Progress

I'm mentioning this job "softly" because it's not fully funded yet. We were recently approved for Github Sponsors, and I'm waiting to hear back on an NLnet grant application (after a short round of questions).

My goal is to find the best person (or people) for the job, regardless of their compensation requirements. I've set the Github sponsors goal at $25K / month, but I need your feedback on whether that can be achieved, and how it should be allocated.

One possibility is to spend $5K or so on "bounties" to make the red line go up. Perhaps the best candidate will emerge out of that process.

I also want to hire a technical writer, but let's do one thing at a time.

Five Ways You Can Help

I don't enjoy administrative work like fundraising and hiring. I prefer improving Oil (#project-updates) and writing technical blog posts (e.g. #parsing).

So honestly I would like to "offload" some of this work to users and readers. If you want a new shell, this could be one of the best uses of your time!

(1) Tell Your Friends and Coworkers About the Project.

This blog has several thousand technical readers, but there are millions of people who use shell.

(2) Help Raise Funds

(3) Donate to Help Pay the Compiler Engineer

Are you employed in the software industry? Do you spend more than an hour a month with shell? Then it's possible that donating $25/month or $100/month will pay for itself in the future by saving you time.

You can get a sponsor badge on your Github profile for just $10/month.

Remember, Oil is our upgrade path from bash to a better language and runtime. It's the only such project out of dozens of alternative shells.

I also set up $200/month and $2000/month tiers if you'd like your logo to appear on this website.

(4) Apply For the Job

If you have the time and the skills, you should apply!

Please contact me at andy@oilshell.org or on Zulip. It may take awhile to find the right person, but introducing yourself never hurts.

If you can make one new spec test pass (with my help), then you'll jump to the front of the queue. This is a very mechanical process. Start with the README.md and Contributing.

(5) Contribute to Oil's Code

I hope that you will keep contributing to the top half of Oil's code even if we meet our $25K/month goal.

Readers have offered to send me donations in the past, but I've declined. The reason is that I want to keep all contributors on equal footing. I don't want there to be "two tiers".

For the same reason, even with this fundraising project, I'm not personally accepting any money. Think of me as a "coordinator".

I'm accepting the reality that we need a specialized skill set that's not likely to show up in volunteer contributions. The garbage collector and type system work requires "global" and concentrated design knowledge.

In other words, I want to wall off the paid part of the project:

What Doesn't Help

The last section described 5 ways you can help. At the risk of being negative, I will insert a rude slogan here: Uninformed back-seat driving doesn't help get the shell done! Let me elaborate on this :-)

This is the first time I've called our strategy "middle out", but I've covered the similar topics in the past, including during an early "wrong turn". (Summary: OPy won't be fast enough, and we won't reuse any part of the Python interpreter.)

But here's what often happens when I write about Oil's implementation: commenters derail the thread by proposing solutions that they haven't tried.

For example, when I set up the oil-dev@ mailing list in 2017 (now Zulip), one of the first messages was "Ewww C++" (yes really).

This annoyed me, but now I see it more positively. What I hear now is I'm interested in contributing to Oil but I don't know C++. So it's a form of interest that's low effort, but flattering.

Another comment I often get is: Why not PyPy or Cython?

I've answered these questions in past threads, and will try to write detailed posts on them in the future. Some relevant issues are:

Notes for the curious: Why Isn't Oil Written in Rust, D, Nim, etc.? (Zulip)

More things to keep in mind:

You could say that the job of the compiler engineer is to work on the unsafe core of the project. An implementation of a garbage collector is inherently unsafe because its job is largely to manipulate raw memory.

But the interpreter that uses it must be 100% statically typed (with MyPy) before it can be translated to C++. And the generated C++ is memory safe by virtue of garbage collection. It's impossible to express bugs like use-after-free or double-free in the source language!

So please refrain from low effort comments until I write those posts. I will simply reply with a link to this section.

On the other hand:

Conclusion

To summarize:

Does this post make sense? I expect it will help some readers understand the project, but others will still question the motivation. I agree it's somewhat unusual, but as I wrote recently: if you want a different result, you have to do something different.

Please ask questions in the comments, and feel free to make suggestions about fundraising, which I know little about. I'll use my answers to inform future blog posts.

Appendices

HBO's Silicon Valley

"Middle Out" is also used to describe the compression algorithm by fictional Silicon Valley startup Pied Piper (2014).

This is where I first heard the term. I thought it was invented by writers — a play on the common computer science terms "top-down" and "bottom-up". But then I found it in Ward's 1994 paper on Language-Oriented Programming!

You can think of it either way :-)

More Writing on "Middle Out" and Translation

As mentioned, I want to write a FAQ about implementation alternatives:

More on the implementation:

And: