Technical Issues And Risks

2020-08-17

The last post described A Plan for Oil 0.8 and 0.9. Roughly speaking, we'll first have a pure interpreter, and then one with I/O. It should run large shell scripts, but not interactive programs.

Given recent progress, this feels doable. But even if we make those two releases, it still falls short of my goals for 2020.

So this is a good time to discuss the remaining technical issues with Oil, which I mentioned in January Blog Roadmap. The main risk is memory management, which you could call the "deallocation problem", or just garbage collection. After that, there are a couple other issues which are "just work", rather than risks.

I'm knee deep in garbage collection right now, so this post will be brief. There are more details on Zulip, and I welcome help. I've already learned a lot from discussions with Max Bernstein and ridiculousfish.

Table of Contents

pgen2 Parser Generator

The Oil Expression Language Shouldn't be Metacircular Python

Deferred Issues: What the Interactive Shell Depends On

Idea: The Interactive Shell In "User Space"

Recap

Let's recap briefly, to give context to the technical issues.

Blog Posts

I'm still following the Summer Blog Roadmap:

A Plan for Oil 0.8 and 0.9
- Features Cut in 2020. The interactive shell is cut for 2020, and realistically speaking, it might be cut for all of 2021 as well.
How Can Oil Be Useful Before It's Complete? The plan proposed that I focus on the Oil language itself. I cut a lot out of the language in January, but I'm excited by the focus of what's left.
Why Use Oil? I drafted this doc and put it on the home page. Feedback is welcome. Recall that I also drafted the Oil Language Idioms page.

Releases

Each of the 0.x releases has focused on a major area.

The 0.6.0 release in July 2019 was about the interactive shell (in Python). Announcement.
The 0.7.0 release in January 2020 was about the Oil language (in Python). (Announcement).

Upcoming:

Oil 0.8.0 is about oil-native, and should be out shortly. There are some minor things left, as described in the last post.
My original thought was that Oil 0.9.0 should have shell I/O.

However, the new compute benchmarks show that memory usage is a problem. You can run oil-native in a loop, and it will use hundreds of megabytes of memory without deallocating everything!

So it might make sense to change directions. I like to work on the riskiest parts of the project first. Getting oil-native to compile and run was a risk, but now that it works, the deallocation problem has been exposed.

Immediate Issues

Garbage Collection

I discussed this in the last post, and it's been on my mind even more since then. Roughly speaking, there are three possible solutions:

Reference counting. CPython and QuickJS use this strategy (plus a cycle collector).
Mark and Sweep, a tracing GC algorithm. Lua and most JavaScript VMs use this strategy.
Copying collection, another tracing algorithm. The femtolisp interpreter (used to bootstrap Julia) and the es shell (dormant) use this strategy.

(Generational garbage collectors use a mix of strategies, but let's leave that aside for now.)

Since the last post, I've started a discussion on Reddit, had discussions via e-mail, and written benchmarks to explore the problem space.

I'm not sure what will happen, but I'm leaning towards copying GC, for reasons I may write about later. If you want all the gory details, ask me on Zulip.

(This is issue 785.)

pgen2 Parser Generator

I mentioned this task in the March Recap, and I'm still looking for help. It's a very self-contained project that's essential to:

the Oil expression language (see The Simplest Explanation of Oil)
Egg Expressions
proc signatures, so arguments can be named.

Contributing is as easy as it's ever been. The code and build process are in better shape, thanks to feedback from Josh Nelson. I welcome more feedback / complaints.

(This is issue 594.)

The Oil Expression Language Shouldn't be Metacircular Python

This can be done in parallel with the pgen2 task. It removes the last dependency on the Python interpreter, so Oil can be a pure C++ program.

I had cut the Oil expression language for 2020, and it indeed it may not make it. But I don't think the Oil language is complete without it.

This is issue 636. It's required to make the expression language "production quality".

Deferred Issues: What the Interactive Shell Depends On

I've mentioned these issues before, but let's recap what's blocking the interactive shell:

Bindings to GNU readline
Replacing our use of Python's yield, e.g. in core/completion.py
Signal Handling. The "batch" interpreter needs some signal handling, but interactive features are is more demanding.

I'm deferring these issues indefinitely.

That said, here are some possible solutions to the yield problem:

append() to a list instead of yielding. I think this gives up the ability to Ctrl-C out of long completions.
Manually write state machines. This is possible, although it goes against Oil's philosophy of high level code.
- Some kind of compiler for synchronous computing, maybe ilke Blech. This is probably too ambitious, and user-written completion plugins that take arbitrarily long don't fit within the model.
Fork a process that calls write(), and read() from the pipe. I think this will work, though I haven't started implementing it yet. It relates naturally into the coprocess protocol, which I cut from Oil.

Idea: The Interactive Shell In "User Space"

One attractive possibility is punt these issues up to the user :-) Then we would have:

yield in the Oil language. This should also be useful for problems other than completion.
More control over signal handling in user space, e.g. trap enhancements. This relates to various discussions about ble.sh.
C bindings to GNU readline. Both bash and Python have shared library extensions, so this feature probably makes sense for Oil.
Subinterpreters. To avoid confusing the user's shell state with the UI state.

Version Numbers

We'll probably have a version 0.10.0. I don't think the release after 0.9 will be called 1.0.

Also, the Oil language will likely not be stabilized for 1.0. The OSH language should be stable for Oil 1.0, but the Oil language might wait until Oil 2.0. (I do realize the naming could be confusing.)

Summary

There are a bunch of technical issues, but the main risk is garbage collection, which I'm working on right now. When that feels like less of a risk, I might make a more concrete plan for 1.0.

I'll feel good if we can implement something correct and reasonably fast by October, but that's ambitious. There are a few prerequisite tasks, like figuring out of mylib::Str should be a slice or a value. And fixing the mylib::Dict implementation to actually be a hash table!

I'm deferring all the issues related to the interactive shell. They'll be handled after Oil 1.0, if ever.

If we can finish garbage collection and finish shell I/O, we'll have something very solid. I think that can be done for 2020, but there's no guarantee.

Then we need to polish the Oil language, which I feel very excited about. Unfortunately, I don't think that can be finished within 2020. Again, I can use help, and comments are welcome.

Appendices

Issue Labels

I wanted to go through these issue labels and make comments, but let's do that another time.

#affects-architecture
#essential. Fixing errexit is a major one.
#high-priority
#open-problem
#carrot. I need a few of these to make Oil compelling. Or perhaps the Oil language is enough.

Blog Topics

Here are five posts I drafted since writing the Summer Blog Roadmap, and which aren't even a part of it!

You Can Build and Modify Oil in One Minute (screencast). I fixed #devtools issues based on feedback from Josh Nelson and Max Bernstein. As mentioned above, more feedback is welcome.
Oil Has Achieved the "Impossible". Based on this Hacker News thread. Oil is compatible with POSIX and bash, but it also has a new language. This relates to the "Perl 5" post I mentioned in the Summer Blog Roadmap.
Shell Is a Toolchain for Doing Science on Software. I continue to use shell to measure properties of software and make reports. For example, I'm using it right now to design the garbage collector in a data-driven way. This Reddit comment elaborates on that. I want to write a post tagged #shell-the-good-parts based on it.
- Related comment: Shell is for Polyglot Programmers. I also want to make a point about using shell for metaprogramming/reflection in two senses: at build time to generate code, and at runtime to analyze data generated by software (benchmarking and monitoring).
The QTSV Interchange Format in Less Than 100 Words. I want others to implement tools that emit and consume structured data, so I'm describing QSN and QTSV as simply as possible.
- The philosophy for structured data in Oil is to use interchange formats over pipes.
Abandoned Subprojects. Part of a retrospective. OHeap and OPy won't be used in oil-native. However, in the distant future, something like OHeap may make sense to reduce GC pressure.