Why Sponsor Oils? | blog | oilshell.org

Technical Issues And Risks

2020-08-17

The last post described A Plan for Oil 0.8 and 0.9. Roughly speaking, we'll first have a pure interpreter, and then one with I/O. It should run large shell scripts, but not interactive programs.

Given recent progress, this feels doable. But even if we make those two releases, it still falls short of my goals for 2020.

So this is a good time to discuss the remaining technical issues with Oil, which I mentioned in January Blog Roadmap. The main risk is memory management, which you could call the "deallocation problem", or just garbage collection. After that, there are a couple other issues which are "just work", rather than risks.

I'm knee deep in garbage collection right now, so this post will be brief. There are more details on Zulip, and I welcome help. I've already learned a lot from discussions with Max Bernstein and ridiculousfish.

Table of Contents
Recap
Blog Posts
Releases
Immediate Issues
Garbage Collection
pgen2 Parser Generator
The Oil Expression Language Shouldn't be Metacircular Python
Deferred Issues: What the Interactive Shell Depends On
Idea: The Interactive Shell In "User Space"
Version Numbers
Summary
Appendices
Issue Labels
Blog Topics

Recap

Let's recap briefly, to give context to the technical issues.

Blog Posts

I'm still following the Summer Blog Roadmap:

Releases

Each of the 0.x releases has focused on a major area.

Upcoming:

However, the new compute benchmarks show that memory usage is a problem. You can run oil-native in a loop, and it will use hundreds of megabytes of memory without deallocating everything!

So it might make sense to change directions. I like to work on the riskiest parts of the project first. Getting oil-native to compile and run was a risk, but now that it works, the deallocation problem has been exposed.

Immediate Issues

Garbage Collection

I discussed this in the last post, and it's been on my mind even more since then. Roughly speaking, there are three possible solutions:

  1. Reference counting. CPython and QuickJS use this strategy (plus a cycle collector).
  2. Mark and Sweep, a tracing GC algorithm. Lua and most JavaScript VMs use this strategy.
  3. Copying collection, another tracing algorithm. The femtolisp interpreter (used to bootstrap Julia) and the es shell (dormant) use this strategy.

(Generational garbage collectors use a mix of strategies, but let's leave that aside for now.)

Since the last post, I've started a discussion on Reddit, had discussions via e-mail, and written benchmarks to explore the problem space.

I'm not sure what will happen, but I'm leaning towards copying GC, for reasons I may write about later. If you want all the gory details, ask me on Zulip.

(This is issue 785.)

pgen2 Parser Generator

I mentioned this task in the March Recap, and I'm still looking for help. It's a very self-contained project that's essential to:

  1. the Oil expression language (see The Simplest Explanation of Oil)
  2. Egg Expressions
  3. proc signatures, so arguments can be named.

Contributing is as easy as it's ever been. The code and build process are in better shape, thanks to feedback from Josh Nelson. I welcome more feedback / complaints.

(This is issue 594.)

The Oil Expression Language Shouldn't be Metacircular Python

This can be done in parallel with the pgen2 task. It removes the last dependency on the Python interpreter, so Oil can be a pure C++ program.

I had cut the Oil expression language for 2020, and it indeed it may not make it. But I don't think the Oil language is complete without it.

This is issue 636. It's required to make the expression language "production quality".

Deferred Issues: What the Interactive Shell Depends On

I've mentioned these issues before, but let's recap what's blocking the interactive shell:

  1. Bindings to GNU readline
  2. Replacing our use of Python's yield, e.g. in core/completion.py
  3. Signal Handling. The "batch" interpreter needs some signal handling, but interactive features are is more demanding.

I'm deferring these issues indefinitely.

That said, here are some possible solutions to the yield problem:

Idea: The Interactive Shell In "User Space"

One attractive possibility is punt these issues up to the user :-) Then we would have:

Version Numbers

We'll probably have a version 0.10.0. I don't think the release after 0.9 will be called 1.0.

Also, the Oil language will likely not be stabilized for 1.0. The OSH language should be stable for Oil 1.0, but the Oil language might wait until Oil 2.0. (I do realize the naming could be confusing.)

Summary

There are a bunch of technical issues, but the main risk is garbage collection, which I'm working on right now. When that feels like less of a risk, I might make a more concrete plan for 1.0.

I'll feel good if we can implement something correct and reasonably fast by October, but that's ambitious. There are a few prerequisite tasks, like figuring out of mylib::Str should be a slice or a value. And fixing the mylib::Dict implementation to actually be a hash table!

I'm deferring all the issues related to the interactive shell. They'll be handled after Oil 1.0, if ever.

If we can finish garbage collection and finish shell I/O, we'll have something very solid. I think that can be done for 2020, but there's no guarantee.

Then we need to polish the Oil language, which I feel very excited about. Unfortunately, I don't think that can be finished within 2020. Again, I can use help, and comments are welcome.

Appendices

Issue Labels

I wanted to go through these issue labels and make comments, but let's do that another time.

Blog Topics

Here are five posts I drafted since writing the Summer Blog Roadmap, and which aren't even a part of it!