The Riskiest Part of the Project

Why Sponsor Oils? | blog | oilshell.org

The Riskiest Part of the Project

2017-04-08

Table of Contents

Status Update

What Remains

Why Shouldn't Oil run on Top of CPython?

Status Update

The table in Measuring Progress with Tests shows these stats:

573 total tests, 375 passing, 136 failing (as of March 22nd)

After writing that post, I spent a few days grinding away at the spec tests. I made a fresh run, and the bottom row is:

601 total tests, 406 passing, and 146 failing

Implementing features in a test-driven manner was such a smooth process that I felt I could work on something else. I like to "de-risk" projects by tackling the hard parts early.

In particular, it's important to make global architectural changes before there's too much code. For example, translating shell to Oil led me to use the Lossless Syntax Tree representation. If I hadn't made that change two months ago, it would have broken the features I just implemented.

What Remains

So what is now the riskiest part of the project?

I've mostly designed the Oil language, but I haven't implemented it, so that could be a risk. But I have implemented and tested OSH, so I view it as a matter of work and not risk. The Oil language design has already been influential in the Ion shell, part of Redox OS, which is some amount of validation.

I think the biggest risk is breaking the dependency on the Python interpreter. It affects all the code, and I haven't had a clear idea on how to do it.

Why Shouldn't Oil run on Top of CPython?

A shell should be simpler and smaller than Python. The biggest shell is bash, at 150K lines, while CPython's core alone is 148K lines. With the Modules/ directory, Python is 444K lines.
Shells are required on machines where Python isn't, like Android phones and other embedded devices. Shell resides at a lower level of the OS stack than Python.
The Python interpreter starts slowly. If you import many modules, it may take hundreds of milliseconds to get to main(). A full second isn't unheard of. Shell programming is about lightweight processes, and that includes the shell itself.
Python 3's handling of Unicode is awkward for a shell. The shell often deals with filenames, and Unix file systems don't have a standard encoding. The shell can treat filenames as opaque byte strings in most cases.
The Python interpreter does some things with signals that we don't want, like turning them into exceptions.
I want the Oil interpreter to be a library and have an API in the style of Lua. I don't want to expose Python-C API to users.

In retrospect, I see that the OSH code has markers of two half-formed plans to break the Python dependency.

There are C++-isms in the parser, like out parameters, and return codes rather than exceptions. I thought this would make possible an automated translation from Python to C++, in the style of bootstrapping Go by translating C to Go.
Another plan involved using the OHeap serialization format to bridge Python and C++. Last month, I stated a goal of using it to write a vertical slice of the shell runtime in C++.

Essentially, I was putting off the parser port and concentrating on the runtime port, which left more unknowns in the future.

Although vague, either of these plans could have worked with enough effort. But over the last two weeks, I experimented with a more general solution. I believe it will not only work, but cost less to implement and even benefit end users.

Tomorrow I will describe what this solution is.