Update on Tests, and An Unexpected Turn


The table in Measuring Progress with Tests shows these stats:

After writing that post, I spent a few more days grinding away at the spec tests. I just made a fresh run, and the bottom row is:

Implementing features in a test-driven manner was such a smooth process that I felt I could work on something else. I like to "de-risk" my projects by tackling the hard parts early.

In particular, it's important to make global architectural changes before there's too much code. For example, translating shell to Oil led me to use the Lossless Syntax Tree representation. If I hadn't worked that out a couple months ago, it would have broken the spec tests and features I just worked on.

A Project Risk

So what's the riskiest part of the project now? Breaking the dependency on Python. It affects all the code, and I haven't had a clear idea on how to do it.

Why shouldn't Oil be a Python program?

  1. A shell should be simpler and smaller than Python. The biggest shell is bash, at 150K lines, but the Python core alone is 185K lines. With the Modules/ directory, Python is 550K lines.
  2. Shells are required on machines where Python isn't, like Android phones and other embedded devices. Shell resides at a lower level of the OS stack than Python.
  3. The Python interpreter starts slowly. If you import many modules, it may take hundreds of milliseconds to get to main(); and a full second isn't unheard of. Shell programming is about lightweight processes, and that includes the shell itself.
  4. The Python interpreter does some things with signals that we don't want, like turning them into exceptions.
  5. I want the Oil interpreter to be a library and have an API in the style of Lua. I don't want to expose Python-C API to users.

In retrospect, I see that the code has markers of two half-formed plans to break the Python dependency.

  1. There are some C++-isms in the parser, like out parameters and return codes rather than exceptions. I thought this would make possible an automated translation from Python to C++, somewhat in the style of bootstrapping Go by translating C to Go.

  2. Another plan involved using the OHeap serialization format to bridge Python and C++. I mentioned in November and completed it in January. Just last month, in Roadmap #4, I stated a goal of using OHeap to write a vertical slice of the shell runtime in C++.

    Instead of translating the Python parser to C++, I had a vague idea of compiling it to OVM — OVM being a hypothetical shell runtime that has enough functionality to run a recursive descent parser.

    I was putting off the parser port and concentrating on the runtime port. But I now think it's better to do everything at once. Having this language split may adversely affect the architecture.

An Unexpected Turn

Either of these plans could have worked with enough effort. But over the last two weeks, I experimented with a more general solution that I believe will be less effort. It may also benefit the end user rather than being an implementation detail.

After experimenting with some third-party code, I believe I can write a small bootstrapped Python intepreter to run OSH and Oil.

By bootstrapped, I mean that most of this interpreter will be written in Python. (Some parts of CPython are written in Python, like the import mechanism, but the lexer, parser, and bytecode compiler are all written in C.)

I've done enough experiments that I believe it will work. The key is adapting two pieces of existing code:

And studying a third piece of code:

This plan allows me to avoid writing as much native code as possible. I believe that we can have OSH, Oil, and Python running on just 3,000 - 5,000 lines of well-tested C++.

The working for name for this interpreter is OPy. I may change the name when it begins diverge from Python. Initially, it will be a faithful subset of Python which can run OSH.

Blog Roadmap

I hope this plan doesn't sound crazy. I will explain it in more detail, but I want to write two backward looking posts first:

Future Posts:

I'm not done implementing the OPy interpreter, and I still have to bootstrap it to break the CPython dependency. But I believe it will work. Then I'll be able to write about an OSH that's written in Python, but doesn't depend on CPython.

Discuss this post on Reddit.
Get notified about new posts via @oilshellblog on Twitter.