The table in Measuring Progress with Tests shows these stats:
After writing that post, I spent a few more days grinding away at the spec tests. I just made a fresh run, and the bottom row is:
Implementing features in a test-driven manner was such a smooth process that I felt I could work on something else. I like to "de-risk" my projects by tackling the hard parts early.
In particular, it's important to make global architectural changes before there's too much code. For example, translating shell to Oil led me to use the Lossless Syntax Tree representation. If I hadn't worked that out a couple months ago, it would have broken the spec tests and features I just worked on.
So what's the riskiest part of the project now? Breaking the dependency on Python. It affects all the code, and I haven't had a clear idea on how to do it.
Why shouldn't Oil be a Python program?
main(); and a full second isn't unheard of. Shell programming is about lightweight processes, and that includes the shell itself.
In retrospect, I see that the code has markers of two half-formed plans to break the Python dependency.
There are some C++-isms in the parser, like out parameters and return codes rather than exceptions. I thought this would make possible an automated translation from Python to C++, somewhat in the style of bootstrapping Go by translating C to Go.
Another plan involved using the OHeap serialization format to bridge Python and C++. I mentioned in November and completed it in January. Just last month, in Roadmap #4, I stated a goal of using OHeap to write a vertical slice of the shell runtime in C++.
Instead of translating the Python parser to C++, I had a vague idea of compiling it to OVM — OVM being a hypothetical shell runtime that has enough functionality to run a recursive descent parser.
I was putting off the parser port and concentrating on the runtime port. But I now think it's better to do everything at once. Having this language split may adversely affect the architecture.
Either of these plans could have worked with enough effort. But over the last two weeks, I experimented with a more general solution that I believe will be less effort. It may also benefit the end user rather than being an implementation detail.
After experimenting with some third-party code, I believe I can write a small bootstrapped Python intepreter to run OSH and Oil.
By bootstrapped, I mean that most of this interpreter will be written in Python. (Some parts of CPython are written in Python, like the import mechanism, but the lexer, parser, and bytecode compiler are all written in C.)
I've done enough experiments that I believe it will work. The key is adapting two pieces of existing code:
pgen2from lib2to3. It does the same thing as
Parser/pgen.c, but it's written in Python.
Python/compile.c, but it's written in Python.
And studying a third piece of code:
This plan allows me to avoid writing as much native code as possible. I believe that we can have OSH, Oil, and Python running on just 3,000 - 5,000 lines of well-tested C++.
The working for name for this interpreter is OPy. I may change the name when it begins diverge from Python. Initially, it will be a faithful subset of Python which can run OSH.
I hope this plan doesn't sound crazy. I will explain it in more detail, but I want to write two backward looking posts first:
I'm not done implementing the OPy interpreter, and I still have to bootstrap it to break the CPython dependency. But I believe it will work. Then I'll be able to write about an OSH that's written in Python, but doesn't depend on CPython.