blog | oilshell.org
Yesterday I listed six reasons that OSH shouldn't run on top of CPython. I had two ideas to break this dependency without rewriting thousands of lines of code:
These are both possible, but they're vague and only address the OSH front end. The third solution I've experimented with is more general:
Does that sound crazy? I mentioned yesterday that Python's core alone is 185K lines of code.
In this post, I'll explain why it's not. In fact, it appears to be easier than any alternative I've thought of.
For background, let's unpack this odd assertion. Bootstrapping Python means writing Python in Python. Although parts of CPython are written in Python, the core is written in C, including all of the following:
But, entirely separately from CPython, Python has been rewritten in Python 1.57 times.
The first 1.0 comes from PyPy, which is a very complete implementation of Python written in Python (including novel and sophisticated JIT technology). We won't be working with PyPy, so let's leave it aside for now. There may be more to say about it later; leave a comment if you're curious.
What accounts for the remaining 0.57? I'm referring to four Python reimplementations of the seven components above:
Parser/tokenizer.cin CPython, but it's written in pure Python.
pgen2parser generator from lib2to3 (~1800 lines). It does the same thing as
Parser/pgen.c, but it's written in pure Python.
Python/compile.c, but it's written in Python. The Design of CPython's Compiler describes this process, and it's largely accurate for compiler2 as well.
These components account for a large fraction of the Python interpreter in less than 10K lines of pure Python code! You could say that Python is more compiled than interpreted: there is a lot of C code to transform your source code into bytecode, but less code to actually run the bytecode.
I believe I can glue together these four components, write around 5K lines of native code for the remaining pieces, and end up with a Python interpreter that will run OSH and Oil. This isn't trivial, because the components were written at wildly different times and don't work together, but it's possible.
For comparison, tinypy has only 1,801 lines of C code and 2,185 lines of Python code. I've used and modified tinypy, and it's fantastic. However, the style is artifically dense, and it has less functionality that I want. (I've excluded it from the 1.57 times because it's more like a dialect of Python.)
In my head, I'm calling this collection of code OPy. It will initially be a hybrid of Python 2 and 3, but I expect it to quickly diverge. For example, after making good use of ASDL in the word evaluation pipeline, I said that OSH is no longer written in Python. It's written in Python+ASDL.
OPy will reflect this evolution. It could even end up a specialized language for writing languages — a meta-language — rather than a general-purpose language like Python.
This post outlined a feasible solution for the riskiest part of the project.
It's not done, but my experiments give me confidence that it will work. For example, I've run all the OSH unit tests with various combinations of pgen2, compiler2, and byterun.
Tomorrow I will go into detail on the benefits of this solution, and what remains to be done.