Home

OVM will be a Slice of the CPython VM

2017-04-24

The last post was a status update on the OPy front end. Today's post addresses the runtime, which I call OVM. It's the virtual machine that OSH and Oil will run on.

Recap

The OSH interpreter is currently ~12K lines of Python. But I listed six reasons why it shouldn't be a plain Python program.

I wrote about seven components in Cobbling Together a Python Interpreter, describing a plan to reuse them economically.

Essentially, I would get a self-hosted lexer, parser, and compiler "for free" by reusing around 8K - 9K lines of Python code. I wrote about these components yesterday.

The other four components would be rewritten from scratch. I was inspired by tinypy and believed I could write these components in 3K - 5K lines of C++. To support this, I reasoned that:

  1. The interpreter loop can take inspiration from byterun, which is only ~1300 lines of Python code.
  2. A shell requires very little from a standard library. The system calls in Shells Use Here Docs to Implement Temp Files account for perhaps half of them.

This plan is probably achievable, but I've decided upon a different direction that is less effort and will let me release OSH sooner. This post will describe:

  1. Using the CPython VM as the basis for OVM.
  2. The experiments I've done to validate this approach.
  3. How it addresses the six problems with the Python interpreter.

(Aside: A few people on Lobsters doubted the original OPy plan, and I concede that they had a point. But they didn't suggest a better solution. Neither rewriting all the code in C/C++ nor shipping OSH as a Python program are good solutions. Leave a comment if you're unclear about this.)

Code

The experiments I've done are available at oil/cpython-slice on Github. Right now, it's a bunch of shell scripts that build a stripped-down CPython VM.

(Shell is good for hacking and prototyping!)

The Six Problems

Before going into detail on the experiments, let's review the problems with using the Python interpreter for a shell:

(1) A shell should be simpler and smaller than Python.

Additionally, OSH should also be smaller than bash, which is ~150K lines.

Forking the CPYthon VM allows me to strip out code over time. I've already stripped out the parser, similar to how Lua can be shipped without a parser.

which will be replaced with the OPy front end. Rewriting the VM would also achieve this goal, but is obviously much more effort.

(2) Shells are required on machines where Python isn't.

This is solved by shipping a stripped-down interpreter as an implementation detail.

(3) The Python interpreter starts slowly.

Python's complicated import mechanism is the main reason for this. Everything related to site.py and sys.path will die in a fire. I'm very excited to remove this code, not just because Oil has no need for it.

(4) Python 3's handling of unicode is awkward for a shell.

I plan to Go's utf-8 centric strategy in Oil. If you know of a use case where this doesn't work, leave a comment.

(5) Python handle signals.

I just noticed that Py_InitializeEx() has a signals aprameter.

(6) The Oil interpreter hsould be a library.

I won't be exposing the Python-C API to users, so eventually.

Experiments Done

Deferred Topics

I could write a lot more about OPy, but it's best at this stage to concentrate on the concrete experiments I've done.

There was the reason I called this the riskiest part of the project -- I keep changing my mind about it!

Here are some potential topics:

Next

Essentially, my strategy is to ship the prototype, which is a good thing. I mentioned in the [first post][] that I started wrote ~3K lines of C++ to start OSH, but realized I would never finish at that rate. Bash is 150K lines of C++ written over three decades.

Python provides necessary leverage. It has downsides, but by forking the CPython interpreter, I can address all of them.

I believe there are now no remaining obstacles to an OSH 0.1 release — there's just work. I don't know when it will be done, but the next post will lay out the criteria for a release.