The OPy Front End is Working


This is a short status update to get the blog caught up with the code.

In Cobbling Together a Python Interpreter, I talked about integrating three pieces of code:

  1. pgen2
  2. compiler2
  3. byterun

Even though pgen2 and compiler2 were both part of the Python standard library, they were written at different times, by different authors, and never integrated.

compiler2 generates slightly different bytecode than Python's default compiler (Python/compile.c). So even though compiler2 and byterun use same Python 2.7 bytecode format in theory, they can be incompatible in practice. And in fact I found an incompatibility related to Python issue 19611.

It took me 2-3 weeks to make the 3 components work together. There were multiple attempts and multiple tricky bugs. But this is a good thing because I gained familiarity with ~8,000 lines of code, rather than just blindly copying it.

I've noticed that this roughly recapitulates the work done by PyPy. I may write more about that later.

Implementation Details

Here is an edited commit log with technical notes. If you're not familiar with Python internals, you may want to skip this part.

First Attempt

(1) ad004f56 — Pristine copy of pgen2 from lib2to3.

(2) 7274be5b — Pristine copy of compiler2 from Python 2.7.

(3) 31c63a72This pure Python compiler can now compile print("Hello World")!

Lots of details in this commit description, including Python 3 issues. I ported Oil to Python 3 for type checking, but the compiler2 package is written in Python 2. Python's bytecode is an implementation detail that changes with each version.

(4) 92c1039aEverything in oil/{osh/,core/,asdl/,}*.py compiles with OPy now. Haven't run the code yet.

I was able to parse all my Python source and produce .pyc files. However I had a lot of problems running the resulting code. The biggest problem was strings vs. bytes, which in retrospect was predictable because of the mix of Python 2 and Python 3 code.

I also spent a long time debugging a problem with non-deterministic bytecode generation. I suspect it's a bug in Python 2.7, but the details are too exhausting to enumerate here.

Second Attempt

I decided to take a more conservative approach. Because the Python 3 type checking experiment failed, and Python 3's approach to unicode is awkward for a shell, I ported Oil back to Python 2. This was easy because I wasn't using anything that Python 3 offered.

(1) 6da29820 — Start again with another copy of compiler2.

(2) aa082c2bIsolated and fixed a buggy interaction between compiler2 and byterun. This took awhile to figure out.

(3) 2b331d92 — Able to compile OSH with compiler2 and run it under byterun.

(4) 9e3bd704Full OPyPy chain is now working! In other words, I reapplied the glue between pgen2 and compiler2, but this time under Python 2 rather than Python 3.

(5) 62cd4928Fix a bug where CPython would execute code instead of byterun. Oops, I knew that OSH under OPy was running way too fast! byterun was falling back to CPython in the majority of cases.

Bugs Filed


After all this, I can compile and run OSH code under what I call OPy. This includes unit tests, spec tests, and the interactive shell.

Sometimes I called it OPyPy, because it was being doubly-interpreted by byterun running on the CPython VM.

But running OSH under byterun was only an experiment. It helped me understand the Python VM. Tomorrow I will talk about the actual OPy VM.

SPOILER: I'm not writing my own VM, at least for now.