blog |

Building Oil with the OPy Bytecode Compiler


In my first blog post, I explained why Oil is written in Python: so I have a chance of getting it done! I want to implement not just the bash-compatible OSH dialect, but also the Oil language, and that's a lot of work.

Bash alone is ~160K lines of C code, while OSH is ~16K lines of Python as of the last release, which is nearly feature-complete.

Of course, there's a problem: Python is slower than C, and I wrote benchmarks to show that it matters. For example, the OSH parser is 40-50 times slower than the bash parser, even after some optimization.

So I'm now working on making it even faster and smaller. My plan involves OPy, a Python bytecode compiler written in Python.

This post shows what I've done with OPy, recaps what I wrote about it last year, and maps out future work. If you've implemented a VM, and especially if you've modified CPython, I'd love your feedback in the comments.

Table of Contents
Release 0.5.alpha2
Benchmarks and Metrics
How Big Is OPy?
Recap of Last Year's Work
April 2017
May 2017
June 2017
Why Do It This Way?
Recent Progress
Future Work
Appendix: FAQs About Python
Why Python 2?
Why not use PyPy
Why not use Cython?

Release 0.5.alpha2

I've released Oil 0.5.alpha2, which you can download here:

It has the same features as OSH 0.4, but its bytecode is built with OPy.

Benchmarks and Metrics

OPy generates slightly different bytecode than CPython, but it appears that OSH is unaffected. For example, these benchmark results are roughly the same, at 6-7 lines/sec on a slow machine and 13-14 lines/sec on a fast machine:

(The 0.5.alpha1 release is built with the CPython bytecode compiler, like all prior releases.)

However, the bytecode is larger:

I'm not sure why this is, but I'll look into as I optimize for both size and speed.

How Big Is OPy?

oil/opy$ ./ all

  [ ... snip ... ]
  579 pgen2/
 2574 total

  [ ... snip ... ]
   410 compiler2/
   764 compiler2/
  1547 compiler2/
  1578 compiler2/
  4909 total

OPy is around 8,000 lines of Python code, which I consider small and malleable. This is why I think it's feasible to fork Python and optimize Oil.

Note that ~16K lines of Oil code and ~8K lines of OPy code is still a lot less than the ~160K lines of C code in bash.

Recap of Last Year's Work

Before explaining how I made this work, let's review what I wrote about OPy last year.

April 2017

(A) The Riskiest Part of the Project. I listed six reasons why a shell shouldn't be a Python program:

  1. The size and complexity of the interpreter.
  2. The extra dependency, which is especially undesirable on embedded systems.
  3. Startup time.
  4. Unicode (in Python 3).
  5. Issues with signal handling.
  6. Using Oil as a library from C programs.

Two more reasons:

  1. I/O buffering issues as mentioned here.
  2. Significantly slower parsing and execution of shell.

In addition to the fact that Python programs inherently allocate often, Python's garbage collector isn't "fork-friendly". Objects that are read-only at the Python level are mutated at the C level, in order to update their reference counts. This inhibits virtual memory page sharing. Ruby addressed this issue in 2012. It might not matter for some Python programs, but it matters for a shell.

(B) Cobbling Together a Python Interpreter. I describe the components of a Python front end in Python:

  1. tokenize, a regex-based lexer from the standard library.
  2. Guido's pgen2 parser generator, written circa 2006 for the 2to3 conversion tool.
  3. compiler2, a bytecode compiler that was removed from the standard library as of Python 3.

(C) The OPy Front End is Working. I describe a couple attempts to make these components work together. I abandoned Python 3 and ported Oil back to Python 2.

(D) OVM will be a Slice of the CPython VM. Rather than writing a small C or C++ VM to complement this front end, I decide to hack off a chunk of the Python interpreter and call it "OVM". This shortcut let me make the first release back in July.

May 2017

(E) Rewriting Python's Build System From Scratch. Oil release binaries have two parts:

  1. Native code: ~135K lines of the CPython VM, and Oil's own C code.
  2. Architecture-independent bytecode. I now create this from .py source code with the OPy bytecode compiler, rather than CPython's built-in compiler.

June 2017

(F) How I Use Tests: Transforming OSH. In summary, the idea is to:

Also, it technically doesn't matter how fast the OPy compiler runs. I compile bytecode ahead of time rather than on-demand. This opens up more space for optimization.

(For those curious about details, the two appendices in this post may be interesting.)

Why Do It This Way?

Admittedly, this strategy is odd. I don't know of any other programs that were almost unusably slow in their original implementation, and only sped up by writing a new compiler.

I was recently asked how I consistently get things done, and my answer my shed some light on this. Part of it was:

  1. Use Python. Python lets me explore new problems quickly. If where were a C++ compiler in my edit-run cycle, many corners of the shell language would remain unexplored.

    Being able to mold the language with metaprogramming was another unexpected benefit. I learned OCaml specifically to write compilers and interpreters, but I decided not to use it for Oil. In retrospect, I suspect this was a good decision. (We'll know more once I get further into OPy!)

  2. Don't get stuck. I've made continuous progress for nearly two years, and this strategy of incrementally optimizing Oil also reduces the likelihood of getting stuck.

    I'll also add: don't go backward. With tests, I have confidence making big changes, like completely changing the bytecode compiler. I know that the OPy compiler works because the spec tests for 0.5.alpha2 did not regress. The bottom of the page records the version:

$ _tmp/oil-tar-test/oil-0.5.alpha2/_bin/osh --version
Oil version 0.5.alpha2
Release Date: 2018-03-02 02:13:34+00:00

So that's the reasoning. I'll also admit that I'd like to prove a point about high level languages vs. gobs of C++.

Though I was honestly surprised by how slow the initial version turned out to be. Python is not a good language for writing efficient parsers, but perhaps OPy will be.

Recent Progress

I had already done most of the work last year, and the main things I did in the last few weeks were:

I noted some differences between OPy and Python in the OPY

Future Work

I have several dozen ideas for OPy. They fall roughly in these categories:

These changes will lead to changes to OVM. For example, ASDL data structures can be represented more efficiently in memory. Unlike Python data types, ASDL types are statically declared.


I released a version of Oil built with OPy, and showed benchmarks and metrics. Then I recapped what I wrote about OPy last year, and described recent progress.

It might take a long time to optimize Oil, but I have no doubt I'll learn a lot in the process. And I won't wait until it's fully optimized to release "carrots".

Appendix: FAQs About Python

I've been asked these questions when I've written about OPy in the past.

Why Python 2?

Because I'm taking ownership of the code, Python 2 vs. Python 3 isn't a meaningful question from the user's point of view.

For those curious about the development process, Oil started off in Python 2, was ported to Python 3, then back to Python 2. (It was easy both times.)

Python 3 emphasizes Unicode strings, but in a shell, you almost never know what the encoding of a string is. File system paths, argv, getenv(), stdin, etc. are all bytes in Unix.

The bytes can of course be UTF-8-encoded. UTF-8 was designed to work with many existing C functions like strstr(), rather than separate Unicode versions.

This blog post discusses the issue of internal string encoding. It notes that Perl, Ruby, Go, and Rust use UTF-8 internally. Oil will follow that example, rather than the example of Python and bash, which used fixed-width multibyte characters.

This comment explains why manipulating UTF-8 text in memory is awkward with Python 3.

The other issues with Python 2 were:

Why not use PyPy

I wasn't excited about PyPy, but I tried it anyway. OSH under PyPy is slower than OSH under CPython, not faster.

JIT speedups depend on the workload. My understanding is that string-heavy workloads are dominated by allocation, which the JIT doesn't touch. Even when it's faster, PyPy uses more memory than CPython, which is not a good tradeoff for a shell. My goal is for OPy to use less memory than CPython.

In summary, PyPy optimizes unmodified Python programs, which is very hard. In contrast, OPy is optimizing just the subset of the language that Oil (and OPy itself) use. I'm also free to change the semantics of the language, e.g. make it more static.

Implementation trivia: OPy started from the same place that PyPy did. PyPy is also based on tokenize, pgen2, and compiler2. Writing a Python front end is a lot of work, so it's best to reuse existing code.

Why not use Cython?

I didn't try Cython, but I don't see any evidence that it speeds up string-based workloads. I believe it also has the tradeoff of bloating the executable (which likely increases memory usage.)