blog | oilshell.org
In the very first post on Oil, I explained why Oil is written in Python: to have a chance of getting it done! I want to implement not just the bash-compatible OSH dialect, but also the Oil language, and that's a lot of work.
Bash alone is ~160K lines of C code, while OSH is ~16K lines of Python as of the last release. (When all's said and done, it might turn out to be a 5-7x ratio rather than 10x, but that's still huge.)
Of course, there's a problem: Python is slower than C, and I wrote benchmarks to show that it matters. For example, the OSH parser is 40-50 times slower than the bash parser, even after some optimization.
So I'm now working on making it even faster and smaller. My plan involves OPy, a Python bytecode compiler written in Python.
This post shows what I've done with OPy, recaps what I wrote about it last year, and maps out future work. If you've implemented a VM, and especially if you've modified CPython, I'd love your feedback in the comments.
I've released Oil 0.5.alpha2, which you can download here:
It has the same features as OSH 0.4, but its bytecode is built with OPy rather than CPython.
OPy generates slightly different bytecode, but it appears that OSH is unaffected. The unit tests and spec tests pass, and these benchmark results are roughly the same:
(That is, 6-7 lines/ms on a slow machine and 13-14 lines/ms on a fast machine.)
However, the bytecode is larger:
I'm not sure why this is, but I'll look into as I optimize for both size and speed.
oil/opy$ ./count.sh all LEXER, PARSER GENERATOR, AND GRAMMR ... snip ... 579 pgen2/tokenize.py 827 pytree.py 2574 total COMPILER2 ... snip ... 410 compiler2/symbols.py 764 compiler2/pyassem.py 1547 compiler2/pycodegen.py 1578 compiler2/transformer.py 4909 total
It's around 8,000 lines of Python code, which I consider small and malleable. This is why I believe it's feasible to optimize Oil by forking the Python language.
Note that ~16K lines of Oil and ~8K lines of OPy is still a lot less than the ~160K lines of C code in bash.
Before explaining how I made this work, let's review what I wrote about OPy last year.
(A) The Riskiest Part of the Project. I listed six reasons why a shell shouldn't be a Python program:
Two more reasons:
In addition to the fact that Python programs allocate memory frequently, Python's garbage collector isn't "fork-friendly". Objects that are read-only at the Python level are mutated at the C level, in order to update their reference counts. This inhibits virtual memory page sharing. Ruby addressed this issue in 2012.
It might not matter for some Python programs, but it matters for a shell.
(B) Cobbling Together a Python Interpreter. I describe the components of a Python front end in Python:
(C) The OPy Front End is Working. I describe my attempts to make these components work together. I abandoned Python 3 and ported Oil back to Python 2.
(D) OVM will be a Slice of the CPython VM. Rather than writing a small C or C++ VM to complement this front end, I decide to hack off a chunk of the Python interpreter and call it "OVM". This shortcut let me make the first release back in July.
(E) Rewriting Python's Build System From Scratch. Oil release binaries have two parts:
(F) How I Use Tests: Transforming OSH. In summary, the idea is to:
Also, it doesn't really matter how fast the OPy compiler runs, since I compile bytecode ahead of time rather than on-demand. This gives more room for optimization.
(For those curious about details, the two appendices in this post may be interesting.)
Admittedly, this strategy is odd. I don't know of any other programs that were almost unusably slow in their original implementation, then sped up by writing a new compiler.
I was recently asked how I consistently get things done, and my answer my shed some light on this. Part of it was:
Use Python. Python lets me explore new problems quickly. If there were a C++ compiler in my edit-run cycle, many corners of the shell language would remain unexplored.
Being able to mold the language with metaprogramming was another unexpected benefit. I learned OCaml specifically to write compilers and interpreters, but I decided not to use it for Oil. In retrospect, I suspect this was a good decision. (We'll know more once I get further into OPy!)
Don't get stuck. I've made continuous progress for nearly two years, and this strategy of incrementally optimizing Oil also reduces the likelihood of getting stuck.
I'll also add: don't go backward. With tests, I have confidence making big changes, like completely changing the bytecode compiler. I know that the OPy compiler works because the spec tests for 0.5.alpha2 did not regress. The bottom of the page records the version I ran the tests with:
$ _tmp/oil-tar-test/oil-0.5.alpha2/_bin/osh --version Oil version 0.5.alpha2 Release Date: 2018-03-02 02:13:34+00:00 ... Bytecode: bytecode-opy.zip
I'll also admit that I'd like to prove a point about high level languages vs. gobs of C and C++. Though I was honestly surprised by how slow the initial version turned out to be. Python is not a good language for writing efficient parsers, but perhaps OPy will be.
I had already done most of the work last year, so all I had to do in the last few weeks were:
2to3 --fix printon a few files.
The OPy README.md records some minor differences between OPy and Python.
I have many ideas for OPy, which fall in these categories:
These changes will lead to changes to OVM. For example, ASDL data structures can be represented more efficiently in memory. Unlike Python data types, ASDL types are statically declared.
I released a version of Oil built with OPy, showed benchmarks and metrics, recapped previous posts on OPy, and described recent progress.
It might take a long time to optimize Oil, but I have no doubt I'll learn a lot in the process.
Oil also doesn't need to be fully optimized before adding useful features. I
called this release
0.5.alpha2 instead of
0.5, because I hope that
will be the first release with a feature that bash doesn't have.
I've been asked these questions when I've written about OPy in the past.
Because I'm taking ownership of the code, Python 2 vs. Python 3 isn't a meaningful question from the user's point of view. It's an implementation detail.
For the curious, Oil started off in Python 2, was ported to Python 3, then back to Python 2. (Both ports were easy.)
Python 3 emphasizes Unicode strings, but in a shell, you almost never know what
the encoding of a string is. File system paths,
etc. are all bytes in Unix.
The bytes can of course be UTF-8-encoded. UTF-8 was designed to work with
existing C functions like
strstr(), rather than requiring Unicode variants of
This blog post discusses the issue of internal string encoding. It notes that Perl, Ruby, Go, and Rust use UTF-8 internally. Oil will follow that example, rather than the example of Python and bash, which used fixed-width multibyte characters.
This comment explains why manipulating UTF-8 text in memory is awkward with Python 3.
The other issues with Python 2 were:
I wasn't excited about PyPy, but I tried it anyway. OSH under PyPy is slower than OSH under CPython, not faster.
JIT speedups depend on the workload. My understanding is that string-heavy workloads are dominated by allocation, and the JIT can't do much about that. Even when it's faster, PyPy uses more memory than CPython, which is not a good tradeoff for a shell. A shell should use less memory than CPython or PyPy.
PyPy optimizes unmodified Python programs, which is very hard. In contrast, OPy is optimizing just the subset of the language that Oil and OPy itself use. I'm also free to change the semantics of the language, e.g. make it more static.
Implementation trivia: OPy started from the same place that PyPy did. PyPy is also based on tokenize, pgen2, and compiler2. Takeaway: writing a Python front end is a lot of work, so it's best to reuse existing code.
November 2018 Update: A more detailed answer on lobste.rs.
I answered this on lobste.rs. I mentioned MicroPython in this June 2017 post.
I didn't try Cython, but I also don't see any evidence that it speeds up string-based workloads. I believe it also has the tradeoff of bloating the executable (which likely increases memory usage.)