OPy, Part Two


What Works Now


pgen2 was apparently written for Python

I've done enough experiments that I believe it will work. The key is adapting two pieces of existing code:

And studying a third piece of code:

This plan allows me to avoid writing as much native code as possible. I believe that we can have OSH, Oil, and Python running on just 3,000 - 5,000 lines of well-tested C++.

The working for name for this interpreter is OPy. I may change the name when it begins diverge from Python. Initially, it will be a faithful subset of Python which can run OSH.

What works?

What does not work:


Compiled languages like Java, ML, Scala, and Haskell often have bootstrapped front ends. That is, the front ends for Java compilers are written in Java, OCaml is written in OCaml, etc.

But the front ends for languages like Python, Ruby, JavaScript, and R are typically written in C. Perl 6 and tinypy are exceptions. I used some tinypy code for my [pratt-parsing-demo][], but I omitted it from the 1.571 count because it's more like a dialect of Python, and it's longer active.

What Doesn't Work

Interestingly, these four Python components were all written separately, and have to be glued together:

Remaining Risk

Will I want to expose this Python dialect to users or not? Or should it only be Oil exposed to users?

I think the biggest issue is classes. Should Oil have classes? In some ways it feels wrong for a shell to have classes.

Blog Roadmap

I hope this plan doesn't sound crazy. I will explain it in more detail, but I want to write two backward looking posts first:

Future Posts:

I'm not done implementing the OPy interpreter, and I still have to cbootstrap it to break the [CPython][cpython] dependency. But I believe it will work. Then I'll be able to write about an OSH that's written in Python, but doesn't depend on CPython.

Addendum: Code Size

Python was essential for getting this project off the ground, but I don't want to depend it forever.

However, I've gone back in forth in my mind about different ways to achieve this. I want to avoid manually rewriting OSH in a different language.

OSH is currently around 12K lines of Python. It's stayed surprisingly constant in size while gaining new features.

  1. Code Size. How do I avoid writing 10K or 100K lines of C?

    For comparison, bash, zsh, mksh, and dash weigh in at 150K, 140K, 30K, and 20K lines respectively. I've worked with people that can write 30K lines of C++ code in a single stretch, by themselves, but I'm not one of those people.

    Even I were, OSH is less than half the project. I still have to write a front end for Oil and add [awk and make][awk-make] functionality. It also makes sense to implement tools with little languages like find and sed. GNU make and awk clock in at about 40K lines and 80K lines, pushing the GNU total to around 270K lines.

    Writing Oil in Oil would result in less code, since Oil is a higher level language than C.

Why do it like this?

Code Size. 12K for OSH, 10K of Opy for OPy, 3K-5K for OPy. 10x less code.

Same reason as oil: add features quickly to shell. - I want to add features to Python, and remove some: - complex numbers - Python 3 unicode handling is not appropriate