pgen2 was apparently written for Python
I've done enough experiments that I believe it will work. The key is adapting two pieces of existing code:
And studying a third piece of code:
This plan allows me to avoid writing as much native code as possible. I believe that we can have OSH, Oil, and Python running on just 3,000 - 5,000 lines of well-tested C++.
The working for name for this interpreter is OPy. I may change the name when it begins diverge from Python. Initially, it will be a faithful subset of Python which can run OSH.
What works?
What does not work:
Compiled languages like Java, ML, Scala, and Haskell often have bootstrapped front ends. That is, the front ends for Java compilers are written in Java, OCaml is written in OCaml, etc.
But the front ends for languages like Python, Ruby, JavaScript, and R are typically written in C. Perl 6 and tinypy are exceptions. I used some tinypy code for my [pratt-parsing-demo][], but I omitted it from the 1.571 count because it's more like a dialect of Python, and it's longer active.
Interestingly, these four Python components were all written separately, and have to be glued together:
pytree.py),
which is not what [compiler2][] expects (wrapped C data structures from the
parser module)Will I want to expose this Python dialect to users or not? Or should it only be Oil exposed to users?
I think the biggest issue is classes. Should Oil have classes? In some ways it feels wrong for a shell to have classes.
I hope this plan doesn't sound crazy. I will explain it in more detail, but I want to write two backward looking posts first:
Future Posts:
I'm not done implementing the OPy interpreter, and I still have to cbootstrap it to break the [CPython][cpython] dependency. But I believe it will work. Then I'll be able to write about an OSH that's written in Python, but doesn't depend on CPython.
Python was essential for getting this project off the ground, but I don't want to depend it forever.
However, I've gone back in forth in my mind about different ways to achieve this. I want to avoid manually rewriting OSH in a different language.
OSH is currently around 12K lines of Python. It's stayed surprisingly constant in size while gaining new features.
Code Size. How do I avoid writing 10K or 100K lines of C?
For comparison, bash, zsh, mksh, and dash weigh in at 150K, 140K, 30K, and 20K lines respectively. I've worked with people that can write 30K lines of C++ code in a single stretch, by themselves, but I'm not one of those people.
Even I were, OSH is less than half the project. I still
have to write a front end for Oil and add [awk and
make][awk-make] functionality. It also makes sense to implement tools with
little languages like find and sed. GNU make and awk clock in at about
40K lines and 80K lines, pushing the GNU total to around 270K lines.
Writing Oil in Oil would result in less code, since Oil is a higher level language than C.
Why do it like this?
Code Size. 12K for OSH, 10K of Opy for OPy, 3K-5K for OPy. 10x less code.
Same reason as oil: add features quickly to shell. - I want to add features to Python, and remove some: - complex numbers - Python 3 unicode handling is not appropriate
Global understanding. I still want to be able to make global aggressive changes to the code. I'm not sure that is possible with cpython.
Keeping the OSH code in a high level language