Instead of outlining future blog posts as promised last time, I
worked on implementing the two top priorities: printing the AST
oil, and executing the AST in Python, aided by spec test
I also decided to start the third use case for the AST: serializing it to a
binary format for execution in C++. In other words, I will build bridge to
the C++ code I wrote as the first pass of the
These new use cases for the AST will change it drastically. Prior to this project, I had a cartoonish view of programming language implementation: isn't there a single, "obvious" AST representation for each language?
There is not; the representation depends heavily on what you need to do with it. In fact, this was the entire motivation for Clang: to develop an AST representation of C and C++ to support source code tools and IDEs, because the representation that GCC was only suited for code generation.
I've been watching a bunch of videos about Clang and its refactoring tools online:
One of the first e-mails I got about oil was from an LLVM developer that made this connection: you could say that oil/osh is to bash as Clang is to GCC. That's not the purpose of the project per se, but there's some truth there.
Turing Award-winner Fred Brooks wrote:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
A more modern version is by Linus Torvalds:
I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
In that light, the AST is the most important part of an interpreter. It's the bridge between parsing and execution.
I'm about halfway through an editing pass over the AST. I reused the same
expression nodes for the arithmetic sublanguage (
$((a + b)) ) and the
boolean sublanguage (
[[ $a -eq $b ]] ), so it's looking more like a
homogeneous AST rather than a heterogeneous AST. There are some other
decisions to make, and I'll write more about them when it's settled down
— that is, when it's closer to supporting the three primary use cases.
However, I can show some code that I expect to be relatively stable.
If the AST backbone of the intepreter, the file core/id_kind.py is the backbone of the AST.
_AddBoolKinds() dynamically generate values
on the enums
Id is for identifying tokens, words, and
Kind is for making coarse-grained parsing decisions in three
recursive descent parsers for three of the sublanguages.
(Pratt Parsing doesn't need the
The takeaway is that you can find every construct that oil implements named
in these ~200 lines. There are 219
Ids organized into 21 values of
This gives a rough idea of how large the shell language is.
As mentioned, I had intended to outline a list of future blog topics, but I think blogging about what I'm actually implementing will give better results.
So I may be in heads-down mode with the AST and its three use cases for the next week or two. If I need a break, I'll surface the list of blog posts, and perhaps talk about two more topics: