Home

Yeah it worked REALLY well. To the point where it's going to be part of Oil itself.

Idea: meta-meta language. ML is not its own meta-language.

And Oil is going to turn into a compiler I think. I still have to do ovm.asdl.

High Level Lessons

Lessons:

External Visitor -- functional vs. object oriented

Homogeneous vs Heterogeneous

Look at TypeScript compiler. This is scary. I'm not saying it's wrong, because clearly people like TypeScript, and Anders Hjelsberg knows what he's doing. It's just t

parser.ts is hand-written! 7K lines?

They also have huge lines.

There is probably something I don't understand here, but I

Homogeneous is less source code, less binary code too.

Clang AST is huge

ASTs ard hard?

types.ts is 3863 lines

No ALGORITHMS, just type definitions!

Zephyr ASDL


POINT: ASDL is type checking can benefit from metaprogramming!!!!! In Python it's metaprogramming.

I'm demanding of the languages I use.

These concepts seem abstruse, but they are all serving a practical goal -- making it possible to develop a correct shell relatively quickly. Programming is about organizing code mostly. Control complexity and correctness. Fitting it all in your head.

They form a braid:

I want oil to support both type checking and metaprogramming, but I think it will lean on the metaprogramming side.

Use cases for metaprogramming:

METAPROGRAMMING IS MORE GENERAL THAN TYPE CHECKING. A type checker is a metaprogram! It's a "predicate" on program: it takes another program as input, and returns "true or false". Is it well-formed according to the rules of the type system?

Misc ASDL

This work took awhile, around 6 weeks, but it unblocks the three top priorities:

1) converting shell scripts to oil 2) testing the semantics by executing the shell in Python The main reason for this is that I don't like iterating in C++. I am doing a lot of work to avoid writing C++ code :) 3) Writing a production quality version in C++

Using ASDL will help with all three.

1) We still need source location info. -- now DONE. It helped with them. 2) ASDL found a lot of bugs. It really is the core of the interpreter. Dynamic type checking with exhaustive tests is as good as static type checking.

In oil, I want optional typing like Dart.

One important realization that led me to use ASDL:

Originally I thought that I could have a very simple architecture. A "pure front-end replacement".

But for a number of reasons this won't work. oil is going to be a SUPERSET of the semantics of oil.

important point: All ERRORS are handled in the first stage.

ovm is a smaller language. It's a "lowering". It's a VM, but for now it will have a tree structure. It might be a little similar to NQP, although I don't know much about it.

ovm can also be used for other tools. I am not writing an awk interpreter or parser, but if someone wanted to write one, ovm might be a good target. Instead I am folding it

A Peek of ASDL

Heterogeneous means: represent everything faithfully, more types.

Homogeneous: fewer types. Most shells seems to do this. In C this is the natural thing because there is no subtyping.

I will just say what I have now, and describe how it's used once code is actually committed:

  1. Parse the .asdl file, creating its own tree. (And yes, ASDL can be described with ASDL, another form of bootstrapping.)

  2. Use Python metaprogramming to generate classes so that the Python parser can use.

  3. Able to create and DYNAMICALLY type check instances of those generated types. Example:

  4. Able to serialize them to binary.

  5. Able to GENERATE C++ code that turns. The API looks somewhawt like protocol buffers,

I'm also using the "external visitor" pattern, i.e. this thing that Terrence Parr gave me permission to do. (TODO: maybe write a blog post on that?)

What does the AST Look like?

Clang AST. IPR for C++.

I'm appreciating how hard, or rather how many choices there, it is to write a programming language. LLVM is WAY cleaner than gcc; yet it seems that they are stuck with some decisinos.

Recap

Fun fact: literally nobody is arguing that shell is a good language. I've thought that programming is big, the internet is big, and someone would come out of the woodwork and claim we don't need to fix shell. I haven't gotten that feedback yet, which is sort of surprising.

Crucially, I think people understand the point of auto-converting osh to oil. The effort is quite large, but it's worth it, and actually improved the design of the oil language in several cases, which I will write about in a future post.

I still have a draft of the blog post listing posts which I will release.

I want a BALANCE of the AST format. AST formats are non-trivial: Clang talk which you can't print. Receive this question a lot. They are doing source edits. Clang AST is fuggin' enormous.

TODO: implement stack traces.

Two clients of AST: Shared osh/oil executor, written in C++ oil printer, written in Python

They have different requirements. Fleshing out a third one.

Compiler class is kind of toy. I would like to see different representations. nano-pass compiler paper kind of talks about this.

Alternate roadmap:

Skipping over some posts. Skipping over re2c to post, waiting until I publish the code. The current code doesn't use it, but the original C++ implementation did. Python is missing abstractions for describing data.

Protocol Buffers and Gazelle -- I mentioned protobufs but not gazelle