In the last 2 days, I landed a few weeks' worth of big changes on the master branch. After writing my ASDL implementation, I replaced the backbone of the interpreter with dynamically-generated ASDL classes.
I'm happy with ASDL, so I plan to write a few posts about it. In this post, I'll review the progress on oil so far, and then show an example ASDL schema and data structure.
In subsequent posts, I'll go into detail on oil's ASDL implementation, describe how it concretely moves the project forward, and share some more abstract thoughts.
Shortly after I released the code in November, I listed two top priorities, gave a detailed roadmap, and mentioned a third use case for the AST.
Looking back, we're pretty much on track. Replacing the backbone of the interpreter with ASDL took awhile, requiring changes to essentially every source file, but it was necessary for all three priorities:
Why do the same work in both Python and C++? The first reason is that discovering the semantics of shell is the hard part, and we want to do that in an agile language. Once that's done, writing the code is easy.
The second reason is that the C++ executor will operate on a lower-level representation of code. I'll explain this in a future post.
At 129 lines, osh.asdl compactly describes the osh language (which is nearly identical to bash). An excerpt:
token = (id id, string val, line_span? loc) word = TokenWord(token token) | CompoundWord(word_part* parts)
This ASDL schema syntax should be readable to programmers who know Haskell or ML. For others, it only involves a few concepts and can be read like this:
tokenis a product type, with two required fields
val, and an optional field
wordis a sum type with two alternatives:
word_part(which is itself a sum type with nine alternatives; not shown.)
For those with C background, it's helpful to remember that a product type can be represented by a struct, and a sum type can be represented by a tagged union. However, structs and unions fall short of algebraic data types because of the static type system.
Consider this statement:
ls >> ~/git/$repo/listing.txt
It consists of three words:
TokenWordwith the token
CompoundWordwith 4 parts:
TildeSubPart: to substitute
VarSubPart(repo): to substitute the
These three words are further parsed into a
command node, which our ASDL
implementation pretty-prints like this:
(SimpleCommand words: [ (CompoundWord parts: [ (LiteralPart token: (token id:Lit_Chars val:ls loc:(line_span pool_index:0 col:0 length:2)) ) ] ) ] redirects: [ (Redirect op_id: Redir_DGreat arg_word: (CompoundWord parts: [ (TildeSubPart prefix:"") (LiteralPart token:(token id:Lit_Chars val:/git/)) (VarSubPart name:repo) (LiteralPart token: (token id: Lit_Chars val: /listing.txt loc: (line_span pool_index:0 col:17 length:12) ) ) ] ) fd: 1 ) ] )
This may look like a Lisp S-expression, but two features give it more structure:
[ ... ]above
In addition to this textual representation, which is useful for debugging, there's also a binary representation. I will describe that in tomorrow's post.