Why Sponsor Oils? | blog | oilshell.org
In the last 2 days, I landed a few weeks' worth of big changes on the master branch. After writing my ASDL implementation, I replaced the backbone of the interpreter with dynamically-generated ASDL classes.
Recall that ASDL is a way of using ML's data model of algebraic data types inside another language, roughly analogous to how JSON lets you use JavaScript's data model in another language.
I'm happy with ASDL, so I plan to write a few posts about it. In this post, I'll review the progress on oil so far, and then show an example ASDL schema and data structure.
In subsequent posts, I'll go into detail on oil's ASDL implementation, describe how it concretely moves the project forward, and share some more abstract thoughts.
Shortly after I released the code in November, I listed two top priorities, gave a detailed roadmap, and mentioned a third use case for the AST.
Looking back, we're pretty much on track. Replacing the backbone of the interpreter with ASDL took awhile, requiring changes to essentially every source file, but it was necessary for all three priorities:
Why do the same work in both Python and C++? The first reason is that discovering the semantics of shell is the hard part, and we want to do that in an agile language. Once that's done, writing the code is easy.
The second reason is that the C++ executor will operate on a lower-level representation of code. I'll explain this in a future post.
At 129 lines, osh.asdl compactly describes the osh language (which is nearly identical to bash). An excerpt:
token = (id id, string val, line_span? loc)
word =
TokenWord(token token)
| CompoundWord(word_part* parts)
This ASDL schema syntax should be readable to programmers who know Haskell or ML. For others, it only involves a few concepts and can be read like this:
token is a product type, with two required fields id and val, and
an optional field loc.word is a sum type with two alternatives:word_part (which is
itself a sum type with nine alternatives; not shown.)For those with C background, it's helpful to remember that a product type can be represented by a struct, and a sum type can be represented by a tagged union. However, structs and unions fall short of algebraic data types because of the static type system.
Consider this statement:
ls >> ~/git/$repo/listing.txt
It consists of three words:
CompoundWord representing lsTokenWord with the token (Id.Redir_DGreat, ">>")CompoundWord with 4 parts:TildeSubPart: to substitute $HOMELiteralPart("/git/")VarSubPart(repo): to substitute the repo variableLiteralPart("/listing.txt")These three words are further parsed into a command node, which our ASDL
implementation pretty-prints like this:
(SimpleCommand
words: [
(CompoundWord
parts: [
(LiteralPart
token: (token id:Lit_Chars val:ls loc:(line_span pool_index:0 col:0 length:2))
)
]
)
]
redirects: [
(Redirect
op_id: Redir_DGreat
arg_word:
(CompoundWord
parts: [
(TildeSubPart prefix:"")
(LiteralPart token:(token id:Lit_Chars val:/git/))
(VarSubPart name:repo)
(LiteralPart
token:
(token
id: Lit_Chars
val: /listing.txt
loc: (line_span pool_index:0 col:17 length:12)
)
)
]
)
fd: 1
)
]
)
This may look like a Lisp S-expression, but two features give it more structure:
[ ... ] aboveIn addition to this textual representation, which is useful for debugging, there's also a binary representation. I will describe that in tomorrow's post.