(We'll see what I mean by "ordered" in the next post.)
In this section, I'll explain this messy diff of osh.asdl. Recall that this schema was originally designed for execution, but now it's become a Lossless Syntax Tree for translation to Oil.
Differences:
(1) As mentioned, I introduced the concept of a Sentence, and removed the
Fork node. A sentence can end with ; or &. The former will be compiled
to synchronous execution, and the latter asynchronous.
(2) I preserved the difference between $var and ${var}, which are
translated to $var and $(var) in Oil. They of course mean the same thing.
(3) The Assignment node now stores only a keyword: either declare,
export, local, or readonly. Prior to this I was trying to represent the
semantics of each statement, but now we only represent the syntax.
(4) The body of a loop like do; echo hi; done and a block like { echo hi; }
are now represented differently. We translate them to a uniform block syntax.
(5) And if statement now has an optional else_action. Prior to this, we
would represent the else statement like elif true; echo else action, but this
isn't good enough for translation.
(6) Case statements now have a sequence of case_arm. This preserves the
"ordering" property of the Lossless Syntax Tree -- we should be able to
traverse the children of a node in syntactic order. We shouldn't have to
jump back and forth.
I made note of a couple differences that aren't represented:
foo() { echo; } vs. the ksh-style function foo { echo; }'string without escapes' vs. $'C-escaped string'These haven't proved necessary for translation so I cheated a bit. If we really needed to recover the original source rather than just translate to Oil, I would fully represent these differences.
This change also means that Oil is starting to like a compiler, because I
need to "lower" osh.asdl to a simpler form for execution (called ovm.asdl
for now).
Translating the Go compiler from C to Go. In other words, Bootstrapping. It preserves comments. I will need to do the same thing to bootstrap Oil!
Python 2to3.
More components needed:
Switching gears:
I need to parallelize all this work.
Next post:
Annotated Lossless Syntax Tree.
Algorithm for
I started the oil project by writing a tree-walking interpreter for osh, a dialect of bash. All shells that I've looked are tree-walking interpreters, although they interleave parsing an execution, so they cannot have a single thin waist.
In this post, I made note of three use cases for "the" AST:
I made
. They must have multiple rephave multiple
rather than having a single
at use this pat Like all shells, Oil started as a tree-walking 0interpreter.
I had an AST for oil
"Transpilers" like CoffeeScript don't need to do this, because the target JavaScript code won't be edited.
A Source Code Representation: the Annotated Lossless Syntax Tree
In the last two posts, I showed examples of the osh to oil translation. The translation algorithm must preserve comments and formatting, i.e. indentation and line breaks.
I expect the upgrade process from bash to be:
The algorithms and data structures to accomplish this style-preserving translation took some thinking.
In this post, I'll describe the data structure I think of as the "Lossless Syntax Tree". It's different than both an AST (abstract syntax tree) and a parse tree/concrete syntax tree.
using the lossless syntax tree. In the next post, I'll describe the algorithm for
What can you do with this data structure?
https://clang.llvm.org/docs/LibFormat.html
FIXME: Write up design. Interface
The core routine of LibFormat is reformat():
tooling::Replacements reformat(const FormatStyle &Style, Lexer &Lex,
SourceManager &SourceMgr,
std::vector
I think that this is compatible with my representation, but I don't know.
Python DOES USE a Concrete syntax tree. Python.asdl is the abstract syntax tree.
But Grammar/Grammar creates a pure cnocrete syntax tree. There are NO SEMANTIC ACTIONS. But that's why Python needs .pyc files. That is shown in Haberman's post.
I want a FAST PARSER. And although I haven't measured it, creating 3 extra nodes for every command is bound to make it slower!
ANTLR 3 vs ANTLR 4. Always creates CST. I believe this is the wrong choice, or at least it makes me want to use ANTLR even less.
ANTLR is dumb; shouldn't use.
Wiki page: Source to Source Translation
CoffeeScript Go