blog | oilshell.org
If you haven't used Google's protocol buffer serialization technology, this analogy may be helpful:
A similar analogy explains Zephyr ASDL, which I explained from a few other angles in the last post:
C data model : Protocol Buffers :: ML data model : ASDL
ML is the language that introduced algebraic data types or ADTs. ADTs are a characteristic feature of strongly-typed functional languages like Standard ML, OCaml, and Haskell.
ASDL, like protocol buffers, is a domain-specific language that describes a language-independent serialization format for a particular data model -- in this case, the ML data model. It has the following constructs:
Oil will use a custom serialization format I developed, but Python doesn't serialize the data structures it represents with ASDL. Instead, it uses ASDL to share the AST between languages, bridging the parser written in C and the AST module in Python.
Taking into that account, this analogy is also valid:
ASDL : Python :: WebIDL : Web Browser
Yesterday, I committed the first pass of oil's ASDL implementation. The schema parser is taken from Python, but these three features are new:
oheapformat, which I'll describe later.
Fortunately, not much code is required to implement these features:
~/git/oil$ asdl/run.sh count 417 asdl/asdl.py 249 asdl/py_meta.py 462 asdl/gen_cpp.py 268 asdl/encode.py 1396 total
py_meta.py file uses metaprogramming all over: Python
metaclasses, but also things like dynamic
I believe that type checking oil with mypy is now hopeless. It was thwarted by very simple metaprogramming, and this addition won't help. However, I believe that ASDL is more valuable than mypy for ensuring the structural integrity of the program.
Another thing to ponder: you could say this means I value Lisp over ML, though paradoxically the purpose of the metaprogramming is to use ML's data model in C++ and Python.
I haven't used ASDL in oil yet -- that's the next step. Since I'm obsessed with the line count, let me snapshot the tree now:
$ ./count.sh parser Lexer/Parser 77 osh/parse_lib.py 196 osh/arith_parse.py 291 osh/bool_parse.py 334 osh/lex.py 1144 osh/word_parse.py 1455 osh/cmd_parse.py 3497 total AST and IDs 80 core/tokens.py 99 core/expr_node.py 441 core/id_kind.py 491 core/cmd_node.py 777 core/word_node.py 1888 total Common Algorithms 228 core/lexer.py 338 core/tdop.py 566 total
Using ASDL will affect the middle section the most, but I'm not sure if it will
get bigger or smaller. On the one hand, ASDL provides impressive code
compression. I mentioned in the last post that
123 lines of
ASDL turns into
~8100 lines of C code in Python. (However, the
format needs just 907 lines of C++ generated from 107 lines of ASDL, an order
of magnitude less code. More on that later.)
On the other hand, the
WordPart classes in
nontrivial methods, which I need to attach to the classes generated from
osh.asdl. Also, the tree will be more heterogeneous, because I'm
osh very faithfully and then "lowering" it into what I'm calling
ovm in my head.
ovm is more homogeneous.
But whether it gets bigger or smaller, the new AST representation brings us
closer to the top priorities. It forms the
backbone of both the interpreter and the tools to convert
/ bash to
This conversion is, of course, the main reason I expect anyone to actually use