Two days before I went on vacation, I described how I transformed the OSH AST into what I call the Lossless Syntax Tree. This was motivated by the requirement to translate shell to Oil (part one, part two).
The post generated quality discussion on Hacker News, Lobsters, and Reddit, which is what I was hoping for. I wanted to "crowdsource" my research into how different language platforms represent code losslessly.
I made a wiki page called Lossless Syntax Tree Pattern to distill the responses, planning to turn it into a blog post. I also drafted a post that showed more examples of the AST versus the LST.
Then I then went on vacation. When got back on Wednesday, full of renewed energy for the project, I directed it at coding instead of blog posts.
That was the right thing to do, but unfortunately it means that the blog is backlogged. Drafts are being neglected and TODOs are piling up.
In this post I'll summarize what I had planned to write about, without making a promise to do so any time soon. Tomorrow I'll talk about the coding tasks that have higher priority.
Leave a comment if you want to see more on any of these topics.
In the Blog TODO Stack, I grouped future blog posts into four themes:
I managed to knock off two posts: Pretty Printing ASTs with ASDL and The Thinner Waist of the Interpreter, but there are still many loose ends.
It should take three or four posts to wrap up the first two themes. I don't feel as much urgency with the third and fourth themes, since they'll benefit from future experience in implementing Oil.
There are at least three more themes in play. Here's a list of possible posts:
(a) Lossless Syntax Tree, Part Two. As mentioned, this draft goes into more detail on the AST vs. Lossless Syntax Tree for OSH.
(b) An Algorithm for Style-Preserving Source Code Translation. The algorithm I used in translating shell to Oil is worth describing.
(c) Lossless Syntax Tree Survey. The docs on the wiki page have a number of important points worth calling out.
One of the best documents is the design doc for Microsoft's Roslyn platform for C# and Visual Basic. Clang is also powerful and mature, but its documentation isn't as good.
(d) Lossless Syntax Tree Conclusions. Make the following arguments:
There have been several posts about parsing problems in shell:
There are an equal number of problems related to execution. A few that come to mind:
Shell has Dynamic Scope. Dynamic scope means that the callee can see
all of the caller's variables, not just the arguments it passed. In
other words, "local" variable lookup traverses the call stack!
Most people are not familiar with this discredited idea in programming languages.
Run dynamic-scope/run.sh and see what happens.
Bash Evaluates Code in Strings Without Eval. This is issue 3, an undocumented feature which is bizarre even for experienced shell users. It also relates to an infinite variable name evaluation rule that you won't see in any other language.
Bash has Separate Expression Languages for Strings, Ints, and Booleans. This design has bad consequences:
[[ a = b ]]tests for equality of strings, while
(( a = b ))does assignment of variables.
[[ $x == $y ]],
[[ $x -eq $y ]], and
(( $x == $y ))are three more ways to test for equality.
Shell is so confusing that experts are wrong about it:
Quoting the Right Hand Side of Assignments Isn't Necessary. Word splitting and globbing only happen within commands, not assignments. Authoritative shell advice doesn't mention this:
NOTE: Besides inhibiting word splitting and globbing, quoting also inhibits tilde expansion. If you know of other reasons to quote the RHS of an assignment, leave a comment.
Word Elision Leads to Command Elision. Word elision is when an empty,
unquoted word is omitted from an
argv array. It works in tandem with
word splitting (which is a poor substitute for
Command elision is when word elision leads to an empty
This came up in the thread Evaluations of backticks in if statements
help-bash mailing list. More than one bash
expert was confused by this. It boils down to
This is the most important theme. I'm writing about the good and bad parts of shell to motivate the design of a new shell language.
It deserves a separate roadmap, but here's what I'm thinking right now:
More Shell Features. Translating Shell to Oil talked about funcs, procs, subshells, if, case, etc. There are more features to talk about: multiline strings in place of here docs, arrays, globs, brace expansion, regular expressions. Command vs. Expression Mode.
Syntactic Puns. A correspondence between syntax and semantics makes
languages more usable. A pun is a syntax with multiple meanings.
is a pun in C;
[ a = b ] is a sort of "inter-language" pun in shell.
Method calls in Java/Python have multiple meanings.
Influence from Python. Python is my favorite language, and I plan to steal many of its features for Oil. More notable are the places where we'll diverge from Python:
const, tuples vs. lists.
Influence from R. The Oil language will have tables. The slogan is
that the output
ps are both tables. R is a language designed
around tables: it's the only language without the "ORM problem".
I've written short blurbs for more than a dozen possible blog post in three themes. The most important theme is #7: the Oil language design.
If you're interested in anything in particular, leave a comment.
In the next post, I'll describe what coding tasks I'm prioritizing over blog posts. The main goal is to attract contributors. If that works, I may have more time for blogging!