Two days before I went on vacation, I described how I changed the OSH AST into what I call the Lossless Syntax Tree. This was motivated by the requirement to translate shell to Oil (part two).
The post generated quality discussion on Hacker News, Lobsters, and Reddit, which is exactly what I was hoping for. Rather than digging through code and design documents myself, I wanted to "crowdsource" my research into how language implementations represent code losslessly.
I made a wiki page called Lossless Syntax Tree Pattern to distill the excellent responses, planning to turn it into a blog post, and then went on vacation. I had also drafted a second part to the post, going into more detail on the different between an AST and an LST.
I made no progerss on vacation, which is probably a good thing. When got back on Wednesday, full of renewed energy for the project, I directed it at coding instead of blog posts.
This the right thing because Oil is meant to be a piece of useful software, but unfortunately that means the blog is backlogged. Drafts are being neglected and TODOs are piling up.
In this post I'll summarize what I had planned to write about, without making a promise to do so any time soon. Tomorrow I'll talk about the coding tasks that have higher priority. If you're interested in any of the topics mentioned here, please leave a comment.
In the Blog TODO Stack, I grouped future blog posts into four themes:
I'd like to wrap up the first two themes, which should take three or four posts, but it's more important to move the project forward by writing code.
I don't feel as much urgency with the third and fourth themes, since they'll benefit from future experience in implementing Oil.
I've started posts on two more themes:
(a) Lossless Syntax Tree, Part Two. This draft contains details on the difference between the AST and Lossless Syntax Tree for OSH.
(b) An Algorithm for Style-Preserving Source Code Translation. The algorithm I used in translating shell to Oil.
(c) Lossless Syntax Tree Survey. I want to call out important points from resources on the wiki page. One of the best documents there is the design doc for Microsoft's Roslyn platform for C#.
Related to "the difficulty of parsing: language ecosystems that use a Lossless Syntax Tree often have two separate parsers:
(d) Lossless Syntax Tree Conclusions. Make the following arguments:
There have been several posts in this theme, although they mostly relate to parsing problems. Shell has an equal number of problems that occur at runtime.
help-bash -- word elision. I think this makes an interesting post to complemenet arrays, which was about word splitting. word elision leads to command elision. Special runtime case.
"Experts can be wrong"
quoting variable substitutions is never necessary in assignments. "Experts can be wrong"
= and == confusion -- reddit post
This theme is more essential.
glob and brace substitution?
redirect-dup -- blog code
I think I will attack these topics when I need a break from coding.
Lurking Smalltalk within Unix Building a Distro around Oil
Need to punt on this:
Unix philosophy / everything is a file
Design beyond Human Abilities Postmodern Programming -- entire systems in containers
Using 10 lines of shell, awk, make to generate this Hacker News graph
Using awk for the test runner -- good examples and feedback
3 papers: rediscovering 90's technology...
Review of Nystrom's syntax tree chapter. This touched on a lot of things.
Thorsten's Book
In the next post, I'll describe the test suite enhancements I've just committed to the master branch. The purpose of this work is to attract contributors. There are now enough separate tasks in this project that it makes sense to parallelize them.
The test suite enhancements deserve some blog space of their own, which is why I may skip over some posts.