Winter Blog Backlog: Recent Progress

2021-11-30

This blog is again falling behind the code, so I'm writing "backlog" posts to catch up. For example, I wrote two popular posts in this style during the summer:

The purpose of these posts is to maintain continuity and skip the detail. They mainly have bullet points and links, including #zulip-links and #comments.

If you want to read more about a topic, let me know in the comments!

Table of Contents

Release 0.9.5 - Oil Syntax Is "Complete"

Commands Accept Typed Arguments in Parens

Block Arguments Are Now A Special Case

Oil as a Foundation for DSLs

Other Changes

Release 0.9.3 - Extended Globs and Nix

How OSH is Designed / Why OSH Isn't Bash

How OSH is Implemented: Process, Tools, and Techniques

Summary

Appendix: What to Work On Next

#oil-dev > TODO Next (Zulip)

#oil-discuss > Brainstorming: Expanding the Project (Zulip)

This first backlog post sketches recent progress and releases, and is more detailed. For context, the last release was Oil 0.9.4 - User Feedback, less than 2 weeks ago.

Release 0.9.5 - Oil Syntax Is "Complete"

I made a release on Sunday that resolved a major design question in the syntax of the Oil language.

This change was motivated by a wart in A Tour of the Oil Language, and by the desire to filter tables with expressions like ls --qtt | where (size > 10).

Commands Accept Typed Arguments in Parens

For example:

$ const obj = {name: 'bob', age: 42}

# pass typed arg 'obj' to 'json' builtin
$ json write (obj)  
{
  "name": "bob",
  "age": 42
}

Everything in parens is parsed in expression mode, so you can also write it inline:

$ json write ({name: 'bob', age: 42})

Perhaps surprisingly, this syntax is backward compatible with shell, so it's available in both bin/osh and bin/oil.

It was tricky to parse this without conflicting with shell functions, but I found a clean way to do it: by adding another lexer mode for lookahead.

And I now consider Oil's syntax "complete"! It's tricky to design a clean language that's also an upgrade from shell, but our parsing model has made it possible. The subtitle for this release should be The Triumph of Lexer Modes!

(While Oil's syntax is complete, we still need to rewrite the evaluator to avoiding "metacircularity", so it can be translated to C++ without the Python interpreter.)

Block Arguments Are Now A Special Case

Recall that Oil also enhances commands to take blocks:

cd /tmp {
  echo $PWD
}

Commands can now take both typed args and blocks:

when (size > 10) {
  echo $name
}  # these three lines are a single command!

That is, this hypothetical when command takes 2 typed arguments:

A lazily evaluated expression size > 10
A lazily evaluated block echo $name

But the block syntax is now just syntactic sugar for a trailing typed argument. It could also be written (awkwardly) like this:

const myexpr = ^[size > 10]  # unevaluated expression
const myblock = ^(echo $name)  # unevaluated block
when (myexpr, myblock)

I like this rule because it simplifies the language semantics, while allowing a rich and familiar syntax.

There's still more work to do in this direction, like implement parameter binding for typed arguments, and documentation.

Oil as a Foundation for DSLs

I expect this syntax to be a flexible but readable foundation for many kinds of DSLs. The when example is inspired by Awk.

Shell, Awk, and Make Should Be Combined (2016)

It also should enable dplyr-like functionality:

ls --qtt | where (size < 10)

The plan for a TSV upgrade called QTT (quoted, typed tables) was mentioned in the Summer Blog Backlog.

What Is a Data Frame? (In Python, R, and SQL) (2018)

I also brainstormed more use cases for typed arguments on the issue tracker:

eval (myblock)  # extension to eval
assert (status === 2)  # test framework

Here's a related Zulip thread: Shell as a Language for DSLs. For context, I last mentioned it in July's release of Oil 0.8.12.

Other Changes

I removed Go-like flag syntax for Oil builtins because it was inconsistent and incompatible. All builtins now use the same GNU-style flag parser, where long flags must be written --long.
I fixed a test/spec-cpp regression, so we now have 1114 tests passing in oil-native. This is an all-time high and well over half the OSH tests. (On the other hand, no Oil tests pass in C++ because the pgen2 parser doesn't have a runtime.)

Release 0.9.3 - Extended Globs and Nix

Back in October, I implemented extended globs because Raphael Megzari noticed their use in Nix's shell scripts.

I drafted a release announcement, but it grew two more posts intended for the Nix audience. I didn't finish them, and instead skipped to the Oil 0.9.4 announcement. Here's a sketch of these posts.

How OSH is Designed / Why OSH Isn't Bash

I used the implementation of extended globs to illustrate three design principles of OSH, the compatible shell language. Each principle has a subtitle / slogan.

OSH should run real shell scripts, like the ones in Nix.
- Caveat: The Common Subset Principle. Sometimes you have to make small changes, like adding a space or quotes. Sometimes you need bigger changes, especially if you rely on dynamic parsing or associative arrays.
Don't confuse shell code and user data.
- In other words, prefer statically parsed syntax.
Consider interactions between language features.
- Slogan: We should be able to explain the language with a straight face.

The canonical example was:

echo hi > @(*.cc|*.h)

Even though the word on the right looks like an extended glob, bash and mksh treat it as a string, writing a file with literally that name. OSH disallows this, which brings all 3 principles above into play.

OSH wants you to write

echo hi > '@(*.cc|*.h)'

if that's what you mean, and this obviously falls in the common subset of OSH and bash.

Related: Oil Doesn't Confuse Flags and Files (Code and Data)

I finished this post and got feedback from #oil-dev on Zulip. But I decided it wasn't a great post.

While those 3 principles are absolutely true and useful to explain, there are other ones that explain less obvious design decisions.

So I updated the Language Design Principles wiki page, adding these:

Minimize the combined OSH + Oil language size to the degree possible.
- This explains the minimal design for string literals and modules. For the same reason, it's likely that Oil will avoid an overhaul of redirects.
OSH (not just Oil) should be familiar to Python and JavaScript users.

Together, these two principles explain why OSH tags values with types, not locations like bash does. That is, declare -i and declare -A don't work like they do in bash. (But as usual, there is a large common subset.)

The bash model means that the assignment statement will occasionally fail, printing errors to stderr, but letting your program continue! I consider this unacceptable.

Related: #oil-discuss > Why declare -i isn't supported

How OSH is Implemented: Process, Tools, and Techniques

Implementing extended globs wasn't easy. I had put it off for years, and there was at least one failed attempt when I resumed the work in October.

But I came up with some creative and compatible solutions that illustrate how OSH is implemented, and how the implementation affects the design.

As the Zen of Python says:

If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.

So this post would describe the development process I use:

Start with spec tests.
- Write cases based on real usage like Nix.
- Add cases that explore rare corner cases and feature interactions.
I observe the results, and think hard about how to "solve" the language design problem within our constraints.
Design data structures before algorithms. The core of the shell is the lossless syntax tree and the runtime interpreter state in core/state.py.

The second section describes the tools and techniques, which are largely metalanguages:

We use algebraic data types via Zephyr ASDL to help us consider all cases.
A subset of typed Python (mycpp, formerly OPy) keeps the code short.
We use Lexer Modes and re2c to consider all cases when recognizing syntax.

The point of this post is that this implementation strategy has been stable for five years, and we continue to feel the benefits!

For example, I doubt I could have implemented extended globs or typed arguments without these "guard rails". Those features were both tricky late additions, and the metalanguages helped me made sense of them.

Summary

Oil 0.9.3 implemented extended globs for Nix. This feature is a nice example of how OSH is designed and implemented.
Oil 0.9.4 - User Feedback was released about 2 weeks ago. (Hacker News thread)
Oil 0.9.5 implemented fundamental syntax for the Oil language. The language should be able to express a variety of DSLs, e.g. for filtering streaming data and tables.
- We also need a test framework and flag parser. As usual, let me know if you want to help!

Appendix: What to Work On Next

These thoughts aren't as well-formed than the ones above.

#oil-dev > TODO Next (Zulip)

There are always too many things to do. In this thread, I dumped all the tasks that came to mind and tried to prioritize them.

Summary:

Rewrite the build system with containers and probably Ninja. I want to do something along the lines of what I mentioned in Comments About Build Systems and CI Services.
- Why? The continuous build and the dev build have slightly diverged. They're not incremental, parallel, and reproducible enough. This is a weakness of shell, and Make doesn't adequately address it either.
- Related: Issues labeled #containers
Get back to work on C++ translation and the garbage collected runtime. This is essential for Nix.
- Recall that I left off in the March release of Oil 0.8.8: the garbage collector works on a variety of examples.
Do research on shell test frameworks, and probably revive the JSON crash dump feature. Motivated by work on Nix.

#oil-discuss > Brainstorming: Expanding the Project (Zulip)

I'm always thinking about ways to get the project done faster. The next backlog post will address this topic, but this link is a dump of my thoughts. Summary:

Hire a "Compiler Engineer" to rewrite the hacky mycpp Python-to-C++ translator.
- This includes related work on the garbage collected runtime and OS bindings.
- It would be nice to use Python 3.10 with the new match statement operating on a typed AST. I'm eagerly awaiting MyPy support for the match statement!
- Translating OSH is a very doable task -- even a short one! There're more work to translate Oil, but it's also doable.
Hire a "Technical Writer" to finish the documentation. I fantasize about dumping my brain during some phone calls and have good documentation show up.
Rabble rouse with some blog posts to raise money. I'd like full time contributors that do professional-quality work.

I like this idea in theory. But I also just enjoy working on Oil without "administrative duties". Usually after I write a few blog posts, I want to get back to coding.

Let me know if you know of any candidates for these jobs, at any price!