Why Sponsor Oils? | blog | oilshell.org

Winter Blog Backlog: Recent Progress

2021-11-30

This blog is again falling behind the code, so I'm writing "backlog" posts to catch up. For example, I wrote two popular posts in this style during the summer:

  1. Summer Blog Backlog: Understanding and Using Shell
  2. Summer Blog Backlog: Distributed Systems (discussed as Kubernetes Is Our Generation's Multics)

The purpose of these posts is to maintain continuity and skip the detail. They mainly have bullet points and links, including #zulip-links and #comments.

If you want to read more about a topic, let me know in the comments!

Table of Contents
Release 0.9.5 - Oil Syntax Is "Complete"
Commands Accept Typed Arguments in Parens
Block Arguments Are Now A Special Case
Oil as a Foundation for DSLs
Other Changes
Release 0.9.3 - Extended Globs and Nix
How OSH is Designed / Why OSH Isn't Bash
How OSH is Implemented: Process, Tools, and Techniques
Summary
Appendix: What to Work On Next
#oil-dev > TODO Next (Zulip)
#oil-discuss > Brainstorming: Expanding the Project (Zulip)

This first backlog post sketches recent progress and releases, and is more detailed. For context, the last release was Oil 0.9.4 - User Feedback, less than 2 weeks ago.

Release 0.9.5 - Oil Syntax Is "Complete"

I made a release on Sunday that resolved a major design question in the syntax of the Oil language.

This change was motivated by a wart in A Tour of the Oil Language, and by the desire to filter tables with expressions like ls --qtt | where (size > 10).

Commands Accept Typed Arguments in Parens

For example:

$ const obj = {name: 'bob', age: 42}

# pass typed arg 'obj' to 'json' builtin
$ json write (obj)  
{
  "name": "bob",
  "age": 42
}

Everything in parens is parsed in expression mode, so you can also write it inline:

$ json write ({name: 'bob', age: 42})

Perhaps surprisingly, this syntax is backward compatible with shell, so it's available in both bin/osh and bin/oil.

It was tricky to parse this without conflicting with shell functions, but I found a clean way to do it: by adding another lexer mode for lookahead.

And I now consider Oil's syntax "complete"! It's tricky to design a clean language that's also an upgrade from shell, but our parsing model has made it possible. The subtitle for this release should be The Triumph of Lexer Modes!

(While Oil's syntax is complete, we still need to rewrite the evaluator to avoiding "metacircularity", so it can be translated to C++ without the Python interpreter.)

Block Arguments Are Now A Special Case

Recall that Oil also enhances commands to take blocks:

cd /tmp {
  echo $PWD
}

Commands can now take both typed args and blocks:

when (size > 10) {
  echo $name
}  # these three lines are a single command!

That is, this hypothetical when command takes 2 typed arguments:

  1. A lazily evaluated expression size > 10
  2. A lazily evaluated block echo $name

But the block syntax is now just syntactic sugar for a trailing typed argument. It could also be written (awkwardly) like this:

const myexpr = ^[size > 10]  # unevaluated expression
const myblock = ^(echo $name)  # unevaluated block
when (myexpr, myblock)

I like this rule because it simplifies the language semantics, while allowing a rich and familiar syntax.

There's still more work to do in this direction, like implement parameter binding for typed arguments, and documentation.

Oil as a Foundation for DSLs

I expect this syntax to be a flexible but readable foundation for many kinds of DSLs. The when example is inspired by Awk.

Shell, Awk, and Make Should Be Combined (2016)

It also should enable dplyr-like functionality:

ls --qtt | where (size < 10)

The plan for a TSV upgrade called QTT (quoted, typed tables) was mentioned in the Summer Blog Backlog.

What Is a Data Frame? (In Python, R, and SQL) (2018)

I also brainstormed more use cases for typed arguments on the issue tracker:

eval (myblock)  # extension to eval
assert (status === 2)  # test framework    

Other Changes

Release 0.9.3 - Extended Globs and Nix

Back in October, I implemented extended globs because Raphael Megzari noticed their use in Nix's shell scripts.

I drafted a release announcement, but it grew two more posts intended for the Nix audience. I didn't finish them, and instead skipped to the Oil 0.9.4 announcement. Here's a sketch of these posts.

How OSH is Designed / Why OSH Isn't Bash

I used the implementation of extended globs to illustrate three design principles of OSH, the compatible shell language. Each principle has a subtitle / slogan.

  1. OSH should run real shell scripts, like the ones in Nix.
  2. Don't confuse shell code and user data.
  3. Consider interactions between language features.

The canonical example was:

echo hi > @(*.cc|*.h)

Even though the word on the right looks like an extended glob, bash and mksh treat it as a string, writing a file with literally that name. OSH disallows this, which brings all 3 principles above into play.

OSH wants you to write

echo hi > '@(*.cc|*.h)'

if that's what you mean, and this obviously falls in the common subset of OSH and bash.


I finished this post and got feedback from #oil-dev on Zulip. But I decided it wasn't a great post.

While those 3 principles are absolutely true and useful to explain, there are other ones that explain less obvious design decisions.

So I updated the Language Design Principles wiki page, adding these:

  1. Minimize the combined OSH + Oil language size to the degree possible.
  2. OSH (not just Oil) should be familiar to Python and JavaScript users.

Together, these two principles explain why OSH tags values with types, not locations like bash does. That is, declare -i and declare -A don't work like they do in bash. (But as usual, there is a large common subset.)

The bash model means that the assignment statement will occasionally fail, printing errors to stderr, but letting your program continue! I consider this unacceptable.

How OSH is Implemented: Process, Tools, and Techniques

Implementing extended globs wasn't easy. I had put it off for years, and there was at least one failed attempt when I resumed the work in October.

But I came up with some creative and compatible solutions that illustrate how OSH is implemented, and how the implementation affects the design.

As the Zen of Python says:

If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.

So this post would describe the development process I use:

  1. Start with spec tests.
  2. I observe the results, and think hard about how to "solve" the language design problem within our constraints.
  3. Design data structures before algorithms. The core of the shell is the lossless syntax tree and the runtime interpreter state in core/state.py.

The second section describes the tools and techniques, which are largely metalanguages:

  1. We use algebraic data types via Zephyr ASDL to help us consider all cases.
  2. A subset of typed Python (mycpp, formerly OPy) keeps the code short.
  3. We use Lexer Modes and re2c to consider all cases when recognizing syntax.

The point of this post is that this implementation strategy has been stable for five years, and we continue to feel the benefits!

For example, I doubt I could have implemented extended globs or typed arguments without these "guard rails". Those features were both tricky late additions, and the metalanguages helped me made sense of them.

Summary

Appendix: What to Work On Next

These thoughts aren't as well-formed than the ones above.

#oil-dev > TODO Next (Zulip)

There are always too many things to do. In this thread, I dumped all the tasks that came to mind and tried to prioritize them.

Summary:

  1. Rewrite the build system with containers and probably Ninja. I want to do something along the lines of what I mentioned in Comments About Build Systems and CI Services.
  2. Get back to work on C++ translation and the garbage collected runtime. This is essential for Nix.
  3. Do research on shell test frameworks, and probably revive the JSON crash dump feature. Motivated by work on Nix.

#oil-discuss > Brainstorming: Expanding the Project (Zulip)

I'm always thinking about ways to get the project done faster. The next backlog post will address this topic, but this link is a dump of my thoughts. Summary:

I like this idea in theory. But I also just enjoy working on Oil without "administrative duties". Usually after I write a few blog posts, I want to get back to coding.

Let me know if you know of any candidates for these jobs, at any price!