blog | oilshell.org

January Release Notes and Themes

2022-01-30

This post describes the last two Oil releases, and then elaborates on emerging project themes.

The most important one is thinking of the #shell-runtime as a state machine that receives asynchronous messages. This work is in progress, but it's worth describing the motivations, and what we've done so far.

Oil version 0.9.7 - Source tarballs and documentation.

Table of Contents
Background
Oil 0.9.6 on December 30th
User-Facing Changes
Under the Hood
Oil 0.9.7 on January 28th
Under the Hood
Closed Issues
The Shell Runtime As A State Machine
Whac-A-Mole With Signal Handling Bugs
Credits
A Model of the Runtime
Other Shells Have This Problem
Exhaustive Test Matrix
Recap
More Themes
What's Next?
Appendix: Metrics for the 0.9.7 Release

Background

I wrote about the 0.9.5 release in November, in Winter Blog Backlog: Recent Progress.

The work in the the 0.9.6 and and 0.9.7 releases has a variety of motivations:

  1. Project-based. Language features for Nix, and a bug fix found by running ble.sh tests.
  2. Bug Reports. A clear pattern is that Oil doesn't handle signals correctly! This led me down a rabbit hole related to SIGWINCH, the signal for terminal window size change. The fixes and testing made me realize that we should "recast" the #shell-runtime as a state machine.
  3. Bug backlog. I went through almost every bug in the issue tracker, and fixed a couple old ones. (Zulip: Issue Triage). Even though Oil is a long-winded project, it feels like it can "converge" with enough help.
  4. Infrastructure: Preparing for a "compiler engineer" to join the project. Our continuous build is in good shape.

The next two sections have details and credits. If you're casually following the project, you may want to skip to the last section.

 

Oil 0.9.6 on December 30th

User-Facing Changes

Full changelog: https://www.oilshell.org/release/0.9.6/changelog.html

Under the Hood

 

Oil 0.9.7 on January 28th

I released Oil 0.9.7 two days ago. Let's start with the infrastructure changes. Then we'll look at user-facing changes, which leads into the larger state machine theme.

Under the Hood

Now let's look at closed issues, which leads into the state machine theme.

Closed Issues

#1077 Add interactive tests that match other shells (e.g. Ctrl-C is exit code 130)
#1072 Fix vm-baseline benchmark after optimization
#1067 Terminal resize causes wait to exit
#1064 Tab completion does not suggest aliases or functions
#743 $PATH is empty when it's not in the parent's environment, unlike other shells
#467 Ctrl-C in command substitution exits parent shell

Full changelog: https://www.oilshell.org/release/0.9.7/changelog.html

 

The Shell Runtime As A State Machine

Whac-A-Mole With Signal Handling Bugs

Let's focus on two of these issues:

They don't seem related, or even that interesting. But they led me to reconceptualize the shell runtime as a state machine.

They both relate to signals: Ctrl-C causes SIGINT, and resizing the terminal causes SIGWINCH. And they reminded me of past bug fixes, like running trap handlers when the read builtin is interrupted.

I realized I've been playing Whac-A-Mole with this class of bug, which is bad. So I started to work on the pexpect tests that Brandon added for the fg bug, expanding the harness and planning a test matrix.

I fixed these bugs, and found more bugs to fix (e.g. in the wait -n variant). Revelation: We were missing an important way of testing the shell! Now that we have test/interactive.py, we can make monotonic progress on a multi-dimensional test matrix. More on this below.

Credits

I also tagged older bugs #signal-handling so I can make another pass over them. They should fall in specific cells of the test matrix, and "disappear" once those cells are filled in.

A Model of the Runtime

Based on this experience, I sketched an idea for a blog post:

The idea is that the shell interpreter walks the syntax tree and:

  1. Makes syscalls like
  2. Receives two types of asynchronous messages:
  3. Updates its state based on these messages
  4. And then there are five ways that the shell waits for state to converge:

One part of this is the singleton Waiter abstraction, which has existed for years. What's new is integrating syscalls and signals into a single state machine model.

It reminds me of DJB's self-pipe trick. How do you wait on a child process and a async read() concurrently? By writing a byte to a "self pipe" in the signal handler, so it reduces to select().

We're not using the same mechanism, but we also have to unify disparate concurrency styles. (Linux has signalfd, but Oil should be portable to all Unixes.)

Other Shells Have This Problem

I felt dumb for putting this issue off while bugs piled up, even though I hinted at it in the last sentence of this 2019 post.

But then I ran grep SIGWINCH on bash's changelog and found that it has a history of similar issues. They're also playing Whac-A-Mole with signal bugs.

From reading the source of other shells and using strace on them, I don't believe they have a clean runtime model. For example:

  1. bash doesn't even wait() for process subs like diff <(sort left) <(sort nonexistent)
  2. I think shells make too many syscalls.

Exhaustive Test Matrix

I wrote a comment in test/interactive.py that describes this five dimensional test matrix. It should let us explore a large portion of the state space and prevent regressions.

  1. What is the main loop doing?
  2. What message is received?
  3. If it's a signal, is it trapped by the user, or untrapped?
  4. Is the shell interactive or batch?
  5. As we do in spec tests, the interactive state machine tests should compare OSH with shells like bash, dash, mksh, zsh, ...

Again, the idea is to make monotonic progress rather than playing Whac-A-Mole. This will also help the code translate cleanly and automatically to C++.

Recap

I wrote about related problems in Technical Issues and Risks (August 2020) > Deferred Issues: What the Interactive Shell Depends On.

Despite recently declaring that the interactive shell is "punted", I'm thinking about all these issues again, and I have a plan for each one:

One way to clarify this: I'm limiting the scope of the interactive shell to my own usage. I want to punt customizations outside the project -- to the headless shell. In contrast, I very much care about making the Oil language useful for others. That is the core of the project.

I expect that the state machine model will improve Oil, and that future blog posts will make reference to it. I've written a lot about principled and exhaustive parsing (#parsing-shell and #ASDL), but not much about the #shell-runtime. That's because we were missing something!

 

More Themes

This post is now too long, so I moved these themes to the next post:

  1. Fundraising / Hiring for a Compiler Engineer
  2. Zulip: Ideas for an Oil Logo. Some good connotations:

 

What's Next?

I want to write these posts:

As far as coding, these issues are on my mind:

 

Appendix: Metrics for the 0.9.7 Release

As usual, these metrics help me keep track of the project. I hope they'll also give the compiler engineer color on what needs to be done.

Let's compare this release with Oil 0.9.4 - User Feedback.

OSH spec tests:

Translation progress:

I forgot to mention that I fixed a mycpp bug related to field inheritance, which led to the big jump in tests passing in C++.

#oil-dev > Translation Bugs That Survived the Compiler

The source code is getting more correct, but not much bigger:

Ditto for the binary: