Why Sponsor Oils? | blog | oilshell.org

Oil 0.10.0 - Can Unix Shell Error Handling Be Fixed Once and For All?

2022-05-05

This is the latest version of Oil, a Unix shell that's our upgrade path from bash:

Oil version 0.10.0 - Source tarballs and documentation.

To build and run it, follow the instructions in INSTALL.txt. The wiki has tips on How To Test OSH. If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ.


Here are some comics for context. The first one describes set -e -u -o pipefail, which is sometimes called "bash strict mode".

Bash Errors Comic

I use it in all my shell scripts, but it's not enough. Strict mode has holes and pitfalls.

Bash Functions Comic

This release announcment describes what Oil does about it!

Table of Contents
Error Handling Overhaul: try and _status
Basic Idea
Docs
Language Design Notes
Backward Compatibility: The Eternal Puzzle
Four Ways to Use OSH / Oil
Good News: 50K Euros From NLnet
Please Sponsor Oil
Groundhog Day
Never-Ending Arguments About Shell
Prediction
Production Incidents
Acknowledgements
What's Next?
Appendices
Why Change try?
More Design Notes
Closed Issues
Metrics for the 0.10.0 Release

Error Handling Overhaul: try and _status

So, can shell's error handling be fixed once and for all? I believe Oil 0.10.0 has done this. It's the first shell with reliable error handling in 50 years :-)

Basic Idea

Recall that Oil is designed to be familiar to Python and JavaScript users. So a program should stop by default on any failure, like:

cp: cannot create regular file '/nonexistent': Permission denied
  cp myfile /nonexistent
  ^~
hello.sh:1: errexit PID 29556: Command failed with status 1
# shell exits with status 1

Additionally, Oil fixes the holes in shell and bash, and steers you away from the pitfalls.

You can also handle those failures in a custom way with the new try builtin:

try {
  cp myfile /nonexistent  # exit status may be non-zero
  var item = a[i]         # index may be out of range
}                         # try sets _status, not $?
if (_status !== 0) {
  echo 'error'
}

Docs

This work is now documented!

Oil vs. Shell Idioms > Error Handling. This comprehensive list of examples is the first stop for users.

Oil Fixes Shell's Error Handling (errexit). I spent over a week writing and revising this design doc and reference.

I've also updated A Tour of the Oil Language, which explains the language from scratch.

Language Design Notes

Let's compare Oil with 2 popular languages:

  1. Unlike Go, errors in Oil are fatal by default.
  2. Oil code is as short or shorter than Python, while still being explicit.

For example, ignoring a failure is often one line:

try ls /bad

rather than four lines:

try:
  myls('/bad')
except Exception:
  pass

Error handling code with 3 branches (true/false/error) is also shorter.

So even though the design is constrained by compatibility with both shell and bash, I'm very happy with how it turned out. This has been a consistent theme: there have been surprisingly few compromises in the Oil language!

Backward Compatibility: The Eternal Puzzle

What do I mean by "Oil fixes shell's error handling"? I mean something pretty strong:

  1. OSH runs existing shell scripts as is, whether you use errexit or not.
  2. Oil has the correct defaults. It's a clean slate language that should be familiar to Python and JavaScript users.
  3. There's an upgrade path from OSH to Oil.

Four Ways to Use OSH / Oil

What is the upgrade path? I recently figured out a concise way to explain how global options like shopt --set oil:basic work. There are four use cases for OSH and Oil:

  1. Run old scripts as is (no options)
  2. Improve old scripts while keeping compatibility with sh or bash (strict:all)
  3. Upgrade old scripts, dropping compatibility (oil:basic)
  4. Write new scripts (oil:all, or use bin/oil)

This explanation is on the OSH versus Oil wiki page, and I should write a longer post with examples.

Good News: 50K Euros From NLnet

I applied for an NLnet grant in February, and we got it in April!

So now we have 50K euros to help pay a compiler engineer to translate Oil to C++. See the blog post last month:

Oil Is Being Implemented "Middle Out". I show evidence that Oil can be fast, and list five ways you can help.

Please Sponsor Oil

I'm glad we have this grant to kick things off. But it's likely not enough to pay someone to completely "own" the task of translating Oil to C++.

If you like this error handling work, or Oil in general, you should sponsor it:

The NLnet grant comes with the constraint that we hire someone in the European Union. I would like to also pay people in other parts of the world. If you understand compilers, C++, and Python, that person could be you!


I hear the feedback loud and clear that the docs are incomplete. However, they take a long time to write and revise, and I need help.

For example, I spent 4-5 days implementing the language changes this release, but documenting everything took even longer! But I think we've finally unraveled and documented the decades-old mystery of shell error handling.

So this project can be finished, but we need help. The easiest way to help is to donate.

Groundhog Day

This work goes back to the original motivation for the project: Shell made me productive programmer, but I can't recommend it to my friends with a straight face! It has too many holes and pitfalls.

I noted in 2019 that David Korn, Tom Duff, and Richard Stallman complained about these same problems with shell in 1991 and 1994. Those complaints were closer to the creation of Unix in 1970 than we are to 1994!

So I view shell as a "groundhog day" problem -- we keep having the same conversations over and over again without making progress.

Have we lost our memory, and our collective will to build and fix things? Are we able to pass on knowledge to new generations of programmers?

Here are a couple recent examples.

Never-Ending Arguments About Shell

A recent "troll" blog post generated dozens of comments across multiple sites:

This conversation is largely a waste of time, because people said the same things 10 and 20 years ago, and the situation hasn't changed. Quite the contrary: shell is more popular than ever.

(I should write a blog post about why shell is more common these days, and why it will be more common in the future. Short answer: scale, the increasing heterogeneity of interconnected systems, and the ratio between apps and operating systems.)

On the other hand, the comments re-affirm that many people use shell to solve problems you care about! Even if you don't use shell directly, you do want a better shell.

Comments:

I used to give the same advice, but I completely changed my opinion over the past 10 years or so. I eventually put in the time and learned shell scripting.

And:

Y'all writing bash scripts without set -u and error checking?

Here's an example from 2008, which was largely before the rise of a cloud full of virtual machines and containers:

Unfortunately I don't think there's a really good Unix programming language to replace the Bourne shell, which is one of the reasons that writing programs in the Bourne shell remains so tempting

Prediction

In the comment threads for this release announcement, some people will react negatively because Oil is a shell. They won't understand that Oil fixes exactly the problems that make shell frustrating! We're on the same side.

They may also say that they "switch" to Python after 100 lines of shell. Given that shell is in such poor shape, this is reasonable! But I still want to write about the The Shell XOR Python Fallacy (my related comment in the thread above).

Production Incidents

This recent blog post noted that a missing set -o pipefail caused a production incident at Cloudflare.

Note that pipefail is Oil's option groups oil:basic and oil:all, so using Oil would help here.


It reminded me of this similar post from 2017:

This is the problem that Oil's command_sub_errexit fixes. Moreover, Oil is the only shell with this option.


There are many other posts in this genre -- feel free reference them in the comments. Again, Oil patches all the holes in shell. If you disagree, please file a bug.

Acknowledgements

3d489ab8 Nathan Sketch re2c patterns should never match NUL bytes (#1095)
adaddb2c glyh Implement jobs -p (#1098)

What's Next?

I look forward to feedback on Oil's error handling features. I expect that the next few months will be filled with with recruiting and fundraising.

This may help find the people with the right skill sets and interest. Again, check out Oil Is Being Implemented "Middle Out" as well as Compiler Engineer Job on the wiki!

Appendices

Why Change try?

Here's some background on the language changes in this release.

In October 2020, I said that one of the Four Features That Justify a Unix Shell is reliable error handling.

At that point, I had implemented most of what these new docs describe, like the command_sub_errexit and strict_errexit options. However:

As a result, few users tried it. So I've learned my lesson with respect to documentation-driven development! A feature isn't done until it's documented and users give feedback.

So it's important that you download Oil 0.10.0, read the docs, and give feedback on it. Again, I claim that this is the first shell in 50 years with reliable error handling. Prove me wrong!


Also, ca2013 reported a bad error message with strict_errexit, which led me to overhaul it. It's now stricter, which makes idiomatic Oil code more straightforward and consistent. I avoided the "meta-pitfall".

More Design Notes

Here are a couple notes for language designers, and readers following the #software-architecture concepts on this blog.

Should a Shell Have Exceptions?

There is no notion of Python-like exceptions in Oil. This avoids what I call a Perlis-Thompson problem. When you add new features to a language, they have to compose with the old ones.

I also like this design because Oil remains a thin layer over the kernel. The exit status of processes is a kernel concept, and it's naturally extended to shell builtins and functions.

The Meta Language Influences the Language

In hindsight, the design of try is obvious and has been missing from shell for decades!

For example, in Oil's own test harnesses, I frequently need to handle errors from shell functions while errexit is on. This is difficult to do correctly in shell, but try makes it easy.

So why did no other shell come up with this solution? I believe a major reason is that they're tree interpreters written in C. Error recovery is hard in C, so shells have avoided language features that require it!

In contrast, Python exceptions are easy to use, and we use them to implement Oil's try. (But again there's no notion of exceptions in Oil itself.)

So this is highly related to the point about metalanguages I've been making, particularly in the last post. In the appendix, I mention the draft of Shell's Implementation Language Has Always Been a Problem, where I further justify this point. This is justified by comments on longjmp() by Stephen Bourne, and a comment on Lisp exceptions in the bash source code.


On the other hand, why didn't I come up with this solution the first time? I think the original design was actually too minimal, focusing on slight adjustments to if. Also, language design is just hard :-)

Closed Issues

This release closed 6 issues. You can also view the full changelog.

#1113 strict_errexit can occur in a child process, which doesn't abort the whole script
#1111 error handling changes: rename try -> bool, new try that takes a block
#1107 strict_errexit error message points to the wrong line
#1106 Unexpected block args in builtins should be errors
#942 test and document var x = $(false)
#937 shopt parse_equals should only be on in config mode

Metrics for the 0.10.0 Release

These metrics help me keep track of the project. Let's compare this release with version 0.9.7, which I discussed in January Release Notes and Themes.

Spec Tests

The spec test suites for both OSH and Oil continue to expand and turn green.


Source Code Size

Despite the new features, the code is still compact. Significant lines:

Physical lines:

Benchmarks

The oil-native Parser performance hasn't changed:

Runtime performance for the Python / "OVM" build:

These measurements are noisy, but this looks like a regression. There's a change on both machines, even though the size is different. It could be related to some of the runtime checks for error handling.

I will keep an eye on it, but what matters is the speed of oil-native, not the speed of this reference implementation. We already know this is too slow!

Native Code Metrics

I didn't work on the translation during this release, but we're not regressing:

The following deltas are proportional. Generated source lines;

Binary size: