Why Sponsor Oil? | source | all docs for version 0.12.5 | all versions | oilshell.org

Oil Fixes Shell's Error Handling (`errexit`)

Oil is unlike other shells:

It never silently ignores an error, and it never loses an exit code.
There's no reason to write an Oil script without errexit, which is on by default.

This document explains how Oil makes these guarantees. We first review shell error handling, and discuss its fundamental problems. Then we show idiomatic Oil code, and look under the hood at the underlying mechanisms.

Review of Shell Error Handling Mechanisms

POSIX shell has fundamental problems with error handling. With set -e aka errexit, you're damned if you do and damned if you don't.

GNU bash fixes some of the problems, but adds its own, e.g. with respect to process subs, command subs, and assignment builtins.

Oil fixes all the problems by adding new builtin commands, special variables, and global options. But you see a simple interface with try and _status.

Let's review a few concepts before discussing Oil.

POSIX Shell

The special variable $? is the exit status of the "last command". It's a number between 0 and 255.
If errexit is enabled, the shell will abort if $? is nonzero.
- This is subject to the Disabled errexit Quirk, which I describe below.

These mechanisms are fundamentally incomplete.

Bash

Bash improves error handling for pipelines like ls /bad | wc.

${PIPESTATUS[@]} stores the exit codes of all processes in a pipeline.
When set -o pipefail is enabled, $? takes into account every process in a pipeline.
- Without this setting, the failure of ls would be ignored.
shopt -s inherit_errexit was introduced in bash 4.4 to re-introduce error handling in command sub child processes. This fixes a bash-specific bug.

But there are still places where bash will lose an exit code.

Fundamental Problems

Let's look at four fundamental issues with shell error handling. They underlie the nine shell pitfalls enumerated in the appendix.

When Is `$?` Set?

Each external process and shell builtin has one exit status. But the definition of $? is obscure: it's tied to the command rule in the POSIX shell grammar, which does not correspond to a single process or builtin.

We saw that pipefail fixes one case:

ls /nonexistent | wc   # 2 processes, 2 exit codes, but just one $?

But there are others:

local x=$(false)                 # 2 exit codes, but just one $?
diff <(sort left) <(sort right)  # 3 exit codes, but just one $?

This issue means that shell scripts fundamentally lose errors. The language is unreliable.

What Does `$?` Mean?

Each process or builtin decides the meaning of its exit status independently. Here are two common choices:

The Failure Paradigm
- 0 for success, or non-zero for an error.
- Examples: most shell builtins, ls, cp, ...
The Boolean Paradigm
- 0 for true, 1 for false, or a different number like 2 for an error.
- Examples: the test builtin, grep, diff, ...

Oil's new error handling constructs deal with this fundamental inconsistency.

The Meaning of `if`

Shell's if statement tests whether a command exits zero or non-zero:

if grep class *.py; then
  echo 'found class'
else
  echo 'not found'  # is this true?
fi

So while you'd expect if to work in the boolean paradigm, it's closer to the failure paradigm. This means that using if with certain commands can cause the Error or False Pitfall:

if grep 'class\(' *.py; then  # grep syntax error, status 2
  echo 'found class('
else
  echo 'not found is a lie'
fi
# => grep: Unmatched ( or \(
# => not found is a lie

That is, the else clause conflates grep's error status 2 and false status 1.

Strangely enough, I encountered this pitfall while trying to disallow shell's error handling pitfalls in Oil! I describe this in another appendix as the "meta pitfall".

Design Mistake: The Disabled `errexit` Quirk

There's more bad news about the design of shell's if statement. It's subject to the Disabled errexit Quirk, which means when you use a shell function in a conditional context, errors are unexpectedly ignored.

That is, while if ls /tmp is useful, if my-ls-function /tmp should be avoided. It yields surprising results.

I call this the if myfunc Pitfall, and show an example in the appendix.

We can't fix this decades-old bug in shell. Instead we disallow dangerous code with strict_errexit, and add new error handling mechanisms.

Oil Error Handling: The Big Picture

We've reviewed how POSIX shell and bash work, and showed fundamental problems with the shell language.

But when you're using Oil, you don't have to worry about any of this!

Oil Fails On Every Error

This means you don't have to explicitly check for errors. Examples:

shopt --set oil:upgrade     # Enable good error handling in bin/osh
                            # It's the default in bin/oil.
shopt --set strict_errexit  # Disallow bad shell error handling.
                            # Also the default in bin/oil.

local date=$(date X)        # 'date' failure is fatal
# => date: invalid date 'X' 

echo $(date X)              # ditto

echo $(date X) $(ls > F)    # 'ls' isn't executed; 'date' fails first

ls /bad | wc                # 'ls' failure is fatal

diff <(sort A) <(sort B)    # 'sort' failure is fatal

On the other hand, you won't experience this problem caused by pipefail:

yes | head                 # doesn't fail due to SIGPIPE

The details are explained below.

`try` Handles Command and Expression Errors

You may want to handle failure instead of aborting the shell. In this case, use the try builtin and inspect the _status variable it sets.

try {                 # try takes a block of commands
  ls /etc
  ls /BAD             # it stops at the first failure
  ls /lib
}                     # After try, $? is always 0
if (_status !== 0) {  # Now check _status
  echo 'failed'
}

Note that:

The _status variable is different than $?.
- The leading _ is a PHP-like convention for special variables / "registers" in Oil.
Idiomatic Oil programs don't look at $?.

You can omit { } when invoking a single command. Here's how to invoke a function without the if myfunc Pitfall:

try myfunc            # Unlike 'myfunc', doesn't abort on error
if (_status !== 0) {
  echo 'failed'
}

You also have fine-grained control over every process in a pipeline:

try {
  ls /bad | wc
}
write -- @_pipeline_status  # every exit status

And each process substitution:

try {
  diff <(sort left.txt) <(sort right.txt)
}
write -- @_process_sub_status  # every exit status

See Oil vs. Shell Idioms > Error Handling for more examples.

Certain expressions produce fatal errors, like:

var x = 42 / 0  # divide by zero will abort shell

The try builtin also handles them:

try {
   var x = 42 / 0
}
if (_status !== 0) {
  echo 'divide by zero'
}

More examples:

Index out of bounds a[i]
Nonexistent key d->foo or d['foo'].

Such expression evaluation errors result in status 3, which is an arbitrary non-zero status that's not used by other shells. Status 2 is generally for syntax errors and status 1 is for most runtime failures.

`boolstatus` Enforces 0 or 1 Status

The boolstatus builtin addresses the Error or False Pitfall:

if boolstatus grep 'class' *.py {  # may abort the program
  echo 'found'      # status 0 means 'found'
} else {
  echo 'not found'  # status 1 means 'not found'
}

Rather than confusing error with false, boolstatus will abort the program if grep doesn't return 0 or 1.

You can think of this as a shortcut for

try grep 'class' *.py
case $_status {
  (0) echo 'found'
      ;;
  (1) echo 'not found'
      ;;
  (*) echo 'fatal'
      exit $_status
      ;;
}

FAQ on Language Design

Why is there try but no catch?

First, it offers more flexibility:

The handler usually inspects _status, but it may also inspect _pipeline_status or _process_sub_status.
The handler may use case instead of if, e.g. to distinguish true / false / error.

Second, it makes the language smaller:

try / catch would require specially parsed keywords. But our try is a shell builtin that takes a block, like cd or shopt.
The builtin also lets us write either try ls or try { ls }, which is hard with a keyword.

Another way to remember this is that there are three parts to handling an error, each of which has independent choices:

Does try take a simple command or a block? For example, try ls versus try { ls; var x = 42 / n }
Which status do you want to inspect?
Inspect it with if or case? As mentioned, boolstatus is a special case of try / case.

Why is _status different from $?

This avoids special cases in the interpreter for try, which is again a builtin that takes a block.

The exit status of try is always 0. If it returned a non-zero status, the errexit rule would trigger, and you wouldn't be able to handle the error!

Generally, errors occur inside blocks, not outside.

Again, idiomatic Oil scripts never look at $?, which is only used to trigger shell's errexit rule. Instead they invoke try and inspect _status when they want to handle errors.

Why boolstatus? Can't you just change what if means in Oil?

I've learned the hard way that when there's a shell semantics change, there must be a syntax change. In general, you should be able to read code on its own, without context.

Readers shouldn't have to constantly look up whether oil:upgrade is on. There are some cases where this is necessary, but it should be minimized.

Also, both if foo and if boolstatus foo are useful in idiomatic Oil code.

Most users can skip to the summary. You don't need to know all the details to use Oil.

Reference: Global Options

Under the hood, we implement the errexit option from POSIX, bash options like pipefail and inherit_errexit, and add more options of our own. They're all hidden behind option groups like strict:all and oil:upgrade.

The following sections explain Oil's new options.

`command_sub_errexit` Adds More Errors

In all Bourne shells, the status of command subs is lost, so errors are ignored (details in the appendix). For example:

echo $(date X) $(date Y)  # 2 failures, both ignored
echo                      # program continues

The command_sub_errexit option makes both date invocations an an error. The status $? of the parent echo command will be 1, so if errexit is on, the shell will abort.

(Other shells should implement command_sub_errexit!)

`process_sub_fail` Is Analogous to `pipefail`

Similarly, in this example, sort will fail if the file doesn't exist.

diff <(sort left.txt) <(sort right.txt)  # any failures are ignored

But there's no way to see this error in bash. Oil adds process_sub_fail, which folds the failure into $? so errexit can do its job.

You can also inspect the special _process_sub_status array variable to implement custom error logic.

`strict_errexit` Flags Two Problems

Like other strict_* options, Oil's strict_errexit improves your shell programs, even if you run them under another shell like bash! It's like a linter at runtime, so it can catch things that ShellCheck can't.

strict_errexit disallows code that exhibits these problems:

The if myfunc` Pitfall
The local x=$(false) Pitfall

See the appendix for examples of each.

Rules to Prevent the `if myfunc` Pitfall

In any conditional context, strict_errexit disallows:

All commands except ((, [[, and some simple commands (e.g. echo foo).
- Detail: ! ls is considered a pipeline in the shell grammar. We have to allow it, while disallowing ls | grep foo.
Function/proc invocations (which are a special case of simple commands.)
Command sub and process sub (shopt --unset allow_csub_psub)

This means that you should check the exit status of functions and pipeline differently. See Does a Function Succeed?, Does a Pipeline Succeed?, and other Oil vs. Shell Idioms.

Rule to Prevent the `local x=$(false)` Pitfall

Command Subs and process subs are disallowed in assignment builtins: local, declare aka typeset, readonly, and export.

No:

local x=$(false)

Yes:

var x = $(false)   # Oil style

local x            # Shell style
x=$(false)

`sigpipe_status_ok` Ignores an Issue With `pipefail`

When you turn on pipefail, you may inadvertently run into this behavior:

yes | head
# => y
# ...

echo ${PIPESTATUS[@]}
# => 141 0

That is, head closes the pipe after 10 lines, causing the yes command to fail with SIGPIPE status 141.

This error shouldn't be fatal, so OSH has a sigpipe_status_ok option, which is on by default in Oil.

`verbose_errexit`

When verbose_errexit is on, the shell prints errors to stderr when the errexit rule is triggered.

FAQ on Options

Why is there no _command_sub_status? And why is command_sub_errexit named differently than process_sub_fail and pipefail?

Command subs are executed serially, while process subs and pipeline parts run in parallel.

So a command sub can "abort" its parent command, setting $? immediately. The parallel constructs must wait until all parts are done and save statuses in an array. Afterward, they determine $? based on the value of pipefail and process_sub_fail.

Why are strict_errexit and command_sub_errexit different options?

Because shopt --set strict:all can be used to improve scripts that are run under other shells like bash. It's like a runtime linter that disallows dangerous constructs.

On the other hand, if you write code with command_sub_errexit on, it's impossible to get the same failures under bash. So command_sub_errexit is not a strict_* option, and it's meant for code that runs only under Oil.

What's the difference between bash's inherit_errexit and Oil's command_sub_errexit? Don't they both relate to command subs?

inherit_errexit enables failure in the child process running the command sub.
command_sub_errexit enables failure in the parent process, after the command sub has finished.

Summary

Oil uses three mechanisms to fix error handling once and for all.

It has two new builtins that relate to errors:

try lets you explicitly handle errors when errexit is on.
boolstatus enforces a true/false meaning. (This builtin is less common).

It has three special variables:

The _status integer, which is set by try.
- Remember that it's distinct from $?, and that idiomatic Oil programs don't use $?.
The _pipeline_status array (another name for bash's PIPESTATUS)
The _process_sub_status array for process substitutions.

Finally, it supports all of these global options:

From POSIX shell:
- errexit
From bash:
- pipefail
- inherit_errexit aborts the child process of a command sub.
New:
- command_sub_errexit aborts the parent process immediately after a failed command sub.
- process_sub_fail is analogous to pipefail.
- strict_errexit flags two common problems.
- sigpipe_status_ok ignores a spurious "broken pipe" failure.
- verbose_errexit controls whether error messages are printed.

When using bin/osh, set all options at once with shopt --set oil:upgrade strict:all. Or use bin/oil, where they're set by default.

Related Docs

Oil vs. Shell Idioms shows more examples of try and boolstatus.
Shell Idioms has a section on fixing strict_errexit problems in Bourne shell.

Good articles on errexit:

Bash FAQ: Why doesn't set -e do what I expected?
Bash: Error Handling from fvue.nl

Spec Test Suites:

These docs aren't about error handling, but they're also painstaking backward-compatible overhauls of shell!

For reference, this work on error handling was described in Four Features That Justify a New Unix Shell (October 2020). Since then, we changed try and _status to be more powerful and general.

Appendices

List Of Pitfalls

We mentioned some of these pitfalls:

The if myfunc Pitfall, caused by the Disabled errexit Quirk (strict_errexit)
The local x=$(false) Pitfall (strict_errexit)
The Error or False Pitfall (boolstatus, try / case)
- Special case: When the child process is another instance of the shell, the Meta Pitfall is possible.
The Process Sub Pitfall (process_sub_fail and _process_sub_status)
The yes | head Pitfall (sigpipe_status_ok)

There are two pitfalls related to command subs:

The echo $(false) Pitfall (command_sub_errexit)
Bash's inherit_errexit pitfall.
- As mentioned, this bash 4.4 option fixed a bug in earlier versions of bash. Oil reimplements it and turns it on by default.

Here are two more pitfalls that don't require changes to Oil:

The Trailing && Pitfall
- When test -d /bin && echo found is at the end of a function, the exit code is surprising.
- Solution: always use if rather than &&.
- More reasons: the if is easier to read, and && isn't useful when errexit is on.
The surprising return value of (( i++ )), let, expr, etc.
- Solution: Use i=$((i + 1)), which is valid POSIX shell.
- In Oil, use setvar i += 1.

Example of `inherit_errexit` Pitfall

In bash, errexit is disabled in command sub child processes:

set -e
shopt -s inherit_errexit  # needed to avoid 'touch two'
echo $(touch one; false; touch two)

Without the option, it will touch both files, even though there is a failure false after the first.

Bash has a grammatical quirk with `set -o failglob`

This isn't a pitfall, but a quirk that also relates to errors and shell's grammar. Recall that the definition of $? is tied to the grammar.

Consider this program:

set -o failglob
echo *.ZZ        # no files match
echo status=$?   # show failure
# => status=1

This is the same program with a newline replaced by a semicolon:

set -o failglob

# Surprisingly, bash doesn't execute what's after ; 
echo *.ZZ; echo status=$?
# => (no output)

But it behaves differently. This is because newlines and semicolons are handled in different productions of the grammar, and produce distinct syntax trees.

(A related quirk is that this same difference can affect the number of processes that shells start!)

Disabled `errexit` Quirk / `if myfunc` Pitfall

This quirk is a bad interaction between the if statement, shell functions, and errexit. It's a mistake in the design of the shell language. Example:

set -o errexit     # don't ignore errors

myfunc() {
  ls /bad          # fails with status 1
  echo 'should not get here'
}

myfunc  # Good: script aborts before echo
# => ls: '/bad': no such file or directory

if myfunc; then  # Surprise!  It behaves differently in a condition.
  echo OK
fi
# => ls: '/bad': no such file or directory
# => should not get here

We see "should not get here" because the shell silently disables errexit while executing the condition of if. This relates to the fundamental problems above:

Does the function use the failure paradigm or the boolean paradigm?
if tests a single exit status, but every command in a function has an exit status. Which one should we consider?

This quirk occurs in all conditional contexts:

The condition of the if, while, and until constructs
A command/pipeline prefixed by ! (negation)
Every clause in || and && except the last.

The Meta Pitfall

I encountered the Error or False Pitfall while trying to disallow other error handling pitfalls! The meta pitfall arises from a combination of the issues discussed:

The if statement tests for zero or non-zero status.
The condition of an if may start child processes. For example, in if myfunc | grep foo, the myfunc invocation must be run in a subshell.
You may want an external process to use the boolean paradigm, and that includes the shell itself. When any of the strict_ options encounters bad code, it aborts the shell with error status 1, not boolean false 1.

The result of this fundamental issue is that strict_errexit is quite strict. On the other hand, the resulting style is straightforward and explicit. Earlier attempts allowed code that is too subtle.

Quirky Behavior of `$?`

This is a different way of summarizing the information above.

Simple commands have an obvious behavior:

echo hi           # $? is 0
false             # $? is 1

But the parent process loses errors from failed command subs:

echo $(false)     # $? is 0
                  # Oil makes it fail with command_sub_errexit

Surprisingly, bare assignments take on the value of any command subs:

x=$(false)        # $? is 1 -- we did NOT lose the exit code

But assignment builtins have the problem again:

local x=$(false)  # $? is 0 -- exit code is clobbered
                  # disallowed by Oil's strict_errexit

So shell is confusing and inconsistent, but Oil fixes all these problems. You never lose the exit code of false.

Acknowledgments

Thank you to ca2013 for extensive review and proofreading of this doc.

Generated on Sun Sep 4 20:54:38 EDT 2022

Oil Fixes Shell's Error Handling (errexit)

Review of Shell Error Handling Mechanisms

POSIX Shell

Bash

Fundamental Problems

When Is $? Set?

What Does $? Mean?

The Meaning of if

Design Mistake: The Disabled errexit Quirk

Oil Error Handling: The Big Picture

Oil Fails On Every Error

try Handles Command and Expression Errors

boolstatus Enforces 0 or 1 Status