Four Features That Justify a New Unix Shell

2020-10-22

On a lobste.rs thread about the rationale for the Fennel language, I posted this summary of why Oil exists:

I think these features alone would justify a new shell:

Getting rid of "quoting hell"

Getting rid of ad hoc parsing and splitting

Fixing errexit

But Oil has a lot more than that, including unifying separate ad hoc expression languages ...

This post elaborates on these points. I've condensed the rationale into four critical features for the OSH language.

I give examples of each feature, link to docs (in progress), and comment on the future of the project.

Table of Contents

The OSH Language

Reliable Error Handling

Safe Processing of User-Supplied Data (like filenames)

Eliminate Quoting Hell (the !qefs problem)

Static Parsing Enables Better Error Messages and Tools

What's Next For the Project?

A Very Important Claim

Reviewing The Biggest Cut (January 2020)

Oil Language Updates

Summary

Dev Tools Improvements

Credits

Appendix: More OSH Features

The OSH Language

Recall that OSH is designed to run existing shell scripts, and has done that since early 2018.

It also fixes warts in the shell language with opt-in features. These are the four most important ones.

Reliable Error Handling

I just finished an overhaul of shell's flaky set -e / errexit mechanism. I'm excited by this, because I started it last year, but put it on the back burner after being stumped!

I believe I've figured out every problem now, and would like your feedback. The simple invariant is that OSH never loses an exit code, which is not true of POSIX shell or bash. Here's a summary of the enhancements:

strict_errexit - A shell option to detect cases where you would lose errors in shell, like if myfunc. This improves your shell scripts, even if you run them under another shell! In other words, OSH can be used as a dev tool.
inherit_errexit - OSH implements this bash 4.4 option, which is a partial fix for the "command sub errexit" problem.
command_sub_errexit - A shell option to check for failure at the end of every command sub, so you don't lose errors.
- This fixes the problem shown in the last panel of a recent comic by Julia Evans: "bash is weird sometimes"
process_sub_fail - Like pipefail, but for process substitutions. It allows errexit to "see" the failure caused by process subs, like the sort invocation in cat foo.txt <(sort /oops/error),
- @_process_sub_status: A variable that's analogous to ${PIPESTATUS[@]}. You may want to inspect the exit status of all processes.
The run builtin turns errexit back on, so if run myfunc is safe. It also provides fine-grained control over exit codes.

Yes, there are many solutions, because shell has many problems! But you don't have to remember all these names. Add shopt --set oil:basic to the top of your program to turn all options. The strict_errexit failures will remind you to use the run wrapper.

(Aside: I was able to fix all these problems cleanly in the interpreter. I spent time a lot of time on Oil's architecture 4 years ago precisely so I could fix such subtle problems. When the code has a good structure, the "right place" for a fix reveals itself to you. Oil is still improving!)

Safe Processing of User-Supplied Data (like filenames)

QSN is the foundation for Structured Data in Oil. It removes the need to invent ad hoc (and often broken) formats every time you need to deal with user-supplied data in shell. In other words, Oil scripts have an alternative to messy parsing and splitting.

I just implemented a QSN decoder, after implementing an encoder earlier this year.

Here are some short examples. The write builtin prints its args to stdout, and it accepts a --qsn flag:

# Print filenames ONE PER LINE.  If a name contains a
# newline or other special char, it's QSN-encoded like
# 'multi-line \n name with NUL \0 byte'

write --qsn -- *.txt

The read builtin provides the inverse:

cat list.txt | while read --line --qsn {
  # _line is implicitly set by 'read'
  rm -- $_line
}

I also implemented read -0 as a synonym for bash's obscure read -r -d ''. This allows you to consume find -print0 output in shell, like xargs -0 does. This format is distinct from QSN, but it's now easy to convert back and forth between them.

This is the first cut of QSN support. I expect it to evolve based on your feedback!

Related: Posts tagged #escaping-quoting, particularly Git Log in HTML (2017).

Eliminate Quoting Hell (the `!qefs` problem)

This was done in summer 2019. I described it in Simple Word Evaluation earlier this year, and you can see examples in Oil Language Idioms.

Briefly, Oil allows this:

ls @myflags $filename

instead of

ls "${myflags[@]}" "$filename"

Notice the @ splice operator, and lack of quotes.

Static Parsing Enables Better Error Messages and Tools

This blog began in 2016 with an explanation of static parsing. I didn't mention it in the comment quoted in the intro, but it's still a crucial part of the project.

I was reminded how important this is when noticing that the authors of both Perl 5 and the rc shell made complaints about shell's dynamic parsing, going back 20-30 years!

This foundation is still paying dividends. I recently used the static parser to create detailed error messages for command subs:

$ shopt --set errexit command_sub_errexit

$ d=$(date %x)
date: invalid date ‘%x’
  d=$(date %x)
    ^~
[ interactive ]:13: fatal: Command sub exited with status 1 ...

and process subs:

$ shopt --set process_sub_fail

$ cat /dev/null <(sort oops)
sort: cannot read: oops: No such file or directory
  cat /dev/null <(sort oops)
                ^~
[ interactive ]:27: fatal: Exiting with status 2 ...

We point to the location of the failing construct. No other shell does this!

In addition, Travis Everett has worked on a shell dependency bundler which relies on static parsing.

Related: The new Syntactic Concepts doc lists static parsing as one of 5 important concepts.

What's Next For the Project?

It was indeed useful to explicitly write out rationale for the language. I've done that many times with posts tagged #why-a-new-shell, but explaining it again helps, even after 4 years. The project is evolving and getting crisper.

A Very Important Claim

With the overhaul of errexit and the QSN decoder, I believe we now have all the bases for the OSH language covered! These features will be out with the next release.

The claim is that these four features alone justify a new Unix shell. If we finish the C++ translation, and end the project here, it would be worthwhile.

To repeat, they are:

Reliable error handling. I can't recommend shell to my friends without these fixes.
Safe processing of user-supplied data, i.e. an alternative to ad hoc parsing and splitting.
Elimination of "quoting hell". Let's fix it once and for all, rather than admonishing every new shell programmer about it for the next 30 years, as has been done for the past 30!
Static Parsing for better error messages and tools. It also removes a security issue.

If you disagree, let me know! I would like to hear what other warts in the shell language need to be fixed or otherwise addressed.

(I'm leaving out the interactive shell here, as I believe the first priority is a better shell for programming and automation. A "cloud shell", if you will.)

Reviewing The Biggest Cut (January 2020)

Back in January, I was already concerned about the scope of the project. I wrote that the biggest cut to the project would be that Oil would be based on strings, rather than Python-like data types.

Let me update that statement based on these crisp definitions:

The OSH language is a compatible shell based on strings (and arrays of strings). Assignments look like local x=mystr.
The Oil language has Python-like types and expressions. Assignments look like var x = 42 + a[i] + f(x, y). It has a garbage-collected heap of recursive data structures.

So what I'm saying now is that the priority going forward is to polish the OSH language, and put off the Oil language until the hazy future.

That means finishing the translation to C++, hooking up the garbage collector, and writing documentation. It may mean preparing the code to be embedded in another application, like the fish shell. (I've discussed this with the maintainer, and there's some interest. But it's a lot of work, which shouldn't be taken for granted, and there are unsolved problems.)

Achieving this OSH language milestone feels very doable, since everything already works in Python, and something like 915 out of 1685 spec tests pass in C++ (yielding a 30x - 50x speedup).

Oil Language Updates

But I'm not giving up on the Oil language! I just need help. It exists in prototype form, and your feedback will motivate me to work on it.

Here are some blog posts I want to write, to get the word out:

Four Features of the Oil Language. This post narrowed down OSH to four major features, and Oil also has four:

Python-like expressions, along with eggexes
Ruby-like blocks, which enable DSLs and declarative configuration
procs (shell functions with signatures, which compose in unique ways)
Serialization formats like JSON and QTSV (proposal). The latter is a format for typed tables, built on top of QSN.

We have working prototypes for every feature except QTSV. You can try them now!

Big Changes to the Oil Language. A list of recent changes I've made, which should give potential contributors a feel for the language.

What Distinguishes Python, JS, and Ruby from Perl and PHP. The former languages have a clean data model / memory model: a garbage collected heap with reference semantics.

The latter languages have warts in their model. Oil adds the clean model to shell.

Comments on Comics. I can use these recent comics as a way to explain the OSH language. (See other posts tagged #comic.)

Summary

I described 4 essential features of an improved shell language. Let me know what you think is missing.

If you haven't read it already, see Why Create a Unix Shell?. It's the most popular page on this site, though I still need to update it for 2021.

I then proposed a focus on making the OSH language "production ready". I'm still going to work on the Oil language, but I need help finishing it.

Dev Tools Improvements

Speaking of which, several people have pointed out that the dev process for Oil is difficult. I've addressed this recently by removing spew from the build logs, and adding a lint check for more portable shebangs with /usr/bin/env. (This just triggered and prevented a regression!)

I still would like to make a screencast to show how easy Oil is to work on. After cloning, a 10 or 20 second build process should get you a working bin/osh:

~/git/oilshell/oil$ bin/osh -c 'echo hi'
hi

This is a pure Python program, which is very nice for prototyping!

So I'm trying to make Oil more friendly to work on. Reach out if you want to help, and if you run into problems.

Credits

Thanks to Till Schröder for great feedback on the Oil language. For example, he noticed that I had awkwardly named the catch builtin (now run).
Thanks to Diego Calleja for many updates the Oil docs. This is a huge effort by itself!

Appendix: More OSH Features

Those 4 features aren't the only ones in OSH, but I claim they are sufficient!

I fixed other problems with shell, and described them with posts tagged #real-problems. I've applied that tag to this post, since command_sub_errexit fixes the problem that Julia Evans was perplexed by.

I believe Oil will be faster than bash along 3 dimensions: parsing speed, runtime compute speed, and runtime I/O speed. However this may require optimization after finishing the garbage collector, and we're short on hands to do that.

I'd like Oil to have better dev tools: tracing, debugging, and crash dumps. I prototyped a crash dump a couple years ago, but I didn't receive much feedback on it. I think OSH has to be more mature before that's compelling.

Subinterpreters (issue 704) have multiple use cases. There have been many failed attempts to add this feature to CPython, whereas embedded languages like Tcl and Lua make good use of them.

I even think there are good use cases for embedding WebAssembly in Oil, e.g. as mentioned in the July post on regular languages. Another use case is to package and distribute portable dev dependencies, like the CommonMark renderer. This problem has contributed to the dev process friction mentioned above.

Four Features That Justify a New Unix Shell

The OSH Language

Reliable Error Handling

Safe Processing of User-Supplied Data (like filenames)

Eliminate Quoting Hell (the !qefs problem)

Static Parsing Enables Better Error Messages and Tools

What's Next For the Project?

A Very Important Claim

Reviewing The Biggest Cut (January 2020)

Oil Language Updates

Summary

Dev Tools Improvements

Credits

Appendix: More OSH Features

Eliminate Quoting Hell (the `!qefs` problem)