Oils 0.15.0 - Big Contributions and More Concessions

2023-05-13

This is the latest version of Oils! We're polishing and optimizing OSH, and translating YSH to fast, native code.

(Since the last release, we started slightly renaming the project.)

Oils version 0.15.0 - Source tarballs and documentation.

To build and run it, follow the instructions in INSTALL.txt. The wiki has tips on How To Test OSH.

If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ. I recently wrote the Oils 2023 FAQ.

Contributions

I'm happy to say that I got a lot of help with this release. The project has grown large, and wouldn't be possible without help. Thanks again to NLnet for funding us.

In roughly chronological order:

Melvin Walls implemented job control.
- This is the shell feature that lets you cancel processes and pipelines with Ctrl-C, and suspend them with Ctrl-Z.
- It requires obscure syscalls like setpgid() and tcsetpgrp(), which Advanced Programming in the Unix Environment (APUE) helped us with.
Chris Watkins implemented a pool allocator that's integrated with the garbage collector, and improved our dev tools.
- We now manage allocations 32 bytes and smaller ourselves. This both sped up the interpreter and reduced memory usage. Some measurements in the appendix.
- I mentioned the pool allocator in the 2023 roadmap, which has related ideas. More optimizations to come.
CoffeeTableEspresso helped with the "big parser refactoring", which includes removing the "span ID" concept from the codebase.
Aidan Olsen did a large part of the parser refactoring, implemented history -a and -r, and improved our dev tools.
Samuel Hierholzer implemented an oshrc.d directory, and the --norc flag.
- These Debian-style directories allow installers to customize your environment without stomping on your oshrc, which we view as a harmful pattern in software distribution.
- Note: the next release will move it to ~/.config/oils/oshrc.d because we still have to finish the Big Renaming.
Melvin Walls added a C++ backend to the pgen2 parser generator, which we borrowed from CPython. This means that we now have spec test runs for YSH (formerly Oil)! See the appendix.

The parser refactoring has 3 or 4 parts, and several motivations, which are all on #oil-dev on Zulip. But one quick way to understand it is to note that finishing the garbage collector "unlocked" many design decisions. We now know how our data structures can be laid out, and how they perform!

(I''d like to write Zephyr ASDL After 6 Years and The Lossless Syntax Tree After 6 Years.)

More OSH Concessions

The March release of Oils 0.14.2 said that polishing OSH includes a process of "conceding to reality".

That is, we're making OSH even more compatible, and making strict behavior opt-in.

A recent report by Simon Michael perfectly demonstrates both the costs and the benefits of strict behavior. OSH complained about this line in his script:

  if [[ $HELP = 1 || ${#ARGS} -eq 0 ]]; then usage; exit; fi
                     ^~
inv:69: fatal: Array 'ARGS' can't be referred to as a scalar (without @ or *)

This is actually a bug that OSH flagged! In bash,

${ARGS} is bizarrely equivalent to ${ARGS[0]}. (Arrays were hacked onto bash late in its life, and I think adopting this semantic was easier to implement)
Likewise, ${#ARGS} is equivalent to ${#ARGS[0]}. It's the length of the first string in the array, not the length of the array.

If you want to get the length of an array, you have to remember that it's ${#ARGS[@]}.

I Wrote the Same Bug, and Chris Hit It

Even though I've burned the awkward "${array[@]}" into my fingers, and wrote the post Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays, I just wrote the same bug.

That is, I used "${PY3_BUILD_DEPS}" instead of "${PY3_BUILD_DEPS[@]}" in one of our own bash scripts. It caused a build problem for Chris, which he fixed it in PR 1576.

I swear this wasn't a "setup". (See other posts tagged #real-problems).

Use `shopt --set strict:all`

Even though this error was useful, I moved it under shopt --set strict_array, so OSH behaves like bash by default. Why?

We have many strict_* options, so this is a more consistent design. (Show all strict options with shopt -p | grep strict)
These errors shouldn't turn up at the "wrong" time. When you try OSH, you probably just want your script to run at first, not fix a bunch of old code.

Later, you can opt in with the strict:all option group, which includes strict_array. Or use this snippet to run with both bash and OSH:

shopt --set strict:all 2>/dev/null || true

Arch Linux Concession?

Samuel tested Arch Linux's makepkg.sh with OSH, and it appears to mix strings and arrays under the same variable name. OSH doesn't like this right now, but we can also move this behavior to shopt --set strict_array.

  pkgbase=${pkgbase:-${pkgname[0]}}
                       ^~~~~~~
'/usr/sbin/makepkg':1232: fatal: Can't index string 'pkgname' with integer

We can use still use help with changes like this, which are admittedly hairy and obscure. I posted an ad on lobste.rs for a Python/Shell Language engineer last month, and got great responses. Aidan started immediately, and already did a bunch of great work.

I noted in the ad that you don't necessarily need to know C++, although in retrospect, knowing C++ seems to be helpful.

Whittling Down C++ Spec Tests

Good news: our C++ spec test delta is down to 10! That is, out of ~1800 tests, the translated C++ shell passes 10 fewer tests than the Python shell.

And we've now accounted for all the differences, but not fixed them. One tricky bug relates to the way we translate Python context managers to C++ constructors and destructors.

You can throw an exception from __exit__, but you can't throw from a destructor. So we have cases where the C++ runtime aborts the process, instead of throwing and catching an exception. I think there's a pretty simple solution with "out params".

If that last paragraph made sense to you, you should help us with the code! You can even be paid to work on Oils.

C++ Portability

I fixed build bugs reported by users on these platforms:

Alpine Linux (which uses musl libc)
OS X
OpenBSD

Please try the new oils-for-unix-0.15.0.tar.gz tarball and let me know what happens!

Closed Issues

Here's an auto-generated list of 20 issues fixed in this release. I don't have time to write about everything, and this isn't a complete list. But the point is that OSH is getting a lot better :-)

#1557	Update known differences doc
#1555	Parallelize end user C++ build
#1552	`time` builtin `user`/`sys` time always zero
#1551	Relax ${array} check by removing shopt -s compat_array and putting it in strict_array
#1550	Add ysh symlink to Python tarball
#1549	Fix bad $? of -1 on Ctrl-Z
#1548	Add oshrc.d and yshrc.d directories; discourage mutating bashrc pattern
#1547	Implement ERR hook (run code when errexit happens)
#1546	Implement bash-compatible DEBUG hook
#1536	Implement history -r
#1525	oils-for-unix 0.14.2 build fails on ArchLinux (`typedef` should have been `decltype`?)
#1523	getopts behaves incorrectly with multiple -abc -def args
#1522	oils-for-unix build failures on OpenBSD due to stdin macro conflict
#1468	`cpp/core.h:116:29: error: unknown type name 'sighandler_t'` while building oils-for-unix on macOS
#1378	Traps (hooks and signal handlers) should be cleared upon fork()
#1375	Make spec/stateful run against oil-native
#916	Ctrl-C shouldn't cancel background jobs (setpgid not called, e.g. on pipelines)
#594	Generate parse tables for pgen2-native and hook it up to oil-native
#562	Implement history -a
#360	Implement the rest of job control

Updated Docs

The docs really need an overhaul, which is coming, but I've kept them up to date for this release.

For example, I updated Known Differences Between OSH and Other Shells with notes about job control.

In particular, OSH runs the last part of a pipeline in the shell process where possible, like zsh does. This is shopt -s lastpipe in bash:

echo hi | read x  # is read run in the shell, or in another process?
echo x=$x         # does $x contain 'hi', or is it empty?

But this conflicts with job control, so such pipelines can't be suspended, which is also true in zsh. In contrast, bash simply ignores shopt -s lastpipe in interactive shells. We chose the zsh behavior because you should be able to test OSH and YSH interactively, with confidence.

I also updated these wiki pages to give contributors a sense of the project:

If those pages doesn't scare you off, you're a good person to work on Oils! And you can even be paid.

Under the Hood / Dev Friction

Soil CI

I've mentioned our CI under the tags #soil and #toil (the first name I used). It's a big "distributed shell script", and I'd like to write in detail about it. But here are some quick updates:

(1) It has a more friendly UI:

(2) It uses 7 or 8 Docker/OCI containers, which build slowly. I started packaging our dev dependencies as "wedges", which are intended to compose with and compose better than OCI layers. This work sped up the build, but it unfortunately caused bugs in the setup process, which Aidan hit.

(3) Soil continues to grow test and benchmark suites which help us design the shell, translate it to C++, and optimize the mycpp runtime.

benchmarks2/uftrace uses uftrace to count allocations and sizes.
benchmarks2/gc-cachegrind uses Cachegrind to separately measure the time taken by the shell itself, the allocator, free(), GC rooting, and marking and sweeping. All of these things are expensive!
interactive/process-table tests the PGID and controlling terminal of child processes across shells.

Soil does the equivalent of a release on every commit, including making tarballs. In the future, it should literally be how we make releases.

Ninja-Based Build

The garbage collector also needs support from the build system. We now have a three-level structure for our build variants:

(compiler binary, compiler config, optional app config)

with the syntax ninja _bin/$CC-$CONFIG+$APP/osh.

Examples:

_bin/cxx-opt+bumproot/osh helped us measure the speedup of the pool allocator.
- Chris noticed that rooting was missing in our perf profiles, because it's done done in every function! It's a cost that's "all over", in both time and space.
_bin/cxx-asan+gcalways/osh is a binary that stress tests the GC.
_bin/clang-coverage/osh uses Clang's code coverage to give us a nice report.

Having all these tools makes it easier to contribute! When reviewing a PR, I look at the tests and benchmarks first. Everything else is generally easy.

Known, Possibly Hard Bugs

Despite all this progress, we're not done. There are more issues after we fix the 10 spec tests. I've been keeping track of hard bugs on Zulip:

#oil-dev > Known, Possibly Hard Bugs

(These could be on Github, but I like Zulip for summarizing, linking, and commenting.)

The most important issue is getting say CPython's configure to run as quickly as bash and dash, and correctly too. There are some unexplained differences in the logs. This is hard because autotools-generated code is hard to read and debug.
Another problem is that I've had to play "Whac-A-Mole" with both non-deterministic bugs and flaky tests.

A shell is a stateful process that's concurrent with the stateful kernel, and job control exposed that even further. It's inherently hard to test.

I think of the shell as an event loop which receives input from signals and waitpid(-1), which I wrote about in January 2022: The Shell Runtime As a State Machine.

So it would be nice to test it as an explicit state machine, including the error paths. I'm not sure how far we'll go down this path right now, but I'd like to the project to continue raising the bar on software quality.

For example, we use the exhaustive reasoning of regular languages and algebraic data types via ASDL. Explicit state machines are in the same vein.

Our Tests Routinely Find Bugs in Other Shells

Even though some of our tests are flaky, they find bugs not just in OSH, but in other shells!

For example, I finally understand the symptom in issue 330 from 2019. Our spec tests would randomly stop like this, especially when run in parallel:

test/spec-runner.sh run-cases prompt 
test/spec-runner.sh run-cases quote 

[1]+  Stopped                 test/spec.sh all

And sometimes the parent shell would even disappear, closing the terminal! For years, this happened rarely, but our initial job control implementation made it happen 100% of the time.

Now I understand that:

A job control shell makes syscalls like setpgid() and tcsetpgrp() to tell the kernel which processes are in the foreground, and which are in the background. This determines which processes receive signals when you hit Ctrl-C and Ctrl-Z.
If a background process tries to read or write, the kernel sends it SIGTTIN / SIGTTOU. The default action for these signals is to stop the process.

So if job control isn't done exactly right, processes can stop seemingly at random, especially if many are run in parallel.

It was pretty easy to isolate a couple bugs in OSH. One bug was that we didn't always give the terminal back to the parent's PGID before exiting the shell, e.g. when your oshrc calls exit (admittedly rare, but tested!).

After fixing it, the confusing stoppage no longer happens with OSH. But it still happens with bash, or at least the old version we're testing with.

Help Us Test Concurrent Programs

But we still have a different, mysterious problem: sometimes the interactive suite hangs in the CI forever. It goes on for 30 minutes or more.

I worked around it by making the test suite run serially, not in parallel.

But I'd like to hear from people interested in testing concurrent systems! I don't want to play "Whac-A-Mole" anymore. Help us figure out ways to exhaustively test a shell. Some ideas in the comments here:

For example, we could run the shell with User Mode Linux, which is a real kernel in user space. But what assertions would we make?

What's Next?

In summary, we:

Fixed many bugs based on user feedback
Added features like job control and history
Optimized the interpreter
Translated more of the shell to native code.
Improved the dev tools

So what's next?

The Big Renaming and Docs

The Big Renaming requires many mechanical changes, which I mentioned on the 2023 Roadmap.

I'd like to do all the breaking changes in one release, rather than spreading them out. For example, renaming ~/.config/oil → ~/.config/oils, and internal features like OIL_GC_ON_EXIT=1 → OILS_GC_ON_EXIT=1.

I want to re-organize the docs and rewrite the help builtin, which will pave the way for smooth user contributions.

YSH and Performance

I've been keeping track of YSH design issues and performance ideas on Zulip. I moved important threads to these new streams:

#language-design includes some topics that I didn't mention on the Oils 2023 Roadmap:
- Semantics of creating booleans from strings (fixed by Melvin)
- Catching typos statically (the biggest and I think most impactful design issue)
- Syntax changes like %symbol → :symbol, which is more conventional
- An eggex usability issue
#performance has threads related to:
- Optimizing GC rooting. It's not just marking and sweeping that's expensive, but maintaining the root set too!
- Optimizing our containers: small strings, lists, and dicts should all have special cases. We should probably intern large strings.

I also had an idea for a new "Squeeze and Freeze" primitive to reduce both GC pressure and memory usage. It has the benefits of an arena, and but it's integrated with the GC, and thus memory safe.

Again, the GC has "unlocked" many design decisions, so we can start thinking of fun stuff like this. But we should do basic optimizations first.

GUIs and the Headless Shell

I also want to advertise our #shell-gui Zulip channel, which has had more activity lately. Subhav Ramachandran and I started this work back in 2021, but it's been dormant for 2 years, since both of us had other things to do.

Remember that the project's scope was too big, and I cut out the entire interactive shell. But now we have help, so it's reasonable to think about this again.

The basic idea is the same as these wonderful Arcan FE demos:

But a crucial difference is that it's compatible with ls --color and a million other tools. We invented the FANOS protocol to solve this problem: File descriptors And Netstrings Over Sockets.

That is, a GUI and Oils can communicate over a Unix domain socket, which includes file descriptors pointing to a terminal.

This idea really needs diagrams. Maybe you can help us on #shell-gui :-)

Thanks again to all contributors! Let me know if I neglected to mention something, including your contribution. And thank you to everyone who reported bugs -- I've been getting great feedback.

Appendix: Metrics for the 0.15.0 Release

These metrics help me keep track of the project. Let's compare this release with version 0.14.2 from March.

Spec Tests

We implemented more features in Python:

OSH spec tests for 0.14.2: 2042 tests, 1814 passing, 89 failing
OSH spec tests for 0.15.0: 2071 tests, 1840 passing, 86 failing

More spec tests passed in C++ because of features implemented in Python, like Aidan's history -a -r. And we're whittling down the remaining translation bugs:

C++ spec tests for 0.14.2 - 1801 of 1817 passing
C++ spec tests for 0.15.0 - 1833 of 1843 passing

There was some work on YSH behavior:

YSH/Oil spec tests for 0.14.2: 506 tests, 466 passing, 40 failing
YSH/Oil spec tests for 0.15.0: 511 tests, 471 passing, 40 failing

More significantly, we have our first run of YSH in C++:

YSH/Oil C++ spec tests for 0.15.0: 181 of 469 passing

Benchmarks

The pool allocator made the parser faster:

Parser Performance for 0.14.2: 26.0 thousand irefs per line
Parser Performance for 0.15.0: 22.1 thousand irefs per line

and it also reduced memory usage (max RSS):

benchmarks/gc for 0.14.2: parse.configure-coreutils 1.95 M objects comprising 69.6 MB, max RSS 91.8 MB
benchmarks/gc for 0.15.0: parse.configure-coreutils 1.97 M objects comprising 73.4 MB, max RSS 81.1 MB

So we're using less memory, but asking for slightly more due to the "big parser refactoring". This is temporary, because the end result of the refactoring will allocate less. It should also shrink Token objects from 40 to 32 bytes, allowing them to fit inside the pool. These are some of the most common objects in the shell.

This large I/O bound benchmark is also slightly faster, though we still have work to do:

Runtime Performance for 0.14.2: 35.2 and 22.5 seconds running CPython's configure
Runtime Performance for 0.15.0: 33.8 and 20.0 seconds running CPython's configure
bash: 26.9 and 14.6 seconds running CPython's configure

I also added stable performance metrics for the GC:

benchmarks/gc-cachegrind for 0.15.0

I created this benchmark after we had trouble reconciling different measurements of the pool allocator.

Code Size

Significant lines:

cloc for 0.14.2: 19,491 lines of Python and C, 363 lines of ASDL
cloc for 0.15.0: 19,854 lines of Python and C, 694 lines of ASDL

I just noticed a bug in the ASDL line counts! I changed comments to be # like Python and shell, and now we're not excluding comment lines.

We have more code in the oils-for-unix C++ tarball:

oil-cpp for 0.14.2 - 90,682 lines
oil-cpp for 0.15.0 - 96,530 lines

And a larger binary:

ovm-build for 0.14.2: 1.23 MB of native code (under GCC)
ovm-build for 0.15.0: 1.42 MB of native code (under GCC)

The increase is due to

Translating more of YSH. I think the grammar tables could be made smaller, but we can optimize once we have more tests passing.
The pool allocator makes Alloc<T> longer, and it's specialized for every type. It may be useful to specialize on sizeof(T), not just T.