Oils 0.16.0 - Breaking Renames and YSH

The big renaming to Oils, OSH, and YSH
- The main change was moving startup files from ~/.config/oil to ~/.config/oils/{oshrc,yshrc}. I updated the Getting Started doc, as well as OSH help.
Changing YSH based on what we've learned.
- For example, mydict->key was changed to mydict.key.

What else changed?

Many bug fixes based on user feedback.
Performance improvements, e.g. due to the "big parser refactoring".

Some Writing on YSH for Everyone

The bugs below list YSH breakages in detail, but these higher level posts will probably more interesting. I started writing one post before this announcement, but it ended up as five!

Reviewing YSH - History, and overview of 7 parts of the language.
Sketches of YSH Features - Concrete descriptions and proposals.

The third post was abstract and hard to write, so I "forked" these two posts:
How to Create a UTF-16 Surrogate Pair by Hand, with Python - Relates to our design for strings.
Narrow Waists Can Be Interior or Exterior: PyObject vs. Unix Files - Useful terminology.

And finally:
Oils Is Exterior-First (Code, Text, and Structured Data) - fundamental ideas behind the language.

This series, tagged #ysh, is roughly our "design roadmap". I'm looking for feedback from contributors, and also casual readers.

Docs Updated

A Feel for YSH Syntax. I kept this up to date with the changes, including the table of "sigil pairs" at the end.
Tour of YSH has also been updated, although I need to change all the "Oil language" references.

Contributions

Thanks to everyone who contributed code, tested Oils, and sent feedback! The project has grown larger, and wouldn't be possible without help.

Melvin Walls
- Rewrote more of the YSH expression evaluator, so it can be translated to C++. I mentioned this essential work in the 2023 Roadmap.
Aidan Olsen
- Implemented the parser for the YSH case statement, mentioned in Sketches of YSH Features.
- Optimized the C++ code generated by Zephyr ASDL. It no longer allocates GC objects for "simple" sum type variants, i.e. ones without members.
- More help removing "span ID" from the codebase.

Overall, the "big parser refactoring" has gone very well. We want a stable lossless syntax tree, which will help packaging tools like resholve, shell GUIs, and more. GUIs can use Oils via the headless mode interface.

Great testing and feedback:

Koichi Murase, author of ble.sh, thoroughly tested of oils-for-unix, which led to several bug fixes, e.g. in test -c.
- We added more ble.sh tests to our continuous build. This suite has been very useful over the years.
- We uncovered performance issues with large programs, e.g. having many variables and functions. We can use more help on the C++ runtime, e.g. with writing a proper hash table with string interning.
John Soo root-caused a trap bug, running direnv, now fixed.
- From experience, this work is useful and difficult! We need more people to try running real scripts — to fill out Shell Programs That Run Under OSH.
Simon Michael reported a rare crash at startup, which led to a fix (and code improvement).
Azat Akhmetov
- Sent feedback about echo, which led to a YSH change.
- Reported a bug with the and or operators, which led to a fix.
bar-g
- Reported a build failure, which led to a fix.
- Good feedback on YSH design.
citriqa found a good crash bug in oils-for-unix, due string formatting.
Alad Wenter found a great bug in the try builtin, now fixed.

I almost certainly missed someone here, so please leave a comment and I'll update this post. Thanks again!

Closed Issues

Here are the details:

#1649	Unquoted array literal syntax from %( foo .py ) to :\| foo .py \|
#1648	change proc syntax to match upcoming func syntax
#1640	bug: abort on special characters following '!'
#1639	Eggex should disallow $x ${x} and allow @x
#1636	Change mydict->key to mydict.key
#1634	YSH echo $x should always be correct, disallow -e -n
#1629	`osh -c 'read -d :'` fails in the C++ osh (not in the Python osh)
#1628	Parsing options like 'shopt -u expand_aliases' shouldn't be restricted upon 'source'
#1627	All types should have bool(x) , 'foo' or 'bar' should work
#1625	Tilde expansion: word vs. expression mode
#1624	Respect YSH_HISTFILE for bin/ysh
#1623	move default history from ~/.config/oils to ~/.local/share/oils
#1622	breaking change: rename rc files ~/.config/oils/oshrc and yshrc
#1621	breaking change: rename env vars OSH_* and OIL_* -> OILS_*
#1618	[[ -c /dev/null ]] fails in osh-cpp
#1615	oils native build: ld returned 1 exit status
#1608	try builtin shouldn't disallow command subs, i.e. with strict_errexit
#1607	`trap - SIGINT` behaves differently than `bash`
#1605	YSH echo should allow multiple args
#1578	crash when pyos.GetHomeDir() returns None
#1274	Maybe remove inline function calls @split(x) and $join(y), use expression sub
#983	Idea for enhanced case statement
#812	Fix leak of lines/spans in Arena (new Token/Line/Source representation)

Reminders

YSH Discourages `eval` misusage - `acme.sh` vulnerability

The last post mentioned this vulnerability in a big shell script:

Specifically, the acme.sh client for updating SSL certificates was exploitable by servers, executing arbitrary shell code specified by the server.

What I didn't mention is that YSH discourages the bug! If you look at the commit that removed the remote code execution:

https://github.com/acmesh-official/acme.sh/commit/327e2fb0a4bdbe4b75339e1cad6d20bda29318d6

They used

eval "$@"  # wrong, extra layer of evaluation of arguments

instead of

"$@"  # correct

I understand why this is confusing — the "$@" feels like it's "dangling". Doesn't it need a "verb"? It also looks like a string substitution "$x", but it's really an "array splice" operation.

When you use YSH, shopt --set simple_eval_builtin restricts eval to one argument:

ysh$ set -- 1 2
ysh$ eval "$@"
  eval "$@"
  ^~~~
[ interactive ]:2: 'eval' requires exactly 1 argument

This isn't a perfect mitigation, but it's a strong signal that you're using eval incorrectly. In the case, the vulnerable logging wrappers would not have worked.

See other posts tagged #real-problems for things YSH protects you from.

Language Design Note

To avoid the "dangling" array, I also think YSH should have a run wrapper to "pass through" an array:

"$@"          # correct
@ARGV         # YSH style

run -- "$@"   # same thing
run -- @ARGV  # ditto

Run can also take flags to limit the lookup to certain forms, behaving a bit like command and builtin:

run --extern ls
run --builtin echo
run --proc myproc

Headless Shell Screenshots

If you're interested in creating a GUI for a shell, please join #shell-gui on Zulip.

I mentioned this in the last release, and there have been a few updates. I tested it with the oils-for-unix C++ tarball, and added tests to the CI so it doesn't regress.

I tested out Subhav's web_shell demo, which has a Go client for the headless shell:

https://github.com/subhav/web_shell

Screenshot:

Web shell demo

What's happening here?

We have a Go server that receives input from an HTML form.
The server passes input to its child process osh --headless over a Unix domain socket.
osh executes the shell string with file descriptors provided by the Go server.
The Go server reads from those file descriptors, and constructs an HTTP response.

This is the FANOS protocol, somewhat documented at Oils Headless Mode: For Alternative UIs.

Question: Which GUI toolkit should we write a client-side demo with? I think some sample code with PyQT would be nice.

What's Next?

I want to get heads down into implementing YSH, with the help of contributors. I think we've made all major design decisions.

But there's more writing to do. I call it the "month of docs", and it looks like it will take more than a month!

"Month of Docs"

Rewrite the help builtin.
- We probably need a mechanism to compile text data into the executable. For practical reasons, I want to avoid external files.
Reorganize reference docs. I have a new doc/ref scheme in mind, which should be a solid structure to update as we implement YSH.
Apply for another grant. We're now more than halfway through the second grant.
A blog post to show how much works in the interactive shell, including the headless shell.

After that, I hope we'll be "home free" to work on YSH! Though there are always things that pop up, and more good ideas on the Oils 2023 Roadmap.

Appendix: Metrics for the 0.16.0 Release

These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.15.0 from May.

Spec Tests

We implemented more features in Python:

OSH spec tests for 0.15.0: 2071 tests, 1840 passing, 86 failing
OSH spec tests for 0.16.0: 2084 tests, 1856 passing, 86 failing

There was a slight regression in the C++ numbers:

C++ spec tests for 0.15.0 - 1833 of 1843 passing - delta 10
C++ spec tests for 0.16.0 - 1847 of 1859 passing - delta 12
- file builtin-history: something related to the prompt in C++.
- file TODO-deprecate: not important, so the delta is more like 11.

Progress on the YSH design:

YSH/Oil spec tests for 0.15.0: 511 tests, 471 passing, 40 failing
YSH/Oil spec tests for 0.16.0: 525 tests, 479 passing, 46 failing

Some C++ tests passed automatically, and some didn't:

YSH/Oil C++ spec tests for 0.15.0: 181 of 469 passing
YSH/Oil C++ spec tests for 0.16.0: 185 of 479 passing

When we get further on the YSH translation, more work should come "for free".

Benchmarks

The "big parser refactoring" worked! Thanks again to Aidan for doing a big chunk of this.

Parser Performance for 0.15.0: 22.1 thousand irefs per line
Parser Performance for 0.16.0: 18.2 thousand irefs per line

This was due to removing the big Arena.tokens list, which was a memory leak, as well removing a dead StrFormat() call in an inner loop.

The leak was an old bug: issue #812 mentioned above. (Interestingly, I noticed that Crafting Interpreters also keeps a big array of tokens. Once your reach problems of our size, it isn't good for performance!)

The parser refactoring also reduced memory usage (max RSS):

benchmarks/gc for 0.15.0: parse.configure-coreutils 1.97 M objects comprising 73.4 MB, max RSS 81.1 MB
benchmarks/gc for 0.16.0: parse.configure-coreutils 1.83 M objects comprising 62.1 MB, max RSS 69.1 MB

The stable benchmarks reflect the same improvement:

benchmarks/gc-cachegrind for 0.15.0 - 92.9 and 84.3 million irefs, mut+alloc+free+gc
benchmarks/gc-cachegrind for 0.16.0 - 66.3 and 83.7 million irefs, mut+alloc+free+gc

We still have to close this gap, running a hard workload:

Runtime Performance for 0.15.0: 33.8 and 20.0 seconds running CPython's configure
Runtime Performance for 0.16.0: 32.1 and 19.3 seconds running CPython's configure
bash: 26.9 and 15.6 seconds running CPython's configure

Code Size

We reformatted the whole codebase with yapf, making it use 4-space indents! The import statements at the top of the file became longer.

cloc for 0.15.0: 19,854 lines of Python and C, and ASDL omitted due to metric bug
cloc for 0.16.0: 20,732 lines of Python and C, 396 lines of ASDL

Source code we ship in the tarball:

oil-cpp for 0.15.0 - 96,530 lines
oil-cpp for 0.16.0 - 97,233 lines

The compiled code didn't get larger:

ovm-build for 0.15.0: 1.42 MB of native code (under GCC)
ovm-build for 0.16.0: 1.42 MB of native code (under GCC)