Why Sponsor Oils? | blog | oilshell.org

Oils 0.23.0 - Writing YSH Code, User Feedback, and Bug Bounty

2024-11-13

This is a delayed announcement of the August release of:

Oils version 0.23.0 - Source tarballs and documentation.

(The most recent release was this weekend.)


Why was the announcement delayed? After writing four blog posts in September, I ran out of steam! That series ended with:

Instead, I returned to working on YSH. I had fun working on some deep design issues, driven by Zulip discussions.

But I took a break to write this, because it's important to credit contributors, and because this release contains 3 months of excellent work!

It's the biggest release ever, and this announcement is long. Every area of Oils has improved: docs, the interactive shell, YSH, OSH, the standard library, and the shell runtime.

Table of Contents
Intro
Great Feedback on YSH
More Bounties?
Contributors
Breaking Changes
Docs Updated
Interactive Shell (screenshots)
Overhaul of pp builtin
New assert builtin
YSH
Method Call Syntax
Obj Type - Basic Polymorphism
io Object - for Pure Functions
Builtin Types
Buffered and Unbuffered I/O
JSON
Floats
Integers
Standard Library
OSH stdlib - small and bash-compatible
Task Files
OSH test framework - no-quotes
YSH test framework - yblocks
Ambitious Design - Awk, R, xargs
Shell Runtime - Hard Stuff!
Multiple trap bugs fixed
Optimizations, like noforklast
User Feedback
Julian - YSH
Samuel - OSH
Koichi - OSH
BashArray -> SparseArray prototype
More OSH Compatibility / Slicing
YSH Slicing is Better
Soil CI - Under the Hood
What's Next?
Please Try Oils and Send Feedback
Appendix
Closed Issues
Metrics for the 0.23.0 Release

Intro

Great Feedback on YSH

In the last few months, readers have been writing YSH code, and sending feedback! Here's are some people who have helped:


Also, Ellen Potter was awarded a bounty for finding a bug our JSON parser:

Our JSON decoder silently ignored input after a NUL byte 0x00, but correctly flagged input after say a 0x01 byte.

This is a nice bug — hard to find, but easy to fix. Now I have more confidence in our JSON implementation.

More Bounties?

I posted that bug bounty on lobste.rs, and it ended up improving the codebase!

If you're interested in working on Oils, and possibly being paid, let me know. We can always use more eyes on it.

There's a concrete BashArray -> SparseArray task at the end of this post, which would be a good intro for a skilled Python programmer.

I wrote a pitch for contributing to Oils last week too!

Contributors

Thank you for the contributions to our codebase:

Breaking Changes

I like to highlight breaking changes up front:

I also want to highlight this deprecation:

 

 

Now let's go through the improvements in this release: docs, interactive shell, YSH, OSH, standard library, and "under the hood".

This list is not necesssarily complete, but the full changelog is.

Docs Updated

We're writing the Oils Reference, with a focus on the YSH Table of Contents.

Most topics have a first draft written. Let us know if you see mistakes. Feedback and questions will improve the docs!

I've also updated A Tour of YSH.


If you'd like to write YSH code, please join us on Zulip:

Interactive Shell (screenshots)

In June's release of Oils 0.22.0, Justin Pombrio added a pretty printer using Wadler's algorithm. You can type = myexpr to see the value of any expression.

In this release, we also use it in pp builtin, which can print corresponding source code, as well as the value. It's a bit like the Rust dbg!() macro:

hello

Overhaul of pp builtin

Here are common usages:

pp (x + 5)               # show source code and value
pp value (x + 5)         # no source, same output as '= x + 5'
pp value (x + 5) | less  # can be piped, unlike '= x + 5'

Changes to pretty printing:

ysh$ = "price is \$3.99"
(Str)   'price is $3.99'

ysh$ = "backslash \\"
(Str)   b'backslash \\'

ysh$ = "isn't cool"
(Str)   b'isn\'t cool'

pp (x) and pp value (x) are stable. So I changed the unstable commands to end in an underscore:

pp test_ (x)  # formerly 'pp line'
pp asdl_ (x)  # implementation-level representation
pp cell_ x    # prints a cell/location, not a value

pp proc       # table of procs, may change

New assert builtin

The new assert builtin uses the same pretty printer:

Here's a long assert:

It accepts these forms:

assert (false)
assert [false]  # evaluates it for you

assert (42 === f())  # eagerly evaluated, not special

assert [42 === f()]  # evaluates it for you, prints error message

YSH

Method Call Syntax

Based on writing YSH, I sorted out a design issue with the ., ->, and => operators.

  1. Non-mutating method calls are mystr.upper(), not mystr => upper()
  2. Mutating method calls are mylist->append(42), often used with the call keyword
  3. Function chaining uses =>, like mylist => join()

Design notes:

Obj Type - Basic Polymorphism

YSH now has objects! I added a minimal JavaScript-like mechanism, for polymorphism. There is no notion of "class".

It turns out that this this changed unlocked several important features in the subsequent release, Oils 0.24.0.

For now, I'll outsource the explanation to Zulip:

io Object - for Pure Functions

YSH has both proc and func, and I want functions to be pure. That is, they are explicit about I/O vs. computation.

So I expanded the io object with:

Oils Reference: chap-type-method.html#IO

Thanks to Chris Waldon for feedback on renderPrompt(io).

Builtin Types

Buffered and Unbuffered I/O

Shell now has buffered I/O:

for line in (io.stdin) {
  echo $line
}

I also simplified unbuffered I/O:

read --raw-line < myfile
echo $_reply

This flag replaces the POSIX shell idiom IFS= read -r line. Telling the shell not to mangle your input takes two non-obvious options!

I added this to YSH vs. Shell Idioms, and mentioned it in A Tour of YSH.

JSON

Floats

Integers

I still want to eliminate integer overflow from Oils:

Standard Library

I moved the standard library to $LIB_OSH or $LIB_YSH, so you use them like this:

source $LIB_YSH/yblocks.ysh
use $LIB_YSH/yblocks.ysh  # next release: create a "namespace" object

Under the hood, these variables expand to ///stdlib/osh and ///stdlib/ysh. The /// refers to a path embedded in the binary.

Because $LIB_OSH is a variable, we can override it, and run the OSH standard library under bash! We want to test the same code under 2 shells.


I started a new chapter in the reference: Oils Reference > Standard Library

OSH stdlib - small and bash-compatible

The OSH standard library is small, and based on the minimal style of bash I use:

$ wc -l stdlib/osh/*.sh
   8 stdlib/osh/bash-strict.sh
  76 stdlib/osh/byo-server.sh
  93 stdlib/osh/no-quotes.sh
  91 stdlib/osh/task-five.sh
  23 stdlib/osh/two.sh
  35 stdlib/osh/two-test.sh
 326 total

Typically you'll source $LIB_OSH/task-five.sh. It's for task files, a "notebook" or "dev" pattern that's made me more productive — every day, for years. Here are details on the other files:

Most of this is stable, except for BYO. I don't expect the standard library to grow beyond a few hundred lines.

Task Files

I unfortunately haven't written much about task files. Here's a collection of links:

I updated it with this 2022 post:

Counterpoint: to be intellectually honest, the Oils repo has perhaps gotten too full of task files!

There are tens of thousands of lines of one-off experiments. They helped me learn a lot, but they should be better organized, for contributors to use and learn from.

On the other hand, I have many git repos filled with a few dozen lines of task files, and they're invaluable. I can juggle multiple projects in parallel, because I can pick up right where I left off.

We should do more work on collaboration. I believe this makes sense because Shell Scripts Are Executable Documentation (2021).

OSH test framework - no-quotes

Under the hood, the OSH test framework uses declare -n "out params". This avoids eval and quoting issues.

Example of testing echo hi:

source $LIB_OSH/no-quotes.sh  # named in comparison to git's "sharness"

test-foo() {
  local status stdout  # declare vars

  nq-capture status stdout \
    echo hi

  # make assertions
  nq-assert 0 = "$status"
  nq-assert 'hi' = "$stdout"
}

YSH test framework - yblocks

Here's how you test echo hi in YSH:

source $LIB_YSH/yblocks.ysh  # because you use ysh blocks

proc test-foo {
  yb-capture (&r) {  # capture result into a "Place"
    echo hi
  }

  # assertion failures give pretty output - screenshots above
  assert [0 === r.status]
  assert [u'hi\n' === r.stdout]  # don't lose the trailing newline
}

Other YSH changes:

Ambitious Design - Awk, R, xargs

Samuel and Aidan are interested in Awk-like idioms in YSH, and we've made progress on how to do it. We still believe that Shell, Awk, and Make Should Be Combined (2016) :-)

The new io.stdin object is important, as well as controlling the evaluation of $0 $1 $2.


I generalized this design question even more, with the slogan

Streams, Tables, and Processes - Awk, R, and xargs

We're using this goal to motivate the YSH language design, and to motivate reflection on the language. I think we'll have nicer reflection than languages like Python, JavaScript, Ruby, and Lua. Error messages are an issue though.

Zulip threads:

Let me know if you're interested in helping!

Shell Runtime - Hard Stuff!

The recent retrospective on Oils mentioned that the shell runtime is hard!

Multiple trap bugs fixed

Optimizations, like noforklast

I noticed a related bug when fixing trap.

Another interaction:

Then I improved these noforklast optimizations, and measured them.

This work could use its own blog post:

How many processes does a Unix shell start?

Raw results from https://www.oilshell.org/release/0.23.0/more-tests.wwz/syscall/-wwz-index:

command.Redirect

I optimized the representation of redirects, which made the interpreter a bit faster. This was motivated by the CPython configure workload.

User Feedback

As mentioned in the intro, we've gotten great feedback on both OSH and YSH. It's easier for me to organize some of it by person :-)

Julian - YSH

Samuel - OSH

Samuel did a lot of great testing, like #projects-with-oils > Swapping GNU coreutils for uutils coreutils on Gentoo Linux

Koichi - OSH

Koichi did another round of OSH testing on ble.sh.

BashArray -> SparseArray prototype

Fun fact: Bash arrays are not arrays! They don't offer O(1) random access:

echo ${myarray[i]}  # may traverse the entire array

I believe they are linked lists, with some caching optimizations, although it may depend on the bash version.

The linked list representation means that they can be sparse. And ble.sh makes use of such non-contiguous and array indices, like 500,000 or 2,000,000. For this usage pattern, our List[str] representation is big and slow.

So I proposed that we change the representation to Dict[BigInt, str]. I prototyped this, and wrote benchmarks. Koichi also validated that it's faster for his workloads.

We call this value.SparseArray, and it still needs to be "turned on". If you're interested in helping, possibly for a grant award, please let me know!

More OSH Compatibility / Slicing

$ printf '%d\n' '"a'
97

$ printf '%d\n' $'"\u03bc'  # this works in OSH and bash
956

In my opinion, array slicing in bash is "broken". The trailing : completely changes the meaning:

$ bash -c 'a=(1 2 3); echo ${a[@]:0}'
1 2 3
$ bash -c 'a=(1 2 3); echo ${a[@]:0:}'  # why doesn't it print 1 2 3?

This behavior is also inconsistent:

$ bash -c 'a=(1 2 3); echo ${a[@]::}'  # prints nothing

$ bash -c 'a=(1 2 3); echo ${a[@]:}'   # error
bash: line 1: ${a[@]:}: bad substitution

In any case, I made OSH more compatible with bash, because it came up in both Nix and ble.sh.

But I also added shopt -s strict_parse_slice, so that you can require explicit code, rather than relying on these quirks.

Koichi explained bash like this:

This is a combination of two separate facts.

  1. When there is only one colon, it means that only "offset" is specified. When there are two colons, it means that "offset" and "length" are specified.
  2. The arithmetic expression can be an empty string, which means 0 (except in some special contexts such as for ((;;))).

${arr[@]:0} means offset '0' and length unset. ${a[@]:0:} means offset '0' and length ''. and ${a[@]::}means offset '0' and length ''.

I can accept that this is how the bash implementation happens to work! But I'm not sure it's documented.

YSH Slicing is Better

I'll 'take this opportunity to show that YSH has simple and familiar design, stolen from Python:

$ var a = ['zero', 'one', 'two']
$ = a[1:3]  # one two
$ = a[1:]   # one two
$ = a[:2]   # zero one
$ = a[:]    # zero one two

The rules are:

Soil CI - Under the Hood

On every commit, we run thousands of tests, and dozens of benchmarks. We also test the setup for our custom tools on different Linux distros. Details:

Other changes:

What's Next?

This was a huge release! And remember that there was another release last weekend, which will be:

These are huge features! YSH is making great progress.


To give you a sense of what's going on, here are some Zulip threads:

More subprojects:

Please Try Oils and Send Feedback

Thank you for all the great feedback! Please continue using Oils, testing it, and reporting issues.

Design and dev discussions happen on https://oilshell.zulipchat.com/, and you're welcome to join!

Appendix

Closed Issues

Some of these issues weren't mentioned above:

#2053 trap INT doesn't run on Ctrl-C
#2037 segfault on MacOS - maybe related to case statement
#2026 `json read` unexpectedly parses `123\x00`
#2003 crash in parsing return
#1992 Add pp [x + 42] to print an expression and its value - like Rust dbg!()
#1986 intermittent crash running amd-test script -- reproducible in dbg, opt
#1985 Abort with += on missing dict key
#1984 Missing "Str=>lower()"
#1853 traps in osh -c don't run when the final command is not a shell builtin
#1833 try builtin only sets _error sometimes, which is hard to remember and document
#1830 _error value persists after successful try
#1654 ERR trap executed when errexit is ignored
#1144 Floating Point Support
#484 implement set -C / set -o noclobber

Metrics for the 0.23.0 Release

These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.22.0.

Docs

We're tracking the progress of the Oils Reference with these metrics:

Wild tests

I don't usually track this test suite, but the improvement due to the case ;& ;;& feature is visible:

(Now I realize that I didn't mention support for ;& and ;;&, which are an obscure syntax for control flow in shell. Sometimes these release notes are not complete!)

Spec Tests

There are 75 new tests passing on OSH:

It all works in fast C++, even though we write typed Python:


We have more YSH features, and the corresponding test coverage:

Likewise, everything still works in C++:

Benchmarks

Warning: I used a new machine mercer rather than lenny, so some comparisons to version 0.22.0 are not valid. Nevertheless, let's take a look, with this discrepancy in mind.


Cachegrind isn't stable across machines, so this isn't a real speedup:


These numbers are comparable; we use a bit less memory:

Again, cachegrind metrics aren't comparable:

Let's look at our "problem workload":

Surprisingly, OSH is sometimes faster than bash this workload!

I've wanted to improve our measurement methodology for awhile.

Code Size

Oils is still a small program in terms of source code:

And generated C++:

And compiled binary size:

 

Remember that After 8 Years, Oils Is Still Small and Flexible!