Why Sponsor Oils? | blog | oilshell.org
This is a delayed announcement of the August release of:
Oils version 0.23.0 - Source tarballs and documentation.
(The most recent release was this weekend.)
Why was the announcement delayed? After writing four blog posts in September, I ran out of steam! That series ended with:
Instead, I returned to working on YSH. I had fun working on some deep design issues, driven by Zulip discussions.
But I took a break to write this, because it's important to credit contributors, and because this release contains 3 months of excellent work!
It's the biggest release ever, and this announcement is long. Every area of Oils has improved: docs, the interactive shell, YSH, OSH, the standard library, and the shell runtime.
In the last few months, readers have been writing YSH code, and sending feedback! Here's are some people who have helped:
bar-g
- feedback on try _error
, which led to a design changeheftig
), deviant
on github - trap
bug reports, which led to an overhaulAlso, Ellen Potter was awarded a bounty for finding a bug our JSON parser:
Our JSON decoder silently ignored input after a NUL byte 0x00
, but correctly flagged input after say a 0x01
byte.
This is a nice bug — hard to find, but easy to fix. Now I have more confidence in our JSON implementation.
I posted that bug bounty on lobste.rs, and it ended up improving the codebase!
My comment on
Regular JSON (neilmadden.blog
via lobste.rs)
31 points, 18 comments on 2024-06-18
If you're interested in working on Oils, and possibly being paid, let me know. We can always use more eyes on it.
There's a concrete BashArray -> SparseArray
task at the end of this post, which would be a good intro for a skilled Python programmer.
I wrote a pitch for contributing to Oils last week too!
Thank you for the contributions to our codebase:
Str.split()
eval
with var bindings, for APIs like the flag parser to useList::pop()
set -E
aka set -o errtrace
, which affects trap ERR
set -o noclobber
shopt -o
Dict.erase()
and Dict.get()
I like to highlight breaking changes up front:
.
operator, not the =>
operator. Details below.source $LIB_YSH/args.ysh
, rather than source --builtin args.ysh
module
builtin to source-guard
read -0 myvar
no longer respects $IFS
read -0
is an Oils extension to read until a NUL
byte, so word splitting was a mistake. I noticed this problem after Sylvia mentioned the IFS= read -r
idiom.Str
, rather than Int
\u{3bc}
, and they don't have to appear inside quotes like u'\u{3bc}'
.#'a'
syntaxBashArray
, BashAssoc
I also want to highlight this deprecation:
_status
integer is deprecated in favor of _error.code
.
_error
dict everywhere, and we can attach arbitrary properties to it in a backward compatible way._status
, but I didn't want to break it in this release.
Now let's go through the improvements in this release: docs, interactive shell, YSH, OSH, standard library, and "under the hood".
This list is not necesssarily complete, but the full changelog is.
We're writing the Oils Reference, with a focus on the YSH Table of Contents.
Most topics have a first draft written. Let us know if you see mistakes. Feedback and questions will improve the docs!
I've also updated A Tour of YSH.
If you'd like to write YSH code, please join us on Zulip:
In June's release of Oils 0.22.0, Justin Pombrio added a pretty printer using Wadler's algorithm. You can type = myexpr
to see the value of any expression.
In this release, we also use it in pp
builtin, which can print corresponding source code, as well as the value. It's a bit like the
Rust dbg!()
macro:
pp
builtinHere are common usages:
pp (x + 5) # show source code and value
pp value (x + 5) # no source, same output as '= x + 5'
pp value (x + 5) | less # can be piped, unlike '= x + 5'
Changes to pretty printing:
value.SparseArray
, to support an upcoming migrationr'\'
and u'\n'
, but currently we only print the ''
and b''
styles.ysh$ = "price is \$3.99"
(Str) 'price is $3.99'
ysh$ = "backslash \\"
(Str) b'backslash \\'
ysh$ = "isn't cool"
(Str) b'isn\'t cool'
pp (x)
and pp value (x)
are stable. So I changed the unstable commands to end in an underscore:
pp test_ (x) # formerly 'pp line'
pp asdl_ (x) # implementation-level representation
pp cell_ x # prints a cell/location, not a value
pp proc # table of procs, may change
assert
builtinThe new assert
builtin uses the same pretty printer:
Here's a long assert:
It accepts these forms:
assert (false)
assert [false] # evaluates it for you
assert (42 === f()) # eagerly evaluated, not special
assert [42 === f()] # evaluates it for you, prints error message
Based on writing YSH, I sorted out a design issue with the .
, ->
, and =>
operators.
mystr.upper()
, not mystr => upper()
mylist->append(42)
, often used with the call
keyword
->
is now enforced, a breaking change. Under the hood, it's implemented with an M/
prefix on the method name.=>
, like mylist => join()
Design notes:
mydict.key
like JavaScript, and unlike Python..
and :
operators, for binding self
.YSH now has objects! I added a minimal JavaScript-like mechanism, for polymorphism. There is no notion of "class".
It turns out that this this changed unlocked several important features in the subsequent release, Oils 0.24.0.
For now, I'll outsource the explanation to Zulip:
io
Object - for Pure FunctionsYSH has both proc
and func
, and I want functions to be pure. That is, they are explicit about I/O vs. computation.
So I expanded the io
object with:
io.stdin
- an attributeio.captureStdout(myblock)
- a non-mutating methodcall io->eval(myblock)
- a mutating method
->
? One reason is that the block could have a setglobal
statement, which mutates the interpreter.Oils Reference: chap-type-method.html#IO
Thanks to Chris Waldon for feedback on renderPrompt(io)
.
Dict.get()
and Dict.erase()
Str.split()
Shell now has buffered I/O:
for line in (io.stdin) {
echo $line
}
I also simplified unbuffered I/O:
read --raw-line < myfile
echo $_reply
This flag replaces the POSIX shell idiom IFS= read -r line
. Telling the shell not to mangle your input takes two non-obvious options!
I added this to YSH vs. Shell Idioms, and mentioned it in A Tour of YSH.
mops::BigInt
NAN
, INFINITY
null
, like JavaScript does. Python has a surprising bug where it prints invalid JSON NaN
, or you can opt into failure with allow_nan=False
.NAN
and INFINITY
constants — the C language spelling for these values===
so neither arg can be a Float
floatsEqual()
builtin%.16g
%.17g
that Bruce Dawson suggested, but %.17g
causes problems in practice%.16g
float
precision, rather than double
mops::BigInt
to fix two integer truncation bugs - reported by Koichi MuraseI still want to eliminate integer overflow from Oils:
I moved the standard library to $LIB_OSH
or $LIB_YSH
, so you use them like this:
source $LIB_YSH/yblocks.ysh
use $LIB_YSH/yblocks.ysh # next release: create a "namespace" object
Under the hood, these variables expand to ///stdlib/osh
and ///stdlib/ysh
. The ///
refers to a path embedded in the binary.
Because $LIB_OSH
is a variable, we can override it, and run the OSH standard library under bash! We want to test the same code under 2 shells.
I started a new chapter in the reference: Oils Reference > Standard Library
The OSH standard library is small, and based on the minimal style of bash I use:
$ wc -l stdlib/osh/*.sh
8 stdlib/osh/bash-strict.sh
76 stdlib/osh/byo-server.sh
93 stdlib/osh/no-quotes.sh
91 stdlib/osh/task-five.sh
23 stdlib/osh/two.sh
35 stdlib/osh/two-test.sh
326 total
Typically you'll source $LIB_OSH/task-five.sh
. It's for task files, a "notebook" or "dev" pattern that's made me more productive — every day, for years. Here are details on the other files:
two
is for the log
and die
functions I always usebash-strict
is for the unofficial "bash strict mode", which has been updated with shopt -s inherit_errexit
no-quotes
is for testing - details belowbyo-server
is for "polyglot" test discovery and execution, similar to the Test Anything ProtocolMost of this is stable, except for BYO. I don't expect the standard library to grow beyond a few hundred lines.
I unfortunately haven't written much about task files. Here's a collection of links:
I updated it with this 2022 post:
Replacing make with a Shell Script for Running Your Project's Tasks (nickjanetakis.com
via lobste.rs)
5 points, 24 comments on 2022-03-06
Counterpoint: to be intellectually honest, the Oils repo has perhaps gotten too full of task files!
There are tens of thousands of lines of one-off experiments. They helped me learn a lot, but they should be better organized, for contributors to use and learn from.
On the other hand, I have many git repos filled with a few dozen lines of task files, and they're invaluable. I can juggle multiple projects in parallel, because I can pick up right where I left off.
We should do more work on collaboration. I believe this makes sense because Shell Scripts Are Executable Documentation (2021).
no-quotes
Under the hood, the OSH test framework uses declare -n
"out params". This avoids eval
and quoting issues.
Example of testing echo hi
:
source $LIB_OSH/no-quotes.sh # named in comparison to git's "sharness"
test-foo() {
local status stdout # declare vars
nq-capture status stdout \
echo hi
# make assertions
nq-assert 0 = "$status"
nq-assert 'hi' = "$stdout"
}
yblocks
Here's how you test echo hi
in YSH:
source $LIB_YSH/yblocks.ysh # because you use ysh blocks
proc test-foo {
yb-capture (&r) { # capture result into a "Place"
echo hi
}
# assertion failures give pretty output - screenshots above
assert [0 === r.status]
assert [u'hi\n' === r.stdout] # don't lose the trailing newline
}
Other YSH changes:
repeat()
function, motivated by generating testdata.
'str' * 3
or ['my', 'list'] * 3
in Python.Samuel and Aidan are interested in Awk-like idioms in YSH, and we've made progress on how to do it. We still believe that Shell, Awk, and Make Should Be Combined (2016) :-)
The new io.stdin
object is important, as well as controlling the evaluation of $0 $1 $2
.
I generalized this design question even more, with the slogan
Streams, Tables, and Processes - Awk, R, and xargs
We're using this goal to motivate the YSH language design, and to motivate reflection on the language. I think we'll have nicer reflection than languages like Python, JavaScript, Ruby, and Lua. Error messages are an issue though.
Zulip threads:
Let me know if you're interested in helping!
The recent retrospective on Oils mentioned that the shell runtime is hard!
trap
bugs fixedSIGINT
/ KeyboardInterrupt
. C++ and Python are now more similar.heftig
trap ERR
is only supposed to run when there is a command error.$LINENO
bug.deviant
noforklast
optimization when the process has traps.noforklast
I noticed a related bug when fixing trap
.
set -o pipefail
is also incompatible with noforklast
optimization, because the option changes the exit status. Detect and disable this combination.Another interaction:
ysh
- Turn off top-level noforklast
optimizations, to respect shopt --set verbose_errexit
bin/ysh -c '/bin/false'
Then I improved these noforklast
optimizations, and measured them.
command
builtin no longer defeats optimizations
command date | wc -l
now starts fewer processesfork()
from subshellstest/syscall
- Compare against bash 5, not just bash 4.This work could use its own blog post:
How many processes does a Unix shell start?
Raw results from https://www.oilshell.org/release/0.23.0/more-tests.wwz/syscall/-wwz-index:
yash
I optimized the representation of redirects, which made the interpreter a bit faster. This was motivated by the CPython configure
workload.
As mentioned in the intro, we've gotten great feedback on both OSH and YSH. It's easier for me to organize some of it by person :-)
Str.replace()
, now fixed
setglobal g.missing += 1
, now fixedStr.lower()
- still needs Unicode supportvalue.Expr
like ^[1 < 2]
(unevaluated expressions)Samuel did a lot of great testing, like #projects-with-oils > Swapping GNU coreutils for uutils coreutils on Gentoo Linux
unalias -a
(( ))
and $(( ))
test -v name[index]
- used by NixKoichi did another round of OSH testing on ble.sh.
$(( x[0] ))
for value.Undef
and value.Str
shopt -s strict_arith
gives flags these cases, but we evaluate them by default.builtin declare s=foo
declare
, called with the builtin
builtindeclare -i
is no longer ignored by default
shopt --set ignore_flags_not_impl
Fun fact: Bash arrays are not arrays! They don't offer O(1) random access:
echo ${myarray[i]} # may traverse the entire array
I believe they are linked lists, with some caching optimizations, although it may depend on the bash version.
The linked list representation means that they can be sparse. And ble.sh makes use of such non-contiguous and array indices, like 500,000 or 2,000,000. For this usage pattern, our List[str]
representation is big and slow.
So I proposed that we change the representation to Dict[BigInt, str]
. I prototyped this, and wrote benchmarks. Koichi also validated that it's faster for his workloads.
We call this value.SparseArray
, and it still needs to be "turned on". If you're interested in helping, possibly for a grant award, please let me know!
printf
builtin has a very special "c
syntax for "character" arguments. It now supports Unicode, as bash does.
$ printf '%d\n' '"a'
97
$ printf '%d\n' $'"\u03bc' # this works in OSH and bash
956
In my opinion, array slicing in bash is "broken". The trailing :
completely changes the meaning:
$ bash -c 'a=(1 2 3); echo ${a[@]:0}'
1 2 3
$ bash -c 'a=(1 2 3); echo ${a[@]:0:}' # why doesn't it print 1 2 3?
This behavior is also inconsistent:
$ bash -c 'a=(1 2 3); echo ${a[@]::}' # prints nothing
$ bash -c 'a=(1 2 3); echo ${a[@]:}' # error
bash: line 1: ${a[@]:}: bad substitution
In any case, I made OSH more compatible with bash, because it came up in both Nix and ble.sh.
But I also added shopt -s strict_parse_slice
, so that you can require explicit code, rather than relying on these quirks.
Koichi explained bash like this:
This is a combination of two separate facts.
- When there is only one colon, it means that only "offset" is specified. When there are two colons, it means that "offset" and "length" are specified.
- The arithmetic expression can be an empty string, which means
0
(except in some special contexts such asfor ((;;))
).
${arr[@]:0}
means offset'0'
and length unset.${a[@]:0:}
means offset'0'
and length''
. and${a[@]::}
means offset'0'
and length''
.
I can accept that this is how the bash implementation happens to work! But I'm not sure it's documented.
I'll 'take this opportunity to show that YSH has simple and familiar design, stolen from Python:
$ var a = ['zero', 'one', 'two']
$ = a[1:3] # one two
$ = a[1:] # one two
$ = a[:2] # zero one
$ = a[:] # zero one two
The rules are:
len(a)
.On every commit, we run thousands of tests, and dozens of benchmarks. We also test the setup for our custom tools on different Linux distros. Details:
Dockerfile.wedge-bootstrap-debian-{10,12}
Other changes:
uftrace
, R-libs
This was a huge release! And remember that there was another release last weekend, which will be:
These are huge features! YSH is making great progress.
To give you a sense of what's going on, here are some Zulip threads:
oils.pub
domainMore subprojects:
Thank you for all the great feedback! Please continue using Oils, testing it, and reporting issues.
Design and dev discussions happen on https://oilshell.zulipchat.com/, and you're welcome to join!
Some of these issues weren't mentioned above:
#2053 | trap INT doesn't run on Ctrl-C |
#2037 | segfault on MacOS - maybe related to case statement |
#2026 | `json read` unexpectedly parses `123\x00` |
#2003 | crash in parsing return |
#1992 | Add pp [x + 42] to print an expression and its value - like Rust dbg!() |
#1986 | intermittent crash running amd-test script -- reproducible in dbg, opt |
#1985 | Abort with += on missing dict key |
#1984 | Missing "Str=>lower()" |
#1853 | traps in osh -c don't run when the final command is not a shell builtin |
#1833 | try builtin only sets _error sometimes, which is hard to remember and document |
#1830 | _error value persists after successful try |
#1654 | ERR trap executed when errexit is ignored |
#1144 | Floating Point Support |
#484 | implement set -C / set -o noclobber |
These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.22.0.
We're tracking the progress of the Oils Reference with these metrics:
I don't usually track this test suite, but the improvement due to the case ;& ;;&
feature is visible:
(Now I realize that I didn't mention support for ;&
and ;;&
, which are an obscure syntax for control flow in shell. Sometimes these release notes are not complete!)
There are 75 new tests passing on OSH:
It all works in fast C++, even though we write typed Python:
vars-special
, which seems to be an artifact of the test harness. Fixed in the next release.We have more YSH features, and the corresponding test coverage:
Likewise, everything still works in C++:
Warning: I used a new machine mercer
rather than lenny
, so some comparisons to version 0.22.0 are not valid. Nevertheless, let's take a look, with this discrepancy in mind.
Cachegrind isn't stable across machines, so this isn't a real speedup:
These numbers are comparable; we use a bit less memory:
parse.configure-coreutils
1.66 M objects comprising 45.8 MB, max RSS 51.4 MBparse.configure-coreutils
1.65 M objects comprising 41.1 MB, max RSS 46.6 MBAgain, cachegrind metrics aren't comparable:
fib
takes 33.0 million irefs, mut+alloc+free+gcfib
takes 27.6 million irefs, mut+alloc+free+gcLet's look at our "problem workload":
configure
configure
configure
Surprisingly, OSH is sometimes faster than bash this workload!
configure.cpython
configure.util-linux
I've wanted to improve our measurement methodology for awhile.
Oils is still a small program in terms of source code:
And generated C++:
And compiled binary size:
Remember that After 8 Years, Oils Is Still Small and Flexible!