Why Sponsor Oils? | blog | oilshell.org
This is the latest version of Oils, a Unix shell. It's our upgrade path from bash to a better language and runtime.
Oils version 0.22.0 - Source tarballs and documentation.
To build and run it, follow the instructions in INSTALL.txt, which have been updated with this release. The wiki has tips on How To Test OSH.
If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ.
Reminder: As of the last release, Oils is a pure native binary! No more Python. It passes the same spec tests (2700+ cases), and it's 2x to 50x faster.
I still need to write a retrospective on this, which I may do after some performance work.
This post is long because it's been 3 months since the last release! What's changed?
I left these areas out of the title:
I describe all these changes, but I also inserted a few design interludes, so you can see the big picture too:
Let's start with the changes that are easy to see.
I re-organized and prettified the Oils Reference:
Many things are still undocumented, but we now have metrics to track this. (See the appendix.)
Justin Pombrio implemented a new pretty printer! It uses Wadler's algorithm, as described in our design doc.
Here's what it looks like with some realistic Github issue data:
Remember that the =
keyword takes an expression on the right, similar to var x = myexpr
(similar to Lua).
This example shows off the line wrapping algorithm:
It also works with OSH data structures:
A few things we should polish:
BashArray
osh -n
), in favor of this nice new one.= myexpr | less -r
, because the |
is parsed in expression mode.
pp (myexpr) | less -r
, which is more consistent with the rest of YSH.I should write a more detailed blog post: Unix Shell Now Has JSON and Pretty Printing
Before describing changes in detail, let's credit contributors.
compgen -e -k
oguz-ismail
- Tilde expansion bug report - Issue #1862lishaduck
- typo fix in the docsI want to repeat that failing spec tests are valuable contributions! Figuring out what bash and other shells do is often more than half the work.
I also improved the our contributor setup a couple weeks ago. So I need to write blog posts about that.
:
in places like read :myvar
and mapfile :myvar
json write (x, space=0)
instead of a --pretty=F
flag@()
now decodes "J8 Lines"ARGV
is now a normal variable, not a special oneshvarGet()
returns null
for undefined variableSome of these changes are explained in detail below. I like to highlight breaking changes early in the announcement.
Now let's go through the changes in each category. You can also view the full changelog.
oils-for-unix
tarballdoc/ref
metrics: https://www.oilshell.org/release/0.22.0/doc/metrics.txt
As mentioned, the Oils Reference has been overhauled and expanded
@(seq 5)
behave?_build/oils.sh
and install
are now POSIX shell, running with #!/bin/sh
, not bash.ln -v
and install -v
As mentioned, I removed the colon "pseudo-sigil" in read :myvar
and mapfile :myvar
. It was intended to make variable names distinct, but we now have &
for that (known as value.Place
). Summary:
read myvar # OSH style
read --all # YSH style with implicit _reply
read --all (&myvar) # YSH style with explicit var
ulimit --all
is an alias for ulimit -a
.echo hi > /dev/full
compgen -e
(env vars) and -k
(keywords), based on spec tests by Matthew Davidsonread -n
behavior, reported by Yihang LiuUnlike other shells, we look inside ${}
for syntax errors. This is a consequence of our static parsing philosophy.
But this meant that we couldn't run "polyglot" scripts with zsh code, like git-completion.bash
:
if [[ -n ${ZSH_VERSION-} ]]; then
# zsh-only syntax in this condition
unset ${(M)${(k)parameters[@]}:#__gitcomp_builtin_*} 2>/dev/null
...
So we now recognize the zsh-only syntax ${(x)myvar}
. We parse it, but don't execute it at runtime.
Samuel did some excellent testing with Nix. It led to fixes that improved OSH for everyone, not just Nix.
Historically, Nix has been the hardest test of bash compatibility. It uses more bash features than any distro I've seen, e.g.: https://github.com/oilshell/oil/issues/26
${a[@]+foo}
and ${a[@]:+foo}
now match bash (commit)shopt -s strict_array
. Without this option:
${mystr[@]}
(Nix code relies on this.)case
now respects ;&
and ;;&
terminators
I also overhauled bash-style regex parsing, i.e. lex_mode_e.BashRegex
. For example, we now implement the very special rule of allowing spaces inside ()
inside a regex pattern, like
[[ x =~ a(b c)d ]]
Here's a comment related to this design flaw in bash: https://news.ycombinator.com/item?id=38414011
It is sometimes difficult to specify a regular expression properly without using quotes, or to keep track of the quoting used by regular expressions while paying attention to shell quoting and the shell’s quote removal. Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the shell. For example, the following is equivalent to the pattern used above:
I recommend doing what the manual says:
pat='a(b c)d'
[[ x =~ $pat ]]
But we also "conceded to reality", for Nix. That is, you no longer have to do this refactoring, because you may not control the code in the first place.
So we are delivering on our goal: OSH is the most bash-compatible shell, by a mile.
BASH_REMATCH
(commit)Join us on the #nix
channel on Zulip to help with Nix compatibility!
Samuel started this repo:
Very briefly:
2021. Nix user Raphael Megzari tested out Oils and liked it. He inspired some documentation, as well as a Nix RFC to use Oils (then called "Oil").
To be honest, it was a bit premature, because only the slow Python implementation was usable. And we needed more help and testing from Nix users.
But Raphael also told me about NLnet and the grants they offer.
2022. I applied for a grant, we got the first one in April!
2024. Oils is now pure native code, as mentioned at the top of this post. You can also see metrics in the appendix.
This deserves a full retrospective, including crediting contributors. But I hope this summary is useful for now.
Now let's look at what changed in YSH. This is the new shell with Python-like data types.
getVar()
function, based on feedback from Julian Brown
evalExpr()
should accept a string, but getVar()
should suffice in many cases.shvarGet()
(dynamic scope) also returns null
for an undefined variable. Prior to this change, you couldn't handle the error.read --num-bytes (&x)
(motivated by code from Yihang Liu)
--long-flag
style is how you know you're using a nice new API, as opposed to read -n -N
, which have quirkssetglobal
scope issue, a big oversight reported by bar-g
(commit)Block args and typed args are no longer confused. We now have a third argument group, after a semicolon:
cd /tmp (; ; myblock) # myblock is of type value.Command
This is equivalent to using a block literal, which is what you'll see 99% of the time:
cd /tmp {
echo hi
}
In contrast, we will still have eval (block)
, not eval (; ; block)
. This is a subtle distinction: eval
takes a positional value.Command
arg, not a block arg.
I tightened up the parsing of command.Simple
, and allowed redirects after a block arg (issue #1850):
json write (x) >out.txt
cd /tmp { echo hi } > out.txt
ARGV
is a normal value.List
var, not an alias for "$@"
, which is the "argv stack" (commit).
So now we have two different "worlds":
"$@"
@ARGV
This distinction fixes a bug, simplifies the YSH language, and opens up more optimization for a pure YSH runtime.
ysh:upgrade
, but not ysh:all
(based on feedback from Samuel)
ysh:upgrade
[[
in YSH (Samuel also ran into this)Last June, I published a "design roadmap" for YSH, which included the concept of interior vs. exterior:
This principle continues to play a big role in our design decisions. I want to write a post based on this thread:
For example, procs that take typed arguments can now be declared with a typed
keyword:
typed proc p (; x, y) { # new 'typed' keyword
echo "sum is $[x + y]"
}
This is so we have a clean distinction: plain procs are exterior, but typed procs are interior. This keyword is now optional, but will become required.
A related issue is that we don't do any auto-serialization, like Python's multiprocessing
module does with pickle
. Serialization in YSH is short, but not invisible.
Now let's review changes to data languages. Recall that J8 Notation is a compatible upgrade of JSON, and is built on UTF-8.
Prior to this release, we used the "Bjoern DFA" to decode UTF-8.
But there was a problem: it has a binary yes/no error model, which isn't sufficient for JSON. Valid JSON can represent invalid UTF-8, i.e. surrogate halves:
So Aidan wrote a brand new decoder, with precise error handling. It's very clean, and better than what I had in mind, which was more of an "inverted" state machine!
So we can now round-trip JSON. And we can also show precise decoding errors to users, though we haven't hooked that up yet.
Aidan also wrote a decoder in JavaScript, which you can try here!
aolsen.ca
)I agree that UTF-8 is not well explained. Here's a checklist of UTF-8 decoding errors I keep in mind, which helped me fix a few bugs below:
042
or 0042
rather than 42
@(spliced command sub)
We have a new format "J8 Lines":
99% of the time, it behaves like lines of text:
/etc/os-release
/etc/passwd
http://www.example.com/
But you can also use quoted J8 strings:
"multiple \n lines \n"
b'binary data \y00\y01\yff'
It's now hooked up to the @(spliced command sub)
construct, which is like the "array" version of $(command sub)
:
ls @(cat j8-lines.txt) # list all of the directories
for x in @(cat other.txt) { # iterate over decoded lines
echo $x
}
Invariant: any argv
array can be represented with J8 Lines. This is not true with
text split by $IFS
. That style leads to data-dependent bugs.
Double quoted strings unfortunately have two different meanings in Oils:
"hi $x"
respects $
substitution, just like POSIX shell.$
in "Price is $3.99"
isn't special.To distinguish these cases, we now allow optional sigils before the left quote.
In YSH, you can add a leading $
:
var x = $"hi $x" # identical to "hi $x"
In JSON8, you can add a leading j
:
j"$3.99" # identical to "$3.99"
You won't use these sigils in the vast majority of cases. But I want to write a blog post to emphasize that our syntax is simpler and more powerful than bash + JSON.
And using explicit sigils shows off the simplicity. We have just four styles:
r'raw without \ escapes'
b'j8 style bytes' u'unicode'
$"shell double quotes"
j"JSON double quotes"
For each of the code strings, there's a multi-line version with triple quotes:
|
|
|
|
That's it!
These sigils were motivated by our pretty-printing work. We were thinking about printing strings in an unambiguous way, regardless of the surrounding context. Without context, it may not be obvious if you're looking at OSH or YSH or JSON.
As mentioned in the list of breaking changes, the way to control indentation is now:
json write (x) # default is 2 spaces
json write (x, space=0) # no indentation
json write (x, space=4) # 4 spaces
See chap-builtin-cmd.html#json.
We now consistently check for code points greater than the max, and in the surrogate range. These checks happen in:
u'unicode' b'bytes'
u'unicode' b'bytes'
, which are identical by design!But not in OSH, basically because bash and other shells don't. For example:
$''
echo
and printf
I want to write a blog post about this analogy in Oils:
Shell : YSH :: JSON : J8 Notation
The surrogate pair work shows this. We faithfully implement the warts in JSON, but we upgrade it to something where you can avoid warts.
I think we're done implementing JSON in Oils. And I noticed this "trichotomy" while writing this post:
So this is interesting: JSON is implemented differently in Python, JavaScript, and Oils, precisely because of the interior representation of strings! (Encoding takes you from interior to exterior, and decoding from exterior to interior.)
This is also an interesting exception to our Language Design Principles. In terms of strings:
I reduced the number of HereDocWriter
processes, a performance bug I mentioned in the last release:
OSH now starts 5% - 10% fewer processes than bash or dash on the Python configure
workload!
But surprisingly, that doesn't make us faster overall.
Both Melvin and I got kinda worked up about this, and landed many more optimizations, which I describe below.
We made great progress, but it appears we need to back up a bit to really improve performance. For example, Melvin is working on adding a control flow graph representation to mycpp to make it smarter.
We're also improving benchmark workloads and measurements. Surprisingly, OSH is slower relative to bash on real hardware, compared to the virtual machines we that our CI runs on.
This work will take awhile, but I have no doubt that Oils will get faster over time. It's very workload-dependent, but roughly speaking, I'd say we're at 50% to 120% the speed of bash — despite being written in typed Python! And it feels like 80% - 200% is feasible, though I don't know how long that will take.
Melvin did a ton of deep debugging and analysis, which led to several fixes:
malloc()
s in our execve()
bindingvirtual
heuristic in mycpp, which was confused by name conflictsSome of the optimizations I landed:
switch
on strings - with str_switch(s)
Str? val
field of a Token
.span_id
concept. We use normal pointers, and let the GC do its work.length
into 16 bits, so a Token
is 32 bytes. I would like them to be 24 bytes, but that actually caused more GC and allocation pressure due to the commmon word_part.Literal
.I think we've now settled on the code representation. I did this refactoring not just for performance, but also because we want to write a pretty printer for YSH (and maybe OSH). I think this style is simple and general, and I'd like to write an update on it:
These issues are a subset of the work above. Again, you can view the full changelog.
#1974 | command -v "$emptyvar" returns zero |
#1968 | "Float" in J8 should probably be "Decimal" |
#1943 | OpenBSD `ln` and `install` do not have `-v` flag |
#1937 | Bug: read -n strips leading and trailing whitespace |
#1924 | cd { pwd } should be an error - dir name required when block is passed |
#1906 | [[ foo =~ pat ]] parsing doesn't match bash and zsh |
#1902 | [BUG] Json read won't work with negative numbers |
#1900 | _build/oils.sh requires bash, but should only require /bin/sh (build/common.sh ) |
#1898 | Assoc error key should be strings error is confusing with `unset` |
#1895 | eggex 'a'{N *} crashes, needs a proper error |
#1884 | "${array[@]+foo}" should behave like bash (for Nix) |
#1864 | ysh exits after `ctx push (&a) { true }` |
#1862 | osh doesn't expand tilde in assignment |
#1850 | Parsing bug with comma after typed arg |
#1849 | Typed args and block arg can get confused |
#1841 | [YSH] setglobal d.key mutates local instead of global |
#1130 | Reorganize into new doc/ref scheme |
#1103 | echo and printf don't check write() failure |
#280 | Implement `ulimit` builtin |
To summarize:
ulimit
), based on your feedbackThis announcement was long, but it didn't cover all parts of the project! These threads have color on other things I've been working on:
But I really want to get back to YSH. In particular:
_status
with _error
(breaking change)args.ysh
, and Justin's work on testing.ysh
ENV
overhaul and the extern
builtin
Our "north star" is still a minimal YSH that's pretty stable. YSH has many features, but it's paradoxically small (metrics below).
Let me know what you think in the comments!
These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.21.0.
We'll track this new metric from now on:
I don't usually track this suite, but the case ;& ;;&
change is visible:
Big progress on OSH, e.g. for Nix compatibility:
Everything works in fast C++, even though we write typed Python:
(The negative delta is due to NUL
bytes and integer semantics.)
Good progress on YSH:
Likewise, everything still works in C++:
The parser is faster, probably due to the Token
representation:
Warning: this may regress in the next release. We're measuring both the parser/mutator and the GC, and using less memory by freeing objects has made things slower! We could also change the definition of this benchmark, or make a new one.
Big reduction in memory usage, due to the parser refactoring:
parse.configure-coreutils
1.87 M objects comprising 65.7 MB, max RSS 70.5 MBparse.configure-coreutils
1.66 M objects comprising 45.8 MB, max RSS 51.4 MBSlight increase in time taken for Fibonacci:
fib
takes 32.1 million irefs, mut+alloc+free+gcfib
takes 33.0 million irefs, mut+alloc+free+gcWe did better on our "problem" workload, measured on real hardware. As mentioned, we'll improve the way we measure performance.
configure
configure
configure
To summarize OSH running time vs. bash:
configure.cpython
configure.util-linux
(new workload)We really want to close this gap!
Oils is still a small program in terms of source code:
And generated C++:
And compiled binary size:
GC rooting still takes up a lot of code size. I also want "mycpp modules" to speed up the build.
I haven't been blogging as much, so I think Oils is now "underexplained"! I mentioned these shorter posts above:
Unix Shell Now Has JSON and Pretty Printing
YSH has 4 Kinds of String Literal, while Bash and JSON Have 8
${x %.2f}
to deprecate printf
shopt -s utf8_source
affects string literals tooImplications of the Exterior-First Philosophy
#oils-dev
The Lossless Syntax Tree After 7 years
How to Parse Shell Like a Programming Language - 2024 edition
If you like algebraic data types, you should like regular languages
If you got this far, check out yesterday's post! Comments about Scripting, CGI, and FastCGI