Why Sponsor Oils? | blog | oilshell.org

Oils 0.19.0 - Dicts, Procs, Funcs, and Places

2024-01-16

This is the latest version of Oils, a Unix shell. It's our upgrade path from bash to a better language and runtime:

Oils version 0.19.0 - Source tarballs and documentation.

We're moving toward the fast C++ implementation, so there are two tarballs:

If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.

Table of Contents
Intro
Contributions
Ideas for Contributions and Feedback
OSH
YSH
Procs and Funcs
Background
&myvar is a value.Place
Rich proc call sites
Fat arrow =>
Future Work
New Prompt API - func and value.IO
More YSH Improvements and Breakages
Builtins
Initializing and Setting Variables
Expressions
Misc Fixes
Docs - New and Updated
Designs That Took More Than One Try
Performance / C++ / Under the Hood
Summary
Appendix: Closed Issues
Appendix: Metrics for the 0.19.0 Release
Spec Tests
Benchmarks
Code Size

Intro

This announcement should have happened weeks ago!

Version 0.19.0 was released on November 30th. And it contains almost 3 months of work — everything since version 0.18.0 in September.

What's happened lately? These blog posts may answer some of your questions:

In short, we've been deep in the nuts and bolts of YSH. It's been been enhanced in fundamental ways, and this includes breaking changes, based on experience with the language.

This announcement is long, with many details and code samples. I hope it will make YSH less mysterious!


We also got a third grant from NLnet, and are looking for contributors. If the details in this post interest you, then you might be a good person to work on Oils.

Contributions

These contributions might give you a feel for the work we're doing, and where you can jump in. The codebase is more stable — taking its "final" form — though I still want to make the dev setup more portable.

Aidan Olsen:

Melvin Walls:

Ellen Potter:

The steady trickle of feedback continues to be useful:

More acknowledgments:


You can also view the full changelog for Oils 0.19.0.

Ideas for Contributions and Feedback

Last year, a few people wanted to help implement the "standard library" for YSH.

Unfortunately, the code wasn't yet ready for that! We had to get rid of the "metacircular hack", and figure out a good style to implement builtin functions.

Now that this is done, please take a look at this list of Python-like Str, Int, Float, List, Dict methods, as well as free functions:

We need help with the ones with the red X! As usual, the first step is to write spec tests.

Tests alone are a big contribution, because they force design decisions. Usually we look at what Python and JavaScript do, e.g. with [].index() and Array.indexOf().


We can also use feedback on all the changes below. In particular, the thin arrow -> vs. fat arrow => distinction is pretty unfamiliar, but I think it's justified. At least one person on Zulip likes it, but we can use more feedback.

OSH

We're mostly working on YSH, but OSH still gets attention. Repeating some of the above, we implemented these bash features:

Please test OSH on your shell scripts, and let us know what's missing or broken.

YSH

Now let's discuss the core of this release: big and breaking changes to YSH. If you want to refresh your memory about the language, these docs may help:

I've updated them for this release.

Procs and Funcs

The biggest feature is an overhaul of procs and funcs. We have a new doc, mentioned in the Winter Status Update:

Guide to Procs and Funcs

There's a big table of comparisons:

And there's some practical advice: start with neither procs nor funcs. Then refactor to procs. Add funcs later if you need them.

Background

Why the big update to procs and funcs? Here's some background.

Until this year, YSH was called Oil, and it had a weak form of proc. The idea was to make a modest language that fixes the "warts" in shell. But


In the summer and fall, Aidan and Melvin implemented func, and tested it by writing new functions in the standard library.

With this release, procs and funcs have become more powerful, and more consistent with each other, along all these dimensions:

  1. Evaluation of actual args at the call site
  2. Evaluation of default args at the definition
  3. Binding args to params - for builtin procs and funcs
  4. Binding args to params - for user-defined procs and funcs
  5. Up to 4 kinds of args and params
    1. Words that evaluate to strings
    2. Positional-typed
    3. Named-typed
    4. A value.Command block

So the language is now very rich! Procs and funcs match our GC data structures and data languages.

The design is largely motivated by the 16 use cases in Sketches of YSH Features (from June).

&myvar is a value.Place

A nice result of procs having typed params is that I got rid of 2 ugly special-case features.

Shell scripts can use dynamic scope to "return" values by mutating the locals of their caller. Bash goes further with declare -n "nameref" variables. The more minimal "Oil" tried to clean this up with:

These are now gone in favor of value.Place, which is just another typed value. To create one, use an expression like &myline:

var myline           # optional declaration
my-read (&myline)    # call proc, passing it a Place
echo result=$myline  # => result=foo

The &myline should look familiar to C programmers, and possibly Rust programmers. To set a place, you use the setValue() method on the place:

proc my-read (; out_place) {
  call out_place->setValue('foo')
}

There could be a keyword like setplace, but I decided to keep the language simple for now.

You'll see more of value.Place in the section on read and json read. A motivating feature was to allow YSH users to write something like Bourne shell's read myvar.


In summary, value.Place generalizes these shell mechanisms:

  1. Builtins like read and mapfile, which set "magic" variables.
  2. Dynamic scope
  3. declare -n aka nameref variables.

Rich proc call sites

The doc on procs and funcs shows that "simple commands" are now very rich. All of these are YSH commands:

cd /tmp

cd /tmp {
  echo $PWD
}

cd /tmp (myblock)

other-command ([42, 43], named=true)

other-command ([42, 43], named=false]) {
  echo 'block arg'
}

This section describes related changes.

Breaking: _ is now call

YSH has both command and expressions, and _ was the expression evaluation "command":

var mylist = []
_ mylist->append('foo')  # method call, which is an expression

my-command append        # compare: shell-like command

I've changed it to a keyword call, which I think is more readable:

call mylist->append('foo')

(A discarded alternative was two colons, like :: mylist->append('foo') )

Procs have Lazy Arg Lists

We now have square brackets (shopt --set parse_bracket) to pass unevaluated expressions to procs:

ls8 /tmp | where [size > 10]  # if 'where' were a proc

The above is equivalent to passing a value.Expr quotation:

var cond = ^[size > 10]
ls8 /tmp | where (cond)  # one typed arg

This builds on top of Aidan's work implementing value.Expr, mentioned above:

var size = 42
var cond = ^[size > 10]
var result = evalExpr(cond)  # => true

Lazy arg lists aren't used much now, but I expect them to be common. In addition to filters on streams, they should allow assert [42 === x] to provide good error messages.

This subtle parsing took a couple tries, but I'm happy with the result!

Unified block arg parsing

YSH commands that take a block literal can also take a value.Command object. These are now two syntaxes for the same thing:

cd /tmp {
  echo hi
}

var b = ^(echo hi)
cd /tmp (b)

So we have:

  1. value.Command quotations ^(echo hi) - looks like shell's $(echo hi)
  2. value.Expr quotations ^[size > 10] - looks like YSH $[size > 10]

The ^ forms won't be common in real YSH code, but they're useful for testing and metaprogramming. Usually, you'll pass literal expressions and blocks.

Fat arrow =>

Pure vs. Mutating Methods

In the summer, we settled on the thin arrow -> for method calls:

var last = mylist->pop()  # use the return value
call mylist->pop()        # throw away the return value

We now also accept => for methods, and I want to use it to distinguish pure methods that "transform" and methods that mutate.

This gotcha has always bugged me in Python:

mylist.sort()          # sort in place
mystr.strip()          # BAD: it throws away the result!
                       # Strings are immutable.

mystr = mystr.strip()  # probably what you meant

In other words, the same syntax is used for wildly different semantics. When I explain it to new programmers, I cringe a bit.

So I propose that in YSH, we have:

call mylist->sort()           # sort in place
var mystr = mystr => strip()  # transform

Right now -> and => are interchangeable, but I think we should enforce the distinction (and Samuel agreed). Feedback is welcome.

Free Function Chaining

Another thing that fell out pretty easily is using => to chain free functions.

Here's an excerpt from the commit that implemented this:

The expression obj => f attempts to create a value.BoundFunc, which you then call with obj => f().

So this behavior makes free functions chain like methods. An example from spec/ysh-methods.test.sh shows the benefit. If dictfunc() returns a dict with keys K1 and K2, then you could have written this code:

$ echo $[list(dictfunc()) => join('/') => upper()]
K1/K2

The new way is nicer and more consistent:

$ echo $[dictfunc() => list() => join('/') => upper()]
K1/K2

Because => can be used for both methods and free functions, it's like "uniform function call" syntax, which I've wanted for many years.

Return Type Annotation

We also parse => in function return types, but these values aren't used yet:

func f(x Int) => List[Int] {
  return ([x, x + 1])  # parens required around expressions
}

Future Work

(1) We should probably enforce that funcs are really pure

(2) Clean up implementation of "closed" vs "open" procs

proc p () {  # closed, no params to bind
  echo
}
proc p {  # open, args are automatically bound
  echo
}

The difference can now be expressed with a rest param ...ARGV.

(3) Unify the runtime representation of value.LiteralBlock and value.Command.

(4) ARGV should be a regular variable, rather than using shell's separate "$@" stack.

New Prompt API - func and value.IO

The interactive shell and the YSH language are converging!

Now that we have functions, we can express a nicer prompt API than bash's $PS1, which has very "exciting" quoting rules:

$ PS1='\w\$ '          # custom PS1 language
$ PS1='$(echo \w)\$ '  # same thing, note single quotes
                       # and delayed $() evaluation

In contrast, YSH now uses a func that takes a value.IO instance. You can build up a plain old string, using methods like io->promptVal():

func renderPrompt(io) {
  var parts = []
  call parts->append(io->promptval('w'))  # pass 'w' for \w
  call parts->append(io->promptval('$'))  # pass '$' for \$
  call parts->append(' ')
  return (join(parts))
}

This is "normal code", and it should be better for complex prompts. But YSH still respects $PS1, so you can copy and paste from existing sources, or use that style if you prefer.

Help Topics:

More YSH Improvements and Breakages

Builtins

Several YSH builtins have been changed to use the new style of typed args to procs. These are all breaking changes.

read takes value.Place, with default var _reply

The read builtin has been simplified by optionally accepting a value.Place. There are now 2 ways to invoke it:

echo hi | read --line       # fill in _reply by default
echo reply=$reply           # => reply=hi

echo hi | read --line (&x)  # fill in this Place, var x
echo x=$x                   # => x=hi

Likewise with the --all flag, which reads all of stdin:

echo hi | read --all
echo hi | read --all (&x)

(The --long-flag style lets you know that you're using YSH features.)

json read is consistent with read

The json builtin now follows the same convention:

echo {} | json read         # fill in _reply
echo {} | json read (&x)    # fill in this Place, var x

append builtin

The append builtin no longer takes an arg like :mylist. Instead, it simply takes a typed arg:

append README.md *.py (mylist)   # append strings to mylist

This is equivalent to calling methods on the value.List:

call mylist->append('README.md')
call mylist->append(glob('*.py'))

# Make it a nested list -- not possible with the command-style
call mylist->append(['typed', 'arg', 42])

error builtin

The syntax has been tweaked to reflected the new separation between word args and typed args. Old style:

error ("Couldn't find $filename", status=99)

The new style has a word arg, and an optional named arg:

error "Couldn't find $filename"
error "Couldn't find $filename" (status=99)

Method Name Changes

We're still tweaking the API names for consistency. There's a new YSH Style Guide as well.

I think this set of APIs:

trim()
trimLeft()       trimRight()
trimPrefix()     trimSuffix()

could be nicer than Python's:

strip()
lstrip()         rstrip()
removeprefix()   removesuffix()

Initializing and Setting Variables

var destructuring

You can now initialize multiple variables at once:

var flag, i = parseArgs(spec, ARGV)

I had disabled that feature because I thought this would be confusing by differing from JavaScript:

var x, y = 1, 2    # YSH
var x = 1, y = 2;  # JavaScript

But I think we can simply avoid that usage, writing this instead:

var x = 1
var y = 2

Implicit null initialization

Sometimes you want to initialize a variable after declaring it with var. Rather than

var x = null
echo hi | read --line (&x)

You can now leave off the right-hand side:

var x  # implicit null
echo hi | read --line (&x)

const must be at the top level

The YSH const keyword inherited its behavior from POSIX shell's readonly. This is a dynamic check, which works poorly in loops:

$ for x in 1 2; do readonly y=x; done
-bash: y: readonly variable

I decided that dynamic const is "weak sauce", and if anything, we should have a static const.


For now, we're de-emphasizing const, so it's illegal inside proc and func. You can only use var.

const can still be at the top level, since the dynamic check is still useful there: it can prevent source from clobbering variables. (We'll probably introduce namespaces / modules in the future, so that source doesn't have this pitfall.)

Thanks to Aidan for feedback on this.

The rest of augmented assignment

Previously we only had:

setvar x += 3

Now we have all of:

setvar x /= 2
setvar a[i] *= 3
setvar d.key -= 4

The augmented assignment operators are listed in the YSH Table of Contents under Assign Ops. (And now I notice a broken link to fix.)

Optional colon for type annotations

This is now valid syntax:

var x: Int = f()  # colon looks better

But again we don't do anything with the Int annotation. We may omit the colon in signatures, because they conflict with Julia-like semi-colons:

proc p (word; x Int, y Int; z Int) {  # no colons, a bit like Go
  echo hi
}

Compared with having both:

proc p (word; x: Int, y: Int; z: Int) {  # : and ; noisy?
  echo hi
}

Expressions

Eggex capture syntax is more explicit

This change came from using Egg expressions myself. It adds Python-like keywords, which I think makes capturing more readable.

Old syntax:

var pat = / <d+> /                   # positional capture

var pat = / <d+ : month> /           # named capture

New Syntax:

var pat = / <capture d+> /           # positional capture

var pat = / <capture d+ as month> /  # named capture

I also reserved syntax for type conversion functions, which are fully implemented in version 0.20.0 (the next release):

var pat = / <capture d+ as month: Int> /

This makes Eggex a bit like C's scanf()!

Range syntax is now 0 .. n, not 0:n

Originally I thought slices a[0:n] were like ranges 0:n, but they're different.

Floats can't end with .

42. is no longer a valid float; an explicit 42.0 is required. This prevents ambiguity with ranges like 1..5.

Misc Fixes

Docs - New and Updated

I pointed out several docs in Oils Winter Status Update > Please Review These Docs.

Designs That Took More Than One Try

While writing these notes, I noticed that we need iteration to get some features right.

This is a major reason YSH hasn't been fully documented: we need to try it first!

Here's a little retrospective:

What other design issues are there?

Related Zulip threads:

Performance / C++ / Under the Hood

Here are some details on the contributions in the first section.

Real Hash Table

As mentioned, Melvin implemented a real hash table, inspired by CPython's "Hettinger dict". Compared with the earlier Python dict, it's more compact in memory and preserves insertion order.

A primary motivation for YSH was to be able to round-trip JSON messages without shuffling the keys:

{"z": 99, "y": 42, "x": [3, 2, 1]}

Some references we used:

As a result of these optimizations, we're now beating bash on a couple cases of benchmarks/compute! I think this is pretty impressive, because our source language is typed and garbage-collected Python, while bash is written in C.

So I have more confidence we can be as fast as bash. It's not clear how much effort it will take, but it should be fun nonetheless :-)

GC Rooting details

More Progress

Code Cleanup - Removed Tea Experiment

A few years ago, I mentioned a "Tea" experiment for bootstrapping. I implemented a parser for Tea, reusing some of the "Oil" parser.

But this made the code more complex, and the parser now seems like the wrong place to start.

So I've deleted it, and started a #yaks experiment in a separate repo. Yaks is more about reusing the mycpp runtime in a "bottom-up" fashion, with an IR, rather than starting from a parser.

In any case, we no longer have this distraction in the code.

Summary

This was a huge release, with changes from September, October, and November!

I showed many code samples, and tried to justify each change. YSH is rapidly improving, but it's not done yet.

What's next? Oils 0.20.0 is well underway, with

Let me know what you think in the comments!

Appendix: Closed Issues

#1759 Str* raw_input(Str*): Assertion `0' failed
#1758 Implement command -V (POSIX compatibility)
#1732 Crash When Comparing Functions (and Other Values)
#1731 Oils 0.18.0 tarball gives errors when extracting with bsdtar
#1727 Error building 0.18.0 on MacOS: std::fmod not found
#1702 [breaking] Change _ prefix to 'call' keyword
#1289 append builtin can take typed args
#1112 Design for Python-like functions in Oil
#1024 Implement binding of typed params to procs
#957 Implement setvar x -= 1
#770 Support read -N, etc.
#498 Provide a prompt hook in bin/ysh
#259 type builtin doesn't handle -p/P/a

Appendix: Metrics for the 0.19.0 Release

These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.18.0.

Spec Tests

OSH passes more tests due to the features mentioned above.

It also fails more tests, because at least one of them is unimplemented. But remember that adding failing spec tests are half the battle!

You can write Python, and everything "just works" in C++:

New YSH behavior is reflected in the spec tests:

Some of the new behavior doesn't work in C++, largely due to JSON. This has already been fixed in Oils 0.20.0!

Benchmarks

The parser is more efficient, I think due to the growth policy:

Small reduction in memory usage:

Huge speedup on Fibonacci due to Melvin's work on Dict<K, V> and GC rooting:

I/O bound workloads remain the same speed. But we still have to figure out the delta with bash here:

Code Size

I improved the accounting of lines between OSH and YSH, which means that OSH went down in size:

There's a bit more code in the oils-for-unix C++ tarball, much of which is generated:

The compiled binary got much bigger due to inlining GC rooting. This is the tradeoff for the speed increases above:

As mentioned, I have an idea for a "hybrid rooting scheme" to make the code both smaller and faster.