Why Sponsor Oils? | blog | oilshell.org

Oils 0.17.0 - YSH Is Becoming Real

2023-08-02

This is the latest version of Oils, a Unix shell that's our upgrade path from bash to a better language and runtime:

Oils version 0.17.0 - Source tarballs and documentation.

We're moving toward the fast C++ implementation, so there are two tarballs:

If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.

Table of Contents
Release Highlights
Clarity in the Design
A Stable Core, with External Growth
OSH and YSH Have More Distinct Data Structures
Risks / Open Questions
Past Risks
Key Questions
Closed Issues
What's Next?
Blog Backlog
Appendix: Metrics for the 0.17.0 Release
Spec Tests
Benchmarks
Code Size

Quick reminder about naming:

Release Highlights

What's new? The previous release was Breaking Renames and YSH, and it prepared the codebase to implement YSH. So this release is a checkpoint along the way.

Clarity in the Design

What's happened lately?

June was supposed to be "the month of docs". I had planned to rewrite the help builtin and re-organize our documentation. We need a place to record all the changes we're making!

That didn't happen, but I did write five blog posts about the design of YSH. They clarified what exactly we should work on, out of the seven features in YSH:

(Stupid slogan I thought of for Oils: Imagine if bash, Python, and JSON kissed.)

A Stable Core, with External Growth

After writing those posts, and seeing Melvin's work on YSH, I realized we need to write more code in YSH, as opposed to typed Python. It's less verbose, and it's a good test of the language.

That is, I hope there will be a small part of YSH that's stable, and a larger part can grow for years. Examples:

This layering also applies to builtin proc as well as func: We should be able to write both a test framework describe and a flag parser argparse in YSH itself. You can see examples in Sketches of YSH Features.

OSH and YSH Have More Distinct Data Structures

Another point of clarity is the runtime relationship between OSH and YSH: their data types are now more distinct.

This is mostly so we can continue to increase bash compatibility ("conceding to reality"), without messing up the semantics of YSH.

(Aside: We've long understood the relationship at parse time: YSH is largely a mutually recursive expression sublanguage weaved into shell. Though Aidan found some good bugs here while implementing the YSH case statement, so it may change a bit.)


Here's some technical detail on the core data types. Building on Melvin's work, I statically-typed and translated the YSH expression evaluator for this release. It had previously relied on PyObject*, i.e. the "metacircular hack".

It now uses a central value_t type, expressed with algebraic data types in Zephyr ASDL. For example, this is what POSIX shell looks like:

value = Undef       # for ${x:-default} etc.
      | Str(str s)  # Everything is a string

This is what bash looks like:

        ...
      | Str(str s)
      | BashArray(List[str] strs)
        # quirk: a bash array is more like Dict[int, str] !
      | BashAssoc(Dict[str, str] d)
      ...

This is what YSH looks like:

        ...
      | Null  # e.g. for JSON
      | Int(int i)
      | Float(float f)
      | List(List[value] items)
      | Dict(Dict[str, value] d)
      ...
      # omitted: Eggex, Func, Proc, etc.

The main change is that I thought we would unify sequences and maps:

But again, I don't want future OSH quirks to affect YSH. Practically speaking, what this means is that OSH and YSH have mostly separate types and operations. You use YSH operations with YSH types:

$ var mylist = ['README', 'foo.py']  # value.List

$ echo @mylist  # YSH splice works
README foo.py

$ echo "${mylist[@]}"  # bash splice doesn't apply
  echo "${mylist[@]}"
        ^~
[ interactive ]:15: fatal: Invalid type value.List: ...
... Can't substitute into word

You also use OSH operations with OSH types. This shouldn't be a big deal because the most common Str type is shared and thus seamlessly interoperable.

$ declare -a array=(README foo.py)  # value.BashArray

$ echo "${array[@]}"  # bash splice works
README foo.py

$ var item = array[0]  # YSH array indexing doesn't apply
  var item = array[0]
  ^~~
[ interactive ]:21: fatal: Invalid type value.BashArray: ...
... subscript expected Str, List, or Dict

Also, features like param passing will "just work". You can copy from bash arrays to YSH lists, and vice versa.

The exact set of valid operations on each type can be tweaked based on usage, but we're no longer aiming to "complete the matrix". The interactions are more controlled.


Note that you can write these two styles of syntax in the same file. It's not recommended for new programs, but it may be useful when upgrading from OSH to YSH.

Risks / Open Questions

So we're deep in the middle of implementing YSH, and it's taking a nice shape. What are the remaining risks?

Past Risks

Let's look back 3 years to Technical Issues and Risks (2020). We're past the issues I enumerated:

So it's clear that our two NLnet grants (April 2022 and February 2023) have been critical. The project really needs concentrated attention. I welcome casual contribution, and I want to increase it, but we also need sustained contribution.

(As always, you're welcome to join https://oilshell.zulipchat.com/ and ask questions!)

So the main risks are that we won't have enough help, or that our funding runs out. There's a lifetime limit of 4 grants from NLnet, which definitely seems like enough to get the project off the ground, but we shouldn't take it for granted.


A related issue is that I've been "heads down" for a couple months, deep in the design of YSH. And I expect to be deep in documentation for the next month. But I also want to work on finding more people to work on the project.

I'm thinking of writing a blog post How are programming languages funded? I've noticed a common misconception that Python was Guido van Rossum's hobby project. This isn't true, since it's had a small amount of funding for most of its life, including from the US government early on.

So the "administrative" parts of a project definitely matter. A little funding goes a long way.

Key Questions

Another question that's on my mind:

Can YSH be a bounded design?

That is, can there be a stable core that supports "infinite" growth? This is essentially the idea behind the narrow waist blog posts.

It seems like it, but the only way to find out is to implement YSH. Luckily, this seems very feasible. I'm happy that the ysh/expr_eval.py file is only 1435 lines after static typing! That means we no longer depend on the Python interpreter, so its weight "doesn't count".

This brings to mind another risk:

Is the language too big?

Does it make sense to stuff together all this functionality from shell, Python, JSON, and TSV together? Is it too big to document?

To be honest, it certainly feels big, because it's a lot of work.

But the whole program is still small! I would say it's really small for the amount of work it does.

This is pretty surprising! We'll have a shell with much more power and functionality than bash, at less than half the weight. I'll publish updated line counts when the interpreter is fully translated to C++.

Programmers adopt platforms, not languages.

I want the project to be self-sustaining, and language projects rarely are. What we really care about is operating systems and platforms (Unix, the web, the cloud, etc.)

Shell is interesting because it's arguably the language that's closest to the Unix operating system.

I may write a separate post about this. I guess the bottom line is that we still need to do things with YSH. We are overflowing with ideas, but again short on people.

Again, feel free join us on Zulip. Most people find it "dense", but asking the right questions is a great way to spread knowledge. The codebase is taking its "final" shape as well, so it should be easier to change.

Closed Issues

The issues represent some of the work we did:

#1658 --gc-sections not supported by ld on macos
#1657 READLINE_DIR is not used in build/ninja-rules-cpp.sh
#1656 HOST_NAME_MAX doesn't appear to be defined in macos
#1643 case NEWLINE crashes because newline accepted as pattern
#1092 Crash in ${a[0]} array evaluation
#954 ${x:-default} when x is an integer fails with NotImplementedError
#840 Bug in integer / string conversion
#741 Fully nested data structures
#636 Oil expression evaluator shouldn't be "metacircular"

What's Next?

As mentioned, I want to overhaul the help builtin and documentation. It's great to have contributors working on YSH while that happens. A few years ago, progress on the code would grind to a halt whenever I wrote docs or a blog post!

And I need to do more on the "administrative" side of the project, which is easy to neglect. Please sponsor us if you appreciate this work. We use the money to onboard contributors before they're added to the grant:

Blog Backlog

Lower priority:

Appendix: Metrics for the 0.17.0 Release

These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.16.0 from June.

Spec Tests

Not much work on OSH:

But I did fix a slight regression in translation:

Lots of work on YSH:

Especially making it work in C++, mentioned above:

Benchmarks

Parsing speed remained the same, despite some changes for YSH:

Also no change:

Many fewer allocations on a real workload:

Which appeared as a big speedup on the ex.compute-fib benchmark:

Wall times:

Code Size

I need to update these metrics to include YSH as well as OSH:

We translated more of YSH, resulting in more C++ code in the tarball:

And more executable code: