Oils 0.21.0 - Flags, Integers, Starship Bug, and Speed

2024-03-21 (Last updated 2024-05-31)

This is the latest version of Oils, a Unix shell. It's our upgrade path from bash to a better language and runtime:

Oils version 0.21.0 - Source tarballs and documentation.

If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.

Big news! As of this release, the C++ tarball passes all spec tests. It's 2x to 50x faster than the Python tarball.

I will have a lot to say about this, but the takeaway is that I now recommend using and testing the C++ tarball. See README-native.txt in oils-for-unix-0.21.0.tar.gz.

The release page also emphasizes that the project is called Oils or "Oils for Unix". It's no longer "Oil". I haven't done a good job of renaming everything (including this site), so I understand the confusion!

Intro

Shouldn't this announcement be all about how we finished translating Oil to C++? It's a big milestone!

I planned to write something like Oils 0.21.0 - Two Megabytes of Pure, Native Code.

But I'm not "declaring victory" yet. Despite being 2x to 50x faster, the C++ translation is 5-10% slower than bash and dash on a realistic configure workload.

I'll talk about the mystery at the end of this long post, which goes like this:

A new flag parser, built on a new ctx builtin.
- I claim parsing flags in YSH is nicer than in shell or in Python!
Integers
- YSH semantics are no longer tied to C++ or Python.
The "Starship Bug"
- Widely reported, and tricky to fix: Pipelines crashed after running a bash snippet to enable a fancy prompt.
- The bug relates to material that's missing from this blog, which would be tagged #shell-runtime. Coincidentally, Thorsten Ball blogged about a very similar bug, which will help me explain this.
Performance Mystery
- Why is OSH slower than bash running CPython's configure? After making all spec tests pass, I got a real "bug up my ass" about this. I spent a few days digging, and I have a pretty good answer.

Also:

Compatibility Fixes, for OSH
Breaking Changes, for YSH
New Docs
- Oils Packaging Guidelines (wiki)
- New Oils Error Catalog, with Hints - we print error codes you can look up
- Up to 4 Pretty Printers? - a design doc, aimed at #oil-dev.
- Published the mycpp/README, with lots of technical detail
Pleasant improvements under the hood

Contributions

Before diving into Flags, Integers, Starship Bug, and Speed, let's credit contributors.

Aidan Olsen
- Implemented the ctx builtin
- Plugged it into the flag parser, and added features
- Implemented Str => trim() methods, with Unicode support
- Improved an error message by tying it to the new error catalog
Lloyd Pique
- The trim() methods now accept an eggex and constant strings. I appreciate the thought that went into the API design!
- startsWith() and endsWith()
Adam Bannister
- Fixed bug where you could redeclare const x, as opposed to const x = null

Also:

notwa for reporting the semantic issue with / and % operators (fixed, discussed below)
Maciej Kruger and Samuel Hierholzer, for help with Nix
bar-g for great testing, which resulted in a few bug fixes (more upcoming)
Simon Michael for great testing and feedback
Angus Hollands, James Chen-Smith, and Jonathan Chan for also reporting the Starship bug

Breaking Changes

I try to call out breaking changes in each release. OSH is stable, but YSH is still under development.

(1) Removed read --line.

This is because the buffered version is incorrect, and the unbuffered version is slow. You can use shell's read -r line for now, which is unbuffered.

There's a demo linked here: #blog-ideas > Shell's Unbuffered I/O is Slow
A new variant of the for loop is the most likely fix: #help-wanted > for loop that buffers lines

Also see

(2) Rename toJ8() fromJ8() to toJson8() fromJson8()

This is more accurate because J8 Notation is a set of data languages, which includes JSON8.

(3) Tilde substitution is respected in combination with brace expansion, like ~{alice,bob}.

This is technically a breaking change, but it improves compatibility:

OSH behaves like the more common bash and zsh, not mksh.
It removes an entry from Known Differences Between OSH and Other Shells. We're again conceding to reality!

The "Closed Issues" section at the end of this post links to more fixes, as does the raw change log.

Now let's talk about Flags, Integers, Starship Bug, and Speed.

Flag Parsing API

You should be able to easily parse command line flags in a shell! POSIX shell has a weak getopts builtin, and Python has a nice but verbose argparse API.

Here's what Python looks like:

parser = argparse.ArgumentParser()

parser.add_argument('filename')           # positional argument
parser.add_argument('-c', '--count')      # option that takes a value
parser.add_argument('-v', '--verbose',
                    action='store_true')  # on/off flag

args = parser.parse_args()
print(args.filename, args.count, args.verbose)

We can emulate this "OOP builder" style in YSH, but I find it too clunky:

var parser = ArgumentParser()

call parser->addArgument('filename')  # call keyword for expressions
call parser->addArgument('-c', '--count')
call parser->addArgument('-v', '--verbose'
                         action='store_true')

var args = parser=>parse_args()
echo "$[args.filename] $[args.count] $[args.verbose]"

Here's our new API, which is still in progress:

parser (&spec) {                # describe the CLI syntax
  arg filename                  # positional argument
  flag -c --count (type=Int)    # flag (TODO: builtin type Int)
  flag -v --verbose
}

var args = parseArgs(spec, ARGV)
echo "$[args.filename] $[args.count] $[args.verbose]"

What do you think of this style? It's a bit like Hay Ain't YAML: were reusing the syntax of shell for declarative data. This means you can also mix code in:

parser (&spec) {
  flag --first
  if (debug) {
    flag --other
  }
}

Help: doc/ref/chap-builtin-cmd.html#ArgsParser

Built on top of the `ctx` builtin

What I really like about the flag parser:

It's implemented mostly in YSH, except for a low-level ctx builtin.
- That is, Aidan implemented ctx in ~100 lines of typed Python code, and then ~200 lines of stdlib/args.ysh on top.
It was implemented based on a #language-design discussion

Why is this good? First, there's a lot of variation in CLI syntax, so we want you to be able to write your own flag parsers in YSH. We provide something that should handle 90% of cases, but YSH is powerful enough to handle the remaining 10%.

Second, having other people involved makes YSH feels more real! I'm also glad that my Zulip ramblings were actually implementable, and there wasn't a hidden flaw.

Help: doc/ref/chap-builtin-cmd.html#ctx

Review: Three language features we needed

I've been saying that we're going for the "whole enchilada" with YSH — a language that's powerful enough to write libraries and internal DSLs.

So what did it take to implement declarative flag parsing in YSH?

value.Place - from Oils 0.19.0 - Dicts, Procs, Funcs, and Places
Procs with block args - mentioned in the same announcement
The ctx builtin
- It may need to be "unified" with Hay, so I'm not explaining it in depth right now. Feel free to join #language-design if you're interested.

Integers - Don't do whatever Python or C++ does

Division

I re-implemented the / and % operators so that they don't simply delegate to / and % in Python and C++. In particular, we don't use operands that are negative numbers, since Python and C++ differ!

Specifically:

(1) Integer division rounds toward zero (not negative infinity)

In OSH, $(( -5 / 2 )) is -2, similar to other shells, and C++
In YSH, -5 // 2 is also -2
- // is integer division like in Python, whereas -5 / 2 is the float 2.5

(2) The % operator disallows a negative divisor, because it's confusing, and arguably not useful.

ysh$ = -5 % 2
(Int)   -1

ysh$ = -5 % -2
  = -5 % -2
       ^
[ interactive ]:3: fatal: Divisor can't be negative

This means that YSH is different than Python, which chooses to round toward negative infinity:

We could provide a separate divrem() with Python-like semantics, but I don't think these operations are common in shell. I think divrem() is a better name than Python's divmod(), because:

Remainder is what's "left over" from division, and the two operations respect invariants.
Modular arithmetic can exist on its own, without division.

64-bit Integers

Like the / and % operators, the size of integers also depended on the host language prior to this release:

Oils in Python used Python's arbitrary precision integers
Oils in C++ used the int type, which is usually 32 bits, even on 64-bit machines

That was obviously bad, so we now use a mylib::BigInt type for shell operations, which is distinct from C int. It's currently 64 bits, even on 32-bit platforms, but I'd like to make it arbitrary precision.

Nevertheless, Oils already behaves better than other shells. I wrote a little demo here:

Commit bcd006
- #blog-ideas > What do interpreters do when you write down big integers?

In each language, we generate a program to print the literal 11112222333344445555666677778888999, which is larger than 64 bits.

Look at all this exciting behavior!

$ demo/survey-arith.sh bigint

python3
11112222333344445555666677778888999

node.js
1.1112222333344446e+34

Lua
1.1112222333344e+34

Perl
1.11122223333444e+34

awk
11112222333344446376450007752507392

dash
9223372036854775807

bash
8402588706881157415

mksh
-77842137

zsh
zsh:1: number truncated after 20 digits: 11112222333344445555666677778888999
-7334521740365106061

_bin/cxx-dbg/osh
  echo $(( 11112222333344445555666677778888999 ))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ -c flag ]:1: fatal: Invalid integer constant '11112222333344445555666677778888999'

Summary:

Python is the only language with big integers.
node.js, Lua, and Perl use floats.
awk is probably representing the integer as a machine float. But it prints the float as an integer, which loses precision.
All shells give nonsensical behavior.
- Moreover, they all disagree on the value! I thought they would just inherit C int, like we did, but it seems worse than that.
- zsh prints a warning, but still gives you nonsense.
Oils actually gives an error. This is arguably the best behavior, except for Python. We want big integers too.

Starship Prompt Bug - Pipelines crash after using it

This bug was reported by several people. It was most common after initializing the Starship prompt, but it also happened without it. The symptom was either these job control error messages:

$ echo foo | head
osh: parent failed to set process group for PID 42996 ...
osh: child failed to set process group for PID 42996 ...
foo

Or a crash like this:

$ echo hi|head
oils I/O error (main): No such process

If you don't care about a single hairy bug that's already fixed, you can skip the whole section below, and move on to the performance mystery.

But I'm writing it up because it touches on #shell-runtime topics that I haven't explained on this blog.

Related zed/zsh bug

It's hard to understand the Starship bug without that background, but luckily Thorsten Ball just explained a similar bug on his blog.

This other bug is: when the Zed editor shells out to zsh -i, why does Ctrl-C stop working in Zed? This is surprising because you wouldn't expect a child process to be able to mess up its parent.

Just like the Starship bug, the zed/zsh bug touches on POSIX job control, and the syscalls you need to implement it.

Thorsten dug deep to the root cause: a "missing" job control syscall after execve(). But the question is why shells behave like that.

I wrote three long comments explaining why. I won't repeat them here, but the outline is:

What is POSIX job control?
- The thing that lets you hit Ctrl-Z in Vim to "go back" to your shell.
- The thing that lets you cancel ls | cat | cat by hitting Ctrl-C once. Those 3 processes are killed, but not the others.
Why is it difficult to implement?
- Examples of similar issues in bash and the fish shell
What are these hidden, undocumented optimizations that shells do?
- How many processes does sh -c 'ls | wc -l' start?
How do the job control and optimizations interact to cause the zed/zsh bug?
Concept: the Unix process API isn't safe from adversaries.
- It relies on cooperation, and sometimes shells don't cooperate.

(If the lobste.rs comments aren't interesting, again feel free to skip to the next section on performance.)

Three Features Interact

The Starship bug is different, but it's another interaction between features in the #shell-runtime. It's actually an interaction between three features, one of them is being job control:

OSH pipelines behave like zsh pipelines: perhaps running the last process in the shell itself (shopt -s lastpipe)
- This is so echo hi | read myvar; echo $myvar works. It doesn't work in bash and other shells.
Job control
- A pipeline is a very special type of process group. We essentially need to call setpgid($first_pid_of_pipeline, $last_pid_of_pipeline), but only if we actually start a process for the last part! Not if it's the shell itself.
- Melvin and I learned last year that some pipelines in zsh can't be suspended, for this exact reason.
OSH implements the bash-like DEBUG trap that runs before commands, and Starship uses it.

The interaction was that user-provided code in the DEBUG trap could start and wait() for other processes in between the first and last part of the pipeline. This caused a stray getpgid() call to return ESRCH errno, causing the shell to "crash".

There were a few fixes:

Don't use getpgid() in an assertion, since we can easily store this variable in the Pipeline object.
Disable the DEBUG trap between parts of a pipeline.
Only print job control errors to the xtrace stream, since job control is inherently racy.

There are some things that are not entirely satisfying about this. But if you read the lobste.rs threads above, you'll see why there don't appear to be clean solutions. Shells, and even kernels, are a bit under-specified.

I would like to explore more exhaustive testing/reasoning strategies in the shell runtime, just like we have exhausting reasoning with Zephyr ASDL and re2c in the front end.

Related: The Shell Runtime as a State Machine

Nonetheless, it now works:

starship works

OK, so that was a big hairy bug that took several days to diagnose and fix. Now we have hairy performance problem that also took several days to diagnose! Writing a shell is hard, at least if you want it to be good.

Performance Mystery - Why can't we celebrate?

I was really happy after fixing the last C++ translation bug. What a long journey!

But I was dismayed that there was a lingering performance issue: Running Python's configure may take 11.3 or 11.4 seconds with dash or bash, but 12.9 seconds with OSH.

This is a 12-14% slowdown, which is actually worse than the 5-6% we see in the CI (for unknown reasons).

You could argue: who cares about 1.6 seconds? But there's a mystery here, and I want to get to the bottom of it.

By the way, I haven't explained all the C++ translation bugs yet. I'll summarize these threads when we "declare victory":

#oil-dev > Translation / spec-cpp Progress - e.g. C++ exceptions
#oil-dev > YSH spec-cpp progress - e.g. integer behavior described above

Using Every Performance Tool

I sometimes tell people that one reason I started this project was so I could use every code quality tool available, including performance tools.

It's honestly fascinating to learn what you need to speed "naive" Python code up by more than 50x, by translating it to predictable C++.

So this section is going to show the tools I used, since I think that will be interesting to readers. But I won't explain everything, to save space. Original thread:

#performance > Benchmarking Python's configure

First, I ran the perf sampling profiler, which is able to profile across processes:

#
# Overhead       Samples  Command     Shared Object            Symbol
# ........  ............  ..........  .......................

     2.44%          3241  ld          libbfd-2.40-system.so    [.] bfd_link_hash_traverse
     1.45%          1908  cc1         [kernel.kallsyms]        [k] 0xffffffffb34bf8f0
     1.42%          1871  ld          libbfd-2.40-system.so    [.] bfd_elf_link_add_symbols
     1.26%          1740  cc1         [kernel.kallsyms]        [k] 0xffffffffb347a135
     1.22%          1805  osh         [kernel.kallsyms]        [k] 0xffffffffb34bf8f0
     1.19%          1614  cc1         cc1                      [.] _cpp_lex_direct
     1.14%          1658  osh         [kernel.kallsyms]        [k] 0xffffffffb34dded0
     1.09%          1484  cc1         cc1                      [.] ht_lookup_with_hash
     1.08%          1415  ld          libc.so.6                [.] __memmove_evex_unaligned_erms
     1.00%          1341  cc1         cc1                      [.] ira_init_register_move_cost
     0.98%          1326  cc1         cc1                      [.] ira_init
     0.97%          1289  ld          libbfd-2.40-system.so    [.] bfd_hash_lookup
     0.96%          1385  osh         oils-for-unix            [.] MarkSweepHeap::MaybeMarkAndPush

(This is benchmarks/perf.sh profile-cpython-configure - source)

The way I read this is:

The linker ld and compiler cc1 processes are hot, which makes sense because this is a configure script. It probes for system features by "shelling out" to the compiler and linker.
The shell process osh does show up in the profile. Mostly in the kernel, but also in the garbage collector, e.g. MaybeMarkAndPush(). (Other shells don't have this!)

I zoomed into the OSH process:

    17.25%          1118  osh      [.] MarkSweepHeap::MaybeMarkAndPush
    11.48%           737  osh      [.] MarkSweepHeap::TraceChildren
     8.22%           543  osh      [.] state::Mem::GetExported
     7.08%           511  osh      [.] MarkSweepHeap::Allocate
     5.75%           355  osh      [.] MarkSweepHeap::Sweep
     1.93%           132  osh      [.] posix::execve

We've seen this problem a lot: allocation and garbage collection are expensive!

It's not the bottleneck in this problem, since osh itself is taking so little time. That is, 40% of 5% is only 2%. Less than 2% of the configure time is spent in Oils memory management.

Don't ask me why, but I then got side-tracked fixing allocation issues, despite what's clear in the profile.

To profile allocations, I've found callgrind very useful. Rather than sampling what the CPU is doing like perf, it traces every function call. Then you can load into the kcachegrind GUI and click around:

callgrind one-byte string allocs

(This is benchmarks/autoconf.sh measure-callgrind - source)

#performance > callgrind screenshots

We also use other tools fore precise tracing: uftrace for measurement, and the R language for analysis. It's easier to put in the CI than the GUI, and the reports are custom:

                      task      ex.compute-fib  ex.bashcomp-excerpt         parse.abuild
              total_bytes_           2,206,788            4,025,766            4,173,385
               num_allocs_              70,159              117,374              102,837
      sum_typed_strs_slabs              70,159              117,649              102,837
         num_reserve_calls              32,497               64,900               61,229

       percent_list_allocs               33.1%                21.0%                29.4%
       percent_slab_allocs               23.7%                24.0%                22.6%
     percent_string_allocs                6.3%                24.8%                 8.4%
percent_other_typed_allocs               36.9%                30.4%                39.6%

(This is benchmarks/uftrace.sh)

But clicking around in the GUI is nice when you're in the middle of debugging something unknown. (But I wish I could use one tool, instead of two.)

I found 3 hotspots with more than 400K allocations of one-byte strings, out of 2.3M total. I used a mycpp trick to turn them into integers, eliminating all those GC objects. (We're approaching the point where we are not "hilariously unoptimized"!)

This was a nice win on some workloads, but it was not the bottleneck in the configure workload!

If it's not garbage collection, why are we slow? I had a hunch to add a call to the times builtin at the end of configure script.

==> _tmp/elapsed/bash.times.txt <==
0m0.213s 0m0.477s
0m8.233s 0m2.931s

==> _tmp/elapsed/dash.times.txt <==
0m0.217s 0m0.463s
0m8.281s 0m2.922s

==> _tmp/elapsed/osh.times.txt <==
0m0.360s 0m0.720s
0m8.790s 0m2.960s

The times builtin has a terrible format, which goes like this:

[shell process - user space time]   [shell process - system/kernel time]
[child processes - user space time] [child processes - system/kernel time]

(Contribution idea: add times --tsv8 to Oils)

So the configure workload only spends 360 ms in OSH user space, which is everything our interpreter does, including garbage collection.

But we spend 8.79 seconds child processes, versus ~8.2 for bash and dash. This 500-600 ms is closer to the 1.6 second difference in elapsed time.

Why would child processes take longer under OSH? I had some random theories, like dynamic against linking against libcstd++, which shells written in C don't do.

Here Docs

But the problem clarified even more by our trusty tool strace. I wrote benchmarks/autoconf.sh measure-syscalls, which runs strace -ff, spewing thousands of files:

Total processes:

~/git/oilshell/oil/_tmp/strace$ ls bash.txt* | wc -l
6022
~/git/oilshell/oil/_tmp/strace$ ls dash.txt* | wc -l
6063
~/git/oilshell/oil/_tmp/strace$ ls osh.txt* | wc -l
6336

OSH is starting >300 more processes? Huh?

I actually optimized that, as explained in the lobste.rs thread above. In our test/syscall.sh harness, OSH starts fewer processes than bash, dash, zsh, and every other shell except for yash (from Japan).

I further narrowed it down to processes that don't call execve():

~/git/oilshell/oil/_tmp/strace/all$ ls bash*|wc -l
1723
~/git/oilshell/oil/_tmp/strace/all$ ls dash*|wc -l
1760
~/git/oilshell/oil/_tmp/strace/all$ ls osh*|wc -l
2039

So those 300 processes are not starting executables like cc1 or ld? What's going on?

#performance > strace of bash, dash, osh

Well, I know exactly when we fork processes, and HereDocWriter is one of them! I wrote about how shells implement here docs many years ago, when we were younger, the air was cleaner, and the sun brighter:

Shells Use Temp Files to Implement Here Documents (2016)

[bash, zsh, and mksh use temp files], which surprised me.

dash does something more expected and elegant, which is to start cat with one end of a pipe() as stdin, rather than a temp file. Strings longer than PIPE_SIZE will cause write() to block, but I think that just requires a little extra care in the implementation.

So OSH follows dash, forking a process for each here doc, rather than writing it to a temp file and reading it back in. The HereDocWriter and the main shell communicate through a pipe().

And the configure script has a bunch of here docs (C code snippets), so I think this must be the root cause of the slowdown. I had thought that temp files would be slower, but now I can see that there's probably forking and context switch overhead.

Why isn't dash slower too? Not sure, but it could relate to the PIPE_SIZE optimization.

Here's a little intuition. Forking zero processes:

$ time for i in {1..1000}; do true; done

real    0m0.003s
user    0m0.003s
sys     0m0.000s

Forking a thousand processes, without execve(). The ( true ) line is a subshell:

$ time for i in {1..1000}; do ( true ); done

real    0m0.702s
user    0m0.465s
sys     0m0.270s

So each fork() takes ~0.7 milliseconds in bash.

Summary and Roadmap

I want to fix this bug soon, and evaluate performance again. But the bottom line is that I'm fairly sure that we'll meet the performance of bash and dash, with a codebase in typed Python. That's pretty cool, because the codebase is small.

And we may even exceed their speed. A promising sign is that OSH makes fewer syscalls than bash or dash on the configure workload, not just the synthetic workloads. This is despite the here doc problem.

But Oils in C++ is already much faster than Oils in Python, so I recommend using it as of this release. For example, Oils in Python takes 27.7 seconds running CPython configure, and we got all the way down to 12.9 by translating the interpreter to C++!

You can see this from the OSH Runtime benchmarks, which I review in the appendix, as always.

Near- and medium-term work:

More Work Under the Hood

We're almost done with this announcment! Let me mention these things, which will come up in the future:

I published the mycpp/README, with lots of technical detail
- A "shimming" technique grew organically, and lets us write Python and C++ sort of as one language
Tested and fixed the "lossless invariant", in test/lossless.sh
- Here docs and multi-line strings have special considerations
- This will be useful for writing ysh --tool fmt, a pretty printer
Improved the error test suites

These files used to be crappy shell scripts, and now they're pretty decent. They find bugs, and they'll help improve our error catalog.

$ wc -l test/*-errors.sh
  868 test/parse-errors.sh
 1131 test/runtime-errors.sh
 1406 test/ysh-parse-errors.sh
  899 test/ysh-runtime-errors.sh
 4304 total

Their output is published on the release quality page.

Closed Issues

A subset of the work we did:

#1854	Brace expansion combined with tilde expansion should behave like bash and zsh, not mksh
#1846	`try eval` is inconsistent
#1826	ysh-0.20.0$ echo b'byte \xff' #=> Traceback
#1825	single backslash \ at end of line could be disallowed within $'...' (and maybe others)
#1822	Simple pipe crashes after initialising starship
#1785	try { } should be the only form for simplicity
#1784	'const c' without null on RHS can be re-declared
#1762	Disallow proc / func keywords in OSH to avoid misleading syntax errors
#1342	/ and % operators behave like Python and not bash

Conclusion

This post had something for everybody!

Language and API design - Flag parsing
Language semantics - Integers
Two hairy bugs, which give a peek into the #shell-runtime
Demos of performance tools
- the perf sampling profiler
- callgrind and the kcachegrind GUI
- versus uftrace and R
- the times builtin
- strace to trace system calls
- Are there other tools I could have used?

What's Next?

Our "north star" is still A Minimal YSH That's Pretty Stable (issue 1834)
- Revisit parts of proc, thanks to testing by Aidan and bar-g
- Another pass over Hay, perhaps unifying it with the ctx builtin
- A test framework, which I realized goes with something like xargs
Improve the tools around the Oils Reference, and fill it in
Administrative stuff
- Really rename everything, e.g. the Github repository to oils-for-unix / oils
- We need a logo. The C++ translation milestone is a great "excuse" to finally do it.
Blog: Lessons Learned Translating Oils to C++
- Once our performance matches bash, this long journey deserves a retrospective.

Done! Let me know what you think in the comments!

Appendix: Related Articles

These great conversations inspired the pretty printing design doc:

I also think we need to explain UTF-8, since comments over many months show great confusion:

Appendix: Metrics for the 0.21.0 Release

Let's compare this release with the previous one, version 0.20.0.

Spec Tests

A little progress on OSH:

OSH spec tests for 0.20.0: 2150 tests, 1903 passing, 103 failing
OSH spec tests for 0.21.0: 2155 tests, 1909 passing, 103 failing

We finished the C++ translation, and blew past the mark! :-)

C++ spec tests for 0.20.0 - 1902 of 1906 passing - delta 4
C++ spec tests for 0.21.0 - 1913 of 1912 passing - delta negative 1

The difference is that the C++ version correctly allows NUL bytes in strings, while the Python version inherits its behavior from CPython.

New YSH features are reflected here:

YSH spec tests for 0.20.0: 735 tests, 698 passing, 37 failing
YSH spec tests for 0.21.0: 771 tests, 728 passing, 43 failing

Again, we blew past the mark with the C++ translation:

YSH C++ spec tests for 0.20.0: 687 of 698 passing, delta 11
YSH C++ spec tests for 0.21.0: 729 of 728 passing, delta negative 1

The difference is again inheriting integer behavior from CPython. We may smooth over this difference at some point.

Benchmarks

I did some frontend refactoring, which may have caused a slight slowdown:

Parser Performance for 0.20.0: 15.5 thousand irefs per line
Parser Performance for 0.21.0: 15.8 thousand irefs per line

However, I already optimized the parser some more after this release.

benchmarks/gc for 0.20.0: parse.configure-coreutils 1.83 M objects comprising 64.8 MB, max RSS 69.5 MB
benchmarks/gc for 0.21.0: parse.configure-coreutils 1.87 M objects comprising 65.7 MB, max RSS 70.5 MB (small correction on 2024-05)

The Fibonacci benchmark is stable:

benchmarks/gc-cachegrind for 0.20.0 - fib takes 33.5 million irefs, mut+alloc+free+gc
benchmarks/gc-cachegrind for 0.21.0 - fib takes 32.1 million irefs, mut+alloc+free+gc

After resolving the big mystery above, we are going to close this gap:

Runtime Performance for 0.20.0: 13.3 and 16.7 seconds running CPython's configure
Runtime Performance for 0.21.0: 12.9 and 16.8 seconds running CPython's configure
- bash: 11.2 and 15.1 seconds running CPython's configure

Code Size

Oils is still a small program in terms of source code:

cloc for 0.20.0: 21,302 significant lines in OSH, 4,674 in YSH, 5,596 lines of hand-written C++
cloc for 0.21.0: 21,598 significant lines in OSH, 4,838 in YSH, 5,757 lines of hand-written C++

And generated C++:

oils-cpp for 0.20.0 - 112,246 physical lines
oils-cpp for 0.21.0 - 115,750 physical lines

And compiled binary size:

ovm-build for 0.20.0: 2.00 MB of native code (under GCC, on Debian 12)
ovm-build for 0.21.0: 2.08 MB of native code (under GCC, on Debian 12)