Fixes and Updates to Oil Benchmarks

2020-11-23

A few weeks ago, I published Metrics for Oil 0.8.4. It establishes a rough performance baseline before enabling garbage collection.

This is a small update to that baseline, released with Oil 0.8.5. I noticed some problems with the benchmarks after partially integrating the garbage collector (which now works on small examples!)

This post doesn't invalidate anything I've said in the past. It just adds some detail!

Table of Contents

Fix: Don't use ASAN When Benchmarking

Fix: A Python Process Can't Measure A Shell Child Process

Why I Wrote A Timing Tool in C

Update: The Parser is Slower

What's Next?

builds.sr.ht and Toil

Fix: Don't use ASAN When Benchmarking

The mycpp-examples benchmark shows how much we can speed up small pieces of code by translating them from staticall-typed Python to C++.

Silly bug: I was building the C++ code with ASAN! When ASAN is on, the compiler generates code that uses "shadow memory" to detect memory unsafety at runtime. This increases the size of every allocation, and makes code slower. Examples:

classes went from taking 728 ms to 1.9 ms
length went from taking 734 ms to 167 ms
cartesian went from 1033 MB of heap usage to 611 MB
parse went from 972 MB of heap usage to 137 MB

So the Python-to-C++ speedups look even more impressive now. (But remember that mycpp is not a general purpose tool.)

Fix: A Python Process Can't Measure A Shell Child Process

The compute benchmark measures the Oil interpreter vs. bash and Python on small code examples.

I noticed that bash and Python both used a minimum of 6 MB of virtual memory (Max RSS). However, this turned out to be a benchmark bug. We were using benchmarks/time_.py, a tool written in Python, to measure the memory of a bash process!

Specifically, we used subprocess.call() and then resource.getrusage(). This doesn't work because Python first forks its larger address space, and then calls exec() to start bash.

That is, we measured the memory usage of Python, not bash. This person ran into the same issue:

Python getrusage with RUSAGE_CHILDREN behaves strangely?

Why I Wrote A Timing Tool in C

To fix this, I first changed time_.py to shell out to /usr/bin/time, a GNU utility written in C which has a small address space. Two problems:

It only has precision in ticks or hundredths of a second, which isn't good enough for our benchmarks. The ratios shifted significantly because of this inaccuracy.
It also tends to print error messages to its output, which is bad for automation. We want clean TSV output.

What about bash's time keyword (which Oil implements)?

It doesn't have a way to get the exit code or memory usage.
It's hard to make it append to a TSV file.

So I wrote my own benchmarks/time-helper.c. It's surprising to find these basic deficiencies in common tools! I guess I need to build something better into Oil, but that's more work on top of a big pile.

Update: The Parser is Slower

The parsing benchmarks compare $sh -n across different shells on 10 files:

The 0.8.5 release is the first one where oil-native is slower than bash!

oil-native: 222 lines/ms and 593 lines/ms
bash: 247 lines/ms and 645 lines/ms

I believe this is due to the partially-integrated garbage collector. Every C++ function now has a StackRoots invocation to register pointers.

This operation should be very cheap, but I would guess that it also inhibits some compiler optimizations. We're passing pointers to locals to be stored in a global (or thread local) data structure.

I mentioned this possibility in the Caveats to January's performance post:

I expect performance to go up and down in future releases, but in the long term it should be faster

I probably won't have time to optimize the mycpp translation of the parser for many months, but it should be possible with enough effort. Remember that Oil is "hilariously unoptimized". (As always, I can use help!)

What's Next?

After fixing these benchmarks, I had a nice experience with builds.sr.ht, the Sourcehut build service. I was driven there by the increasing flakiness of Travis CI.

I want to write a blog post about it, but I should really get back to work on the garbage collector.

builds.sr.ht and Toil

Here's a brief outline instead:

It took me a matter of minutes to get a test build running. The service is snappy and the docs are good.
I ported Oil's continuous build in a day or so. We have a complex build with many tasks because we use a lot of metaprogramming.
- I described the "Toil" continuous build back in March: Oil 0.8.pre3 - A Line Editor and a Continuous Build
This leads to an interesting milestone: services/toil is a shell script and web interface that runs on multiple CI services.
- A concrete benefit of this is that we could use sourcehut's FreeBSD support and Travis CI's OS X support in parallel.
I want to write a bit about the style of Toil. I would call it "distributed shell programming with concretions" (rather than abstractions).
- It uses TSV, JSON, and a wwz archive of logs. That is, we don't serialize and deserialize "objects". We just work with well-formed data. The shell can help with this.
- It spans Dreamhost, sourcehut, Travis CI, and my own development machine. (I'd also like to try it on Github Actions.) It's a "heterogeneous" distributed system. It uses ssh for auth.
- What do these platforms have in common? They are based on Linux, which can run a shell!

(Zulip notes on builds.sr.ht)

Let me know if you have questions!