blog |

Fixes and Updates to Oil Benchmarks


A few weeks ago, I published Metrics for Oil 0.8.4. It establishes a rough performance baseline before enabling garbage collection.

This is a small update to that baseline, released with Oil 0.8.5. I noticed some problems with the benchmarks after partially integrating the garbage collector (which now works on small examples!)

This post doesn't invalidate anything I've said in the past. It just adds some detail!

Table of Contents
Fix: Don't use ASAN When Benchmarking
Fix: A Python Process Can't Measure A Shell Child Process
Why I Wrote A Timing Tool in C
Update: The Parser is Slower
What's Next? and Toil

Fix: Don't use ASAN When Benchmarking

The mycpp-examples benchmark shows how much we can speed up small pieces of code by translating them from staticall-typed Python to C++.

Silly bug: I was building the C++ code with ASAN! When ASAN is on, the compiler generates code that uses "shadow memory" to detect memory unsafety at runtime. This increases the size of every allocation, and makes code slower. Examples:

So the Python-to-C++ speedups look even more impressive now. (But remember that mycpp is not a general purpose tool.)

Fix: A Python Process Can't Measure A Shell Child Process

The compute benchmark measures the Oil interpreter vs. bash and Python on small code examples.

I noticed that bash and Python both used a minimum of 6 MB of virtual memory (Max RSS). However, this turned out to be a benchmark bug. We were using benchmarks/, a tool written in Python, to measure the memory of a bash process!

Specifically, we used and then resource.getrusage(). This doesn't work because Python first forks its larger address space, and then calls exec() to start bash.

That is, we measured the memory usage of Python, not bash. This person ran into the same issue:

Why I Wrote A Timing Tool in C

To fix this, I first changed to shell out to /usr/bin/time, a GNU utility written in C which has a small address space. Two problems:

What about bash's time keyword (which Oil implements)?

So I wrote my own benchmarks/time-helper.c. It's surprising to find these basic deficiencies in common tools! I guess I need to build something better into Oil, but that's more work on top of a big pile.

Update: The Parser is Slower

The parsing benchmarks compare $sh -n across different shells on 10 files:

The 0.8.5 release is the first one where oil-native is slower than bash!

I believe this is due to the partially-integrated garbage collector. Every C++ function now has a StackRoots invocation to register pointers.

This operation should be very cheap, but I would guess that it also inhibits some compiler optimizations. We're passing pointers to locals to be stored in a global (or thread local) data structure.

I mentioned this possibility in the Caveats to January's performance post:

I expect performance to go up and down in future releases, but in the long term it should be faster

I probably won't have time to optimize the mycpp translation of the parser for many months, but it should be possible with enough effort. Remember that Oil is "hilariously unoptimized". (As always, I can use help!)

What's Next?

After fixing these benchmarks, I had a nice experience with, the Sourcehut build service. I was driven there by the increasing flakiness of Travis CI.

I want to write a blog post about it, but I should really get back to work on the garbage collector. and Toil

Here's a brief outline instead:

(Zulip notes on

Let me know if you have questions!