blog | oilshell.org

Metrics for Oil 0.8.4

2020-11-07

I just released Oil 0.8.4, another huge one filled with OSH and Oil language changes! I kept track of the changes on Zulip, and you can view the changelog, but I'll summarize them in the next post.

On the other hand, this post is a short note to review release #metrics.

In particular, I want to keep track of the binary size and build speed issues mentioned in the June announcement of the 0.8.pre6 release.

I won't circulate this post widely — it's mainly for those close to the project. But it's important to keep track of our progress along multiple dimensions.

Table of Contents
Background
Oil is "Hilariously Unoptimized"
Correctness and Runtime Speed
Test Results
Benchmarks
Size and Build Speed
Lines of Source
Lines of Native Code (mostly generated)
Binary Size ("C++ Bloat")
Build Speed ("C++ Bloat")
Conclusion
Appendix: Blog Roadmap

Background

Previous posts with #metrics:

That brings to mind another motivation for this post: preparing for garbage collection. In September, I wrote a small copying collector with unit tests, and partially integrated it into oil-native. I even showed it to a few people, and wasn't embarrassed. But it's not fully running yet.

Garbage collection is the top priority for the coming months (in addition to translating shell I/O). As mentioned, it will necessarily slow the code down.

It's not clear how much, but this post marks a rough baseline for performance.

Oil is "Hilariously Unoptimized"

I should note that essentially all of my effort is spent implementing features, fixing bugs, and getting Oil to translate and compile as C++. I've spent essentially zero time on optimization since December 2019.

I say that because we are experiencing classic "C++ bloat" problems due to templates and exceptions. However, just like bash is no longer "too big and too slow", neither is C++! Modern software has made these old technologies look efficient by comparison.

For example, you'll see below that the Oil binary is about 20-30% bigger than bash right now, e.g. 1.3 MB vs. 1.0 MB. It will get bigger, but it won't reach 5, 10, or 15 MB like similar programs written in Go or Rust.

Correctness and Runtime Speed

These are the things we care most about, and they're looking good.

Test Results

Almost 300 new tests pass in oil-native:

I also reviewed these spec-cpp metrics in August, in A Plan for Oil 0.8 and 0.9. Again, the goal is for the 917 osh_eval.cc number to reach the 1672 osh (in Python) number.

OSH spec tests indicate many new features:

And so do Oil spec tests:

I described some of these in the previous post, and I'll talk more about them in the next post. Both the OSH and Oil languages are taking a nice shape!

Benchmarks

The parsing benchmarks are still noisy. I'm not sure if this change is significant, but oil-native is still faster than bash at parsing. I should probably switch to something more stable, like instruction counts.

The runtime benchmark measures the old Python build, not oil-native, so we don't care about this. What's critically important is to simply run configure scripts under oil-native!

This is the reason we're spending so much effort translating Oil to C++! I haven't made a big deal about it on the blog, but it's an obvious problem.

New Benchmarks

I wrote some synthetic benchmarks to test shell "computation":

And here are some rough measurements of mycpp's translation:

Summary: We get a huge speedup on most code, but there are still performance bugs where the translated code is slower than Python! At least one of these is a computational complexity bug.

Size and Build Speed

Again, we compare this release with June's 0.8.pre6 release.

Lines of Source

These are the lines we edit, not those generated. It's still pretty small!

Significant lines:

Physical lines:

Let's add in Oil language (also counted in src.txt):

Note that OSH and Oil share a lot of common libraries, which are counted under OSH.

Nevertheless, I'm surprised by the small increase, and that's a good thing! I think it's because most of the recent changes happened in the grammar, which is small.

I also included the new Tea language! Many thanks to Batuhan Taskaya for recent help on that. I hope to write more about it soon.

Lines of Native Code (mostly generated)

Almost all lines in the oil-native tarball are generated, and we continue to count them. I've also counted osh_eval.cc, the translation of the core interpreter, by itself.

This is expected progress, which reflects three things:

  1. More new code written in Python, e.g. for the Four Features that Justify a New Unix Shell.
  2. More existing code type checked and translated with mycpp.
  3. Changes to the translation process, like garbage collection. The amount of C++ code generated by the corresponding Python construct can change.

Binary Size ("C++ Bloat")

The binary is getting bigger along with the lines of translated code. Refer to the June announcement of 0.8.pre6 for the reasons behind this.

(And I still have to figure out why the size of osh_eval.opt.stripped differs so much between GCC and Clang. Guesses: templates, exception tables, or both.)

Build Speed ("C++ Bloat")

For osh_eval.opt.stripped:

This is bad! Compile time basically doubled.

I believe this is due to template bloat. We introduced gc_heap::Alloc<T>(...) instead of new T(...), which uses std::forward().

This shows up in the report from Bloaty:

7   gc_heap::Alloc<>()::__PRETTY_FUNCTION__  48821   107976
8                       _GLOBAL__sub_I_str0  61208    61252
9                   [section .debug_abbrev]      0    54882
10                       gc_heap::Alloc<>()  23494    41924

I plan to look into this further, but again, I think we'll have to live with it for awhile. I'm focused on making Oil usable and featureful.

(It's also interesting that Clang was faster on the old code, but is now slower. This pattern held up in 0.8.3 too, so it's not benchmark noise.)

Conclusion

Overall, the build speed is the thing I'm most annoyed by. I expect it to get worse once we fully integrate the garbage collector. For example, I need to generate field masks for every type in the program, and that involves some compile-time computation, e.g. with offsetof().

If you're experienced with these issues, I'd love some help! Let me know in the comments.

Again, I think you compare Oil to a Go or Rust binary, none of this is a big deal. But I want there to be "no reason to use bash rather than Oil", and these issues matter for embedded systems, which occasionally use bash.

But it's much more important to solidify the OSH language and the Oil language. The next post will talk about that work, which includes:

Appendix: Blog Roadmap