Metrics for Oil 0.8.4

2020-11-07

I just released Oil 0.8.4, another huge one filled with OSH and Oil language changes! I kept track of the changes on Zulip, and you can view the changelog, but I'll summarize them in the next post.

On the other hand, this post is a short note to review release #metrics.

In particular, I want to keep track of the binary size and build speed issues mentioned in the June announcement of the 0.8.pre6 release.

I won't circulate this post widely — it's mainly for those close to the project. But it's important to keep track of our progress along multiple dimensions.

Table of Contents

Background

Oil is "Hilariously Unoptimized"

Correctness and Runtime Speed

Lines of Native Code (mostly generated)

Binary Size ("C++ Bloat")

Build Speed ("C++ Bloat")

Conclusion

Appendix: Blog Roadmap

Background

Previous posts with #metrics:

The March metrics post about the 0.8.pre2 release.
The June announcement of the 0.8.pre6 release
The September release of Oil 0.8.1, which I didn't announce. It made significant progress in integrating the garbage collector.

That brings to mind another motivation for this post: preparing for garbage collection. In September, I wrote a small copying collector with unit tests, and partially integrated it into oil-native. I even showed it to a few people, and wasn't embarrassed. But it's not fully running yet.

Garbage collection is the top priority for the coming months (in addition to translating shell I/O). As mentioned, it will necessarily slow the code down.

It's not clear how much, but this post marks a rough baseline for performance.

Oil is "Hilariously Unoptimized"

I should note that essentially all of my effort is spent implementing features, fixing bugs, and getting Oil to translate and compile as C++. I've spent essentially zero time on optimization since December 2019.

I say that because we are experiencing classic "C++ bloat" problems due to templates and exceptions. However, just like bash is no longer "too big and too slow", neither is C++! Modern software has made these old technologies look efficient by comparison.

For example, you'll see below that the Oil binary is about 20-30% bigger than bash right now, e.g. 1.3 MB vs. 1.0 MB. It will get bigger, but it won't reach 5, 10, or 15 MB like similar programs written in Go or Rust.

Correctness and Runtime Speed

These are the things we care most about, and they're looking good.

Test Results

Almost 300 new tests pass in oil-native:

spec-cpp for 0.8.pre6: 1589 osh, 1043 osh_eval.py, 633 osh_eval.cc
spec-cpp for 0.8.4: 1672 osh, 1158 osh_eval.py, 917 osh_eval.cc

I also reviewed these spec-cpp metrics in August, in A Plan for Oil 0.8 and 0.9. Again, the goal is for the 917 osh_eval.cc number to reach the 1672 osh (in Python) number.

OSH spec tests indicate many new features:

OSH spec tests for 0.8.pre6: 1786 tests, 1587 passing, 83 failing
OSH spec tests for 0.8.4: 1883 tests, 1671 passing, 86 failing

And so do Oil spec tests:

Oil spec tests for 0.8.pre6: 254 tests, 232 passing, 22 failing
Oil spec tests for 0.8.4: 297 tests, 274 passing, 23 failing

I described some of these in the previous post, and I'll talk more about them in the next post. Both the OSH and Oil languages are taking a nice shape!

Benchmarks

The parsing benchmarks are still noisy. I'm not sure if this change is significant, but oil-native is still faster than bash at parsing. I should probably switch to something more stable, like instruction counts.

Parser Performance for 0.8.pre6: 775 lines/ms and 207 lines/ms.
Parser Performance for 0.8.4: 635 lines/ms and 234 lines/ms.

The runtime benchmark measures the old Python build, not oil-native, so we don't care about this. What's critically important is to simply run configure scripts under oil-native!

Runtime Performance for 0.8.pre6: 6.0x slower and 7.3x slower than bash.
Runtime Performance for 0.8.4: 7.8x slower and 8.1x slower than bash.

This is the reason we're spending so much effort translating Oil to C++! I haven't made a big deal about it on the blog, but it's an obvious problem.

New Benchmarks

I wrote some synthetic benchmarks to test shell "computation":

compute for 0.8.pre9 (August). Oil is already a bit faster than bash, even though the runtime is "hilariously unoptimized".
compute for 0.8.4. This looks roughly the same, but again the measurements are noisy.

And here are some rough measurements of mycpp's translation:

mycpp-examples for 0.8.0 (September)
mycpp-examples for 0.8.4

Summary: We get a huge speedup on most code, but there are still performance bugs where the translated code is slower than Python! At least one of these is a computational complexity bug.

Size and Build Speed

Again, we compare this release with June's 0.8.pre6 release.

Lines of Source

These are the lines we edit, not those generated. It's still pretty small!

Significant lines:

cloc for 0.8.pre6: 16,792 lines of Python and C, 332 lines of ASDL
cloc for 0.8.4: 17,796 lines of Python and C, 329 lines of ASDL

Physical lines:

src for 0.8.pre6: 31,231 lines of Python
src for 0.8.4: 33,467 lines of Python

Let's add in Oil language (also counted in src.txt):

0.8.pre6: 4,247 lines of Python
0.8.4: 4,684 lines of Python

Note that OSH and Oil share a lot of common libraries, which are counted under OSH.

Nevertheless, I'm surprised by the small increase, and that's a good thing! I think it's because most of the recent changes happened in the grammar, which is small.

I also included the new Tea language! Many thanks to Batuhan Taskaya for recent help on that. I hope to write more about it soon.

Lines of Native Code (mostly generated)

Almost all lines in the oil-native tarball are generated, and we continue to count them. I've also counted osh_eval.cc, the translation of the core interpreter, by itself.

oil-cpp for 0.8.pre6: 77,236 lines, 24,340 in osh_eval.cc
oil-cpp for 0.8.4: 85,916 lines, 28,584 in osh_eval.cc

This is expected progress, which reflects three things:

More new code written in Python, e.g. for the Four Features that Justify a New Unix Shell.
More existing code type checked and translated with mycpp.
Changes to the translation process, like garbage collection. The amount of C++ code generated by the corresponding Python construct can change.

Binary Size ("C++ Bloat")

The binary is getting bigger along with the lines of translated code. Refer to the June announcement of 0.8.pre6 for the reasons behind this.

ovm-build for 0.8.pre6: 860 KB under GCC. 1,011 KB Clang.
ovm-build for 0.8.0: 921 KB under GCC. 1,085 KB Clang. (September)
ovm-build for 0.8.4: 1,131 KB under GCC. 1,295 KB Clang.

(And I still have to figure out why the size of osh_eval.opt.stripped differs so much between GCC and Clang. Guesses: templates, exception tables, or both.)

Build Speed ("C++ Bloat")

For osh_eval.opt.stripped:

0.8.pre6: 57.5 / 21.2 seconds under GCC. 51.6 / 16.9 seconds under Clang.
0.8.0: 87.2 / 27.5 under GCC. 82.3 / 23.8 under Clang. (September)
0.8.4: 110.5 / 33.8 seconds under GCC. 126.1 / 36.1 seconds under Clang.

This is bad! Compile time basically doubled.

I believe this is due to template bloat. We introduced gc_heap::Alloc<T>(...) instead of new T(...), which uses std::forward().

This shows up in the report from Bloaty:

7   gc_heap::Alloc<>()::__PRETTY_FUNCTION__  48821   107976
8                       _GLOBAL__sub_I_str0  61208    61252
9                   [section .debug_abbrev]      0    54882
10                       gc_heap::Alloc<>()  23494    41924

I plan to look into this further, but again, I think we'll have to live with it for awhile. I'm focused on making Oil usable and featureful.

(It's also interesting that Clang was faster on the old code, but is now slower. This pattern held up in 0.8.3 too, so it's not benchmark noise.)

Conclusion

Overall, the build speed is the thing I'm most annoyed by. I expect it to get worse once we fully integrate the garbage collector. For example, I need to generate field masks for every type in the program, and that involves some compile-time computation, e.g. with offsetof().

If you're experienced with these issues, I'd love some help! Let me know in the comments.

Again, I think you compare Oil to a Go or Rust binary, none of this is a big deal. But I want there to be "no reason to use bash rather than Oil", and these issues matter for embedded systems, which occasionally use bash.

But it's much more important to solidify the OSH language and the Oil language. The next post will talk about that work, which includes:

An overhaul of shell options that affect parsing and runtime. Options are the mechanism we use to evolve shell into a better language. Remember that "The Unix Shell Should Evolve Like Perl 5".
An overhaul of variable scope. Oil is now a lot cleaner and safer in this respect!
New features like doc comments with ###, and the pp builtin.

Appendix: Blog Roadmap

More Changes to OSH and Oil
The Shell Programmer's Guide to errexit / set -e. Fixing errexit is one of the Four Features that Justify a New Unix Shell. But we'll also help users of existing shells!
Responses to comments on that post:
- The Biggest Misconception About Shell is that you would replace all your Python with shell. Shell is for gluing programs in different languages together! I write my own tools in Python to invoke in shell, and have a concrete list of them. Factoring into processes is a design skill. Tools are more reusable than "libraries".
- More posts in #shell-the-good-parts