blog |

OSH 0.3 - Optimization, Benchmarks, and Bug Fixes


I've released version 0.3 of OSH, a bash-compatible shell:

To build and run it, follow the instructions in INSTALL.txt. Please try it on your shell scripts and report bugs!

This release contains:

  1. Optimizations. Compared to the OSH 0.2 release, the parser is 6 to 7 times faster and uses 60% to 32% of the memory.
  2. New benchmarks and tests published on the /release/$VERSION/ page.
  3. Fixes to several bugs uncovered by running real shell scripts.

Despite the optimizations, the same caveat still applies: it's too big and too slow. Read on for details.

Table of Contents
Code Changes
Lexer Optimization
ASDL Optimization
An Aside on Metaprogramming and Code Generation
New Benchmarks
Other Changes
Under the Hood
What's Next?
Appendix: Other Project Metrics


Although OSH has many automated tests, I'm still looking for more shell users to run their programs with it and report bugs. Shell is a big language, and each person uses a different subset of it.

Fixing the bugs above only took a few minutes each. I prioritize bug fixes because I'm optimizing the code, and it's not good to optimize incorrect code!

Code Changes

The section is a summary of the full changelog.

Lexer Optimization

Translating the lexer to native code via re2c roughly doubled the overall speed of the parser. I'm currently writing a series of blog posts on this topic. So far, I've published an introduction and an explanation of lexer modes.

ASDL Optimization

Recall that OSH represents shell programs with a lossless syntax tree, and the LST is described with a domain-specific language called ASDL.

Prior to this release, OSH parsed the ASDL schema at startup, and used Python's metaclass mechanism to generate classes at runtime.

In this release, I instead generate Python code as text, at build time. This roughly triples the speed of the parser (on top of the lexer optimization).

The generated code uses Python's __slots__ mechanism, which accounts for the large decrease in memory usage. A big file takes 32% of the memory to represent, compared to OSH 0.2.

Related, short explanation: Saving 9 GB of RAM with Python's __slots__

An Aside on Metaprogramming and Code Generation

It's interesting that I sped up the parser without actually touching its code!

Instead, I optimized its "book ends": the lexer and the LST. The parser pulls tokens from the lexer, and pushes nodes onto the LST.

Furthermore, I didn't touch the logic of these two components. Instead, I generated low-level code from high level descriptions:

  1. For the lexer, a list of regex → token ID mappings becomes a series of switch and goto statements in C.
  2. For the LST, a schema that uses the model of algebraic data types becomes a sequence of Python class definitions. I've also generated C++ types from this schema for oheap, although this code isn't currently used.

New Benchmarks

With the last release, I began measuring the time to parse 10 large shell programs. As of this release, I also measure memory usage.

For example, consider the first row of the second table on the parser benchmarks. It shows that after OSH parses a ~1700 line shell program, the VmPeak of the process is 24.7 MB.

In addition, I created three new benchmarks:

  1. OSH Runtime. On a fast machine, Python's configure script takes 13.3 seconds to run under OSH, but 10.6 seconds to run under bash. This isn't a huge difference, because the script spends most of its time in external processes.

    However, there are shell scripts that are almost 50x slower under OSH! I will fix this.

    This benchmark also suggests that OSH mostly uses memory to parse shell programs, as opposed to running them.

  2. Virtual Memory Baseline. I compare the memory usage of various idle shells. Because OSH is based on the Python interpreter, it uses more memory than any other shell.

  3. OHeap Encoding Size. The oheap encoding of the LST isn't used right now, but in the future it may reduce the memory footprint of OSH.

These benchmarks will be published with every release on the /release/$VERSION/ page, and I'll use them to guide further optimization. I also plan to add more benchmarks, e.g. to track the compiled binary size.

Other Changes

  1. A few small optimizations to the lexer and parser that contributed to the 6-7x speedup.
  2. Fixes/enhancements to the runtime, like implementing the obscure >| redirect, and fixing the execution of case clauses with multiple patterns.

Under the Hood

  1. The binary got smaller after removing unecessary usages of collections.deque() and the json module. (The build process for the OVM app bundle only includes modules actually imported.)
  2. oheap encoding of the LST now works. I'm testing it on big, complex shell scripts.
  3. ASDL reflection now has a more uniform API. Serializing the LST to oheap and pretty-printing it as text both use reflection.

What's Next?

Profiling and optimizing the OSH front end has given me many more ideas for further optimization.

However, the parsing speed is no longer a blocking issue for many applications. The 30% penalty for running Python's configure script under OSH isn't a dealbreaker.

For the next release, I plan to continue to improve correctness by running OSH on more shell scripts.

It already runs:

But now is a good time to go further in this direction. I plan to resume work on building Linux distros like Alpine Linux and Debian with OSH.

Appendix: Other Project Metrics

Of course, I'm still publishing test results and line counts. In addition to getting faster, OSH is also more correct:

Most of the code volume in this release has gone toward code generators and benchmarking scripts, as opposed to OSH proper:

Notable changes: