Oil 0.8.pre6 - Pure Bash and C++

2020-06-18

This is the latest version of Oil, a Unix shell that's our upgrade path from bash:

Oil version 0.8.pre6 - Source tarballs and documentation.

To build and run it, follow the instructions in INSTALL.txt. The wiki has tips on How To Test OSH.

Table of Contents

Highlights

Patch to run the mal Lisp

Closed Issues

What's Next?

Appendix: Selected Metrics

Highlights

(1) Fixes so OSH can run two interpreters written in bash:

The kanaka/mal Lisp. I implemented negative slice indices for strings like ${s : -5 : -2}, and submitted a patch upstream. The next section explains the patch.
Crestwave's Brainfuck. I fixed a corner case with literal hyphens in globs like ${x//['+-']}, and fixed the semantics of unset a[-1].

These are pure programs in the sense that they do little I/O. For example, they don't start external processes like typical shell scripts. Pure programs are useful to test the speed of the OSH interpreter itself, and because translating I/O to C++ is separate work.

(2) Fixes to run the ShellSpec test framework. For example, shopt -u verbose_errexit makes OSH silent on errexit, like other shells. Feedback on the shell's verbosity is still welcome.

Keep the bug reports coming! The fixes were minor, which is evidence that OSH is maturing.

(3) More progress translating Oil to C++, i.e. oil-native. The last release announcement described this work.

The main bundle osh_eval.cc now has ~24K lines of code, up from ~21K.
The native interpreter passes 633 spec tests, up from 420. More details below.

Note that these two strands of work have yet to converge. That is, the Python version of OSH can run Lisp, Brainfuck, and a JSON parser, but oil-native can't yet.

Achieving that may be a good milestone for version 0.8.0!

Patch to run the `mal` Lisp

Pull Request 518 on kanaka/mal was merged a couple days ago, and it shows the changes necessary to run mal under OSH. To justify the changes, I linked to sections in the Known Differences doc:

Arrays and strings are distinct in OSH, but not in bash.
Associative array keys must be quoted, because Parsing Bash is Undecidable. That is, OSH avoids dynamic parsing based on the runtime type of the variable.
I extracted a constant regex, as suggested by the bash manual. OSH doesn't have a special lexing mode for the pattern in [[ x =~ $pat ]].

In my mind, each of these changes improved the program. So this is evidence that OSH is delivering on its claims to be a stricter, saner language, while still running real bash programs.

Closed Issues

Thanks to Crestwave and Koichi Nakashima for testing and reporting bugs!

#774	eval 'break' doesn't break, 'source return.sh' doesn't return, etc.
#772	Support negative indices in string slices
#769	read -n1 </dev/null succeeds
#768	Unsetting the last element of an array and appending an element behaves differently than bash
#765	Incorrect handling of literal hyphen in patsub
#763	Suppress some errexit messages

In addition to those issues, see the full git log.

What's Next?

I'm very focused on translation to C++. Running these pure bash programs under the oil-native will be a great milestone, and it opens up the possibility of using existing benchmarks in the kanaka/mal repository. How fast does a Lisp run under OSH?

Such benchmarks aren't representative of all shell programs, but they are representative of programs that do a lot of string processing, like autocompletion plugins.

I'd still like more help with testing, bug fixes, and evaluating the Oil language. As I mentioned in the last post, Oil is looking more like a language/library for building shells than a shell itself.

If you want it to be more than that, please get involved! Send us a message on Github or Zulip.

Appendix: Selected Metrics

Let's compare this release with last month's 0.8.pre5 release.

Lines of Native Code

Most commits in this release were related to C++ translation. Here's evidence of that:

oil-cpp for 0.8.pre5: 69,840 lines, 20,875 in osh_eval.cc
oil-cpp for 0.8.pre6: 77,236 lines, 24,340 in osh_eval.cc

Binary Size

I still have to figure out why the size of osh_eval.opt.stripped differs so much between GCC and Clang.

ovm-build for 0.8.pre5: 759,072 bytes under GCC. 894,128 Clang.
ovm-build for 0.8.pre6: 859,656 bytes under GCC. 1,010,872 Clang.

I also think that the binary is too big, considering the source code size. For example, bash has over 140K lines of code, and it's also about 1 MB in size.

I spent some time with bloaty and it looks like some of the problems are:

Exception tables. Oil uses exceptions but bash doesn't.
Extra pretty-printing methods generated by ASDL. These can be removed with some effort.
Global strings that are dynamically allocated. This is a result of the rough mycpp translation, which still needs improvement in many areas.
mycpp compiles every Python % format string into a C++ function, which increases the code size. I think Rust programs often notice an analogous issue.
Constant data. Oil has about twice as much as bash, maybe due to more detailed error messages.

However, to keep this in context, common shells in Rust or Go are often 10x bigger than Oil and bash. They're more like 10 MB than 1 MB.

But my goal is to sneak the Oil language in "for free", so to speak. Making the binary smaller is future work.

Build Speed

For osh_eval.opt.stripped:

0.8.pre5: 58.2 / 18.2 seconds under GCC. 44.3 / 14.3 seconds under Clang.
0.8.pre6: 57.5 / 21.2 seconds under GCC. 51.6 / 16.9 seconds under Clang.

Compared to oil.ovm:

0.8.pre5: 50.2 / 18.9 seconds under GCC, 50.9 / 14.6 under Clang
0.8.pre6: 53.0 / 19.6 seconds under GCC, 51.9 / 15.1 under Clang (redundant)

Compared to bash make:

0.8.pre5: 69.1 / 25.5 seconds under GCC
0.8.pre6: 69.4 / 25.7 seconds under GCC (redundant)

So oil-native compiles about as fast as our slice of CPython, and compiles a bit faster than bash.

Although again, it has fewer lines of code than bash, so I think it should be even faster. I think that getting rid of small translation units (.cc files) may improve build times. This is also future work.

Test Results

More builtins run in C++, leading to over 200 new tests passing:

spec-cpp for 0.8.pre5: 1560 osh, 534 osh_eval.py, 420 osh_eval.cc
spec-cpp for 0.8.pre6: 1589 osh, 1043 osh_eval.py, 633 osh_eval.cc

(Minor harness bugs: I fixed the inconsistency between 1539 and 1560 that showed up last time, but there's still an inconsistency between 1589, and 1587 below.)

OSH spec tests:

OSH spec tests for 0.8.pre5: 1762 tests, 1560 passing, 84 failing
OSH spec tests for 0.8.pre6: 1786 tests, 1587 passing, 83 failing

Oil spec tests:

Oil spec tests for 0.8.pre5: 253 tests, 231 passing, 22 failing
Oil spec tests for 0.8.pre6: 254 tests, 232 passing, 22 failing

Significant lines of code:

cloc for 0.8.pre5: 16,521 lines of Python and C, 312 lines of ASDL
cloc for 0.8.pre6: 16,792 lines of Python and C, 332 lines of ASDL

Physical lines:

src for 0.8.pre5: 30,703 lines of Python
src for 0.8.pre6: 31,231 lines of Python

Benchmarks

These benchmarks are still noisy, but roughly unchanged. The parser benchmark measures C++ code:

The runtime benchmark measures Python code for now:

Oil 0.8.pre6 - Pure Bash and C++

Highlights

Patch to run the mal Lisp

Closed Issues

What's Next?

Appendix: Selected Metrics

Lines of Native Code

Binary Size

Build Speed

Test Results

Benchmarks

Patch to run the `mal` Lisp