I've released version 0.2 of OSH, a bash-compatible shell:
To build and run it, follow the instructions in INSTALL.txt.
Please try it on your shell scripts and report bugs!
Caveat: it's still too big and too slow. However, this release takes the first step toward rectifying that. Read on for details.
This is also the first release where someone besides me has changed either the OSH runtime or the test suite! (Several others have sent build patches in the past, but I view these changes as a milestone.)
Looking at the changelog:
-osh, which is how login shells are invoked.
timetoplatypuson lobste.rs contributed an Arch Linux package for OSH. I started a Wiki page listing Oil deployments.
A note on contributing:
OSH is well-tested, and test cases are easy to add. In most cases, writing a good test case solves 50% of the problem. So, if you don't have time to dive into the code, I still appreciate test-only patches.
In the last status update, I mentioned that OSH should run the shell scripts that build Alpine Linux. However, I immediately hit two problems:
The good news is that this release largely fixes the second problem. I rewrote the wild test harness and fixed all unhandled exceptions uncovered by this "fuzzing". Here are the results:
The page above shows OSH parsing over a million lines of shell scripts found in the wild. This includes every APKBUILD file in Alpine Linux, which is 5,106 shell scripts totalling 254,565 lines of code.
It also includes:
Given the volume and diversity of the scripts that OSH now handles, I would say that the OSH language is almost completely defined.
(The architecture of the parser is strict and principled, but the details must be determined empirically. That is, we parse many scripts and see what fails.)
The summary at the top of the wild test results says that there are 229 scripts that OSH can't parse. However, most of these aren't problems with OSH. Here is a sampling of the underlying reasons:
.shextension that aren't shell scripts (e.g. ld-version.sh in the Linux kernel is an Awk script.)
$(( x > 3.5 ))is a syntax error because we don't support floating point numbers, but
There are also constructs that I've intentionally defined out of OSH. I started documenting these in a nascent OSH manual.
For example, the problem I wrote about with $(( appears in perf-with-kcore.sh in the Linux kernel.
There are also some OSH bugs, like incorrect parsing of bash's regex literals, but this is a small number of cases. Subsequent releases will fix these problems.
Important: This is a good time to suggest other complex shell scripts that OSH should parse. I'll be more likely to include an odd/rare shell construct in the OSH language definition now, as opposed to a year from now.
Leave a comment or e-mail me!
In the last status update, I also mentioned that I uncovered three performance bottlenecks by profiling OSH.
I addressed the first bottleneck in this release by optimizing the parsing of here documents. Surprisingly, I also found correctness problems, which led me to change the algorithm. In the next post, I'll issue a correction to How To Parse Here Documents.
OSH now has benchmarks that show this change. They will be published with every release.
While OSH is still more than a hundred times slower than other shells, it's
improved significantly. Parsing a
configure script went from 14.8
seconds to 8.5 seconds on a fast machine, and from 32.0 to 23.0
seconds on a slow machine.
My goal for OSH is to parse scripts as fast as other shells, while statically parsing the whole program and producing a more detailed representation of it.
Other notable changes from the changelog:
stripbinaries, but this breaks the OSH app bundle. Thanks to
timetoplatypusfor bringing this to my attention.
readonly x=1is an assignment, while
readonly -pis a command that prints the shell's state.
In this post, I summarized contributions from three people. Then I wrote about parser correctness and performance. There is new infrastructure for wild tests and new parser benchmarks.
For the next release, I plan to keep my eye on the ball and continue optimizing the parser:
The lexer and the LST are like the "bookends" of the parser. Although the parser will still be written in thousands of lines of Python, I hope that optimizing its input and output will make it at least five times faster.
Leave a comment if you have feedback on the direction of the project!
In this section, I compare this release vs. the previous one. I put
significant work into the
oilshell.org/release/$VERSION/ trees so that this
analysis is possible.
What improves the spec test metrics is implementing new features, like the
getopts builtin. In this release, I did more work on the parser, so the
numbers haven't moved as much.
Most of the work in this release went toward the supporting infrastructure, which I summarize here (OSH 0.2 only):
# Lines Category 2,380 Build Automation 5,242 Test Automation 970 Benchmarks 974 Web (used by tests and benchmarks) ----- ----- 9,566 TOTAL 7,116 Spec Tests 4,253 Oil Unit Tests 543 Other Unit Tests ----- ----- 11,912 TOTAL
The supporting code adds up to 21,478 lines of code, dwarfing OSH itself! I view this is as a good thing. It keeps the project on track, but adds no weight to what we ship.
This gives me confidence that OVM can be made smaller and lighter, which will make OSH smaller and lighter.