Why Sponsor Oils? | blog | oilshell.org
Let's do something hard, and go all the way back to the first post on the project:
Let's see how much code we've added, and let's see if the ideas made sense. There's no better test than reading and evaluating what you wrote years ago :-)
In 2016, I showed this summary of the code:
PYTHON SKETCH
...
1044 sketch/word_parse.py
1299 sketch/cmd_parse.py
10315 total
SHELL TESTS
...
This was 6 months into the project, and we had 10 K lines of Python code, and many tests.
That report evolved into the ones I publish on the release quality page:
wc -l
oils-for-unix
tarball.
Let's arrange these numbers in columns:
Component | Physical Lines, 2016 | Physical Lines, 2024 | Significant Lines, 2024 | Notes |
OSH | 10 K | 44 K | 23 K | Compare with ~142K lines of bash |
YSH | - | 9 K | 5 K | |
Data Notation | - | 2 K | 1 K | |
|
||||
Garbage Collected Runtime | - | 5 K | 4K | Hand-written C++ |
OS Bindings | - | 3 K | 2K | Hand-written C++ |
|
||||
Total Hand-Written Source | 10 K | 64 K | 35 K | |
Total Generated Code | - | 122 K | ||
|
||||
mycpp Translator | - | 7 K | Not shipped at runtime | |
Spec Tests | 3 K | 54 K |
I like this! We have 64 K physical lines / 35 K significant lines in the major components of the project: OSH, YSH, J8 Notation, and the C++ runtime.
All of Oils — including YSH and J8 Notation — has less source code than bash (~142 K lines).
This is despite the fact that YSH has "real" data structures, garbage collection, and more. (The next post will emphasize this.)
And it's not just our source code that's smaller than bash, but our generated code is too. This matters because we read, debug, and profile it.
So the last post showed that the Oils project is big, but now we see that its source code is small.
The appendix links to selected source files, which may give you a feeling for why this is.
(Caveat: I'm counting only Python and C++ code, which is ~7 out of the 13 parts. I'd like to join and fully automate the 3 line count reports, to account for all 13.)
The table of line counts suggests how the project has changed.
YSH now exists.
J8 Notation now exists.
We have 8 years of features and functionality, but not 8 years of code.
Why is the code short? Because mycpp now exists.
It's funny to me that the first post can be read as an apology for showing Python code, not C++:
I actually started writing it in C++. But after getting to 3K lines of code in the spring, it began to feel onerous.
I also hinted at what was to come:
Or even better than porting is to use Python as a metaprogramming language for C++.
After some diversions and missteps, this largely came true. We now have a nice situation:
This is what I call the middle-out style. But it certainly took a long time to get here.
I think so, but it's hard to argue that in a short space. For now, I'll abbreviate the argument with some slogans:
Benefits of the Middle-Out Style:
A slight surprise:
command.Redirect
refactoring in Oils 0.23.0 made the interpreter faster.Oils is a big project, with 8 years of functionality, but it's a small codebase. And that was always the goal!
What's next? I extracted two posts from this one:
This was the original plan for the series:
Why is our code short? I publish selected source files with every release, and they may give you a feel for this:
_gen/frontend/id_kind.asdl_c.h - generated
_gen/_tmp/match.re2c-input.h - 1277 lines, generated
frontend/syntax.asdl - 653 lines
ysh/grammar.pgen2 - 538 lines
core/value.asdl - 174 lines
Let me know if you need help reading these files! Together, they form a concise description of the many interleaved languages in Oils. They're a big part of what I think of as the executable spec.