What Oil Looks Like, and a Plan for This Blog


When I tell people I'm writing a new Unix shell, I tend to get blank stares. In this blog, I plan to make a case that the shell is an interesting problem.

But for this first post, I'll just give a glimpse of my progress. Here's what the code looks like right now:

    11 sketch/setup.py
    28 sketch/fake_libc.py
    48 sketch/arith_eval.py
    68 sketch/parse_lib.py
    80 sketch/base.py
    81 sketch/ui.py
    98 sketch/reader.py
   106 sketch/util.py
   116 sketch/value.py
   182 sketch/pool.py
   213 sketch/arith_parse.py
   221 sketch/bool_eval.py
   275 sketch/libc.c
   323 sketch/pysh.py
   337 sketch/bool_parse.py
   366 sketch/builtin.py
   462 sketch/tokens.py
   503 sketch/lexer.py
   538 sketch/process.py
   620 sketch/node.py
   776 sketch/completion.py
   796 sketch/word_eval.py
   808 sketch/cmd_exec.py
   916 sketch/word.py
  1044 sketch/word_parse.py
  1299 sketch/cmd_parse.py
 10315 total

    5 tests/bugs.test.sh
    7 tests/process-sub.test.sh
   11 tests/test-builtin.test.sh
   24 tests/builtins.test.sh
   24 tests/extended-glob.test.sh
   25 tests/tilde.test.sh
   38 tests/case.test.sh
   42 tests/var-ref.test.sh
   54 tests/for-let.test.sh
   54 tests/zsh-assoc.test.sh
   61 tests/command-sub.test.sh
   63 tests/append.test.sh
   71 tests/assign.test.sh
   82 tests/explore-parsing.test.sh
   85 tests/brace-expansion.test.sh
   87 tests/smoke.test.sh
   88 tests/loop.test.sh
   89 tests/assoc.test.sh
   92 tests/arith-context.test.sh
   97 tests/func.test.sh
  108 tests/word-split.test.sh
  109 tests/redirect.test.sh
  123 tests/glob.test.sh
  126 tests/arith.test.sh
  139 tests/quote.test.sh
  144 tests/posix.test.sh
  146 tests/var-sub-quote.test.sh
  206 tests/shell-grammar.test.sh
  214 tests/dbracket.test.sh
  260 tests/here-doc.test.sh
  260 tests/var-sub.test.sh
  285 tests/array.test.sh
 3219 total

602 test_sh.py

This is an interpreter for the shell language in about 10K lines of Python. It's not complete, but after writing this code, I feel like it can be completed. Writing a shell is a big job!

I actually started writing it in C++. But after getting to 3K lines of code in the spring, it began to feel onerous.

The challenge is really understanding all the nooks and crannies of the shell language. If I misunderstood a syntactic feature, which happened constantly, I would have to tweak a class definition. Redundant header files and long build times make that an annoyance.

So often I would write little sketches in Python first, to test if my parsing algorithm matched reality. Over time I decided to just implement the whole thing in Python, and port it to C++ later.

Python is malleable and good at text manipulation, so it was pretty nice for writing the lexer and multiple parsers. But it also has built-in bindings to raw system calls like fork(), exec(), and dup2(), so I wrote the runtime portion as well. (Though perhaps it isn't "production quality" due to issues with signals and so forth).

So my strategy was roughly:

1) Read over the POSIX shell spec, and port the official grammar to ANTLR. I didn't use ANTLR to generate code — I used it to machine-check the grammar, so I was more confident in my understanding of the language. The rules in the grammar form the skeleton of a recursive descent parser hand-written in Python.

2) Write a test framework which can run code snippets through any shell and make assertions on the output.

3) Write detailed test cases to explore each shell feature, and run them against bash, dash, mksh, and zsh. Examine the output and decide upon the "correct" behavior for my shell.

It's not hard to find bugs in doing this, but I've also learned that shells are in general highly POSIX compliant. (They are also highly ksh compliant!)

4) Write my own interpreter, sketch/pysh.py, using these proven tests as a guide. Refactor mercilessly.

The result is that I know the sh language inside and out, and I've converged on a clean software architecture.

Now I'm a lot more confident that I can write a high quality shell in C++. Since shell is a fixed target, I believe that prototyping it in Python and then porting it to C++ will be faster than slogging through it from beginning to end in C or C++.

(Or even better than porting is to use Python as a metaprogramming language for C++. More on that later.)

The first line of the sketch was written 6 months ago:

commit 382dbe29a7bce242cd62771489816ee69a9a17fa
Author: Andy C <...>
Date:   Wed Apr 20 01:54:26 2016 -0700

    figuring it out in Python

M       NOTES.txt
A       demo/repl_parse.py

Given that my shell is closer to the bash language than the POSIX shell subset, and that I had some diversions into Make and Awk (e.g. bwk), I think that's a reasonable amount of time.

A Plan for the Blog

I have a lot of incomplete documentation about oil. Nobody has seen it yet, and nobody has seen the code either.

So I'm going to take an incremental approach and write something in this blog every day. Each entry might be very short, but that's probably a good thing.

Tomorrow I will show that oil does something marginally useful: it can parse the Aboriginal Linux shell scripts.