Home

Surprisingly, nobody has complained that almost all the code is in Python.

programmers question its usage.

In fact the opposite is true: I've been asked why it should be ported to C or C++ at all.

A few reasons:

  1. Most embedded Unix systems don't have Python, and I want Oil to be usable there. For example, Android uses [mksh][] and doesn't have Python.
  2. Python does some non-trivial things with signals, like turning some of them exceptions, which may cause bugs. It also might do some nontrivial stuff in os.fork().
  3. The Python interpreter is slow to start. This is apparent when you run the [spec tests][spec-test], which start OSH many times in a serial fashion.

Next step: instead of porting the lexer and parser to C++, I'm going to start with the executor.

This forms the semantics of the oil language. I want to bootstrap more of the language.

Maybe the REPL for oil can be written in oil. And the builtins can be written in oil.

Sneak peak: it will have procs and funcs.

I implemented function call syntax in oil already. IN TDOP.

You already have

$(( a[1] ))

$(( f(1, 2) ))

Bootstrapping

What does bootstrapping mean? If a language is written in itself, it means the language you first wrote it in.

Rust compiler is written in Rust. But it was bootstrapped wtih OCaml.

Rust: link to OCaml repo

Go compiler is now written in Go. Bootstrapped with C.

tinypy parser was bootstrapped with CPyhton.

The oil parser will be written in oil. But it was bootstrapped in Python.

Note that this allows me to break the dependency WITHOUT ever writing a parser in C++! I could just write a Python to oil converter in Python, using the Python AST module.

Goal: avoid writing TWO parsers by hand in C++. I don't mind writing two parsers by hand in Python, or writing one parser by hand in Python, and one iwth a meta-language.

However no meta-languages are suitable. (Tomorrow blog post)

Steps:

  1. Write a little protocol compiler. Like Zephyr AST. This is easier than Writing stuff by hand in C++.

    1. Define the AST using class annotations.
    2. Python metaprogramming (runtime code generation) to write the AST to a binary format.
    3. Generate C++ code from Python to READ that binary format. This is what the protocol compiler does: it takes a schema, and generates C++ code that will read a specific binary format. You can think of this as type specialization.
    4. Also generate C++ code that will REPRESENT the data. Homogeneous AST rather than heterogeneous -- cuts down on code size.

    5. Now write an executor for the serialized tree.

Problem: "source" statement and "eval" statement are DYNAMIC. You can't execute them statically, because you would have stuff like this:

source $(simulate-turing-machine $input)/my-lib.sh

I mentioned this in the post on static parsing -- like Python, the problem of determining what files comprise a shell program is also formally undecidable.

But I think I can get around this. I have to write a binding back into the Python parser, or something. Or provide an alternate semantics.