Oil Cross Reference

A list of topics and anchors that the blog and other docs link to.


ALGOL Family of Languages — C-like imperative languages with functions, loops, conditionals, etc.

Project Components

Oil Language — A new shell language, to which bash programs can be automatically translated. It has a superset of bash functionality, with a syntax designed all at once instead of evolved. It will also incorporate elements of Awk and Make.

OSH Language — A statically-parseable language based on the common use of shell, in particular bash. In almost all cases, it's indistinguishable from bash.

OPy — A variant of Python used to write OSH and Oil. See Cobbling Together a Python Interpreter.

Boil — The working name for the part of Oil that subsumes GNU Make. No code for this exists yet.

OVM — The virtual machine that OSH and Oil will run on. As an implementation detail, it's a fork of the CPython VM.

readline — A line-editing library derived from bash. It has emacs and vi modes.

pylibc — An extension module to expose libc functions to Python. Python implements its own glob() or fnmatch() that are different from the ones in libc. We may also need libc's locale-aware string functions.

Oil Terms

Nice Translation — A shell-to-Oil translation that only uses recommended concepts in Oil.

Compatible Translation — A shell-to-Oil translation that uses Oil features that exist only for the sake of bash compatibility.

Naive Style — TODO

Pedantic Style — TODO

Language Tools

ANTLR — A tool to generate top-down parsers (LL(k), LL(*)). I ported the POSIX shell grammar to ANTLR to machine check it, but it's not used to generate code.

yacc — A tool to generate bottom-up parsers. Bash uses yacc, which is a mistake discussed in this AOSA Book chapter on Bash.

re2c — A tool for generating lexers from regular expressions. The best part of it is that it's a library and not a framework.

mypy — A static type checker for Python.

Zephyr ASDL — A domain-specific language using the model of algebraic data types, which are a convenient for representing programs. See What is Zephyr ASDL? and posts tagged ASDL.

This article describes its use in Python. This SourceForge project contains the code.

Clang — A modular front end for C and C++ that supports IDEs and other tools (as well as the code-generating compiler). Oil has some similarities because we have multiple uses cases for the parser: execution, interactive completion, a tool to convert the osh language to the oil language, and more.

Protocol Buffers — A schema language, serialization format, and set of APIs created and open-sourced by Google.

sh_spec.py — A test framework written for osh that runs shell snippets against many shells.

Wild Tests — A test framework that tortures the OSH parser with real-world shell scripts.

Gold Tests — A type of test that compares the output of OSH and bash (or another existing shell). The assertions are implicit so you don't have to write them.

Shell Tools

coreutils — The GNU implementation of ls, cp, mv, etc. It also has versions of test, time, and kill, which are typically shadowed by similar-but-different shell builtins.

find — A classic Unix tool that walks a directory tree, filters its entries, and performs actions. GNU findutils implements it. It doesn't look much like Awk, but it has similar semantics: it applies predicates and actions on a stream.

xargs — A tool that builds and executes command lines from stdin. A very useful GNU extension is xargs -P, which starts processes in parallel.

expr — An external tool that implements mathematical expressions for shell. It has been mostly subsumed by the POSIX $((1+2)) construct, and the [[ $mystr =~ $myregex ]] construct. GNU autotools still generates code that uses it.)

Code Improvement Tools

Themes: Correctness, security, performance.

AddressSanitizer — A compiler tool for detecting memory errors at runtime. That is, it's a kind of dynamic analysis. It solves roughly the same problem as Valgrind, but it's faster. Also known as ASAN.

American Fuzzy Lop — A fuzzer that uses compiler technology to efficiently explore code paths. In the last few years, it's been used to surface hundreds of bugs in ubiquitous and already well-tested pieces of open-source software. Its Wikipedia page is also helpful.


Aboriginal Linux — Shell scripts that implement the minimal Linux system that can rebuild itself (discontinued as of April 2017.)

abuild — A 2500-line shell script that builds Alpine Linux packages.

Alpine Linux — A minimal Linux distribution based on musl libc and busybox.

bwk — Some software archaeology I did on Kernighan's Awk, to research how Awk relates to the shell. (One interesting thing: they both don't implement first-class compound data structures, and thus lack garbage collection.)

GNU autotools — A meta-build system that generates configure shell scripts and Makefiles from m4 macros.

BusyBox — A reimplementation of standard Unix command line utilities, commonly used on embedded Linux systems.

debian — One of the oldest and most popular Linux distributions. It uses the apt package manager, which wraps dpkg. Ubuntu is based on Debian.

debootstrapDebian uses this large shell program to construct its base image from binary packages.

Nix — A purely-functional package manager and Linux distribution. As with nearly all distributions, bash plays a fundamental role in building binary packages.

PyPy — A Python interpreter written in Python (including a restricted subset RPython). It has novel JIT technology and a focus on speed.

tinypy — A interpreter for a subset of Python written in just ~2K lines of C and ~2K lines of Python (using a very dense style). I used some tinypy code for my pratt-parsing-demo, and it inspired the plan for Oil to have a Python interpreter.

Toybox — A reimplementation of standard Unix command line utilities, by the former maintainer of busybox.

Ninja — A "low-level" build system focused on incremental build speed. High level languages like CMake generate Ninja build files.

Unix Concepts

chroot — A system call that gives a process a view of its own "virtual" file system. Linux container technology like Docker or LXC can be thought of as a "chroot on steroids".


Python tokenize module — A reimplementation of Parser/tokenizer.c in pure Python. Part of the Python standard library.

pgen2 — A reimplementation of Parser/pgen.c in Python, done for lib2to3.

compiler2compiler2 is my name for the deprecated Python 2.7 compiler module. It does the same thing as Parser/compile.c, but in Python.

byterun — A Python bytecode interpreter written in Python, described in the AOSA Book.

Shell Documents

POSIX Shell Spec: POSIX specification for the shell (sh). It seems that ksh was the dominant shell at the time of standardization, so bash implemented POSIX + a lot of ksh.

POSIX Shell Grammar: Subsection of the spec which has a BNF-style grammar.

Google Shell Style Guide -- Unofficial shell style guide at Google, which points out some deficiencies in the shell language. (Not all shell scripts at Google attempt to conform to this style.)

Chapter on Bash in the Architecture of Open Source Applications — An excellent article by bash maintainer Chet Ramey on bash's internal structure.

Algorithms and Data Structures

Context-Free Grammar -- A formalism for expressing the syntax of programming languages. Shell can only be partially specified using a CFG; the POSIX grammar is incomplete.

Parsing Expression Grammar -- An alternative formalism to context-free grammars, which may be better-suited to expressing shell syntax.

Lexical State -- A simple technique for parsing languages with "subdialects".

Precedence Climbing -- A simple algorithm for top-down parsing of expressions. It's a special case of top-down operator precedence parsing.

Top-Down Operator Precedence Parsing -- Also called Pratt parsing, this is a general algorithm for parsing expressions with multiple levels of precedence.

Recursive Descent Parsing -- A kind of hand-written top-down parser.

Top-Down Parsing -- Parsing algorithms can be categorized as either top-down or bottom-up. ANTLR uses top-down algorithms, while yacc uses bottom-up algorithms. Pratt parsing is a top-down algorithm and recursive descent is a top-down technique. See LL and LR Parsing Demystified.

Abstract Syntax Tree — In contrast to an AST, a parse tree is derived only from the rules of the grammar for a language. You don't need to annotate your parser with nontrivial "semantic actions". The exact definition is debatable, but in my usage, an AST has some simplifications or annotations over a parse tree, depending on what you need to do with it: source-to-source translation, interpretation, code generation, etc.

Lossless Syntax Tree — An "abstract" syntax tree with enough detail to reproduce the original source code.

Algebraic Data Types — A data model of sum and product types. This model is particularly convenient for representing the structure of programming languages.

Shell Language

Trivia about the Unix shell language, including the common ksh/bash extensions.

Here Document — A construct in shell for writing lines of text to be fed to stdin of a process. Perl, Ruby, and PHP borrowed here docs from shell.

Shell Builtin — A shell builtin is just like an external command, e.g. /bin/ls, except it's linked into the sh binary. It takes an argv array, returns an exit code, and uses stdin, stdout, and stderr.


Flame Graph — A visualization that's commonly used for profiling the CPU usage of a program. More generally, it can visualize quantities associated with each node in a tree. Sets of Stack traces form a tree when combined.


Domain Specific Languages by Martin Fowler -- A book of patterns for implementing DSLs. Discusses lexical state.

Shell Implementations

GNU Bash — The most popular shell implementation.

Debian Almquist Shell — A fork of the Almquist Shell that Debian and Ubuntu use for shell scripts, but not the default login shell. If you look at the busybox ash source code, it is apparent that they are similar. The things I notice most about it are that kebab-case function names aren't allowed, and it has a bug related to readonly and tilde expansion.

MirBSD Korn Shell — A fork of pdksh (Public Domain Korn Shell). This is the default shell on Android. Testing this shell against others has taught me that many "bash-isms" are actually "ksh-isms". bash implemented many ksh extensions for compatibility.

zshzsh is probably the second most popular interactive shell, after bash. It's not POSIX-compliant by default, although it has options to make it POSIX compliant. Apparently, it doesn't split words by default.

Korn Shell — ksh was an extension of the Bourne shell, developed at Bell Labs. pdksh and bash cloned many of its features.

Public Domain Korn Shell — A defunct clone of AT&T's Korn shell that survives in at least two forks: the OpenBSD shell and mksh.

Programming Languages

Tcl — An embedded scripting language that's influenced some alternative shells. It has Lisp-like properties.

Lua — Lua is an embedded scripting language, which means that the interpreter is a library. It has no global variables, and requires explicit capabilities to I/O. While I don't like Lua the language, this aspect of Lua will influence Oil.

sed — A text stream editor using a batch execution model.

Awk — A classic Unix programming language for text processing.

Make — A classic Unix build tool that is also a Turing-complete programming language.

Shell — An interactive program to control the Unix operating system, as well as a programming language. Oil aims treat shell as a serious programming language.

R language — A language for statistical computing, including data manipulation, modelling, and visualization.

ML — ML stands for "meta-language": a language for manipulating languages. The ML family of languages includes OCaml and Haskell, and its distinguishing feature is the data model of algebraic data types. The domain-specific language ASDL uses this data model.

CPython — The reference implementation of the Python programming language.

Python — The popular language that I wrote OSH in.

OCaml — A popular modern implementation of ML. If I hadn't prototyped OSH in Python, OCaml would have been a good choice. The compiler and runtime are well-engineered and well-documented. They may influence OPy.

M4 — GNU Autotools is written in the text preprocessor language M4. It's similar to the C preprocessor, except that it's Turing-complete. It was designed to support a dialect of Fortran.