A list of topics and anchors that the blog and other docs link to.
Oil Language — A new shell language, to which bash programs can be automatically translated. It has a superset of bash functionality, with a syntax designed all at once instead of evolved. It will also incorporate elements of Awk and Make.
OSH Language — A statically-parseable language based on the common use of shell, in particular bash. In almost all cases, it's indistinguishable from bash.
ALGOL Family of Languages — C-like imperative languages with functions, loops, conditionals, etc.
ANTLR — A tool to generate top-down parsers (
LL(*)). I ported the POSIX
shell grammar to ANTLR to machine check it, but it's not used to generate code.
yacc — A tool to generate bottom-up parsers. Bash uses yacc, which is a mistake discussed in this AOSA Book chapter on Bash.
re2c — A tool for generating lexers from regular expressions. The best part of it is that it's a library and not a framework.
mypy — A static type checker for Python.
Zephyr ASDL — A meta-language to describe the Abstract Syntax Tree for a language in an ML-like syntax. This article describes its use in Python. This SourceForge project contains the code.
Clang — A modular front end for C and C++ that supports IDEs and other tools (as well as the code-generating compiler). Oil has some similarities because we have multiple uses cases for the parser: execution, interactive completion, a tool to convert the osh language to the oil language, and more.
Protocol Buffers — A schema language, serialization format, and set of APIs created and open-sourced by Google.
sh_spec.py — A test framework written for
osh that runs shell snippets against many
Aboriginal Linux -- Shell scripts that implement the minimal Linux system that can rebuild itself.
bwk -- Some software archaeology I did on Kernighan's Awk, to research how Awk relates to the shell. (One interesting thing: they both don't implement first-class compound data structures, and thus lack garbage collection)
debootstrap -- A big shell script that Debian uses to construct its base image from binary packages.
GNU autotools -- A meta-build system that generates
shell scripts and Makefiles from
Toybox -- A reimplementation of standard Unix command line utilities, by the former maintainer of busybox.
POSIX Shell Spec: POSIX specification for the shell (
It seems that
ksh was the dominant shell at the time of standardization, so
bash implemented POSIX + a lot of ksh.
POSIX Shell Grammar: Subsection of the spec which has a BNF-style grammar.
Google Shell Style Guide -- Unofficial shell style guide at Google, which points out some deficiencies in the shell language. (Not all shell scripts at Google attempt to conform to this style.)
Chapter on Bash in the Architecture of Open Source Applications — An excellent article by bash maintainer Chet Ramey on bash's internal structure.
Context-Free Grammar -- A formalism for expressing the syntax of programming languages. Shell can only be partially specified using a CFG; the POSIX grammar is incomplete.
Parsing Expression Grammar -- An alternative formalism to context-free grammars, which may be better-suited to expressing shell syntax.
Lexical State -- A simple technique for parsing languages with "subdialects".
Precedence Climbing -- A simple algorithm for top-down parsing of expressions. It's a special case of top-down operator precedence parsing.
Top-Down Operator Precedence Parsing -- Also called Pratt parsing, this is a general algorithm for parsing expressions with multiple levels of precedence.
Recursive Descent Parsing -- A kind of hand-written top-down parser.
Top-Down Parsing -- Parsing algorithms can be categorized as either top-down or bottom-up. ANTLR uses top-down algorithms, while yacc uses bottom-up algorithms. Pratt parsing is a top-down algorithm and recursive descent is a top-down technique. See LL and LR Parsing Demystified.
Abstract Syntax Tree — In contrast to an AST, a parse tree is derived only from the rules of the grammar for a language. You don't need to annotate your parser with nontrivial "semantic actions". The exact definition is debatable, but in my usage, an AST has some simplifications or annotations over a parse tree, depending on what you need to do with it: source-to-source translation, interpretation, code generation, etc.
Algebraic Data Types — A data model of sum and product types. This model is particularly convenient for representing the structure of programming languages.
Trivia about the Unix shell language, including the common ksh/bash extensions.
Here Document -- A unique construct in shell for writing a piece of text to be fed to
stdin of a process.
Domain Specific Languages by Martin Fowler -- A book of patterns for implementing DSLs. Discusses lexical state.
GNU Bash — The most popular shell implementation.
Debian Almquist Shell — A fork of the Almquist Shell that Debian and Ubuntu use for shell scripts, but not the default login shell. If you look at the busybox
ash source code, it
is apparent that they are similar. The things I notice most about it are that
kebab-case function names aren't allowed, and it has a bug related to
readonly and tilde expansion.
MirBSD Korn Shell — A fork of
pdksh (Public Domain Korn Shell). This is the default shell on
Android. Testing this shell against others has taught me that many "bash-isms"
are actually "ksh-isms".
bash implemented many
ksh extensions for
zsh is probably the second most popular interactive shell, after bash. It's
not POSIX-compliant by default, although it has options to make it POSIX
compliant. Apparently, it doesn't split words by default.
Korn Shell — ksh was an extension of the Bourne shell, developed at Bell Labs.
bash cloned many of its features.
Tcl — An embedded scripting language that's influenced some alternative shells. It has Lisp-like properties.
Awk — A classic Unix programming language for text processing.
Make — A classic Unix build tool that is also a Turing-complete programming language.
Shell — An interactive program to control the Unix operating system, as well as a programming language. Oil aims treat shell as a serious programming language.
R language — A language for statistical computing, including data cleaning, modelling, and visualization.
ML — ML stands for "meta-language": a language for manipulating languages. The ML family of languages includes OCaml and Haskell, and its distinguishing feature is the data model of algebraic data types. The domain-specific language ASDL uses this data model.