A list of topics and anchors that the blog and other docs link to.
#algol-like
ALGOL Family of Languages —
C-like imperative languages with functions, loops, conditionals, etc.
#oil-language
Oil Language —
A new shell language, to which bash programs can be automatically
translated. It has a superset of bash functionality, with a syntax
designed all at once instead of evolved. It will also incorporate elements of
Awk and Make.
#osh-language
OSH Language —
A statically-parseable language based on the common use
of shell, in particular bash. In almost all cases, it's
indistinguishable from bash.
#opy
OPy —
A variant of Python used to write OSH and Oil. See Cobbling Together a Python
Interpreter.
#boil
Boil —
The working name for the part of Oil that subsumes GNU Make. No code for this
exists yet.
#OVM
OVM —
The virtual machine that OSH and Oil will run on. As an implementation detail,
it's a fork of the CPython VM.
#readline
readline —
A line-editing library derived from bash. It has emacs
and vi
modes.
#pylibc
pylibc —
An extension module to expose
libc functions to Python.
Python implements its own glob()
or fnmatch()
that are different from the
ones in libc
. We may also need libc
's locale-aware string functions.
#nice-translation
Nice Translation —
A shell-to-Oil translation that only uses recommended concepts in Oil.
#compatible-translation
Compatible Translation —
A shell-to-Oil translation that uses Oil features that exist only for the sake
of bash compatibility.
#naive-style
Naive Style —
TODO
#pedantic-style
Pedantic Style —
TODO
#antlr
ANTLR —
A tool to generate top-down parsers (LL(k)
, LL(*)
). I ported the POSIX
shell grammar to ANTLR to machine check it, but it's not used to generate code.
#yacc
yacc —
A tool to generate bottom-up parsers. Bash uses yacc, which is a
mistake discussed in this AOSA Book chapter on Bash.
#re2c
re2c —
A tool for generating lexers from regular expressions. The best part of it is
that it's a library and not a framework.
#mypy
mypy —
A static type checker for Python.
#zephyr-asdl
Zephyr ASDL —
A domain-specific language using the model of algebraic data types,
which are a convenient for representing programs. See What is Zephyr
ASDL? and posts tagged
ASDL.
This article describes its use in Python. This SourceForge project contains the code.
#clang
Clang —
A modular front end for C and C++ that supports IDEs and other tools (as
well as the code-generating compiler). Oil has some similarities because we
have multiple uses cases for the parser: execution, interactive completion, a
tool to convert the osh language to the oil language, and more.
#protobuf
Protocol Buffers —
A schema language, serialization format, and set of APIs created and
open-sourced by Google.
#spec-test
sh_spec.py —
A test framework written for osh
that runs shell snippets against many
shells.
#wild-test
Wild Tests —
A test framework that tortures the OSH parser with real-world shell scripts.
#gold-test
Gold Tests —
A type of test that compares the output of OSH and bash (or another existing
shell). The assertions are implicit so you don't have to write them.
#coreutils
coreutils —
The GNU implementation of ls
, cp
, mv
, etc. It also has versions of
test
, time
, and kill
, which are typically shadowed by
similar-but-different shell builtins.
#find
find —
A classic Unix tool that walks a directory tree, filters its entries, and
performs actions. GNU findutils implements it. It doesn't look much like
Awk, but it has similar semantics: it applies predicates and actions on
a stream.
#xargs
xargs —
A tool that builds and executes command lines from stdin
. A very useful
GNU extension is xargs -P
, which starts processes in parallel.
#expr
expr —
An external tool that implements mathematical expressions for shell. It has
been mostly subsumed by the POSIX $((1+2))
construct, and the
[[ $mystr =~ $myregex ]]
construct. GNU autotools still
generates code that uses it.)
Themes: Correctness, security, performance.
#asan
AddressSanitizer —
A compiler tool for detecting memory errors at runtime. That is, it's a kind
of dynamic analysis. It solves roughly the same problem as Valgrind, but
it's faster. Also known as ASAN.
#afl
American Fuzzy Lop —
A fuzzer that uses compiler technology to efficiently explore code paths. In
the last few years, it's been used to surface hundreds of bugs in ubiquitous
and already well-tested pieces of open-source software. Its Wikipedia
page is also
helpful.
#perf
Linux perf —
User-space tools and kernel APIs for Linux performance analysis. Uses
CPU-specific features for accurate measurements.
#flame-graph
Flame Graph —
A relatively new technique for visualizing profiler output. It shows how much
execution time can be attributed to a particular call stack. Note that a
set of function call stacks forms a tree: a function may call multiple
functions.
This explains why flame graphs can also be used like treemaps, i.e. to visualize space used in a file system hierarchy.
#aboriginal-linux
Aboriginal Linux —
Shell scripts that implement the minimal Linux system that can rebuild
itself (discontinued as of April 2017.)
#abuild
abuild —
A 2500-line shell script that builds Alpine Linux packages.
#alpine-linux
Alpine Linux —
A minimal Linux distribution based on musl
libc and busybox.
#bwk
bwk —
Some software archaeology I did on Kernighan's Awk, to research how Awk
relates to the shell. (One interesting thing: they both don't implement
first-class compound data structures, and thus lack garbage collection.)
#autotools
GNU autotools —
A meta-build system that generates configure
shell scripts and Makefiles
from m4
macros.
#busybox
BusyBox —
A reimplementation of standard Unix command line utilities, commonly used on
embedded Linux systems.
#debian
debian —
One of the oldest and most popular Linux distributions. It uses the apt
package manager, which wraps dpkg
. Ubuntu is based on Debian.
#debootstrap
debootstrap —
Debian uses this large shell program to construct its base image
from binary packages.
#nix
Nix —
A purely-functional package manager and Linux distribution. As with nearly
all distributions, bash plays a fundamental role in building binary
packages.
#pypy
PyPy —
A Python interpreter written in Python (including a restricted subset RPython).
It has novel JIT technology and a focus on speed.
#tinypy
tinypy —
A interpreter for a subset of Python written in just ~2K lines of C and ~2K
lines of Python (using a very dense style). I used some tinypy code for my
pratt-parsing-demo, and it inspired the plan for Oil to have a Python
interpreter.
#toybox
Toybox —
A reimplementation of standard Unix command line utilities, by the former
maintainer of busybox.
#ninja
Ninja —
A "low-level" build system focused on incremental build speed. High level
languages like CMake generate Ninja build files.
#chroot
chroot —
A system call that gives a process a view of its own "virtual" file system.
Linux container technology like Docker or
LXC can be thought of as a "chroot on
steroids".
#tokenize
Python tokenize module —
A reimplementation of Parser/tokenizer.c
in pure Python. Part of the Python
standard library.
#pgen2
pgen2 —
A reimplementation of Parser/pgen.c
in Python, done for lib2to3.
#compiler2
compiler2 —
compiler2
is my name for the deprecated Python 2.7
compiler module. It does the same thing as Parser/compile.c
, but in
Python.
#byterun
byterun —
A Python bytecode interpreter written in Python, described in the AOSA
Book.
#posix-shell-spec
POSIX Shell Spec: POSIX specification for the shell (sh
).
It seems that ksh
was the dominant shell at the time of standardization, so
bash
implemented POSIX + a lot of ksh.
#posix-grammar
POSIX Shell Grammar: Subsection of the spec which has a
BNF-style grammar.
#google-style-guide
Google Shell Style Guide -- Unofficial shell style guide
at Google, which points out some deficiencies in the shell language. (Not all
shell scripts at Google attempt to conform to this style.)
#aosa-book-bash
Chapter on Bash in the Architecture of Open Source Applications —
An excellent article by bash maintainer Chet Ramey on bash's internal
structure.
#cfg
Context-Free Grammar -- A formalism for expressing the syntax of
programming languages. Shell can only be partially specified using a CFG; the
POSIX grammar is incomplete.
#peg
Parsing Expression Grammar -- An alternative formalism to context-free
grammars, which may be better-suited to expressing shell syntax.
#lexical-state
Lexical State -- A simple technique for parsing languages with
"subdialects".
#precedence-climbing
Precedence Climbing -- A simple algorithm for top-down
parsing of expressions. It's a special case of top-down operator precedence
parsing.
#tdop-parsing
Top-Down Operator Precedence Parsing -- Also called Pratt
parsing, this is a general algorithm for parsing expressions with multiple
levels of precedence.
#recursive-descent
Recursive Descent Parsing -- A kind of hand-written
top-down parser.
#top-down-parsing
Top-Down Parsing -- Parsing algorithms can be categorized
as either top-down or bottom-up. ANTLR uses top-down algorithms,
while yacc uses bottom-up algorithms. Pratt parsing
is a top-down algorithm and recursive descent is a
top-down technique. See LL and LR Parsing Demystified.
#AST
Abstract Syntax Tree —
In contrast to an AST, a parse tree is derived only from the rules of the
grammar for a language. You don't need to annotate your parser with nontrivial
"semantic actions". The exact definition is debatable, but in my usage, an AST
has some simplifications or annotations over a parse tree, depending on what
you need to do with it: source-to-source translation, interpretation, code
generation, etc.
#LST
Lossless Syntax Tree —
An "abstract" syntax tree with enough detail to reproduce the original source
code.
#adt
Algebraic Data Types —
A data model of sum and product types. This model is particularly convenient
for representing the structure of programming languages.
Trivia about the Unix shell language, including the common ksh/bash extensions.
#here-doc
Here Document —
A construct in shell for writing lines of text to be fed to stdin
of a
process. Perl, Ruby, and PHP borrowed here docs from shell.
#shell-builtin
Shell Builtin —
A shell builtin is just like an external command, e.g. /bin/ls
, except it's
linked into the sh
binary. It takes an argv
array, returns an exit code,
and uses stdin
, stdout
, and stderr
.
#flame-graph
Flame Graph —
A visualization that's commonly used for profiling the CPU usage of a program.
More generally, it can visualize quantities associated with each node in a
tree. Sets of Stack traces form a tree when combined.
#dsl-book</>
Domain Specific Languages by Martin Fowler -- A book of patterns
for implementing DSLs. Discusses lexical state.
#bash
GNU Bash —
The most popular shell implementation.
#dash
Debian Almquist Shell —
A fork of the Almquist Shell that Debian and Ubuntu use for shell scripts, but
not the default login shell. If you look at the busybox ash
source code, it
is apparent that they are similar. The things I notice most about it are that
kebab-case
function names aren't allowed, and it has a bug related to
readonly
and tilde expansion.
#mksh
MirBSD Korn Shell —
A fork of pdksh (Public Domain Korn Shell). This is the default
shell on Android. Testing this shell against others has taught me that many
"bash-isms" are actually "ksh-isms". bash
implemented many ksh
extensions
for compatibility.
#zsh
zsh —
zsh
is probably the second most popular interactive shell, after bash. It's
not POSIX-compliant by default, although it has options to make it POSIX
compliant. Apparently, it doesn't split words by default.
#ksh
Korn Shell —
ksh was an extension of the Bourne shell, developed at Bell Labs.
pdksh and bash cloned many of its features.
#pdksh
Public Domain Korn Shell —
A defunct clone of AT&T's Korn shell that survives in at least two forks: the
OpenBSD shell and mksh.
#tcl
Tcl —
An embedded scripting language that's influenced some alternative shells. It
has Lisp-like properties.
#lua
Lua —
Lua is an embedded scripting language, which means that the interpreter is
a library. It has no global variables, and requires explicit capabilities
to I/O. While I don't like Lua the language, this aspect of Lua will influence
Oil.
#sed
sed —
A text stream editor using a batch execution model.
#awk
Awk —
A classic Unix programming language for text processing.
#make
Make —
A classic Unix build tool that is also a Turing-complete programming language.
#shell
Shell —
An interactive program to control the Unix operating system, as well as a
programming language. Oil aims treat shell as a serious programming language.
#r-language
R language —
A language for statistical computing, including data manipulation, modelling,
and visualization.
#ML
ML —
ML stands for "meta-language": a language for manipulating languages.
The ML family of languages includes OCaml and Haskell, and its distinguishing
feature is the data model of algebraic data types. The domain-specific
language ASDL uses this data model.
#cpython
CPython —
The reference implementation of the Python programming language.
#python
Python —
The popular language that I wrote OSH in.
#ocaml
OCaml —
A popular modern implementation of ML. If I hadn't prototyped
OSH in Python, OCaml would have been a good choice. The compiler and runtime
are well-engineered and well-documented. They may influence
OPy.
#M4
M4 —
GNU Autotools is written in the text preprocessor language M4.
It's similar to the C preprocessor, except that it's Turing-complete. It was
designed to support a dialect of Fortran.