|
blog | oilshell.org
BayLISA Presentation Materials
2019-01-18
These are my notes for last night's BayLISA presentation on Oil.
I gave many live demos, so these notes are missing the essence of the talk.
But I expect to give a recorded talk covering this same material in the future.
The talk lasted for over 90 minutes, and there was a lively audience with many
questions and useful feedback. Thanks to Marc Rovner and the BayLISA board for
inviting me to speak!
I prepared too much material, so I skipped everything below the Design and
Implementation section. I now realize that's an entirely separate talk.
The main point of the talk is that I'd like experienced users to run their
shell scripts with OSH and send me feedback. Please:
I just released OSH 0.6.pre12, which you can download
here:
Thank you!
Audience Background
I was excited to speak in front of an experienced UNIX audience.
- How many people have heard of Oil?
- Who writes shell scripts?
- Who avoids writing shell scripts?
- Who avoids shell scripts but has to maintain scripts written by others?
- Who uses the cloud? AWS, Azure, Google Cloud, Kubernetes, etc.
- Python?
- C and C++?
Intro
About Me
Work:
- Electronic Arts (2002-2004)
- Google (2005-2016)
- Dev Tools, Web Services, Distributed Systems
- Big Data, Data Science, Machine Learning
- Open Source (2016-present)
Languages I Use:
- Python, C, C++, R, JavaScript, and many domain-specific languages
- And shell to glue everything together!
What the Project Is: OSH and Oil
- OSH: a compatible shell to run your existing programs.
- Close to feature-complete.
- It needs polish, optimization, and attention to a few corner cases.
- Oil language: Designed on paper.
Takeaways for this Talk
- Understand the motivation for the project, and how it's different from existing shells.
- Find people to run real shell scripts with OSH and give feedback.
- OSH is almost ready for early adopters to use on interactively. I need to
switch to it myself first.
Motivation
- Shell scripts are useful; you can be very productive with them.
- T-Shirt: I didn't understand this until 2010 or so.
- Shell is more widely applicable now than 15 years ago. I expect it will
still be important in 15 years.
- The shell language is hard to learn.
- Shell scripts are fragile.
- Shell implementations like bash are old and buggy. They have
surprising corner cases and bad error messages.
Slogans to Explain the Project
- Our upgrade path out of bash.
- Languages have strong network effects.
- Shell was already old and regretted 25 years ago.
If Tcl does become the "standard scripting language", users will curse it
for years — the way people curse Fortran, MSDOS, Unix shell syntax,
and other de facto standards they feel stuck with.
— Richard Stallman on Usenet, 1994
A second problem with using a Bourne shell compatible language is
that field splitting and file name generation are done on every
command word. In purely string processing applications, this is not
the desired default [...] With ksh it is possible to disable field
splitting and/or file name generation on a per function basis, which
makes it possible to eliminate this common source of errors.
— David Korn in a Usenix paper on ksh, 1994
... nobody really knows what the Bourne shell’s grammar is. Even examination
of the source code is little help. The parser is implemented by recursive
descent, but the routines corresponding to the syntactic categories all have
a flag argument that subtly changes their operation depending on the context.
— Tom Duff in a paper on Plan 9's rc shell, 1991
- Let's take shell seriously as a programming language. It's used as one,
but not implemented as one.
- An experiment in software engineering.
- Can we replace bash with something much simpler?
- Can we implement it with normal parsing algorithms rather than thousands of
lines of ad hoc code?
- Can I teach shell "with a straight face"?
- Also, I want to write an OSH manual "with a straight face".
- If shell were a better language, it would be appropriate to expand its usage to more domains.
- Data science, e.g. Why Use Make?
- Shell should be the language for describing the architecture of
distributed systems (the last part of the talk, if there's time).
- i.e. this is motivation for the Oil language.
- More: Why Create a New Unix Shell?.
A Story About a 30-Year-Old Security Problem
Suppose you run the following script as root. How many people think the user
who supplies the number should be able to delete your hard drive?
# print-incremented.sh
x=$(cat user-supplied-number.txt)
echo $(( x + 1 ))
Related blog posts (from October 2016):
Audit: What do shell scripts use arithmetic for?
- counters:
i=0; (( i++ ))
.
- calculating with the output of
wc -l
- calculating with seconds:
date +%s
- Parsing
$bin --version
Quiz on POSIX Shell
This section was meant to motivate POSIX shell as a bad language — not
just bash. But I skipped it because motivation was already well understood.
- Env binding / Assignment
- Redirects
- Subshell vs. Brace Group
Project Status and History
Summary:
- OSH can run big hairy programs.
- Making it faster with "OPy" is still an open problem.
- The Oil language is designed but not implemented.
Details:
- April 2016: Started working on this incarnation of the project.
Established the testing strategy.
- October 2016: "Proved" the structure of the parser, and started the blog.
- Spring 2017: OSH runs my own shell scripts.
- July 2017: OSH runs Python's autotools-generated configure script. First
release.
- January 2018: Big milestones that led me to continue the project.
- Summer 2018: OPy, OVM, refactoring/rewrites, toward the goal of Hollowing
out the Python Interpreter
- October 2018: Running Bash Completion Scripts with OSH
- December 2018: Interactive shell UI.
Other facts:
- Small contributions from about 20 people over the course of the project.
- Current release is OSH 0.6.pre11.
- Features: See OSH Quick
Reference
- Unsupported: job control,
printf
, advanced history expansion.
Demos
Static Parsing and Error Messages
- Examples: commands, functions, quoted strings, brace expansion
- Syntax errors ahead of time
- More detailed syntax errors with Column numbers (Clang vs. GCC)
Notes in retrospect: The audience liked this section, but there was some
confusion and I could have done a crisper demo, and explained it more clearly.
Running Scripts from Linux Distros
Success with Aboriginal, Alpine, and Debian Linux
- Debian
- Aboriginal Linux
- Alpine Linux
Running Interactive Completion Scripts
- Partially working.
- What's the only thing worse than an ad hoc shell parser in C?
OSH to Oil translation
JSON Crash Dump
- Claim: This should always be on in production!
Development Process (testing)
Design and Implementation
As noted, I skipped everything below in the name of time. It's a separate
presentation.
Line Counts
OSH runs many real bash scripts, but it's much smaller than bash.
- Bash 4.4: 124K physical, 88K significant (excluding GNU readline)
- OSH: 21K physical, 11K significant
- Ratios: 5.9x physical, 8x significant
Caveats:
- OSH still has fewer features than bash (60-80%?)
- OSH ships with a portion of the Python VM
Implementation Style / Metaprogramming
Oil uses domain-specific languages to "compress" the code:
- re2c for all character-level processing
- Zephyr ASDL for the core data structure representing shell code
- The OPy compiler for all other code
Summary: All the code in Oil is processed by our own compilers.
Benefits of metaprogramming:
- Source code readability and size
- Correctness
Architecture: The Parser as a Library
See doc/architecture-notes.md
- As far as I can tell, OSH is the only POSIX shell that:
- Uses its own parser for completion. (Demo)
- Use its own parser for history expansion, e.g.
!$
picks off the last
unevaluated word.
The OSH front end is written in a particular style:
- Lexing with modes (via re2c).
- Static parsing in a single pass (as much as possible).
- Representation of the code with Zephyr ASDL and the
Lossless Syntax Tree.
- The LST can be used for multiple purposes:
- Execution
- Translation to another language
- Static analysis.
- Future: auto-formatting like gofmt and clang-format.
- Future: use for syntax highlighting
This style enables easy language composition.
Open Problems
- Speed.
- The project's scope is too big.
- Even if I don't get to the Oil language, replacing bash is valuable.
- Recent idea: cut scope on bootstrapping.
Criteria for the OSH Language Definition (analogy to POSIX)
- Rough criteria to include features:
- It's is POSIX, or
- Two or more shells implement it in the same way, as verified by the spec
tests, or
- An "important" bash script needs it.
- Excluded: bash trivia that scripts don't use in practice. "Demand-driven".
- Discovery: I think OSH will be a significantly simpler language than bash.
- Discovery: all shells are highly POSIX compliant, for the areas of the language
that POSIX specifies. But POSIX only covers a small portion of say dash,
let alone bash.
Recap
- I argued why we need a new Unix shell. An individual user has limited choice
in practice: it's an ecosystem problem.
- Main way to help: Run OSH on your shell scripts. OSH language corner cases.
- help-wanted
on the issue tracker.
- Oil is an experiment, but I think it will be a practical tool as well. There
are some open problems but they seem within reach.
Q&A
General areas:
- Motivation
- OSH Language Definition
- OSH Design and Implementation; Software Architecture
- Oil Language Design (speculative)
- Tentative Roadmap
Future: Oil Language
This is future work, so I saved it for last.
- A nicer, statically parsed version of everything in shell (globs, regexes,
brace expansion, loops, functions, pipelines, etc.)
- Roughly speaking:
- the "single-line" language has the same syntax, with some edits
- the "block" language has different syntax:
for
, if
, case
, functions
- Python/JavaScript like functions, and an expression language (proc and func)
- Ruby-like blocks (e.g. in the style of Rake, and Chef and Vagrant configs)
- Goal: Subsume 70's style macro processing:
- sh in Make
- sh in Docker
- sh in Ruby DSLs
- sh in YAML (various systems around Kubernetes)
- sh in systemd unit files, Ubuntu upstart configuration
- Future Goal: Service Definitions. A distributed system is a bunch of
heterogeneous processes and ports.
- Application config and flags
- Container definition (file system, kernel settings)
- Scheduler
- Auth
Slogan:
- Old sludge: sh, make, awk, sed, autotools, m4
- New sludge: sh, YAML, Go templates, JSON, Docker, Ruby, Python