Syntactic Concepts in the Oil Language

These documents introduce the Oil language:

In contrast, the concepts introduced below may help advanced users remember Oil and its syntax. Read on to learn about:

Table of Contents
Command vs. Expression Mode
Lexer Modes
More Information
Sigils and Sigil Pairs
Valid Contexts
Parse Options to Take Over (), @, set, and =
Static Parsing
Aside: Duplicate Functionality in Bash
Related Links
Related Documents
Appendix: Hand-Written vs. Generated Parsers

Command vs. Expression Mode

The Oil parser starts out in command mode:

echo "hello $name"

for i in 1 2 3 {
  echo $i

But it switches to expression mode in a few places:

var x = 42 + a[i]      # the RHS of an assignment is an expression

echo $len('foo')       # interpolated function call

echo $[mydict['key']]  # interpolated Oil expressions with $[]

See Command vs. Expression Mode for details.

Lexer Modes

Lexer modes are a technique that Oil uses to manage the complex syntax of shell, which evolved over many decades.

For example, : means something different in each of these lines:

PATH=/bin:/usr/bin          # Literal string
echo ${x:-default}          # Part of an opeartor
echo $(( x > y ? 42 : 0 ))  # Arithmetic Operator
var myslice = a[3:5]        # Oil expression

To solve this problem, Oil has a lexer that can run in many modes. Multiple parsers read from this single lexer, but they demand different tokens, depending on the parsing context.

More Information

Sigils and Sigil Pairs

A sigil is a symbol like the $ in $mystr.

A sigil pair is a sigil with opening and closing delimiters, like ${var} and @(seq 3).

An appendix of A Feel For Oil's Syntax lists the sigil pairs in the Oil language.

Valid Contexts

Each sigil pair may be available in command mode, expression mode, or both.

For example, command substitution is available in both:

echo $(hostname)      # command mode
var x = $(hostname)   # expression mode

Array literals only make sense in expression mode:

var myarray = %(one two three)

echo one two three  # no array literal needed

The $'' syntax for C-style strings makes sense in command mode:

echo $'foo\n'  # the bash-compatible way to do it

but in expression mode, we prefer r'' and c'':

var raw      = r'c:\Program Files\'
var newlines = c'foo\n'

var newlines = $'foo\n'  # also accepted

A sigil pair often changes the lexer mode to parse what's inside.

Parse Options to Take Over (), @, set, and =

Most users don't have to worry about parse options. Instead, they run either bin/osh or bin/oil, which are actually aliases for the same binary. The difference is that bin/oil has the option group oil:all on by default.

Nonetheless, here are two examples.

The parse_at option (in group oil:basic) turns @ into the splice operator when it's at the front of a word:

$ var myarray = %(one two three)

$ echo @myarray         # @ isn't an an operator in shell

$ shopt -s parse_at     # parse the @ symbol
$ echo @myarray
one two three

$ echo '@myarray'       # quote it to get the old behavior

The parse_set option (in group oil:all) lets you use set as a keyword to mutate vars. It's shorter than setvar.

set -o errexit          # set is a shell builtin

shopt -s parse_set      # parse set as a keyword

set x = 42 + a[i]       # Now it accepts a LHS and RHS

builtin set -o errexit  # One way to use the set builtin

Static Parsing

POSIX specifies that Unix shell has multiple stages of parsing and evaluation. For example:

$ x=2 
$ code='3 * x'
$ echo $(( code ))  # Silent eval of a string.  Dangerous!

Oil expressions are parsed in a single stage, and then evaluated, which makes it more like Python or JavaScript:

$ setvar code = '3 * x'
$ echo $[ code ]
3 * x

Another example: shell assignment builtins like readonly and local dynamically parsed, while Oil assignment like const and var are statically parsed.

Aside: Duplicate Functionality in Bash

It's confusing that bash has both statically- and dynamically-parsed variants of the same functionality.

Boolean expressions:

C-style string literals:

Related Links

Related Documents

Appendix: Hand-Written vs. Generated Parsers

The OSH language is parsed "by hand", while the Oil language is parsed with tables generated from a grammar (a modified version of Python's pgen).

This is mostly an implementation detail, but users may notice that OSH gives more specific error messages!

Hand-written parsers give you more control over errors. Eventually the Oil language may have a hand-written parser as well. Either way, feel free to file bugs about error messages that confuse you.

Generated on Wed Nov 18 14:50:55 PST 2020