Syntactic Concepts in the Oil Language

Oil is an extension of the Unix shell, which means that it's a large language. The concepts introduced here may help advanced users remember the syntax.

However, new users should read these docs to learn the syntax:

Read on to learn about:

Table of Contents
Sigils and Sigil Pairs
Valid Contexts
Parse Options to Take Over (), @, set, and =
Static Parsing
Aside: Duplicate Functionality in Bash
Related Links
Command vs. Expression Mode
Lexer Modes
More Information
Related Documents
Appendix: Hand-Written vs. Generated Parsers

Sigils and Sigil Pairs

A sigil is a symbol like the $ in $mystr.

A sigil pair is a sigil with opening and closing delimiters, like ${var} and @(seq 3).

See A Feel For Oil's Syntax for a list of sigils and sigil pairs.

Valid Contexts

Each sigil pair may be available in command mode, expression mode, or both.

For example, command substitution is available in both:

echo $(hostname)      # command mode
var x = $(hostname)   # expression mode

Array literals only make sense in expression mode:

var myarray = %(one two three)

echo one two three  # no array literal needed

The $'' syntax for C-style strings makes sense in command mode:

echo $'foo\n'  # the bash-compatible way to do it

but in expression mode, we prefer r'' and c'':

var raw      = r'c:\Program Files\'
var newlines = c'foo\n'

var newlines = $'foo\n'  # also accepted

A sigil pair often changes the lexer mode to parse what's inside.

Parse Options to Take Over (), @, set, and =

Most users don't have to worry about parse options. Instead, they run either bin/osh or bin/oil, which are actually aliases for the same binary. The difference is that bin/oil has the option group oil:all on by default.

Nonetheless, here are two examples.

The parse_at option (in group oil:basic) turns @ into the splice operator when it's at the front of a word:

$ var myarray = %(one two three)

$ echo @myarray         # @ isn't an an operator in shell
@myarray

$ shopt -s parse_at     # parse the @ symbol
$ echo @myarray
one two three

$ echo '@myarray'       # quote it to get the old behavior
@myarray

The parse_set option (in group oil:all) lets you use set as a keyword to mutate vars. It's shorter than setvar.

set -o errexit          # set is a shell builtin

shopt -s parse_set      # parse set as a keyword

set x = 42 + a[i]       # Now it accepts a LHS and RHS

builtin set -o errexit  # One way to use the set builtin

Static Parsing

POSIX specifies that Unix shell has multiple stages of parsing and evaluation. For example:

$ x=2 
$ code='3 * x'
$ echo $(( code ))  # Silent eval of a string.  Dangerous!
6

Oil expressions are parsed in a single stage, and then evaluated, which makes it more like Python or JavaScript:

$ setvar code = '3 * x'
$ echo $[ code ]
3 * x

Another example: shell assignment builtins like readonly and local dynamically parsed, while Oil assignment like const and var are statically parsed.

Aside: Duplicate Functionality in Bash

It's confusing that bash has both statically- and dynamically-parsed variants of the same functionality.

Boolean expressions:

C-style string literals:

Related Links

Command vs. Expression Mode

The Oil parser starts out in command mode:

echo "hello $name"

for i in 1 2 3 {
  echo $i
}

But it switches to expression mode in a few places:

var x = 42 + a[i]      # the RHS of an assignment is an expression

echo $len('foo')       # interpolated function call

echo $[mydict['key']]  # interpolated Oil expressions with $[]

See Command vs. Expression Mode for details.

Lexer Modes

Lexer Modes are a technique that Oil uses to manage the complex syntax of shell, which evolved over many decades.

For example, : means something different in each of these lines:

PATH=/bin:/usr/bin          # Literal string
echo ${x:-default}          # Part of an opeartor
echo $(( x > y ? 42 : 0 ))  # Arithmetic Operator
var myslice = a[3:5]        # Oil expression

To solve this problem, Oil has a lexer that can run in many modes. Multiple parsers read from this single lexer, but they demand different tokens, depending on the parsing context.

More Information

Related Documents

Appendix: Hand-Written vs. Generated Parsers

The OSH language is parsed "by hand", while the Oil language is parsed with tables generated from a grammar (a modified version of Python's pgen).

This is mostly an implementation detail, but users may notice that OSH gives more specific error messages!

Hand-written parsers give you more control over errors. Eventually the Oil language may have a hand-written parser as well. Either way, feel free to file bugs about error messages that confuse you.


Generated on Thu Oct 8 13:33:44 PDT 2020