source | all docs for version 0.8.pre9 | all versions | oilshell.org

Simple Word Evaluation in Unix Shell

This document describes Oil's word evaluation semantics (shopt -s simple_word_eval) for experienced shell users. It may also be useful to those who want to implement this behavior in another shell.

The main idea is that Oil behaves like a traditional programming language:

It's parsed from start to end in a single pass.
It's evaluated in a single step too.

That is, parsing and evaluation aren't interleaved, and code and data aren't confused.

Table of Contents

An Analogy: Word Expressions Should Be Like Arithmetic Expressions

Design Goals

Examples

No Implicit Splitting, Dynamic Globbing, or Empty Elision

Splicing, Static Globbing, and Brace Expansion

Where These Rules Apply

Opt In to the Old Behavior With Explicit Expressions

More Word Evaluation Issues

More shopt Options

Arithmetic Is Statically Parsed

Summary

Notes

An Analogy: Word Expressions Should Be Like Arithmetic Expressions

In Oil, "word expressions" like

$x
"hello $name"
$(hostname)
'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"

are parsed and evaluated in a straightforward way, like this expression when x == 2:

1 + x / 2 + x * 3        → 8  # Python, JS, Ruby, etc. work this way

In contrast, in shell, words are "expanded" in multiple stages, like this:

1 + "x / 2 + \"x * 3\""  → 8  # Hypothetical, confusing language

That is, it would be odd if Python looked inside a program's strings for expressions to evaluate, but that's exactly what shell does! There are multiple places where there's a silent eval, and you need quoting to inhibit it. Neglecting this can cause security problems due to confusing code and data (links below).

In other words, the defaults are wrong. Programmers are surprised by shell's behavior, and it leads to incorrect programs.

So in Oil, you can opt out of the multiple "word expansion" stages described in the POSIX shell spec. Instead, there's only one stage: evaluation.

Design Goals

The new semantics should be easily adoptable by existing shell scripts.

Importantly, bin/osh is POSIX-compatible and runs real bash scripts. You can gradually opt into stricter and saner behavior with shopt options (or by running bin/oil). The most important one is simple_word_eval, and the others are listed below.
Even after opting in, the new syntax shouldn't break many scripts. If it does break, the change to fix it should be small. For example, echo @foo is not too common, and it can be made bash-compatible by quoting it: echo '@foo'.

Examples

In the following examples, the argv command prints the argv array it receives in a readable format:

$ argv one "two three"
['one', 'two three']

I also use Oil's var keyword for assignments. (TODO: This could be rewritten with shell assignment for the benefit of shell implementers)

No Implicit Splitting, Dynamic Globbing, or Empty Elision

In Oil, the following constructs always evaluate to one argument:

Variable / "parameter" substitution: $x, ${y}
Command sub: $(echo hi) or backticks
Arithmetic sub: $(( 1 + 2 ))

That is, quotes aren't necessary to avoid:

Word Splitting, which uses $IFS.
Empty Elision. For example, x=''; ls $x passes ls no arguments.
Dynamic Globbing. Globs are dynamic when the pattern comes from program data rather than the source code.

Here's an example showing that each construct evaluates to one arg in Oil:

oil$ var pic = 'my pic.jpg'  # filename with spaces
oil$ var empty = ''
oil$ var pat = '*.py'        # pattern stored in a string

oil$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']

In contrast, shell applies splitting, globbing, and empty elision after the substitutions. Each of these operations returns an indeterminate number of strings:

sh$ pic='my pic.jpg'  # filename with spaces
sh$ empty=
sh$ pat='*.py'        # pattern stored in a string

sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']

To get the desired behavior, you have to use double quotes:

sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']

Splicing, Static Globbing, and Brace Expansion

The constructs in the last section evaluate to a single argument. In contrast, these three constructs evaluate to 0 to N arguments:

Splicing an array: "$@" and "${myarray[@]}"
Static Globbing: echo *.py. Globs are static when they occur in the program text.
Brace expansion: {alice,bob}@example.com

In Oil, shopt -s parse_at enables these shortcuts for splicing:

@myarray for "${myarray[@]}"
@ARGV for "$@"

Example:

oil$ var myarray = @('a b' c)  # array with 2 elements
oil$ set -- 'd e' f            # 2 arguments

oil$ argv @myarray @ARGV *.py {ian,jack}@sh.com
['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']

is just like:

bash$ myarray=('a b' c)
bash$ set -- 'd e' f

bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']

Unchanged: quotes disable globbing and brace expansion:

$ echo *.py
foo.py bar.py

$ echo "*.py"            # globbing disabled with quotes
*.py

$ echo {spam,eggs}.sh
spam.sh eggs.sh

$ echo "{spam,eggs}.sh"  # brace expansion disabled with quotes
{spam,eggs}.sh

Where These Rules Apply

These rules apply when a sequence of words is being evaluated, exactly as in shell:

Command: echo $x foo
For loop: for i in $x foo; do ...
Array Literals: a=($x foo) and var a = @($x foo) (oil-array)

Shell has other word evaluation contexts like:

sh$ x="${not_array[@]}"
sh$ echo hi > "${not_array[@]}"

which aren't affected by simple_word_eval.

Opt In to the Old Behavior With Explicit Expressions

Oil can express everything that shell can.

Split with @split(mystr, IFS?)
Glob with @glob(mypat)
Elision with @maybe(s)

More Word Evaluation Issues

More `shopt` Options

nullglob - Globs matching nothing don't evaluate to code.
dashglob is true by default, but disabled when Oil is enabled, so that files that begin with - aren't returned. This avoids confusing flags and files.

Strict options cause fatal errors:

strict_tilde - Failed tilde expansions don't evaluate to code.
strict_word_eval - Invalid slices and invalid UTF-8 aren't ignored.

Arithmetic Is Statically Parsed

This is an intentional incompatibility described in the Known Differences doc.

Summary

Oil word evaluation is enabled with shopt -s simple_word_eval, and proceeds in a single step.

Variable, command, and arithmetic substitutions predictably evaluate to a single argument, regardless of whether they're empty or have spaces. There's no implicit splitting, globbing, or elision of empty words.

You can opt into those behaviors with explicit expressions like @split(mystr), which evaluates to an array.

Oil also supports shell features that evaluate to 0 to N arguments: splicing, globbing, and brace expansion.

There are other options that "clean up" word evaluation. All options are designed to be gradually adopted by other shells, shell scripts, and eventually POSIX.

Notes

Tip: View the Syntax Tree With `-n`

This gives insight into how Oil parses shell:

$ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
(C {<echo>} 
  {
    (braced_var_sub
      token: <Id.VSub_Name x>
      suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
    ) 
    (word_part.ArithSub
      anode: 
        (arith_expr.Binary
          op_id: Id.Arith_Plus
          left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
          right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
        )
    )
  }
)

You can pass --ast-format text for more details.

Evaluation of the syntax tree is a single step.

Generated on Wed Aug 19 00:07:37 PDT 2020

Simple Word Evaluation in Unix Shell

An Analogy: Word Expressions Should Be Like Arithmetic Expressions

Design Goals

Examples

No Implicit Splitting, Dynamic Globbing, or Empty Elision

Splicing, Static Globbing, and Brace Expansion

Where These Rules Apply

Opt In to the Old Behavior With Explicit Expressions

More Word Evaluation Issues

More shopt Options

Arithmetic Is Statically Parsed

Summary

Notes

Related Documents

Tip: View the Syntax Tree With -n

More `shopt` Options

Tip: View the Syntax Tree With `-n`