Home

Grammar for Variable Substitutions

2016-10-26

At the bottom of this page is a grammar for the language accepted inside ${}. I used it to help me write a recursive descent parser.

I've tested it on over a hundred thousand lines of shell, and it appears to parse everything correctly. (The previous iteration had problems with stuff like ${@:1:2}).

Some observations:

$ echo ${a-${a-${a-unset}}}
unset

The grammar tries to strike a balance between being faithful to bash and following the philosophy of early errors. Bash accepts some code as syntactically valid, but doesn't interpret it correctly.

For example, bash allows multiple subscripts during parsing, but ignores them during execution:

$ array=(abc def ghi)
> echo ${array[0]}
> echo ${array[0][1]}
> echo ${array[0][1][2]}
> echo ${array[0][1][2] : 1 : 2}
abc
abc
abc
bc

Here is an example where slices are accepted, but ignored:

$ array=(abc def ghi jkl)
> echo ${#array[@]}  # length of array, OK
> echo ${array[@] : 1 : 2}  # slice of array, OK
> echo ${#array[@] : 1 : 2}  # why is this 4?
4
def ghi
4

The oil parser disallows both of these constructs, since they don't seem to be implemented correctly.

The Grammar

(This is just part of the word grammar)

NAME        = [a-zA-Z_][a-zA-Z0-9_]*
NUMBER      = [0-9]+                    # ${10}, ${11}, ...

Subscript   = '[' ('@' | '*' | ArithExpr) ']'
VarSymbol   = '!' | '@' | '#' | ...
VarOf       = NAME Subscript?
            | NUMBER   # no subscript allowed, none of these are arrays
            | VarSymbol

TEST_OP     = '-' | ':-' | '=' | ':=' | '+' | ':+' | '?' | ':?'
STRIP_OP    = '#' | '##' | '%' | '%%'
CASE_OP     = ',' | ',,' | '^' | '^^'

UnaryOp     = TEST_OP | STRIP_OP | CASE_OP | ...
Match       = ('/' | '#' | '%') WORD   # match all / prefix / suffix
VarExpr     = VarOf
            | VarOf UnaryOp WORD
            | VarOf ':' ArithExpr (':' ArithExpr )?
            | VarOf '/' Match '/' WORD

LengthExpr  = '#' VarOf     # can't apply operators after length

RefOrKeys   = '!' VarExpr   # CAN apply operators after a named ref
                            # ${!ref[0]} vs ${!keys[@]} resolved later

PrefixQuery = '!' NAME ('*' | '@')   # list variable names with a prefix

VarSub      = LengthExpr
            | RefOrKeys
            | PrefixQuery
            | VarExpr