blog |

Language Composition in Shell


In the last post, we saw that lexer modes are useful for language composition.

This post will show how different languages are composed in shell.

The point is really to justify that shell is itself many languages. This is leaving aside the fact that you can put perl or Python in here docs.

I will try to keep the examples fairly basic for those who don't know shell.

But inevitably there will be some unfamiliar constructs. Those who know shell might.

That is all for this post. It's a bunch of code showing different compositions, and why we need mosts.

I will also list the 14 modes in the appendix.

In the next post, I will talk another issue with the Oil lexer: lexer hints.

Why Is Shell Interesting?

Language Composition in Shell

The above examples are compositions of languages with different lexical structures:

  1. Expressions outside string literals vs. expressions inside them.
  2. Arithmetic expressions (a+b) vs. regular expressions /ab+/.
  3. Regular Expressions abc+ vs. character classes like [^abc].

Shell takes this composition of languages to an extreme. Here is an overview:

The Word Language is Recursive

To a first approximation, shell words are expressions that eventually get evaluated into a element of an argv array.

This expression has three words, the first of which is a command:

ls /bin /lib

The word language is recursive:

$ echo ${a:-default}
> echo ${a:-${b:-default}}
> echo ${a:-${b:-${c:-default}}}

There are also assignment words:

$ name=world
> echo hello $name
hello world

The right-hand side of an assignment can be an array literal:

$ declare -a myarray=(1 2 3)
> echo "${myarray[@]}"
1 2 3

And the entries can be arbitrary words:

$ declare -a myarray=(${a:-default} 2 3)
> echo "${myarray[@]}"
default 2 3

Commands Are Made of Words

Arithmetic Expressions Are Made of Words

Boolean Expressions Are Made of Words

Mutual Recursion: Words Can Contain Commands and Arithmetic

Mutual recursion.

Other Recursive Shell Sublanguages

I mentioned in the last post that there are at least four recursive (and mutually-recursive) sublanguages to shell: Command, Word, Arith, and Bool. Here are some more:

(1) Brace expansion

$ type=sh; echo {build,test,release}/*.{py,$type}
build/*.py build/*.sh test/*.py test/*.sh release/*.py release/*.sh

The braces can be nested arbitrarily:

$ echo _{-{X{1..3}Y}-,{a,b}}_
_-{X1Y}-_ _-{X2Y}-_ _-{X3Y}-_ _a_ _b_

(2) Extended globs. Globs are not a recursive language, but [extended globs][extglob] are.

$ [[ --help == --@(help|verbose=@(1|2)) ]] && echo TRUE

(3) Regular Expressions.

$ [[ --help == --@(help|verbose=@(1|2)) ]] && echo TRUE

Appendix: Non-Recursive Shell Sublanguages