[ Is a Builtin, But [[ Is Part of the Language


The current theme of this blog is to show how the oil parser works. But let's make sure we understand an important concept first: parse-time errors vs. runtime errors.

The way I write conditions in shell is like this:

if test -d /tmp; then
  echo "/tmp is a dir"

This is the same thing:

if [ -d /tmp ]; then
  echo "/tmp is a dir"

But yet another construct for conditional expressions is [[. The Google Shell Style Guide recommends using it, reasoning that:

[[ ... ]] reduces errors as no pathname expansion or word splitting takes place between [[ and ]].

[[ ... ]] allows for regular expression matching where [ ... ] does not.

That is, consider the following:

x='name with space.sh'
[ $x == *.sh ]
[[ $x == *.sh ]]

On the [ line, $x will be split into 3 arguments. The glob *.sh is also expanded into multiple arguments, depending on what's in the current directory. Both of these things cause the wrong number of arguments to appear on each side of ==. In contrast, the [[ expression will have exactly one argument on the left and right of ==. (It tests if $x matches the pattern *.sh, which is true.)

What I didn't realize before implementing the oil parser is that this doesn't quite capture the difference between [[ and [. The more important difference is that [[ is part of the shell language, while [ is a builtin.

This means that an expression inside [[ ... ]] is parsed up front, before any code is executed. In contrast, the arguments to [ are parsed by the builtin itself at runtime.

(In terms of parsing arguments, shell builtins behave like external commands. The fact that they happen to live inside the /bin/sh binary doesn't change anything.)

The bash help doesn't capture this difference either: see help [ and help [[. The parse-time vs. runtime distinction isn't mentioned.

Let's write some code to show this difference. First we create syntax errors by leaving off the right hand side of an equality test:

$ [ a == ]
/bin/bash: line 1: [: a: unary operator expected
$ [[ a == ]]
/bin/bash: line 1: unexpected argument `]]' to conditional binary operator
/bin/bash: line 1: syntax error near `]]'
/bin/bash: line 1: `[[ a == ]]'

On the face of it, these errors look similar. Now let's use the general technique of wrapping them in if false:

$ if false; then [ a == ]; else echo 'NOT PARSED'; fi
$ if false; then [[ a == ]]; else echo 'NOT PARSED'; fi
/bin/bash: line 1: unexpected argument `]]' to conditional binary operator
/bin/bash: line 1: syntax error near `;'
/bin/bash: line 1: `if false; then [[ a == ]]; else echo 'NOT PARSED'; fi'

bash parsed the first statement without issue, and executed the else clause. The stuff inside [ is just an opaque list of strings. We never executed it and never parsed it.

In contrast, it emitted a parse error for the second statement, and didn't execute any code. This is because [[ is actually part of the language.

Tomorrow we will use the same if false technique to compare the oil parser with popular shell parsers. We will see which errors they can catch at parse time, and which errors have to wait until runtime.

oil has the philosophy that catching errors earlier is better. You don't want run a 4 hour script and get a syntax error after 3 hours.