Home

Problems With the test Builtin

2017-08-28

I recently implemented the test builtin, also known as [. Since I had already implemented the [[ variant in OSH, I thought this would be straightforward.

But as always, shell is full of surprises. In this post, I give examples of ambiguous test expressions. Then I describe the curious algorithm that bash/POSIX uses to resolve ambiguities, as well as an example where this algorithm breaks down.

You can consider this another episode of Shell: The Bad Parts.

Background

Recall the difference between [ and [[ from October:

That post shows the difference in both parsing and execution between these two statements:

$ if false; then [ a == ]; else echo 'NOT PARSED'; fi
NOT PARSED
$ if false; then [[ a == ]]; else echo 'NOT PARSED'; fi
/bin/bash: line 1: unexpected argument `]]' to conditional binary operator
/bin/bash: line 1: syntax error near `;'
/bin/bash: line 1: `if false; then [[ a == ]]; else echo 'NOT PARSED'; fi'

Users reported that Gentoo and Nix both invoke [ without $PATH set, which means that the coreutils executables /usr/bin/test and /usr/bin/[ won't be found.

Background

Last October, I described the difference between [ and [[.

I originally thought people could use /usr/bin/[.

But Gentoo and Nix both use [ without $PATH set. So I thought: how hard could it be to implement?

[ is just an expression langauge with no lexer. Just replace the lexer and that's it. (Other examples: find, expr)

However, I soon found out that there are fundamental problems with the design of the test builtin.

It is an instance of string confusion.

Review

As a reminder, here is how the builtin works:

$ [ -z "" ];  echo $?;    # -z returns 0/true on an empty string
0
$ [ -z foo ]; echo $?;    #            1/false on a non-empty string
1

In bash, -a is an alias for -e:

$ [ -a / ];     echo $?;  # -a returns 0/true if the path exists
0
$ [ -a /oops ]; echo $?;  #            1/false if it doesn't
1

Three Meanings of -a

  1. Literal string
  2. Unary operator
  3. Binary Operator

Test body

What does [ -a -a -a -a ] Mean?

if (s > t) { # greater than less than. This depends on LOCALE. Maybe change # it to a function? } if (s == t) {
}

Disabled: ! -a -o ( ) < >

a = ' 3 ' # note spaces, we read them from a file b = ' 5 ' test $a -lt $b

if (a < b) # BAD: STRINGS THAT LOOK LIKE NUMBERS # maybe disallow this, too subtle! # I would have to write a comment

if (sortsBefore(a, b)) if (order(a, b)) if (cmp(a, b)) if (cmpLocale(a, b))

if (Int(a) < Int(b)) # THis is OK

mystr='-a' # -a can just be a string myfile='-a' # -a is a valid filename

[ $mystr ] -- Test if A is empty

[ -a ] -> [ $mystr ] "a"

[ -a -a ] -> [ -a $myfile ]

[ -a -a -a ] -> [ $mystr -a $otherstr ] [ -a -a -a ]

[ -a -a -a -a ] -> syntax error!! But this DOES have a representaiton.

exists "a" and "a"

The 4 case POSIX thing isn't enough!

[ ( -a -a )

Also see "Three Meanings of Slash" and #

[ -z ] [ -z -a ] [ -z -a ] ] # another weird lookahead case

Another way to think of it is if there were no difference in Python between the following:

and and and "and" and "and"

equal equal equal equ

Part two:

POSIX Rules

Shell Style Guideline: For Oil Translation

I don't think this style guideline is very restrictive. In fact I never used [[ -- I've been writing shell scripts for 10 years. The 2 and 3 argument versions of test suffice for almost all purposes.

Options for Oil:

Style guideline / Oil:

just use two arg or three args:o

test -f "$path" ->

test -file $path # not quoting, get rid of

test is-file $path test is-dir $path test exists $path test is-pipe $path

alternative if (isFile(path)), if (isDir)

test $path older-than $path && test $path newer-than $path # with auto-complete test $path is-hard-link-to $path # with auto-complete

Philosophy: Compatible vs. Nice Translations

Because OSH will implement essentially all shell builtins, it is trivial to make Oil compatible. But we don't want the Oil language to be burdened by compatibility -- that's how we ended up with [ -a -a -a] in the first place!

So if you follow our (loose) style guidelines, you'll get the nice translation.

If you don't, you'll get the "compatible translation", with __. The __ is a visual cue that you could manually rewrite some code to be nicer in Oil.

Example Oil Translations

These are just what I'm thinking; they haven't been implemented yet.

You must follow the style guidelines above. However, I still want to retain the property of automatic conversion. So I'm thinking of having a namesapce for shell builtins in Oil.

Refer to Translating Shell to Oil.

if _ test -a -a -a {
  hello
}

_ could be old builtins. It's subtle a sign that something could be "modernized".

Another option would be if eval-sh "test -a -a -a" {} , but this seems too ugly.

Other options:

if $ test -a -a -a -{
}

if $$ test -a -a -a -{
}

if __ test -a -a -a -{
}

while __ read -r foo {
}

Or you could also do:

Appendix A: Differences betwen [ and [[