blog | oilshell.org

find and test: How To Read And Write Them

2021-04-10 (Last updated 2021-04-22)

This is another short post based on a Hacker News comment.

Many shell users are confused by the find command, so I show a way to remember its usage (#usage-tips). Like many things in shell, it's awkward, but powerful once you know it.

Table of Contents
Context
My Response
Slogan: "I'm too lazy to write a lexer"
More find Usage Tips
Don't rely on implicit -a
Use -prune with .git
Help Wanted
Related

Context

Comment from ben0x539:

find is weird anyway. The stuff after the arguments aren't really flags, they're a tiny filter language, with significant ordering and operator precedence and all that stuff

My Response

Yes exactly, find is like test a.k.a [.

test -f foo -a -f bar -o -z spam
[ -f foo -a -f bar -o -z spam ]   # same thing!

can be read

isfile('foo') && isfile('bar') || emptystring('spam')

Likewise

find . -name '*.py' -a -executable -a -printf '%s %P\n'

can be read

Traverse '.' and evaluate at every node F:
nameMatches(F, '*.py') && isExecutable(F) && printf('%s %P\n', F)

Both tools respect -a and -o for AND / OR, ! for NOT, and ( ) for precedence.

Slogan: "I'm too lazy to write a lexer"

Confusingly, you must quote ( and ). This is because they're shell operators, and the language is embedded in the argv array. Example:

find . -executable -a '(' -name '*.py' -o -name '*.sh' ')' -a -print

is like

isExecutable(F) &&
(nameMatches(F, '*.py') || nameMatches(F, '*.sh')) &&
print()

The way I remember this abuse is to think that the language designer was too lazy to write a lexer! In contrast, DSLs like awk and jq do not use this pattern. They have lexers, and hence lexical syntax. (Related: posts tagged #lexing).

Trivia: the expr tool for arithmetic also uses this pattern:

# * must be quoted to avoid confusing with glob
$ expr 1 + 2 '*' 3
7

But you shouldn't use it, as POSIX shell arithmetic is now universal:

$ echo $((1 + 2*3))
7

The difference between expr and $(( )) is exactly the difference between [ and [[: external/builtin vs. language, dynamic vs. static parsing.

Oil aims to do away with all this silliness with Python-like, statically parsed expressions.

More find Usage Tips

Don't rely on implicit -a

The parser tries hard to add -a automatically. This abbreviation:

find . -name '*.py' -executable -printf '%s %P\n'  # missing -a!

is the same as our longer version above:

find . -name '*.py' -a -executable -a -printf '%s %P\n'

But I prefer the latter style. I think this confusing abbreviation is another reason that people have a hard time learning the syntax and execution model of find.

Use -prune with .git

Annother interesting aspect of the find language is that it has side effects like -print, -printf, -exec, and -prune.

The -prune command alters the file system traversal in the middle of it, which can make it more efficient. For example, this command avoids even statting nodes under the .git subtree (not just printing them):

find . -name .git -a -prune -o -print

Or with the fictional syntax:

find . (nameMatches(F, '.git') && prune() || print())

I started using this to optimize Oil's own scripts, like the ones that parse one million lines of shell. Now I feel comfortable using this style interactively. It takes some getting used to.

(Original comment)

Help Wanted

A couple years ago, someone helped me implement a better find without this wonky syntax for Oil. But it isn't done and needs some love. If anyone wants to help, feel free to join Zulip :-)

I do think that find is more like a language than a command line tool. It's pretty powerful; e.g. I just used it to sort through 20 years of haphazard personal backups.

Related