Comments About Shell, Awk, and Make

2017-10-25

A few forum comments I've made would have made good blog posts. But I declare blog bankruptcy again, so I'll just link to and summarize the comments here.

The comments about shell are immediately useful usage tips. The comments about Awk and Make are more abstract, with one exception.

I'll try to provide the main point inline, but click through to the comment if you'd like details.

Table of Contents

Shell Usage Tips

Awk Language Design

Make: Automatic Prerequisites and Language Design

Conclusion

Shell Usage Tips

(1) Explaining syntax of time. The time construct in bash is part of the language, not a shell builtin. It's more like a for loop than cd.

Analogously: [ Is a Builtin, But [[ Is Part of the Language
This is also true of the rarely used coproc and select keywords (e.g. see help select in bash).

(2) Explaining syntax of find. find is an external command, but it's also an expression language with no lexer.

In other words, it's is similar to the test / [ builtin, and its syntax has similar problems: Problems With the test Builtin: What Does -a Mean?
You can also think of find as a predicate/action language like Awk.

(3) help-bash: Awkward Behavior of Empty Arrays (September messages).

This long help thread is related to Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays, where I talk about the copy and splice operations for arrays.

It's long and not very readable, but from it, I distilled an extended style guide for using arrays. Here is a list of valid operations:

copy and splice, mentioned above
iterate over the strings in an array
split a string into an array, using an arbitrary delimiter
join the elements of an array into a string, using an arbitrary delimiter

Using any other operation on arrays risks confusing them with strings.

Use set -u / set -o nounset to avoid out of bounds access. However, there is a bug fixed very recently, in bash 4.4: Empty arrays are confused with unset variables.

(4) Grouping and Redirect Syntax in Shell

I explain some gotchas about shell syntax, and the semantics of > and < redirects.

Awk Language Design

(1) Comparing the Syntax and Semantics of Awk and JavaScript. They have surprisingly similar syntax, but different semantics.

Yet another way of putting it is that Awk is language with a function call stack, but no heap. This of course imposes severe restrictions on the language and its containers.

But if there's no heap, then you don't need garbage collection!

Addendum: I also realized that Awk can't express my solution to the Git log in HTML problem. Python's useful re.sub() API is impossible in Awk, because it doesn't have first-class functions:

re.sub(
    r"\x00(.*)\x00", 
    lambda match: cgi.escape(match.group(1)),
    sys.stdin.read())

This indicates to me that Awk is stuck in the 1980's, but the model is useful enough that I still see lively discussions and new documents being written about it.

Make: Automatic Prerequisites and Language Design

(1) Simpler Automatic Prerequisites in GNU Make.

Make has the problem of extracting the dependency graph from C #include statements.

My initial comment here was wrong — I wrote some code to convince myself of that. I had been following the pattern in the GNU Make Manual, which uses a gross piece of sed to massage the output of gcc -M, writing a .d file.

The commenters taught me something. I'm not convinced this is a great solution for future build tools to emulate, but it's worth thinking about.

The gcc -M interface is also pretty maddening, and I've already forgotten the details of it.

As far as I remember, this mad-scientist.net post eventually comes to the same conclusion, although the code there is long and intermingled with other concerns, like using an arbitrary output directory.

TODO: It would be nice to write up A Simpler Method for Automatic C Dependencies in GNU Make.

(2) .PHONY targets are a smell. In my opinion, Make should be treated as a dataflow language. Its purpose is to let you specify a partial order for incremental and parallel builds.

Shell is a better language for imperative actions. I mentioned the "argv dispatch pattern", i.e. using "$@" as the last line of your script. Almost all of the shell scripts in the Oil repo use this pattern.

TODO: Write a blog post about it, and also mention the variant with better error checking:

case $action in 
  build|test|deploy) "$@" ;;
  *)                 die "Invalid action ${action}" ;;
esac

(3) What are Make's weaknesses as a dataflow language?

OK, maybe Make is not actually what I want it to be. I think its evolution has been confused, much like the evolution of shell.

Make is not good for specifying dataflow because of:

The multiple outputs problem. One commenter suggested the "obvious thing", which is wrong.
Make doesn't consider the absence of a prerequisite to mean the target is out of date. It has odd special cases for "intermediate files", which doesn't compose.
Metaprogramming the dependency graph is clumsy. (This isn't in the thread, but a recent Makefile I wrote for oilshell.org analytics drove this point home.)

The overall problem is that instead of thinking of make like a functional/parallel language, you end up "stepping through" it, like an imperative language.

(4) There are three Turing-complete languages in GNU Make: Make, Shell, and Guile Scheme.

You can write a Lisp in shell and make, and Guile Scheme is already a Lisp.

It's bad enough that when writing a Makefile, you need to know two languages simultaneously, as well the places where their syntax collides. (What does $$ mean in Make? What does it mean in shell?)

But those two languages aren't expressive enough, so they added a third language!

Conclusion

I linked to observations I've made about shell, awk, and Make. If any of it was useful to you, let me know.

In the next post, I'll link to comments about programming language design and implementation. Depending on the feedback, I'll include more or fewer comments.