Consider this program:
$ for i in $(seq 10); do > cat <<EOF > here doc $i > EOF > done here doc 1 here doc 2 here doc 3 here doc 4 here doc 5 here doc 6 here doc 7 here doc 8 here doc 9 here doc 10
To run it,
bash does this 10 times:
fork()a child process
open()a temp file for write
write()the expanded here doc to it. The contents depends on the iteration.
open()it again read-only
unlink(), so it will be deleted after it's closed
dup2(4, 0)the resulting descriptor so that the new process has the temp file as stdin
catreads the file from disk, writing its contents to
strace -ff -e open,close,unlink,read,write,execve,dup2 \ -- $sh ./here_doc_disk.sh Process 4090 attached [pid 4090] open("/tmp/sh-thd-865008962", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 [pid 4090] write(4, " here doc 1\n", 15) = 15 [pid 4090] close(4) = 0 [pid 4090] open("/tmp/sh-thd-865008962", O_RDONLY) = 4 [pid 4090] close(3) = 0 [pid 4090] unlink("/tmp/sh-thd-865008962") = 0 [pid 4090] dup2(4, 0) = 0 [pid 4090] close(4) = 0 [pid 4090] execve("/bin/cat", ["cat"], [/* 68 vars */]) = 0 ... [pid 4090] read(0, " here doc 1\n", 65536) = 15 [pid 4090] write(1, " here doc 1\n", 15 here doc 1 ) = 15
mksh do the same thing, which surprised me.
dash does something more expected and elegant, which is to start
one end of a
stdin, rather than a temp file. Strings longer than
PIPE_SIZE will cause
write() to block, but I think that just requires a
little extra care in the implementation.
Curiously, the "here string" construct in bash also uses temp files:
cat <<< "here string $i"
I don't see a reason to use temp files in either case, other than the fact that in ancient computing history people didn't want to hold entire "files" in memory. Compilers used to work a line at a time too.
Based on parsing real shell scripts, here docs are generally tiny, so I don't expect string size to be an issue.
I think my shell langauge will only have the here string operator, and
implement it with pipes like
dash does. From a programmer's perspective,
here docs are just a weird kind of multiline string. These two
invocations have the same output:
s="\ one two" cat <<< "$s" cat <<EOF one two EOF
That is, shell strings are already multiline. I guess I should allow some
kind of line-based delimiter in the string literal syntax, because the
\ is a
bit ugly. But this special syntax for multiline strings doesn't need to be
coupled with the notion of piping to stdin.
I showed in the last post that here doc syntax is unintuitive in
other ways: quoted delimiters to eliminate expansion; the
<<- variant to
strip leading tabs; and the post-order traversal rule for multiple here docs on
oil implements all of this in its
sh parser. But now that I fully understand
the traditional syntax, I want to design something nicer for the oil language,
as well as improve its implementation.