Why Sponsor Oils? | blog | oilshell.org
Here are two examples of directly manipulating file descriptors in shell. Do they look familiar to you?
# Naming file descriptors: from the Nix package repo
exec {fd}< "$fn"
# ... read from $fd ...
exec {fd}<&-
# Explicit save/restore of file descriptor: from Apache Yetus
exec 6>&1 1>"${LOG_FILE}"
# ... do work and write to stdout ...
exec 1>&6 6>&-
I wasn't familiar with these constructs before I started writing a shell, and I suspect many others aren't either.
Now that I know what they do, I argue that they're not worth filling your head with. They could go in a book called Shell: The Bad Parts.
In short, there are simpler ways of accomplishing these tasks, like:
read myvar < "$fn" # read a variable from a line of a file
and
myfunc > ${LOG_FILE} # redirect stdout of a function to a file
If you're not convinced, read on for details.
In both cases, the exec
builtin is used to manipulate the file descriptor
table of the current process. This is unrelated to the usual use of exec
,
which is to replace the executable that a process is running. Type help exec
in bash for details.
The two examples come from real shell scripts in "the wild". I copied the relevant parts into demo.sh in my oilshell/blog-code repo. Then I tested and rewrote them.
The first example came up on issue 26, when a user tried to run Nix build scripts with OSH.
It involves the file setup.sh in the nixpkgs repo. nixpkgs defines all packages for the Nix package manager.
I've rewritten the isElf
function as isElfSimple.
These three lines:
exec {fd}< "$fn"
read -r -n 4 -u "$fd" magic
exec {fd}<&-
can be replaced with the single line:
# read 4 bytes from $path, without escaping, into the $magic var
read -r -n 4 magic < "$path"
We simply redirect stdin
from a file while the read
builtin is running.
For another way of looking at it, the three lines of tortured syntax are saying the same thing as this simple Python/C program:
int fd = open(path) # return a file descriptor from a path
bytes magic = read(fd, 4) # read four bytes from that descriptor
close(fd) # close the descriptor
Really, that's it! It's hard to imagine a worse syntax than exec {fd}< "$fn"
for opening a file and exec {fd}<&-
for closing it.
The second example was pointed out in comments to OSH Runs Real Shell Programs. The Apache Yetus project is a set of tools and libraries for release automation. It makes extensive use of shell scripts.
In the file builtin-bugsystem.sh, these lines:
if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then
exec 6>&1 1>"${CONSOLE_REPORT_FILE}"
fi
echo FOO
echo BAR
if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then
exec 1>&6 6>&-
fi
Can be replaced with:
doWork() {
echo FOO
echo BAR
}
if [[ -n "${CONSOLE_REPORT_FILE}" ]]; then
doWork > ${CONSOLE_REPORT_FILE}
else
doWork
fi
By extracting a function doWork, you avoid the duplicate conditionals, as
well as the need to explicitly save stdout
as descriptor 6
.
Redirects to file system paths implicitly save and restore the file descriptor state, so there's no need to do it yourself.
For example, consider doWork > out.txt
. This means:
fd = open("out.txt")
stdout
(descriptor 0
) to fd
.doWork
function, which may call both builtins and external
programs.
echo FOO
writes to stdout, but stdout
is now connected to
a disk file out.txt
.ls
inherit stdout
from the shell, so their
output also goes to out.txt
.doWork
, restore stdout
to whatever it was before, e.g. the
terminal.Also consider read myvar < in.txt
. What does that do?
Before implementing a shell, I didn't realize that a shell needs to save and restore file descriptors. There's a long discussion on this topic in the comment thread mentioned above.
If this concept is unfamiliar to you, it might help to think of file
descriptors as pointers to data structures in the kernel. Those data
structures could represent a pipe()
or an open()
file.
With this viewpoint, shell redirection is mutating process-wide globals in the kernel.
File descriptors aren't literally pointers; they're small integers, because the
kernel is in a different address space. And you can't copy them with a C
assignment statement; you have to use the dup2(1, 2)
system call.
But otherwise the analogy holds: copying a file descriptor to a different
position in the table is like copying a pointer, so it's not permanently
lost when overwritten. This is more or less what a shell redirect like
1>&2
does.
I showed that advanced file descriptor manipulation in bash can be replaced with simpler constructs.
If you're not convinced, clone demo.sh and play with it. The tests show that the original code and my rewrites behave the same way:
$ ./demo.sh testIsElf
$ ./demo.sh testDoWorkAndLog
If you know of a use case where directly manipulating file descriptors is essential or preferable, please leave a comment. I'm collecting feedback for the design of the Oil language.
Let me propose a more aggressive style rule:
The only file descriptor that should appear explicitly in a shell script is
2
, forstderr
.
For example, I often use this log
function:
log() {
echo "$@" >&2
}
log "hello $name" # goes to stderr
It explicitly mentions descriptor 2 so that no other part of the program needs to.
I've never found a need for any other file descriptor. Descriptors 0
and 1
for stdin
and stdout
are the defaults for many shell constructs, like
echo
and read
, so they don't need to be explicitly mentioned.
If you disagree with this rule, let me know.
If you like writing shell scripts, please try the first OSH release on your programs, and file bugs on Github.
If you'd like to run the development version, see the Contributing page. Right now I'm knocking off shell features needed to run Nix's setup.sh.