blog | oilshell.org
At the end of Shell Has a Forth-Like Quality, I mentioned that
su have odd interfaces. Instead of using part of their own
array as the
argv array for the child — which allows Bernstein
Chaining — they accept shell strings.
This interface requires a shell to interpret, so the process tree looks like this:
su andy -c 'ls /' /bin/sh -c 'ls /' /bin/ls /
rather than this:
sudo ls / # sudo uses its argv array /bin/ls /
anacrolix asked if a wrapper can fix this. Though
I've hit this problem while using
scp to copy filenames with spaces, I hadn't
thought about it until then.
This problem can be solved with Python 2's commands module:
#!/usr/bin/python # argv_to_sh.py import commands import sys # strategy: double quote if it has a single quote; otherwise single quote for arg in sys.argv[1:]: sys.stdout.write(commands.mkarg(arg))
This script is invoked like this:
ssh localhost "$(./argv_to_sh.py touch 'filename with spaces')"
which is equivalent to:
ssh localhost "touch 'filename with spaces'"
In other words, it eliminates the need for double quoting, which is hard to read and write, especially when dealing with whitespace, quotes, and backslashes.
For comparison, if you assumed the composable
ssh localhost touch 'filename with spaces'
you would get three files in your home directory, instead of one.
$ ls -1 ~/filename ~/with ~/spaces /home/andy/filename /home/andy/with /home/andy/spaces
Even more oddly,
ssh does accept an array, but its elements are joined
before passing them to the shell. For example, this command has two
double-quoted arguments, but it works just like previous commands:
ssh localhost "touch" "'filename with spaces'".
The help for
ssh is misleading, implying that it takes a single command:
usage: ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] ... ... [-w local_tun[:remote_tun]] [user@]hostname [command]
I would notate it like this:
ssh [user@]hostname [SHELL_STRING_PART]...
It's conceivable that
ssh could spawn
ls without a remote shell.
One possible reason for accepting a shell string is that you can do remote evaluation of environment variables:
ssh localhost 'echo $HOME' # evaluate in remote environment ssh localhost "echo $HOME" # evaluate in local environment ssh localhost echo $HOME # ditto, local environment
A more fundamental reason is that they both run processes under a different
uid), and the Unix convention is that the shell sets up a user's
environment, e.g. setting
$USER. Though this isn't
entirely convincing because many Unix daemons run directly under
uid, without a shell as a parent.
There's less justification for argument joining, but
ssh probably does this
so you can leave off the quotes in the common case:
ssh localhost 'touch filename_without_spaces' # explicit ssh localhost touch filename_without_spaces # for convenience
I consider this a misfeature because it causes confusion about what the input syntax is. A different flag for each syntax would be a nicer interface:
ssh -c 'echo $HOME' # shell string interface ssh -a touch 'filename with spaces' # argv array interface
(1) Python's commands module was deprecated in favor of subprocess, and
mkarg() function was lost. If you use a system without Python 2,
you can copy the function from
argv_to_sh.py (and perhaps
# Make a shell command argument from a string. # Return a string beginning with a space followed by a shell-quoted # version of the argument. # Two strategies: enclose in single quotes if it contains none; # otherwise, enclose in double quotes and prefix quotable characters # with backslash. # def mkarg(x): if '\'' not in x: return ' \'' + x + '\'' s = ' "' for c in x: if c in '\\$"`': s = s + '\\' s = s + c s = s + '"' return s
system()function (POSIX, Linux), and
exec*()family of functions (POSIX, Linux)
In Unix, the
argv array interface is more fundamental because
defined in terms of
exec(). That is,
system(const char* command) is
exec(['sh', '-c', command]).
(3) In contrast, the Windows
CreateProcess() API takes a string, not an array
of strings. Making each application is responsible for quoting leads to
(4) In Python, this is the difference between
# Works as intended; lists $HOME directory subprocess.call('ls ~', shell=True) # Tries to execute binary named `ls ~` subprocess.call('ls ~', shell=False)
(4) Julia has an unique API that allows hygienic string interpolation and avoids the shell.
It's important to precise about code that deals with strings vs. code that
argv arrays. Using arrays not only saves a shell process, but
removes surface area for command injection
You can use the
argv_to_sh.py script with
su to eliminate errors
caused by incorrect double quoting.
As always, the
blog-code repository on Github contains code and
from this article.
Working on the osh to oil converter has prompted a lot of thinking and research on ML. I borrowed the model of algebraic data types via ASDL, but now I want pattern matching too!
I may write about this, but unfortunately I think I have to bootstrap Oil in Python first, without pattern matching.
There are still many topics left on the blog TODO stack.
seschwaron Lobsters makes a good point about the su interface. You can avoid double quoting with
-c 'exec "$0" "$@"'.