Oil's Parser Doesn't Worry About Aliases and Prompts

2020-01-18

This is part two of "The Interactive Shell Needs a Principled Parser", which was mentioned in the January blog roadmap.

The last post argued that a shell should use its parser for history and autocompletion.

This post is something of the opposite: a shell parser shouldn't be concerned with alias expansion or the interactive prompt. Those are orthogonal concerns.

I show code "smells" rather than specific bugs, so the argument is more abstract. But I believe it's an important issue, especially when you want to expand the shell language. You'll also see that the author of the rc shell had essentially the same criticism back in 1991!

Table of Contents

The PS2 Problem

The Alias Problem

Global Variables and Grammars

Spec Test Results

Caveat

Summary

Appendix: dash Code Excerpts

Its parser is overly concerned with interactive prompts

... and alias

The PS2 Problem

In a POSIX shell,

$PS1 is the prompt for the first line of input, like $
$PS2 is the prompt for continuation lines, like >

What I call the "$PS2 problem" is simply: When the user hits Enter, does the shell execute the line of text, or does it print $PS2 and wait for more input?

$ echo hi     # Enter causes command to be executed
hi           

$ if echo hi  # prints > and prompts for more input
> then        # more input needed after 'then'
>

Oil handles this problem outside the lexer and parser, in the InteractiveLineReader.

In contrast, all the shells I've looked at litter their parsers with references to the prompt. See the appendix for evidence of that in dash.

The Alias Problem

Over the last few years of implementing shell, I've found many times that a careful reading of the POSIX spec isn't sufficient.

Instead, I use two main techniques to determine the required behavior:

Observe what other shells do by writing spec tests (an outside view).
Read the code of popular shell implementations like bash and dash (an inside view).

When implementing alias, I was surprised that shells implemented it by littering their parsers with reads and writes of global variables. Again, see the appendix for evidence of this in dash.

Global Variables and Grammars

Does this matter? Let me make a more abstract argument first.

In my BayLISA presentation last year, I quoted a few 25-year-old complaints about shell, including this one:

... nobody really knows what the Bourne shell’s grammar is. Even examination of the source code is little help. The parser is implemented by recursive descent, but the routines corresponding to the syntactic categories all have a flag argument that subtly changes their operation depending on the context.

— Tom Duff in a paper on Plan 9's rc shell, 1991

Duff is lamenting the "flag arguments", but global variables are strictly worse.

If you agree that this was a problem 25 years ago, it's even more of a problem today. The POSIX spec was silent on such issues then, and it is now. Since then, shells have grown more ad hoc features.

Oil's behavior diverges slightly from other shells, but it's designed to be documentable. Global variables are extraneous state outside the grammar, and you don't need them to describe what Oil does. To implement alias expansion, it re-invokes the parser as a library, rather than changing global flags.

Spec Test Results

To be more concrete, let's look at the results of running spec/alias.test.sh. Here are the cases that shells disagree on:

#14, dash disagrees. First and second word are the same alias, but no trailing space
#22, zsh disagrees. Loop split across both iterative and recursive aliases
#29, zsh disagrees. Alias is respected inside eval
#37, bash disagrees. Here doc inside alias

So out of four shells (including mksh), none of them agree on all alias test cases. To be honest, this isn't worse than any other shell feature. Given all the global flags, I was surprised at the relative agreement!

Nonetheless I still prefer Oil's strict style, because it makes it easier to expand the language without thinking about prompts or aliases.

Caveat

As mentioned in the last post, there are still more things to implement in Oil, and there are undoubtedly cases where it behaves worse than existing shells.

Help me polish it by testing it interactively and on real shell scripts. See Help Wanted and Where To Send Feedback.

Summary

This post explained that Oil's parser is not concerned with these interactive features:

The interactive prompt ($PS1 vs. $PS2)
Alias expansion

On the other hand, the last post showed that the parser can used as a library to implement:

History expansion
Autocompletion

Following the January blog roadmap, the next post will clarify my goals for the reduced Oil language.

Appendix: `dash` Code Excerpts

Its parser is overly concerned with interactive prompts

Dash has a ~1500 line recursive-descent parser, and it deals with the prompt throughout. Other shells are implemented similarly.

In Oil, this knowledge is confined to the InteractiveLineReader.

~/dash-0.5.8/src$ grep -n prompt parser.c
86:int doprompt;                        /* if set, prompt the user */
87:int needprompt;                      /* true if interactive and at start of line */
112:STATIC void setprompt(int);
141:    doprompt = interact;
142:    if (doprompt)
143:            setprompt(doprompt);
144:    needprompt = 0;
662:            if (needprompt) {
663:                    setprompt(2);
774:    if (needprompt) {
775:            setprompt(2);
790:                            if (doprompt)
791:                                    setprompt(2);
798:                    needprompt = doprompt;
881:            if (c == '\034' && doprompt
899:                            if (doprompt)
900:                                    setprompt(2);
920:                                    if (doprompt)
921:                                            setprompt(2);
1078:                   needprompt = doprompt;
1298:   int uninitialized_var(saveprompt);
1318:                   if (needprompt) {
1319:                           setprompt(2);
1328:                                   if (doprompt)
1329:                                           setprompt(2);
1352:                           needprompt = doprompt;
1375:           saveprompt = doprompt;
1376:           doprompt = 0;
1382:           doprompt = saveprompt;
1489:setprompt(int which)
1494:   needprompt = 0;
1495:   whichprompt = which;
1504:           out2str(getprompt(NULL));
1513:   int saveprompt;
1518:   saveprompt = doprompt;
1519:   doprompt = 0;
1523:   doprompt = saveprompt;
...

... and `alias`

Likewise, the parser has many checks for global flags, including a flag for alias expansion. In contrast, Oil invokes its parser as a library to expand aliases.

~/dash-0.5.8/src$ grep -i -n -C 1 alias parser.c
...
--
160-
161:    checkkwd = CHKNL | CHKKWD | CHKALIAS;
162-    if (nlflag == 2 && tokendlist[peektoken()])
--
203-                    }
204:                    checkkwd = CHKNL | CHKKWD | CHKALIAS;
205-                    if (tokendlist[peektoken()])
--
241-            }
242:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
243-            n2 = pipeline();
--
264-            negate = !negate;
265:            checkkwd = CHKKWD | CHKALIAS;
266-    } else
--
278-                    lp = (struct nodelist *)stalloc(sizeof (struct nodelist));
279:                    checkkwd = CHKNL | CHKKWD | CHKALIAS;
280-                    lp->n = command();
--
363-            n1->nfor.var = wordtext;
364:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
365-            if (readtoken() == TIN) {
--
392-            }
393:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
394-            if (readtoken() != TDO)
--
409-            n2->narg.next = NULL;
410:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
411-            if (readtoken() != TIN)
--
472-    /* Now check for redirection which may follow command */
473:    checkkwd = CHKKWD | CHKALIAS;
474-    rpp = rpp2;
--
512-
513:    savecheckkwd = CHKALIAS;
514-    savelinno = plinno;
--
556-                            n->type = NDEFUN;
557:                            checkkwd = CHKNL | CHKKWD | CHKALIAS;
558-                            n->ndefun.text = n->narg.text;
--
725-
726:    if (checkkwd & CHKALIAS) {
...
727:            struct alias *ap;
728:            if ((ap = lookupalias(wordtext, 1)) != NULL) {
...

Oil's Parser Doesn't Worry About Aliases and Prompts

The PS2 Problem

The Alias Problem

Global Variables and Grammars

Spec Test Results

Caveat

Summary

Appendix: dash Code Excerpts

Its parser is overly concerned with interactive prompts

... and alias

Appendix: `dash` Code Excerpts

... and `alias`