blog | oilshell.org

Oil's Parser Doesn't Worry About Aliases and Prompts

2020-01-18

This is part two of "The Interactive Shell Needs a Principled Parser", which was mentioned in the January blog roadmap.

The last post argued that a shell should use its parser for history and autocompletion.

This post is something of the opposite: a shell parser shouldn't be concerned with alias expansion or the interactive prompt. Those are orthogonal concerns.

I show code "smells" rather than specific bugs, so the argument is more abstract. But I believe it's an important issue, especially when you want to expand the shell language. You'll also see that the author of the rc shell had essentially the same criticism back in 1991!

Table of Contents
The PS2 Problem
The Alias Problem
Global Variables and Grammars
Spec Test Results
Caveat
Summary
Appendix: dash Code Excerpts
Its parser is overly concerned with interactive prompts
... and alias

The PS2 Problem

In a POSIX shell,

What I call the "$PS2 problem" is simply: When the user hits Enter, does the shell execute the line of text, or does it print $PS2 and wait for more input?

$ echo hi     # Enter causes command to be executed
hi           

$ if echo hi  # prints > and prompts for more input
> then        # more input needed after 'then'
>

Oil handles this problem outside the lexer and parser, in the InteractiveLineReader.

In contrast, all the shells I've looked at litter their parsers with references to the prompt. See the appendix for evidence of that in dash.

The Alias Problem

Over the last few years of implementing shell, I've found many times that a careful reading of the POSIX spec isn't sufficient.

Instead, I use two main techniques to determine the required behavior:

  1. Observe what other shells do by writing spec tests (an outside view).
  2. Read the code of popular shell implementations like bash and dash (an inside view).

When implementing alias, I was surprised that shells implemented it by littering their parsers with reads and writes of global variables. Again, see the appendix for evidence of this in dash.

Global Variables and Grammars

Does this matter? Let me make a more abstract argument first.

In my BayLISA presentation last year, I quoted a few 25-year-old complaints about shell, including this one:

... nobody really knows what the Bourne shell’s grammar is. Even examination of the source code is little help. The parser is implemented by recursive descent, but the routines corresponding to the syntactic categories all have a flag argument that subtly changes their operation depending on the context.

— Tom Duff in a paper on Plan 9's rc shell, 1991

Duff is lamenting the "flag arguments", but global variables are strictly worse.

If you agree that this was a problem 25 years ago, it's even more of a problem today. The POSIX spec was silent on such issues then, and it is now. Since then, shells have grown more ad hoc features.

Oil's behavior diverges slightly from other shells, but it's designed to be documentable. Global variables are extraneous state outside the grammar, and you don't need them to describe what Oil does. To implement alias expansion, it re-invokes the parser as a library, rather than changing global flags.

Spec Test Results

To be more concrete, let's look at the results of running spec/alias.test.sh. Here are the cases that shells disagree on:

So out of four shells (including mksh), none of them agree on all alias test cases. To be honest, this isn't worse than any other shell feature. Given all the global flags, I was surprised at the relative agreement!

Nonetheless I still prefer Oil's strict style, because it makes it easier to expand the language without thinking about prompts or aliases.

Caveat

As mentioned in the last post, there are still more things to implement in Oil, and there are undoubtely cases where it behaves worse than existing shells.

Help me polish it by testing it interactively and on real shell scripts. See Help Wanted and Where To Send Feedback.

Summary

This post explained that Oil's parser is not concerned with these interactive features:

On the other hand, the last post showed that the parser can used as a library to implement:

Following the January blog roadmap, the next post will clarify my goals for the reduced Oil language.

Appendix: dash Code Excerpts

Its parser is overly concerned with interactive prompts

Dash has a ~1500 line recursive-descent parser, and it deals with the prompt throughout. Other shells are implemented similarly.

In Oil, this knowledge is confined to the InteractiveLineReader.

~/dash-0.5.8/src$ grep -n prompt parser.c
86:int doprompt;                        /* if set, prompt the user */
87:int needprompt;                      /* true if interactive and at start of line */
112:STATIC void setprompt(int);
141:    doprompt = interact;
142:    if (doprompt)
143:            setprompt(doprompt);
144:    needprompt = 0;
662:            if (needprompt) {
663:                    setprompt(2);
774:    if (needprompt) {
775:            setprompt(2);
790:                            if (doprompt)
791:                                    setprompt(2);
798:                    needprompt = doprompt;
881:            if (c == '\034' && doprompt
899:                            if (doprompt)
900:                                    setprompt(2);
920:                                    if (doprompt)
921:                                            setprompt(2);
1078:                   needprompt = doprompt;
1298:   int uninitialized_var(saveprompt);
1318:                   if (needprompt) {
1319:                           setprompt(2);
1328:                                   if (doprompt)
1329:                                           setprompt(2);
1352:                           needprompt = doprompt;
1375:           saveprompt = doprompt;
1376:           doprompt = 0;
1382:           doprompt = saveprompt;
1489:setprompt(int which)
1494:   needprompt = 0;
1495:   whichprompt = which;
1504:           out2str(getprompt(NULL));
1513:   int saveprompt;
1518:   saveprompt = doprompt;
1519:   doprompt = 0;
1523:   doprompt = saveprompt;
...

... and alias

Likewise, the parser has many checks for global flags, including a flag for alias expansion. In contrast, Oil invokes its parser as a library to expand aliases.

~/dash-0.5.8/src$ grep -i -n -C 1 alias parser.c
...
--
160-
161:    checkkwd = CHKNL | CHKKWD | CHKALIAS;
162-    if (nlflag == 2 && tokendlist[peektoken()])
--
203-                    }
204:                    checkkwd = CHKNL | CHKKWD | CHKALIAS;
205-                    if (tokendlist[peektoken()])
--
241-            }
242:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
243-            n2 = pipeline();
--
264-            negate = !negate;
265:            checkkwd = CHKKWD | CHKALIAS;
266-    } else
--
278-                    lp = (struct nodelist *)stalloc(sizeof (struct nodelist));
279:                    checkkwd = CHKNL | CHKKWD | CHKALIAS;
280-                    lp->n = command();
--
363-            n1->nfor.var = wordtext;
364:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
365-            if (readtoken() == TIN) {
--
392-            }
393:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
394-            if (readtoken() != TDO)
--
409-            n2->narg.next = NULL;
410:            checkkwd = CHKNL | CHKKWD | CHKALIAS;
411-            if (readtoken() != TIN)
--
472-    /* Now check for redirection which may follow command */
473:    checkkwd = CHKKWD | CHKALIAS;
474-    rpp = rpp2;
--
512-
513:    savecheckkwd = CHKALIAS;
514-    savelinno = plinno;
--
556-                            n->type = NDEFUN;
557:                            checkkwd = CHKNL | CHKKWD | CHKALIAS;
558-                            n->ndefun.text = n->narg.text;
--
725-
726:    if (checkkwd & CHKALIAS) {
...
727:            struct alias *ap;
728:            if ((ap = lookupalias(wordtext, 1)) != NULL) {
...