blog | oilshell.org

Lexer Hints: Disambiguating the Right Parenthesis

2017-12-16

Lexer Hints: How to Tokenize the Right Paren

Bash article. Quote from it? It has to understand a lot. OSH solves this.

Meaning of )

Command language:

Word language:

Arith Language:

Ambiguity: recall $((echo hi)) is not allowed. Preserves a single token of lookahead. With modes.

This is sort of an adhoc disambiguation mechanism. See "A possibly correct C11 parser for a good discussion of ad hoc mechanisms and the challenges of integrating them with parser generators. It can be done.

Theory

Review:

In other words: instead of having potentially infinite tape on which to compute, computation is restricted to the portion of the tape containing the input plus the two tape squares holding the endmarkers.

what's an example of something that's unrestricted but not context-sensitive?
I think it's the C type-name / identifier problem.  The "lexer hack".
Because that requires arbitrary storage/

Why does theory matter? Because of the non-determinism

Lexer is HIGHLY STRUCTURED: modes and a hint stack. Maybe could generate it? Well Unread() is a problem. Unread at most one token.

Parser is structured too but kinda messy.

Footnote: Unread

To be comprehensive, I have to Unread().

I think this is because of

$(case foo) echo hi ;; esac))

The details are unclear, but I mention is for completeness.

bash does have an unread.