Bash article. Quote from it? It has to understand a lot. OSH solves this.
Meaning of )
Command language:
Word language:
Arith Language:
Ambiguity: recall $((echo hi)) is not allowed. Preserves a single token of lookahead. With modes.
This is sort of an adhoc disambiguation mechanism. See "A possibly correct C11 parser for a good discussion of ad hoc mechanisms and the challenges of integrating them with parser generators. It can be done.
Mosts make python's lexer non-regular?
Hint Stack makes it context-sensitive
Review:
regular: regular expressions, not regxes.
context-free: can match parens
context-sensitive: stack, can match indentation
unrestricted: can do anything.
In other words: instead of having potentially infinite tape on which to compute, computation is restricted to the portion of the tape containing the input plus the two tape squares holding the endmarkers.
what's an example of something that's unrestricted but not context-sensitive? I think it's the C type-name / identifier problem. The "lexer hack". Because that requires arbitrary storage/
Why does theory matter? Because of the non-determinism
Lexer is HIGHLY STRUCTURED: modes and a hint stack. Maybe could generate it? Well Unread() is a problem. Unread at most one token.
Parser is structured too but kinda messy.
To be comprehensive, I have to Unread().
I think this is because of
$(case foo) echo hi ;; esac))
The details are unclear, but I mention is for completeness.
bash does have an unread.