Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.

Notes on Unicode in Shell

Table of Contents
Philosophy
List of Unicode-Aware Operations in Shell
Implementation Notes

Philosophy

Oil's unicode support is unlike that of other shells because it's UTF-8-centric.

In other words, it's like newer languages like Go, Rust, Julia, and Swift, as opposed , JavaScript, and Python (despite its Python heritage). The latter languages use the notion of "multibyte characters".

In particular, Oil doesn't have global variables like LANG for libc or a notion of "default encoding". In my experience, these types of globals cause correctness problems.

List of Unicode-Aware Operations in Shell

Where bash respects it:

This is a list of operations that SHOULD be aware of Unicode characters. OSH doesn't implement all of them yet, e.g. the globbing stuff.

List of operations that depend on the locale (not implemented):

Other:

Implementation Notes

Unlike bash and CPython, Oil doesn't call setlocale(). (Although GNU readline may call it.)

It's expected that your locale will respect UTF-8. This is true on most distros. If not, then some string operations will support UTF-8 and some won't.

For example:


Generated on Sun Jul 18 00:18:36 PDT 2021