source | all docs for version 0.9.0 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
Oil's unicode support is unlike that of other shells because it's UTF-8-centric.
In other words, it's like newer languages like Go, Rust, Julia, and Swift, as opposed , JavaScript, and Python (despite its Python heritage). The latter languages use the notion of "multibyte characters".
In particular, Oil doesn't have global variables like LANG for libc or a notion of "default encoding". In my experience, these types of globals cause correctness problems.
${#s} -- length in code points${s:1:2} -- offsets in code points${x#?} and family (not yet implemented)Where bash respects it:
This is a list of operations that SHOULD be aware of Unicode characters. OSH doesn't implement all of them yet, e.g. the globbing stuff.
${#s}
${s:0:1}? for a single character,
character classes like [[:alpha:]], etc.
case $x in ?) echo 'one char' ;; esac[[ $x == ? ]]${s#?} (remove one character)${s/?/x} (note: this uses our glob to ERE translator for position)printf '%d' \'c where c is an arbitrary character. This is an obscure
syntax for ord(), i.e. getting an integer from an encoded character.List of operations that depend on the locale (not implemented):
[[ $a < $b ]] -- should use current locale? TODO: compare
with sort command.${s^} and ${s,}printf also has time.Other:
wcswidth(), which doesn't just count
code points. It calculates the display width of characters, which is
different in general.Unlike bash and CPython, Oil doesn't call setlocale(). (Although GNU
readline may call it.)
It's expected that your locale will respect UTF-8. This is true on most distros. If not, then some string operations will support UTF-8 and some won't.
For example:
${#s} is implemented in Oil code, not libc, so it will
always respect UTF-8.[[ s =~ $pat ]] is implemented with libc, so it is affected by the locale
settings. Same with Oil's (x ~ pat).