Oil Language Design Notes #1

2019-08-22

The last post was about stricter behavior for existing shell features. This post is about new features, which I collectively call the Oil language. (Both posts are tagged #zulip-links, since they summarize Zulip threads.)

Casual readers may want to skip this post, since we'll discuss many details that are subject to change. But if you're interested in shaping the design of the Oil language, read on.

The linked threads describe new Oil language features, which you can try now:

$ git clone https://github.com/oilshell/oil.git
$ cd oil
$ build/dev.sh minimal
$ bin/osh

osh$ var x = 1 + 2*3
osh$ echo $x
7

After trying them, please ask questions on Zulip! Many of the threads below were motivated by questions. Feedback also helps me prioritize what parts of the language to work on.

Table of Contents

Design Philosophy

The Oil Language is Now A Dialect Within OSH (thread)

Implementing a Legacy-Free Subset of Oil (thread)

New Language Features

How Oil Solves the !QEFS Problem (thread)

How to Use Arrays in Oil (thread)

Use var to declare variables and setvar to mutate them (thread)

Strict eval builtin (thread)

Other Background

What Happened to the ShellShock-alike bug you found? (thread)

Smoosh Test Results (thread)

What's Next? A Lightweight Process for Language Design

Design Philosophy

Before diving into details, here's an important update on the name "Oil".

The Oil Language is Now A Dialect Within OSH (thread)

I explained in the 2018 FAQ that the Oil project contains two separate languages, OSH and Oil. OSH runs your existing shell scripts, and Oil will be a new language that you can automatically translate them to.

This changed in the last couple months as I've started to implement Oil. Oil is now an upgrade to OSH. It retains compatibility using three mechanisms:

New keywords like var and func. An entirely new sublangauge can be introduced to the right of a keywords, via lexer modes. I'll try to minimize the number of new keywords, favoring builtins where possible.
New parsing modes like shopt -s parse-at, which allows you to write @array instead of "${array[@]}".
New execution modes like shopt -s simple-word-eval, discussed below.

This change is essential to understand if you're following the project, and the Zulip thread has details. Summary:

I changed the definitions of OSH and Oil in the cross reference.
Why implement the Oil language like this? I make an analogy to a proposed strategy for enhancing PHP.
OSH can also be compared to C++: it's a compatible upgrade, but also a significant redesign. I hope that the ability to define subsets of OSH will rein in the complexity.
The OSH-to-Oil translation didn't end up working the way I wanted it to.

While I had a strong idea of the project's design three years ago, it's changed based on real-world experience.

Implementing a Legacy-Free Subset of Oil (thread)

After implementing several Oil features within OSH, it appears that compatibility requires surprisingly few compromises.

To back this up, it would be nice to talk about what a legacy-free implementation of Oil would look like. I won't have time to work on that since I'll be maintaining everything in the osh binary, but it's a good thought experiment.

A related thing I'd like help with is a set of Python-compatible data structures and garbage collector, e.g. what MicroPython has. That's a large project on its own!

New Language Features

How Oil Solves the !QEFS Problem (thread)

Try typing !qefs in #bash on FreeNode. It responds with advice that has been drilled into every shell programmer's head for decades.

My view is that the default is wrong, and the shell language should be changed. Oil introduces a new mode shopt -s simple-word-eval, which turns on an alternative word evaluation algorithm that disables word splitting, empty elision, and dynamic globbing. In other words, you don't need to use double quotes everywhere.

How to Use Arrays in Oil (thread)

Oil programs should use arrays rather than word splitting (which poorly simulates arrays). But this means we need to improve array syntax and semantics, because shell arrays are confusing and awkward.

When shopt -s parse-at is enabled, you can use @flags to splice an array flags into a command, e.g. ls @flags ~/src.
The push builtin appends one or more elements to an array, and replaces awkwkard shell idioms.
There's a new literal syntax for arrays, var myarray = @(bare words).
This thread contains rewrites of real shells snippets from the Linux kernel. You can see what Oil looks like here!

In the future, we'll also allow splicing arrays returned from functions with echo @arrayfunc(x, y).

Use `var` to declare variables and `setvar` to mutate them (thread)

Oil behaves more like other languages in this regard. It's clear from the syntax whether you're declaring or modifying a local or global variable.

OSH has shell assignment builtins like local and declare for compatibility, but they're deprecated.

Strict `eval` builtin (thread)

Based on a help-bash@ thread, I introduced another strict option shopt -s strict-eval-builtin, which changes eval to take exactly one argument.

In POSIX shell, eval and echo both have the pitfall that they join their arguments with a space, which is confusing since it's another step after word splitting! This is another place where the distinction between strings and arrays is blurred.

echo will be fixed in the near future with shopt -s oil-echo.

Other Background

What Happened to the ShellShock-alike bug you found? (thread)

Early this year, I rediscovered a 30-year-old bug in all ksh-derived shells, including bash.

OSH doesn't have this problem. Nonetheless, Oil expressions under var and setvar replace shell arithmetic.

Smoosh Test Results (thread)

In addition to the Oil language, I'm working on POSIX shell compatibility. The Smoosh project uncovered several bugs which I've fixed, and it exposing unimplemented features like times, command -V and export -p.

I could use help with these features!

What's Next? A Lightweight Process for Language Design

Here is how I see things going:

I'll continue to work on draft implementations of Oil language features.
I'll post design notes on Zulip, like the ones linked above. You can try the features from HEAD, using the instructions at the top of this post.
We can discuss alternatives on Zulip. The process of explaining a language feature often leads me to change it.
I'll periodically make #zulip-links blog posts, like this one.
I'll periodically make pre-releases, which will be the last call for feedback and language changes.

I've had the Oil language brewing in my head for over 3 years, so this iterative process should yield a coherent result quickly.

In contrast, I started an rfc/ dir in the Oil repo, but it became apparent that writing the docs takes longer than implementing the features. That process may make sense in the distant future.