Backlog: Language FAQs

2021-12-20

This post has #comments and #zulip-links about the Oil and OSH languages. It's fourth post in the Winter Backlog series, which I hope will maintain continuity while I focus on expanding the project with outside help.

(The first three were Recent Progress, Explaining the Project, and Rough Progress Assessments.)

I'd like to answer these questions more authoritatively, but the "narrative style" with links has proven effective. And most topics came from reader questions anyway. Leave a comment if anything is unclear!

Table of Contents

FAQs

What's the Difference Between OSH and Oil?

What Shell Options Should I Use For a New Script?

Do I Need Python to Use Oil?

Why Are There Two Different Tarballs?

The Influence of Shell on Oil

Design of Sigil Pairs (lobste.rs)

Shell / bash Syntax We're Keeping

Redirects to Memorize

Conclusion

Appendix: Questions From Nix Contributors

Can Oil be treated as a "restricted bash"?

Idea: Venn Diagrams for sh, bash, OSH, Oil

Is Oil production-ready?

I’m still not sure where Python comes into play.

Does the tarball contain pre-built binaries?

We want to use Nix to build Oil from the git repo, not the tarball.

Will we make a mess and end up with a hybrid half-and-half situation?

How much code is it?

FAQs

What's the Difference Between OSH and Oil?

This is a common question, which I was asked on Zulip:

#oil-discuss > OSH and Oil

and in the long Nix RFC thread. So I create this wiki page:

Wiki: OSH versus Oil

A short answer is that OSH is compatible shell-like stuff and Oil is new, Python- and Ruby-like stuff, and there isn't a sharp line between them.

Let me know if anything is unclear, and I'll update the wiki page. Also see other posts tagged #FAQ.

What Shell Options Should I Use For a New Script?

I was asked what option groups are recommended for a new OSH script:

#oil-discuss > Strict shopts in Oil vs. OSH

Even though this doc has a pink warning for "under construction", it has a good answer:

https://www.oilshell.org/release/0.9.5/doc/options.html#what-every-user-should-know-2-minutes

But I gave an even more concise answer on Zulip:

strict:all. Use this if you want to run the same script under multiple shells, like bash and OSH.
oil:basic. Use this if you're upgrading existing script, and dropping compatibility with other shells. You can use new Oil features, but you won't have to change your existing code too much.
oil:all. Use this for a brand new program.
- This is equivalent to running bin/oil rather than bin/osh.

Do I Need Python to Use Oil?

No! You don't need Python to build or use Oil.

All the source code you need is in the release tarball, which builds with C++ compiler and make. (Remember that OSH and Oil are in the same tarball and executable.)

Why Are There Two Different Tarballs?

Each page like /release/0.9.5/ has two different releases:

oil-$VERSION.tar.gz
oil-native-$VERSION.tar.gz

The first one is the slow "executable spec" -- it reuses parts of the Python interpreter, and contains Python code. But I took great pains to make this invisible. You just run ./configure and make, without Python.

The second is the fast interpreter in pure C++, but it's not ready for use yet.

Unfortunately, we will need to rename these. That is, oil-native should be oil or oilshell. The first tarball could be oil-python, oil-reference, or oil-experiments.

The Influence of Shell on Oil

The three questions below aren't FAQs, but they may help people understand the relationship between OSH and Oil.

This relationship has evolved -- they used to be more like two different "worlds", but now they're more unified. The upgrade path is gradual, not sudden!

Design of Sigil Pairs (lobste.rs)

A user on lobste.rs was confused by Oil's syntax, and my explanation may be worthwhile.

Let's start with these "shell axioms":

# variable substitution with $ and ${}
echo $mystr ${mystr}

# command sub with $()
touch "$(my-command)"

# command block with { }
{ echo hi; my-command; } > out.txt

(Notice that shell already has some inconsistency between $() and { }.)

I claim that Oil's new "sigil pairs" are natural extensions:

echo $[42 + x]    # expression sub with $[]

ls @myarray       # splice array into command with @
                  # Inspired by "${myarray[@]}" and Perl.
               
ls @(my-command)  # split command sub with @()

# An array of "word" literals with %()
# Note that % doesn't mean hash as it does in Perl.
const x = %(foo bar *.py)

More notes:

^(echo hi) is a rare syntax for an unevaluated block.
- It's consistent with $(echo hi) and the unevaluated expression syntax ^[42 + x].
- It's not consistent with { echo hi }, but $(echo hi) already has that problem too!
We may want to add @{x|html} and @[split(x)]. These would be rare, but they're consistent.

We document design warts. Most languages don't!
I wrote A Feel for Oil's Syntax to help myself and users remember the syntax. The next release will show this updated table of sigil pairs.

Shell / bash Syntax We're Keeping

There is some "legacy" shell syntax that I've decided to keep.

#blog-ideas > Shell / bash Syntax We're Keeping (for now)

Remember that I'm trying to cut the scope of the project! And I also noted that the combined OSH + Oil language size should be minimized. That's a principle that's become more important since the early days of the project, when we had two separate worlds.

C-Style strings. The $ prefix annoys me because $myvar also uses it, but they mean different things. But I tried to add c'\n', and it was too complicated and inconsistent.
```
echo $'\n'
```
Redirects. I don't like shell's redirect syntax, but the ugly cases aren't common.
```
echo 'error message' >&2
```
This weird bash syntax for assigning FDs to variables is occasionally useful, and we're also keeping it:
```
myproc {left}< left.txt {right}< right.txt
```
Process Sub. These unfortunately look like redirects, but they're actually "sigil pairs".
```
diff <(sort left.txt) <(sort right.txt)
```
That is, they're analogous to $(sort left), @(sort left), and ^(sort left)!

Redirects to Memorize

I came across an insightful Hacker News comment that recommends reading shell redirect syntax as assignments.

This indeed matches what the dup2() system call does! It's like an assignment statement for "pointers" to file structs in the kernel. The programming model is imperative.

But I think it's better to just memorize a few canned patterns.

#blog-ideas > Redirects to Memorize

These patterns cover 99% of cases:

echo 'my message' 1>&2   # message to stderr
ls > out.txt             # stdout to a file
sort < in.txt            # stdin from a file
sort < in.txt > out.txt  # both

I also use this idiom:

mycmd 2>&1 | wc -l       # stdout and stderr to pipe

And this one, which is the annoying case where order matters:

mycmd >file.txt 2>&1     # stdout and stderr to file

That's about it. Remember, Avoid Directly Manipulating File Descriptors in Shell. If you find yourself saving and restoring descriptors, you should be using shell functions instead:

myfunc > output

That does the same thing. So those patterns are all you need -- really! If you have a counterexample, let me know.

Conclusion

I answered 4 common questions about OSH and Oil, and then summarized 3 comments on the language design.

Let me know if you have questions!

Appendix: Questions From Nix Contributors

Here are short answers to some questions that came up on the Nix RFC thread. I think oil-native is the main blocking issue for Nix, so these answers are not particularly important. But some readers may be curious.

Can Oil be treated as a "restricted bash"?

... in the sense that every Oil script can be executed by bash?

We don't have a mode for that, although it's possible in theory. OSH and Oil are "stricter" than bash, but they also have new functionality that won't run under bash, like Simple Word Evaluation.

Posts tagged #real-problems explain some of these new features.

Idea: Venn Diagrams for sh, bash, OSH, Oil

I think the last question could be better answered by diagrams to explain the relationships. For now, here are some notes.

sh versus bash:

bash augments sh with constructs like arrays, [[ for logical tests (including regexes), and ${x//pattern/replace}.
bash is very POSIX compliant, especially with set -o posix. It's a myth that bash's additional features make it non-compliant! If you want to write a portable script, you should test your script under two shells, like bash and OSH.

bash versus osh:

OSH runs most bash scripts. There is a large common subset that you can port to. It's the most bash-compatible shell, by a mile!
Sometimes you'll need to add quotes or a space. See How OSH Is Designed / Why OSH Isn't Bash.
OSH is stricter and simpler than bash, but also halfway to Oil.
Details:
- Known Differences Between OSH and Other Shells
- Wiki: What Is Expected to Run Under OSH

osh versus oil:

OSH is currently much more mature than Oil.
Oil has readability features like test --dir, and options like shopt --set simple_word_eval.
Oil has many new features like Python-like expressions const myint = min(3, 4) and Ruby-like blocks cd /tmp { echo $PWD }.
There are Five to Seven Essential Features of the Oil Language.

Again, the terms "OSH" and "Oil" are fuzzy because they've evolved over time. I used to think of simple word evaluation as an OSH feature, but now it seems to logically belong in Oil.

Is Oil production-ready?

I would call it production-ready when we have the faster oil-native build. However, many people tell me they already use Oil and like it.

Practically speaking, migrating to OSH is the first step. For many bash programs that are thousands of lines, the migration is trivial -- just run it with OSH instead of bash. Try it and let me know what happens! Is it too slow?

Even if you migrate to OSH and not Oil, there are benefits. This post last year mentions:

Reliable Error Handling ("fixed" errexit)
Safe Processing of User-Supplied Data (like filenames)
Eliminate Quoting Hell
Static Parsing Enables Better Error Messages and Tools

This is all implemented and done! It just needs to be faster.

I’m still not sure where Python comes into play.

Remember that Oil doesn't require Python 2 or 3 to build or use, in any form. It's packaged in several distros and none of them require Python to build.

https://repology.org/project/oil-shell/versions

That said,

Tools to generate source code are written in Python. Analogy: bash uses the parser generator yacc, and imagine if yacc were written in Python.
The "executable spec" for the interpreter is written in Python, and it's being translated to C++ with oil-native. As an analogy, it's similar to TeX or PyPy in spirit, but not in the details.

But this doesn't mean you need Python to build or use Oil. More analogies:

When you build bash, you don't actually run yacc. The code generated by yacc is included in the tarball, so you just run the C compiler.
When you build bash or coreutils or Python, you don't run autoconf. Instead, the person who prepared the tarball runs autoconf, and you run an extremely portable shell script (that is more portable than autoconf itself).

The generated C++ code is readable and I debug it directly with GDB, and use normal profiling tools on it. That is all by design. It's more readable than the output of yacc (which is a bunch of parsing tables).

Related: FAQ: Why Is Oil Written in Python?
This answer is adapted from https://github.com/NixOS/rfcs/pull/99#issuecomment-945828400

Does the tarball contain pre-built binaries?

No, it contains both hand-written and generated source code.

oil-0.9.3.tar.gz has generated Python code and a slice of the Python interpreter
oil-native-0.9.3.tar.gz has no Python code at all, only C++.

We want to use Nix to build Oil from the git repo, not the tarball.

Note: The argument below is mostly academic, since it was discovered that in Nix, bash is built from its tarball, not its repo. But it was a long conversation, and this issue came up with Guix as well, so I've copied the answer here.

I can see the appeal of having packages consistently use the Nix build system from the git repo -- for patching, and for Nix maintainers to understand.

That is OK, and I've accepted patches to make this a reality. But a Nix build of Oil will always have to be maintained in parallel with our own shell script build. Because a shell is at a lower level than a package manager!

That is, a shell can build and boot an entire Unix system without a package manager. But a package manager can't do that without a shell. (This is related to my interest in the now-defunct Aboriginal Linux early in the project.)

So a shell having a build dependency on a package manager is inverted, which is why I don't take that dependency. How do you build the package manager itself?

Also note that Oil has the same goals as Nix with respect to reproducibility -- the build is very deterministic and automated, but it doesn't use Nix.

Another point: To make it easier to bootstrap Nix, I think you should avoid "bootstrapping" Oil. At the bottom levels of a Unix system, there will always be circular build dependencies. It's just a matter of where you want to "cut it off".

https://github.com/NixOS/rfcs/pull/99#issuecomment-970473965

Will we make a mess and end up with a hybrid half-and-half situation?

If you port an existing program to the large common subset of OSH and bash, this won't happen. You can always run it with bash.

If you start a new program in Oil, this won't happen.

But it is possible if start porting to Oil, but don't finish! I can imagine this happening if not enough people understand shell and Oil. I've found that there are often few knowledgeable maintainers of shell in Linux distros.

Oil is a simpler, cleaner language, but it still takes work to use the features and improve the code.

How much code is it?

Someone got the idea that Oil is 2 million lines of code! This is false.

It's smaller than bash, and well under 100K lines of code any way you count. Search for "metrics" on any release page and look at various line counts:

https://www.oilshell.org/release/0.9.3/

It's designed to be small and comprehensible (as much as a bash compatible shell can be). The core is less than 20K significant lines of code, which is 5-7x smaller than bash:

FAQ: Why Is Oil Written in Python?