On the Wiki: Project Goals and Related Projects

2017-01-19

Yesterday, I was asked about the motivation for oil by Ilya Sher, the author of a new shell Unix called NGS.

I thought I covered it in this blog, but he has a point: I haven't crisply stated the high level goals for the project anywhere.

The reason for this is that the goals are rather grand, and I'm trying to keep the project concrete rather than speculative. It often feels like the quality of an open source project is inversely related to the amount and intensity of the marketing. A famous example is Linux Torvalds saying that Linux "won't be big and professional like GNU".

So the first half of the blog mostly said, I wrote a very compatible bash parser, which was intentionally a concrete milestone. But it won't hurt to briefly speculate, so I outlined a large set of ambitious goals here:

Project Goals on the Github Wiki

That list is large, but it's not even the whole story. The real motivation for oil is that, from 2010 to 2015, I tried to write a distributed operating system / cluster manager based on my experience with Google's Borg. The project had a few users, but ultimately didn't make it very far. Kubernetes has since done better.

But I came away with the belief that a distributed OS should be just be a pile of hypothetical "shell scripts". This is not an insane idea: the authors of Dokku apparently had the same idea. (PaaS is a buzzword for a distributed OS.)

All this aside, the immediate goal of the project is still the same:

Automatically convert bash scripts to the new oil language, and
Run them on the new oil interpreter / VM.

I believe that a shell for a distributed OS needs to scale down as well as scale up, so I'm focusing on use cases for a single machine first.

Related Projects

The ExternalResources page lists projects relevant to oil, including NGS and other new shells like oh and elvish.

The awk-like projects are interesting. In addition to scaling down cluster management, I also want "big data" to scale down. At Google, you would see the anti-pattern of using heavy big data tools for small data, just because the data happened to be in a certain place.

Pipelines of MapReduce jobs are not unlike shell scripts. Maybe they can literally be shell scripts. A quick Google search reveals some early efforts like bashreduce.

And newer APIs like Google Dataflow are divided into a control plane and data plane. I think of shell as the control plane of an operating system and the kernel as the data plane.

Popping the Blog TODO Stack

Blogging takes as much time as writing code, so posts have gotten backlogged. Here's the plan for the near future:

(1) The third post in "Shell: The Good Parts" will be called The argv Dispatch Pattern (first post, second post).

I've blogged about many of the bad qualities of the shell, but I also want to acknowledge the good qualities. I wouldn't be working on a compatible shell if I didn't think it had good qualities!

(2) Pop the stack on ASDL. I left off with the post on oheap. I want to write about:

Pretty printing ASTs.
A short update on source code size.
An example of enforcing invariants with ASDL.
Higher level thoughts like: object-oriented vs. functional style in interpreters. Why is the Clang AST so huge? Heterogeneous vs. homogeneous ASTs.

I've also built up themes like parsing and metaprogramming, but I'm not sure how soon I'll tackle them and what the roadmap is like. I'll be able to write about them more intelligently after I've implemented more of oil. For now, here are the loose ends:

(3) The difficulty of Parsing. Parsing isn't a solved problem.

Parsing tools should be context-sensitive or Turing complete. Humans are better at recognizing languages than computers.
Parsing tools should be libraries and not frameworks. ANTLR and yacc are frameworks, and "real" languages don't use them. Instead, they use hand-written parsers or bespoke code generators.
Lexing and parsing should be separate. Lexing is fast but not powerful; parsing is slow but powerful.
re2c and ASDL are helpful but little-known tools. They help with the lexer and AST representation, respectively, forming "bookends" around the parser. re2c is a library and not a framework for writing lexers.

(4) The importance of Metaprogramming.

Does shell need metaprogramming? Yes, autoconf.
Does Make need metaprogramming? Yes. For build variants (debug builds, ASAN, coverage, static tracepoints, etc.) For portability: CMake, Gyp/GN. Bazel and Skylark.
Kinds of metaprogramming in Python: textual, AST, bytecode, reflection.
Code generation vs. metaprogramming. Examples from existing shells, as well as from other interpreters / compilers.
Metaprogramming in oil's implementation.
- ASDL is all type-generic metaprogramming: pretty-printing, binary encoding, C++ code generation
- Lexing with re2c
- core/id_kind.py
Design ideas for Metaprogramming in the oil language itself.
Updates on metaprogramming vs. type checking.

It seems that my posts on the shell language itself are more popular than posts on language design and implementation. I suspect that's partly because of the bigger audience, and partly a naming or "SEO" problem.

Programmers still need to be convinced that shell is an interesting language. Until that happens, they probably won't be interested in this blog. In addition to existing shell users, I want non-shell users to adopt oil.

Please leave comments if there are particular topics you'd like addressed. And let me know if you have any thoughts about the Project Goals.