blog | oilshell.org
The last post in the Winter Blog Backlog series will be about software architecture.
Why write about this big, fuzzy topic? Because the design of Oil revolves around some controversial architectural decisions, like untyped byte streams as a solid foundation for a modern shell language, and for large distributed systems. A JSON file is a byte stream, and that's a feature, not a bug!
Here are some claims to orient us:
But let's first recap what I wrote about software architecture this year. Does it still make sense?
Here are 3 concepts that I'll use when summarizing the 8 posts below.
I just added these concepts to the cross reference, under Software Architecture, so we can continue to refer to them. If you know existing names for them, please leave a comment. I would love to add links to high quality references.
Let's review eight posts tagged #software-architecture in 2021, and call out important slogans and claims.
I emphasized software architecture and distributed systems at the beginning of this year, stating this goal:
Start a conversation, and encourage others to view the shell language as a basis for distributed systems ... A better shell can statically and dynamically describe distributed processes, their configuration, and interconnection.
I didn't write as much as I wanted to, but the blog succeeded in this respect! I had valuable exchanges and debates with domain experts Qi Xiao, Will Hatch, and Tom Van Vleck in response to blog posts this summer.
I may refer to them in the next post. Sometimes a dialogue is the best way of explaining a difficult topic.
Shell Scripts, Audio, Images, and 3D Graphics (January 2021)
This post gave examples of the Unix-y interchange format style of architecture, in contrast to the typed API style.
JSON, XML, and CSV are common interchange formats in web and "business" programming. But such formats also arise in audio, graphics, and 3D rendering.
They are narrow waists -- you can generate data from any programming language, and then send it to a different process to render. Notably, the web uses this architecture.
A web server is a program that can generate not just text, but also audio, images, and video. A web browser is a renderer for such versionless protocols.
Here's a slogan that may come up in the next post: JSON, HTML, and QTT are more important to Oil than dicts, lists, and integers!
That is, Oil supports Python-like typed data structures in memory, but they can be thought of as "helpers". Idiomatic programs should revolve around serialized data in "versionless" formats.
Unix Shell: Philosophy, Design, and FAQs (January 2021)
I claim that a two-tiered shell design like "first class" PowerShell cmdlets and "second class" external processes creates problems of composition. Several alternative shells share this design, and I'm working hard to avoid it in Oil.
Another way of saying this is: A shell should compose seamlessly with external commands written in diverse languages. That's why shell is uniquely useful and powerful.
However, there are problems with external commands, and I want Oil to address them. For example, the problem of VMs that start slowly can be addressed by a new coprocess mechanism.
This year's work on the headless shell unexpectedly pointed us in the right direction for coprocesses. We're passing file descriptors over Unix domain sockets (FANOS), which means that persistent processes can be "isomorphic" to batch processes. More on this later.
Comments About Build Systems and CI Services (April 2021)
The conclusion of this post made these claims:
- CI configuration is naturally a "shell script". That is, it's a loose collection of processes that run in parallel: builds, tests, static analysis, and more.
- CI services are surprisingly general distributed systems. They're very much like operating systems — with storage, computation, and users.
I still believe this, and made progress in this direction with yesterday's Oil 0.9.6 release:
#containersstream on Zulip.
I want to write a post called An Evaluation of Docker's Usability and Design based on this experience. Here's a summary, which I may repeat in the next post:
Slogan: Shell scripts and containers are complementary. At both build time and runtime!
This post said that the most important posts to write are Five Years of Oil Milestones and The Perlis-Thompson Principle.
I addressed the first topic in Backlog: Explaining the Project. I framed the project as a series of successes and failures, or "wrong turns".
I haven't written an authoritative post about the Perlis-Thompson Principle, mainly because many commenters already understood what I meant from the sketch in this post. They even started using the term!
I made a Wiki page about it:
It isn't well organized, but it has a comprehensive set of links. I'll also note that nobody suggested an existing name for this architectural concept, which is shocking! Please send me references to existing material.
I hope that the Perlis-Thompson principle will sit in the back of your mind when you write software. You should aim to reuse existing concepts in the Unix style, rather than creating your own! Language-oriented, multi-process composition is surprisingly flexible.
(I just added today's discussion on Table-Oriented Programming to the wiki. This is an attempt to orient programs around a single concept: the table! Importantly, it needs better language support.)
This post also enumerates many slogans, but one that I keep in mind regarding Oil is:
Shell is the language of heterogeneity and diversity.
That is, I've encountered many tradeoffs between (1) making shell more convenient for one domain or language and (2) designing general, "polyglot" mechanisms. The design of functions and JSON-like data structures is a good example.
(Aside: I want to develop these ideas in a better medium than the Github wiki. If you've used bidirectionally linked systems like Roam Research or open-source alternatives, let me know!)
Summer Blog Backlog: Distributed Systems (July 2021)
This post has a sharp slogan that drew attention: Kubernetes is Our Generation's Multics. I made several big claims:
Blog Review: Distributed Systems (July 2021)
Oil has always been a language and project oriented towards systems. That is, it's not a pure language project.
So even though we're reviewing 8 software architecture posts from 2021, I've written about similar things in posts going back to 2017. This post reviewed those older thoughts on #distributed-systems and #software- architecture.
Note: I now call our multi-cloud CI system "Soil" (progress report above). This post summarized an update from November 2020, when it was called "Toil".
Unix Shell: History and Trivia (August 2021)
When I started Oil, I hadn't seen Ken Thompson's original paper on Unix shell! I had read The UNIX time-sharing system (1978), which discusses the shell, but this more specific paper wasn't available online until recently.
Thompson strongly advocates simplicity and composability at both the beginning and the end of the paper. He explicitly mentions exponential codebase scaling problems with software, and talks about what Unix leaves out: records and file types.
I've read many critiques of the lack of structure in Unix, and I consider them a misunderstanding. This design was clearly intentional, and I'd argue the reason for its success.
Again, Oil has Python-like data structures, which are records. And HTTP has metadata for file types. But these concepts are still built around the "bones" of an untyped system, which lives at a lower and foundational level.
As always, this writing exercise was useful. It reminded me of two problems with external binaries:
The binaries that a shell script invokes traditionally aren't versioned. I mentioned this important caveat in the Shell Design Philosophy post.
Solution: You should be able to describe containers with Oil. Docker's DSL is shell-like, but it has well known problems. We need a mix of declarative, functional, and imperative that fits the problem domain more closely. It should be more incremental, parallel, and reproducible.
Nix is far on the functional end of the spectrum, which doesn't suit many problems (e.g. building R packages.)
These days, external binaries are often VMs that start slowly -- the JVM, node.js, CPython with SciPy, etc. Or a literal VM with Xen or QEMU!
Solution: Oil needs support for a "transparent" or "isomorphic" form of coprocess. (Bash coprocesses don't have this property, and are rarely used.)
Moreover, coprocesses and containers can be combined! I believe the functionality to compose them is present in the Linux kernel, but is missing from the current cloud ecosystem.
Writing this post also reminded me of a central unresolved issue:
We need to design and implement functions on Python-like data structures.
A few years ago, Python-like
func and shell-like
proc were "parallel"
keywords. But I now think this is naive and will introduce composition
problems. Instead, functions could be built on top of these isomorphic
entities: units of code defined with with
proc, external processes, and
I think we should use QTT to serialize rows of arguments and return values, and functions should be vectorized.
I reviewed past blog posts on #software-architecture and commented on their relevance to Oil. They still make sense to me! If you have questions, let me know.
Now that we've reviewed everything I've written about #software-architecture, the next post will sketch future writing in the same direction.