Retrospective: Software Architecture

2021-12-31

The last post in the Winter Blog Backlog series will be about software architecture.

Why write about this big, fuzzy topic? Because the design of Oil revolves around some controversial architectural decisions, like untyped byte streams as a solid foundation for a modern shell language, and for large distributed systems. A JSON file is a byte stream, and that's a feature, not a bug!

Here are some claims to orient us:

Byte streams are a narrow waist, solving essential interoperability problems.
They admit generic operations, like copying, compression, and encryption -- all of which are used when you send data across the network!
You can build diverse domain-specific abstractions on top of them. This includes different "algebras" of records, tables, and documents. And multiple static type systems.

But let's first recap what I wrote about software architecture this year. Does it still make sense?

Table of Contents

Preliminaries

Review of Slogans and Claims

Blog Roadmap

Concrete Examples

Philosophy and Design

Build Systems and CI Services

Understanding and Using Shell

Backlog: Distributed Systems

Review: Distributed Systems

Unix Shell History

Recap: Oil Design Issues

Conclusion

Preliminaries

Here are 3 concepts that I'll use when summarizing the 8 posts below.

O(M × N) code explosion. A system may need bespoke code to fill in every cell of a grid, like M algorithms and N data structures, or M languages and N operating systems.
Perlis-Thompson Principle. Software with fewer concepts composes, scales, and evolves more easily. I distilled this principle from statements by Alan Perlis and Ken Thompson.
- Example: Docker flouts the principle by introducing superfluous concepts, which leads to O(M × N) amounts of code.
Narrow Waist (of an hourglass). A software concept that solves an interoperability problem, avoiding an O(M × N) explosion. All of these are narrow waists:
- Interchange formats like JSON
- Networking protocols like HTTP
- Operating system interfaces like Win32 and POSIX
- Instruction set architectures like x86, and arguably WebAssembly

I just added these concepts to the cross reference, under Software Architecture, so we can continue to refer to them. If you know existing names for them, please leave a comment. I would love to add links to high quality references.

Review of Slogans and Claims

Let's review eight posts tagged #software-architecture in 2021, and call out important slogans and claims.

Blog Roadmap

Blog Roadmap for January 2021

I emphasized software architecture and distributed systems at the beginning of this year, stating this goal:

Start a conversation, and encourage others to view the shell language as a basis for distributed systems ... A better shell can statically and dynamically describe distributed processes, their configuration, and interconnection.

I didn't write as much as I wanted to, but the blog succeeded in this respect! I had valuable exchanges and debates with domain experts Qi Xiao, Will Hatch, and Tom Van Vleck in response to blog posts this summer.

I may refer to them in the next post. Sometimes a dialogue is the best way of explaining a difficult topic.

Concrete Examples

Shell Scripts, Audio, Images, and 3D Graphics (January 2021)

This post gave examples of the Unix-y interchange format style of architecture, in contrast to the typed API style.

JSON, XML, and CSV are common interchange formats in web and "business" programming. But such formats also arise in audio, graphics, and 3D rendering.

They are narrow waists -- you can generate data from any programming language, and then send it to a different process to render. Notably, the web uses this architecture.

A web server is a program that can generate not just text, but also audio, images, and video. A web browser is a renderer for such versionless protocols.

Here's a slogan that may come up in the next post: JSON, HTML, and QTT are more important to Oil than dicts, lists, and integers!

That is, Oil supports Python-like typed data structures in memory, but they can be thought of as "helpers". Idiomatic programs should revolve around serialized data in "versionless" formats.

Philosophy and Design

Unix Shell: Philosophy, Design, and FAQs (January 2021)

I claim that a two-tiered shell design like "first class" PowerShell cmdlets and "second class" external processes creates problems of composition. Several alternative shells share this design, and I'm working hard to avoid it in Oil.

Another way of saying this is: A shell should compose seamlessly with external commands written in diverse languages. That's why shell is uniquely useful and powerful.

However, there are problems with external commands, and I want Oil to address them. For example, the problem of VMs that start slowly can be addressed by a new coprocess mechanism.

This year's work on the headless shell unexpectedly pointed us in the right direction for coprocesses. We're passing file descriptors over Unix domain sockets (FANOS), which means that persistent processes can be "isomorphic" to batch processes. More on this later.

Slogan: Shells should "shell out"!
Related comment on WebAssembly: Programmers underestimate the degree to which languages and VMs are coupled. I've heard PowerShell described as "an awkward syntax for writing C# programs".

Build Systems and CI Services

Comments About Build Systems and CI Services (April 2021)

I mentioned the "YAML problem" in this post: YAML is the de facto control plane language for the cloud, and we need something better!
- New slogan for the project: Oil doesn't replace Python; it replaces YAML! This isn't literally true because I expect real systems to use Oil and YAML together. But it should be possible to avoid YAML.
I mentioned that we started using Ninja for building and testing mycpp. I still want remove GNU Make from our repo entirely.
I referred to this post in a recent thread about Tvix, a rewrite of the Nix evaluator.
- Slogan: We need a middle ground between Docker and Nix.

The conclusion of this post made these claims:

CI configuration is naturally a "shell script". That is, it's a loose collection of processes that run in parallel: builds, tests, static analysis, and more.

CI services are surprisingly general distributed systems. They're very much like operating systems — with storage, computation, and users.

I still believe this, and made progress in this direction with yesterday's Oil 0.9.6 release:

I created containers for our five continuous build jobs using Docker. I've criticized Docker in the past, but I now have a clearer picture of both its value and its design flaws.
I documented the experience in many posts the new #containers stream on Zulip.
Our build runs on multiple clouds: Github Actions and sourcehut (and Travis CI before it was shut down). For diversity, I also used Red Hat's podman container runtime, and it works very well.
I still want to make the build more parallel, incremental, and reproducible. Containers already eliminate some redundant work like installing packages and compiling tarballs, but we can do more.
- This work will influence the Oil language. I want a notion of coarse-grained data dependencies (not fine-grained ones as in Bazel).

I want to write a post called An Evaluation of Docker's Usability and Design based on this experience. Here's a summary, which I may repeat in the next post:

Good: Docker absolutely solves a real problem, and its model of layers is useful. In many ways, it's a nice upgrade from shell scripts.
Bad: Docker has a very anti-Unix design. It has severe "violations" of the Perlis-Thompson Principle.
- Examples: Docker versus git. Docker versus Ninja. A few recent threads claimed that the Unix philosophy isn't a real thing, or that it's already been absorbed into good programming practice. This couldn't be further from the truth! Contrasting Docker's monolithic, code-centric design with the designs of git and Ninja would be a great way to explain this.

Slogan: Shell scripts and containers are complementary. At both build time and runtime!

Understanding and Using Shell

Summer Blog Backlog: Understanding and Using Shell (July 2021)

This post said that the most important posts to write are Five Years of Oil Milestones and The Perlis-Thompson Principle.

I addressed the first topic in Backlog: Explaining the Project. I framed the project as a series of successes and failures, or "wrong turns".

I haven't written an authoritative post about the Perlis-Thompson Principle, mainly because many commenters already understood what I meant from the sketch in this post. They even started using the term!

I made a Wiki page about it:

https://github.com/oilshell/oil/wiki/Perlis-Thompson-Principle

It isn't well organized, but it has a comprehensive set of links. I'll also note that nobody suggested an existing name for this architectural concept, which is shocking! Please send me references to existing material.

I hope that the Perlis-Thompson principle will sit in the back of your mind when you write software. You should aim to reuse existing concepts in the Unix style, rather than creating your own! Language-oriented, multi-process composition is surprisingly flexible.

(I just added today's discussion on Table-Oriented Programming to the wiki. This is an attempt to orient programs around a single concept: the table! Importantly, it needs better language support.)

This post also enumerates many slogans, but one that I keep in mind regarding Oil is:

Shell is the language of heterogeneity and diversity.

That is, I've encountered many tradeoffs between (1) making shell more convenient for one domain or language and (2) designing general, "polyglot" mechanisms. The design of functions and JSON-like data structures is a good example.

(Aside: I want to develop these ideas in a better medium than the Github wiki. If you've used bidirectionally linked systems like Roam Research or open-source alternatives, let me know!)

Backlog: Distributed Systems

Summer Blog Backlog: Distributed Systems (July 2021)

This post has a sharp slogan that drew attention: Kubernetes is Our Generation's Multics. I made several big claims:

The abstractions we use to program the cloud don't compose, and there are too many of them. This is bad for productivity, usability, latency, and security.
We still have yet to discover a simpler system that respects the Perlis-Thompson Principle. It will be an important narrow waist of computing.
- It will be a graceful evolution of Unix, like the web is. It won't be a chunky layer of leaky abstractions on top.

Review: Distributed Systems

Blog Review: Distributed Systems (July 2021)

Oil has always been a language and project oriented towards systems. That is, it's not a pure language project.

So even though we're reviewing 8 software architecture posts from 2021, I've written about similar things in posts going back to 2017. This post reviewed those older thoughts on #distributed-systems and #software- architecture.

Metaphors:

Oil is a dynamic complement to static "bare metal" code.
- Foreshadowing: I may soften this metaphor, and re-brand around Oil and dumplings!
Plants vs. animals in distributed systems.

Note: I now call our multi-cloud CI system "Soil" (progress report above). This post summarized an update from November 2020, when it was called "Toil".

Unix Shell History

Unix Shell: History and Trivia (August 2021)

When I started Oil, I hadn't seen Ken Thompson's original paper on Unix shell! I had read The UNIX time-sharing system (1978), which discusses the shell, but this more specific paper wasn't available online until recently.

Thompson strongly advocates simplicity and composability at both the beginning and the end of the paper. He explicitly mentions exponential codebase scaling problems with software, and talks about what Unix leaves out: records and file types.

I've read many critiques of the lack of structure in Unix, and I consider them a misunderstanding. This design was clearly intentional, and I'd argue the reason for its success.

Again, Oil has Python-like data structures, which are records. And HTTP has metadata for file types. But these concepts are still built around the "bones" of an untyped system, which lives at a lower and foundational level.

Recap: Oil Design Issues

As always, this writing exercise was useful. It reminded me of two problems with external binaries:

The binaries that a shell script invokes traditionally aren't versioned. I mentioned this important caveat in the Shell Design Philosophy post.

Solution: You should be able to describe containers with Oil. Docker's DSL is shell-like, but it has well known problems. We need a mix of declarative, functional, and imperative that fits the problem domain more closely. It should be more incremental, parallel, and reproducible.

Nix is far on the functional end of the spectrum, which doesn't suit many problems (e.g. building R packages.)
These days, external binaries are often VMs that start slowly -- the JVM, node.js, CPython with SciPy, etc. Or a literal VM with Xen or QEMU!

Solution: Oil needs support for a "transparent" or "isomorphic" form of coprocess. (Bash coprocesses don't have this property, and are rarely used.)

Moreover, coprocesses and containers can be combined! I believe the functionality to compose them is present in the Linux kernel, but is missing from the current cloud ecosystem.

Writing this post also reminded me of a central unresolved issue:

We need to design and implement functions on Python-like data structures.

A few years ago, Python-like func and shell-like proc were "parallel" keywords. But I now think this is naive and will introduce composition problems. Instead, functions could be built on top of these isomorphic entities: units of code defined with with proc, external processes, and coprocesses.

I think we should use QTT to serialize rows of arguments and return values, and functions should be vectorized.

Conclusion

I reviewed past blog posts on #software-architecture and commented on their relevance to Oil. They still make sense to me! If you have questions, let me know.

Now that we've reviewed everything I've written about #software-architecture, the next post will sketch future writing in the same direction.