Why Sponsor Oils? | blog | oilshell.org

Retrospective: Software Architecture

2021-12-31

The last post in the Winter Blog Backlog series will be about software architecture.

Why write about this big, fuzzy topic? Because the design of Oil revolves around some controversial architectural decisions, like untyped byte streams as a solid foundation for a modern shell language, and for large distributed systems. A JSON file is a byte stream, and that's a feature, not a bug!

Here are some claims to orient us:

  1. Byte streams are a narrow waist, solving essential interoperability problems.
  2. They admit generic operations, like copying, compression, and encryption -- all of which are used when you send data across the network!
  3. You can build diverse domain-specific abstractions on top of them. This includes different "algebras" of records, tables, and documents. And multiple static type systems.

But let's first recap what I wrote about software architecture this year. Does it still make sense?

Table of Contents
Preliminaries
Review of Slogans and Claims
Blog Roadmap
Concrete Examples
Philosophy and Design
Build Systems and CI Services
Understanding and Using Shell
Backlog: Distributed Systems
Review: Distributed Systems
Unix Shell History
Recap: Oil Design Issues
Conclusion

Preliminaries

Here are 3 concepts that I'll use when summarizing the 8 posts below.

  1. O(M × N) code explosion. A system may need bespoke code to fill in every cell of a grid, like M algorithms and N data structures, or M languages and N operating systems.
  2. Perlis-Thompson Principle. Software with fewer concepts composes, scales, and evolves more easily. I distilled this principle from statements by Alan Perlis and Ken Thompson.
  3. Narrow Waist (of an hourglass). A software concept that solves an interoperability problem, avoiding an O(M × N) explosion. All of these are narrow waists:

I just added these concepts to the cross reference, under Software Architecture, so we can continue to refer to them. If you know existing names for them, please leave a comment. I would love to add links to high quality references.

Review of Slogans and Claims

Let's review eight posts tagged #software-architecture in 2021, and call out important slogans and claims.

Blog Roadmap

Blog Roadmap for January 2021

I emphasized software architecture and distributed systems at the beginning of this year, stating this goal:

Start a conversation, and encourage others to view the shell language as a basis for distributed systems ... A better shell can statically and dynamically describe distributed processes, their configuration, and interconnection.

I didn't write as much as I wanted to, but the blog succeeded in this respect! I had valuable exchanges and debates with domain experts Qi Xiao, Will Hatch, and Tom Van Vleck in response to blog posts this summer.

I may refer to them in the next post. Sometimes a dialogue is the best way of explaining a difficult topic.

 

Concrete Examples

Shell Scripts, Audio, Images, and 3D Graphics (January 2021)

This post gave examples of the Unix-y interchange format style of architecture, in contrast to the typed API style.

JSON, XML, and CSV are common interchange formats in web and "business" programming. But such formats also arise in audio, graphics, and 3D rendering.

They are narrow waists -- you can generate data from any programming language, and then send it to a different process to render. Notably, the web uses this architecture.

A web server is a program that can generate not just text, but also audio, images, and video. A web browser is a renderer for such versionless protocols.

Here's a slogan that may come up in the next post: JSON, HTML, and QTT are more important to Oil than dicts, lists, and integers!

That is, Oil supports Python-like typed data structures in memory, but they can be thought of as "helpers". Idiomatic programs should revolve around serialized data in "versionless" formats.

 

Philosophy and Design

Unix Shell: Philosophy, Design, and FAQs (January 2021)

I claim that a two-tiered shell design like "first class" PowerShell cmdlets and "second class" external processes creates problems of composition. Several alternative shells share this design, and I'm working hard to avoid it in Oil.

Another way of saying this is: A shell should compose seamlessly with external commands written in diverse languages. That's why shell is uniquely useful and powerful.

However, there are problems with external commands, and I want Oil to address them. For example, the problem of VMs that start slowly can be addressed by a new coprocess mechanism.

This year's work on the headless shell unexpectedly pointed us in the right direction for coprocesses. We're passing file descriptors over Unix domain sockets (FANOS), which means that persistent processes can be "isomorphic" to batch processes. More on this later.

 

Build Systems and CI Services

Comments About Build Systems and CI Services (April 2021)

The conclusion of this post made these claims:

I still believe this, and made progress in this direction with yesterday's Oil 0.9.6 release:

  1. I created containers for our five continuous build jobs using Docker. I've criticized Docker in the past, but I now have a clearer picture of both its value and its design flaws.
  2. I documented the experience in many posts the new #containers stream on Zulip.
  3. Our build runs on multiple clouds: Github Actions and sourcehut (and Travis CI before it was shut down). For diversity, I also used Red Hat's podman container runtime, and it works very well.
  4. I still want to make the build more parallel, incremental, and reproducible. Containers already eliminate some redundant work like installing packages and compiling tarballs, but we can do more.

I want to write a post called An Evaluation of Docker's Usability and Design based on this experience. Here's a summary, which I may repeat in the next post:

Slogan: Shell scripts and containers are complementary. At both build time and runtime!

 

Understanding and Using Shell

Summer Blog Backlog: Understanding and Using Shell (July 2021)

This post said that the most important posts to write are Five Years of Oil Milestones and The Perlis-Thompson Principle.

I addressed the first topic in Backlog: Explaining the Project. I framed the project as a series of successes and failures, or "wrong turns".

I haven't written an authoritative post about the Perlis-Thompson Principle, mainly because many commenters already understood what I meant from the sketch in this post. They even started using the term!

I made a Wiki page about it:

https://github.com/oilshell/oil/wiki/Perlis-Thompson-Principle

It isn't well organized, but it has a comprehensive set of links. I'll also note that nobody suggested an existing name for this architectural concept, which is shocking! Please send me references to existing material.

I hope that the Perlis-Thompson principle will sit in the back of your mind when you write software. You should aim to reuse existing concepts in the Unix style, rather than creating your own! Language-oriented, multi-process composition is surprisingly flexible.

(I just added today's discussion on Table-Oriented Programming to the wiki. This is an attempt to orient programs around a single concept: the table! Importantly, it needs better language support.)


This post also enumerates many slogans, but one that I keep in mind regarding Oil is:

Shell is the language of heterogeneity and diversity.

That is, I've encountered many tradeoffs between (1) making shell more convenient for one domain or language and (2) designing general, "polyglot" mechanisms. The design of functions and JSON-like data structures is a good example.

(Aside: I want to develop these ideas in a better medium than the Github wiki. If you've used bidirectionally linked systems like Roam Research or open-source alternatives, let me know!)

 

Backlog: Distributed Systems

Summer Blog Backlog: Distributed Systems (July 2021)

This post has a sharp slogan that drew attention: Kubernetes is Our Generation's Multics. I made several big claims:

 

Review: Distributed Systems

Blog Review: Distributed Systems (July 2021)

Oil has always been a language and project oriented towards systems. That is, it's not a pure language project.

So even though we're reviewing 8 software architecture posts from 2021, I've written about similar things in posts going back to 2017. This post reviewed those older thoughts on #distributed-systems and #software- architecture.

Metaphors:

Note: I now call our multi-cloud CI system "Soil" (progress report above). This post summarized an update from November 2020, when it was called "Toil".

 

Unix Shell History

Unix Shell: History and Trivia (August 2021)

When I started Oil, I hadn't seen Ken Thompson's original paper on Unix shell! I had read The UNIX time-sharing system (1978), which discusses the shell, but this more specific paper wasn't available online until recently.

Thompson strongly advocates simplicity and composability at both the beginning and the end of the paper. He explicitly mentions exponential codebase scaling problems with software, and talks about what Unix leaves out: records and file types.

I've read many critiques of the lack of structure in Unix, and I consider them a misunderstanding. This design was clearly intentional, and I'd argue the reason for its success.

Again, Oil has Python-like data structures, which are records. And HTTP has metadata for file types. But these concepts are still built around the "bones" of an untyped system, which lives at a lower and foundational level.

 

Recap: Oil Design Issues

As always, this writing exercise was useful. It reminded me of two problems with external binaries:

  1. The binaries that a shell script invokes traditionally aren't versioned. I mentioned this important caveat in the Shell Design Philosophy post.

    Solution: You should be able to describe containers with Oil. Docker's DSL is shell-like, but it has well known problems. We need a mix of declarative, functional, and imperative that fits the problem domain more closely. It should be more incremental, parallel, and reproducible.

    Nix is far on the functional end of the spectrum, which doesn't suit many problems (e.g. building R packages.)

  2. These days, external binaries are often VMs that start slowly -- the JVM, node.js, CPython with SciPy, etc. Or a literal VM with Xen or QEMU!

    Solution: Oil needs support for a "transparent" or "isomorphic" form of coprocess. (Bash coprocesses don't have this property, and are rarely used.)

Moreover, coprocesses and containers can be combined! I believe the functionality to compose them is present in the Linux kernel, but is missing from the current cloud ecosystem.

Writing this post also reminded me of a central unresolved issue:

  1. We need to design and implement functions on Python-like data structures.

    A few years ago, Python-like func and shell-like proc were "parallel" keywords. But I now think this is naive and will introduce composition problems. Instead, functions could be built on top of these isomorphic entities: units of code defined with with proc, external processes, and coprocesses.

    I think we should use QTT to serialize rows of arguments and return values, and functions should be vectorized.

Conclusion

I reviewed past blog posts on #software-architecture and commented on their relevance to Oil. They still make sense to me! If you have questions, let me know.

Now that we've reviewed everything I've written about #software-architecture, the next post will sketch future writing in the same direction.