Blog Review: Distributed Systems

2021-07-19

This post is a continuation of Summer Blog Backlog: Distributed Systems, published 10 days ago. It reviews Oil blog posts on the cloud and #distributed-systems.

These posts sketch arguments, without much detail, but I like to ensure that each post has a central message. The bold claim from the previous post was:

Kubernetes is our generation's Multics. A better design would have fewer concepts and be more composable, following the Perlis-Thompson Principle.

Likewise, this post also has a claim:

A distributed OS can — and should — be made of shell scripts.

If that sounds crazy, read on for details!

Table of Contents

Oil Blog Posts (2017-2021)

January 2017: Project Goals

January 2019: BayLISA Presentation

November 2020: Continuous Integration with "Toil"

April 2021: Build Systems and CI Services

Another Bold Claim

Analogy: Plants vs. Animals

What's Next?

Appendix: More Comments

Ad Hoc Reuse of Github UI

Dynamic Languages, REPLs, and Distributed Systems

Oil Blog Posts (2017-2021)

Here are excerpts from older posts, which I've now tagged #distributed-systems.

January 2017: Project Goals

On an experimental project I worked on before Oil:

I came away with the belief that a distributed OS should be just be a pile of hypothetical "shell scripts".

I want to again tip my hat to the Heroku-inspired Dokku. It apparently evolved from literal shell scripts into a very capable project! (I think it manages a single node rather than many, but that's a surprisingly big part of the problem.)

Pipelines of MapReduce jobs are not unlike shell scripts. Maybe they can literally be shell scripts.

Again, there have been efforts along these lines, so it's not hypothetical.

There's also very recent work in this direction in addition to PaSH and POSH. I just read the 2020 paper PB&J: Easy Automation of Data Science/Machine Learning Workflows, mentioned in the HotOS Notes last month.

I really like the code comparisons between their distributed shell utilities (P, PU, B, etc.) and Swift/T, Apache AirFlow, Beam, Spark, etc. They also mention the limitation of a single machine "reduction" vs. the MapReduce framework.

The conclusion mentions that a "JSON shell" would remove some limitations of the framework. Well that's exactly what Oil is :-)

(Unfortunately, the paper isn't freely available; I e-mailed the authors directly and got a copy. I look forward to the open source release of the code!)

January 2019: BayLISA Presentation

I gave a well-received presentation on Oil, but this material was cut for lack of time.

Slogans to Explain the Project:

Shell should be the language for describing the architecture of distributed systems

Future: Oil Language:

A distributed system is a bunch of heterogeneous processes and ports.

I also say that we should solve the Shell-in-{YAML,Docker,systemd,Ruby,Python} problem, which I again mentioned in June's post on the Oil language. I want Oil to solve this problem by adding the missing declarative part to shell.

November 2020: Continuous Integration with "Toil"

I commented on my experience porting Oil's continuous build to multiple cloud providers with a shell program called "Toil". I described this programming style as distributed shell scripting with concretions, not abstractions.

I'm not sure if this the right slogan, since Rich Hickey uses the term concretion to mean something bad: an inflexible typed wrapper for data like the Java HTTP Request class.

In contrast, I'm using concretion to mean something good: data that's not wrapped at all! Instead, it's expressed in a versionless interchange format like JSON or TSV.

I'm drafting these ideas on the Slogans, Fallacies, and Concepts wiki page, discussed in a recent post. More possible concept names: Distributed Shell Scripts and Parasitic Shell Scripts. These names are meant to get at the idea that shell can be an independent control plane, while cloud providers are the dumb data plane.

July 2021 Update: I ported Toil to Github Actions, so it now runs on 3 cloud providers! See the Zulip thread for brief comments on the experience.

April 2021: Build Systems and CI Services

Excerpt:

CI services are surprisingly general distributed systems. They're very much like operating systems — with storage, computation, and users. They're concerned with performance (scheduling, scale), security (authentication, trust), and running heterogeneous software.

This end of this post linked to an article relevant to containers. It shows how the new cloud platform fly.io "deconstructs" OCI images (standardized Docker images) with shell and some hacked up Go.

This idea goes back earlier, to 2015:

Here are a couple #comments on container tooling.

I think they just had a fixed Zone that resembled the host Zone. It was a canned set of packages.

... I just tried NearlyFreeSpeech, which uses FreeBSD jails, and it’s kind of similar.

Solaris supported containers long before Linux, and they were used in cloud products like Joyent's Manta. But they weren't as flexible as Linux containers. As usual, Linux proceeds by evolution rather than design, but you can often "rescue" something good from the mess.

A comment on Docker's shortcomings:

There are several criticisms here. A critical one is that Docker's design doesn't follow the Unix philosophy. It's code-centric, but Unix is data-centric. The Open Container Initiative is a step toward making containers data-centric. I still need to understand and use these alternative tools — there are many of them like crun and bubblewrap.

Further down the thread: Building containers from scratch pre-Docker was one of the primary motivations for Oil.

I used to have the same question as you… but then I tried to build containers from scratch, which was one of the primary motivations for Oil.

... the short answer is "Do Linux From Scratch and see how much work it is".

Another Bold Claim

The posts above state that:

A "PaaS" like Heroku can be a shell script
Distributed batch jobs like MapReduce can be a shell script
Docker tooling can be a shell script
Shell is missing a "declarative part", and Oil will add it

It's not too far from that to another bold claim:

A distributed OS can — and should — be made of shell scripts!

A shell coordinates processes, and a distributed system is literally a bunch of processes running on a bunch of computers (with few exceptions). This is true both at build time and runtime.

On the other hand, I can see why this won't be an appealing slogan:

To many programmers, "shell script" means something my coworker wrote and I have to suffer with. It sounds unreliable because the language is bad.
To me, "shell script" means something that's easy to write and test interactively, and that gracefully reuses code in multiple languages (polyglot programming).
- It's factored into policy and mechanism. As a result, it can be flexibly reused and rewired.
- It's reliable because it uses stable, low level interfaces (the kernel).

Both of these views are simultaneously true. I hope Oil can change the connotation of what a "shell script" is. I'm looking for a style of software that's drastically simpler.

Analogy: Plants vs. Animals

Here's part of an e-mail I sent to Stephen Kell that adds color on this "distributed OS as shell scripts" idea. It's based on experiences at Google and my experimental "PaaS" / OS project:

One metaphor I use is "plants vs. animals". Animals are your big iron written in C++ — like the the index servers that serves up posting lists, the batch jobs that process images from maps and satellites, etc.

And then "plants" is everything else -- the dev tools like build and test tools, and the production tools like configuration and monitoring.

Based on my experience, the plants have a lot more bearing on the health of distributed systems than the animals. Not only do they sit at the connection points, but their "biomass" is greater.

The animals are using more "known" software engineering techniques (threads, SIMD, etc.), while the plants are sort of neglected and crappy. One reason that they're crappy is because they use crappy languages! Google uses an haphazard mix of Python and neglected custom DSLs for that purpose, sort of like the open source ecosystem uses a neglected mix of shell, make, awk, m4/autoconf, Perl, etc. (i.e. terrible textual metaprogramming)

Recall that one of the slogans I mentioned was Old Unix Sludge vs. New Sludge: make/awk/m4 vs. YAML/Go templates. The experience of porting Toil to Github Actions was another exercise in "YAML programming".

What's Next?

I've sketched some arguments around the shell and #distributed-systems. Let me know if they made sense!

Now I want to get back to comments on #software-architecture, especially The Perlis-Thompson Principle. It has practical implications for the design of both languages and systems.

The appendix has a few more #comments on the cloud. Tackling these problems is still in the future, but the continuous build work is a natural gateway into it.

Appendix: More Comments

Ad Hoc Reuse of Github UI

Shell scripts can reuse entire cloud services! Shell is the language of ad hoc reuse.

Dynamic Languages, REPLs, and Distributed Systems

I mentioned this comment in yesterday's section on Fallacies.

The bigger the distributed system, the more heterogeneous the code [is] ...

It's a fallacy / language design mistake to assume that you "own the world". More likely is that the program written in your language is just a small part of a bigger system.

Tweet I referenced:

Why is it that people get so into linting and type-checking within services, while they're okay letting latent dumpster fires burn across services? 😱🔥
— ⚡️ Jean Yang ⚡️ (@jeanqasaur) May 18, 2021

Shell is a "lowest common denominator" language: it combines programs written in languages with incompatible type systems.