Summer Blog Backlog: Distributed Systems

2021-07-09 (Last updated 2022-08-03)

Yesterday's post enumerated #blog-topics related to understanding and using shell. Today's post started out as a "grab bag" of other topics, but is now centered around problems with #distributed-systems.

What's the connection between shell and distributed systems?

Abstractly, a shell script invokes Unix processes, and distributed systems are (almost invariably) collections of processes.

Concretely, when you use cloud platforms like AWS or Heroku, you're using command line tools in the shell to build, test, configure, and deploy applications.

I hope Oil can be the foundation for better systems, but there's still a big gap to bridge. This post is focused on criticism of the status quo, since we have to identify problems in order to solve them.

Table of Contents

Kubernetes is Our Generation's Multics

Ken Thompson vs. Kubernetes

Images and Feelings

Kubernetes Is Unproductive

Serverless Is Unproductive

The Cloud is "Fire and Motion"

More #blog-topics

Conclusion

Appendix: Kubernetes vs. HPC Schedulers

Kubernetes is Our Generation's Multics

Let's start this post off with a bold claim: Kubernetes is Multics!

That is, it's a serious, respectable, but overly complex system that will eventually replaced by something simpler: the Unix of distributed operating systems.

(It's arguable whether Kubernetes deserves to be called a distributed OS, but let's leave that aside for now.)

This is the same claim, phrased differently:

In the future, we'll use a distributed OS designed with regard to the Perlis-Thompson Principle.

Essentially, this means that it will have fewer concepts and be more compositional.

Ken Thompson vs. Kubernetes

Here's some more color on the claim. Recall that the definition of the Perlis-Thompson Principle is derived from this part of the first paper on Unix shell:

A program is generally exponentially complicated by the number of notions that it invents for itself ... [It] is my belief that you should base systems on a single notion.

I'd say that "Kubernetes is exponentially complicated by notions it invents for itself".

I worked with Google's Borg cluster manager for many years, and Kubernetes is directly inspired by Borg. I may write about those experiences, but let me first invoke the images, tweets, and experience of others.

Images and Feelings

Here's some visual evidence and "feelings" around this problem of excessive complexity in the cloud.

The "Cloud Native" Computing Foundation maintains this diagram of the ecosystem around Kubernetes:

CNCF Cloud Native Interactive Landscape (diagram)

Here are reactions I found, on Twitter:

this is the only diagram i have ever seen that is more complex than the cloud native computing foundation landscape. https://t.co/G6CcTcRCi3
— Nerd Immunity (@monkchips) January 10, 2020

and Hacker News:

This whole image, to me, represents a big problem with software engineering today: https://twitter.com/dankohn1/status/989956137603747840

The industry is full of engineers who are experts in weirdly named "technologies" (which are really just products and libraries) but have no idea how the actual technologies (e.g. TCP/IP, file systems, memory hierarchy etc.) work. I don't know what to think when I meet engineers who know how to setup an ELB on AWS but don't quite understand what a socket is...

I don't think everyone needs to be an expert in low-level programming. However, I do think that there is a phenomenon where engineers overcomplicate systems because they don't understand operating system fundamentals.

To repeat the claim: A distributed OS that follows the Perlis-Thompson Principle would have fewer concepts. It would be easier to use, and easier to build software on. The diagram would be smaller and more intelligible!

I suggest the meme Ken Thompson vs. Kubernetes to remember this. Ken Thompson would have designed something simpler, with lasting value.

Kubernetes Is Unproductive

The last section had some "feelings" about complexity, not a solid argument about it.

The post below gives a lot more detail. Along with the post in next section, about serverless productivity, it inspired my writing tagged #comments and #software-architecture earlier this year.

MetalLB taught me that it’s not possible to build robust software that integrates with Kubernetes.

GKE SRE taught me that even the foremost Kubernetes experts cannot safely operate Kubernetes at scale.

This is damning because of the author's direct experience.

My initial reaction is that a Unix-y model of contained processes, a content-addressed file system (a cross between git and BitTorrent), and named ports would be simpler. Auth and security are huge problems, and torpedoed my earlier efforts in this direction.

But these aren't fully formed thoughts; the blog post is more specific.

Serverless Is Unproductive

It's not just Kubernetes that's unproductive. This article describes productivity problems with "serverless" (e.g. AWS Lambda).

and

Summary:

Languages, deployment, and overall developer experience have gotten worse as distributed backends have become more powerful. This post had a nice analogy to IBM JCL (job control language), which I think of as a primitive mainframe version of Unix shell.
Deploying your code to someone else's cloud just to test it is painful and slow. In contrast, an improved Unix shell should help you describe and run distributed systems locally.
- I mentioned local testing in Comments on Build Systems and CI Services. When using CI services, the insult on top of injury is that your code is often YAML!
- I'd like the cloud to have the ease of using PHP, but with any language!

The Cloud is "Fire and Motion"

Here's another feeling, this time from a ~19 year old post by Joel Spolsky. It reminds me of today's cloud.

Fire and Motion (2002)

Think of the history of data access strategies to come out of Microsoft. ODBC, RDO, DAO, ADO, OLEDB, now ADO.NET – All New! Are these technological imperatives? The result of an incompetent design group that needs to reinvent data access every goddamn year? (That’s probably it, actually.) But the end result is just cover fire. The competition has no choice but to spend all their time porting and keeping up, time that they can’t spend writing new features.

Does anyone in 2021 regret not spending more time with OLEDB, much less know what it is?

Likewise, I think Kubernetes will be forgotten in 2041. It will have been eclipsed by simpler systems that follow the Perlis-Thompson Principle.

More wisdom from Joel:

Look closely at the software landscape. The companies that do well are the ones who rely least on big companies and don’t have to spend all their cycles catching up and reimplementing and fixing bugs that crop up only on Windows XP.

I would mentally replace "Windows XP" with "Kubernetes" in this quote, and see if it rings true. In other words, our coding efforts should be directed at the problem domain, not on fixing the underlying platform.

More #blog-topics

Classic Blog Posts by Joel Spolsky. Referencing Joel's post Fire and Motion reminds me that his blog and forum were invaluable early in my career. This post will quote more insightful posts by Joel.

Rich Hickey's Influence. After this recent comment about dynamic languages and "the world", Qi Xiao of Elvish reminded me of that Rich Hickey uses the term "situated programs".

I didn't credit Hickey in that comment, but his ideas definitely influenced me. I want to write another post covering his ideas on language design and systems. I would reference this comment on his talk "Maybe Not".

Oil is very much a language for programs situated in "the world".

Conclusion

I expect to refer to this post in the future. The links and comments are intended to add color on the design of the Oil language and the motivation for the project as a whole.

Let me know whether it was interesting or useful!

Tomorrow, I'll review more posts and comments about #distributed-systems. I expected to write a single Blog Backlog post, but it turned into three!

If you haven't read it already, check out yesterday's post on understanding and using shell.

Appendix: Kubernetes vs. HPC Schedulers

7/20 Update: I'm saving this thread for later reference. I ask why use Kubernetes for a deployment that seems more like an HPC problem. People with HPC experience chime in on the issues in that domain.