Comments About Build Systems and CI Services

2021-04-11

Back in January, I wrote four posts tagged #comments. They were intended to introduce some big ideas quickly, in the form of a dialogue. I wanted to talk about distributed systems, #software-architecture, dev tools, and their relation to Unix shell.

The last post I published was Unix Shell: Philosophy, Design, and FAQs, and the last section previewed the rest of the series.

I didn't finish the series, but recent threads on lobste.rs and Hacker News address the same topics from a different angle. In this post, I walk through these more recent comments. They're concerned with:

Build Systems like Make, Ninja, and Google's Bazel; and
CI services like Travis CI, Github Actions, and sourcehut.

I'm not sure how coherent this will be, but this angle is more concrete than the previous one.

Table of Contents

A Critique of Modern CI

Travis CI Shutdown

The YAML Problem

Oil As a Better Configuration Language

Language Design: Staged Execution Models

Build Systems

Ninja and OSTree

Make vs. Ninja, and Shell

Two Problems With Bazel (and gg again)

CI Services As a Bridge Between Shell and Distributed Systems

A Critique of Modern CI

This is a fantastic post about the design of modern CI services and the bad engineering practices they encourage.

A summary of my long comment:

CI services have evolved into awkward build systems configured with YAML (echoing the original post)
In my mind, a CI config is a parallel shell script. It coordinates disparate tools that you didn’t write, on multiple machines.
I implemented Oil's continuous build (tentatively called "Toil") in Python and shell, and it runs on both Travis CI and sourcehut. I recently added Ninja for more parallelism, with great results.
I sketch an idea for a multi-cloud CI tool based on this work. It could gradually incorporate the Oil language.
- We may also need something like the bubblewrap tool for rootless containers.
- Even more speculative: using git annex. (Comment on my recent experience with this networked storage system.)

Another point is that it's important to be able to debug the continuous build locally. Specifying build logic in an open source shell language is one natural way to do that.

Travis CI Shutdown

My instinct was to implement our continuous build as a shell script, meaning that we avoid vendor-specific features.

This turned out to be a good choice, because Travis CI was acquired and is in the process of being shut down!

This comment goes into more detail on Toil. It also mentions that gg is also a multi-cloud system, which I discuss further below.

The YAML Problem

Not only do we have a problem with YAML, we also have the "shell in YAML" anti-pattern.

I think it’s going to be the year (and decade) of shell scripts written in YAML … Github Actions, Gitlab runners, Kubernetes config, sourcehut, etc. :)

My Comment on

Oil As a Better Configuration Language

Oil is very similar to existing purpose-built configuration languages, including Tcl, SDLang, etc.

My comment on

TODO: I've made at least 5 comments about Oil's configuration dialect, and it would be nice to collect them all in one place. But it would be even better to finish implementing it!

Language Design: Staged Execution Models

My comment on the Modern CI critique (on Hacker News this time).

Summary: I want to add the "missing declarative" part to shell. This will let users specify graphs naturally. But we also need metaprogramming, or a staged execution model, akin to:

Make: Guile Scheme/Make metaprogramming + Shell. This is done poorly, but it's the right idea.
Modern C++: Code generators like CMake or GN + Ninja. Textual code generation is the oldest form of metaprogramming, and can work surprisingly well. (Downsides must be mitigated.)
Tensorflow: Python metaprogramming + Tensorflow (a parallel graph language).

Build Systems

This section has various opinions on build systems. It's based on concrete experiences, but your use cases may differ.

Ninja and OSTree

I liked this blog post, so I'm saving it here. It describes a build system for boot images of embedded systems (deployed within a private company). It uses Ninja and OSTree together in a clever way.

Make vs. Ninja, and Shell

I wrote several GNU Makefiles from scratch for Oil. This turned out badly, and I've concluded that I should have used Ninja all along. These two comments go into detail:

My comment on

Ninja naturally invokes shell functions, and it can be generated by a Python script. I'd like to write those shell functions in the Oil language and write a Ninja generator in Oil :-)

Two Problems With Bazel (and gg again)

The CI critique above mentions Bazel several times (because it has a nice DAG model, among other things). After a decade using it (and a small amount of time working on it), here are two problems with it:

It encourages you to rewrite your entire build configuration in its own language (Starlark, previously Python).
- In other words, it assumes the codebase is homogeneous. This is bad because open source code is very heterogeneous. (Shell is the language of heterogeneity!)
It has a very static notion of dependencies. I found it interesting that gg can solve the same problem — compiling huge C++ codebases — but it has a more dynamic notion of dependencies. I hope to learn more about it, and llama as well.

CI Services As a Bridge Between Shell and Distributed Systems

This post introduced many ideas, and at first I thought they weren't that coherent. But now I see a clear connection, and even an implementation strategy:

CI configuration is naturally a "shell script". That is, it's a loose collection of processes that run in parallel: builds, tests, static analysis, and more.
- CI systems grow large, slow, and more heterogeneous over time.
- Remember, shell is the language of heterogeneity. Framing it as a shell script also makes the process of debugging it locally more obvious.
CI services are surprisingly general distributed systems. They're very much like operating systems — with storage, computation, and users. They're concerned with performance (scheduling, scale), security (authentication, trust), and running heterogeneous software.

Does that make sense? Let me know in the comments.

I'd love some help pursuing these ideas. There's too much to do on the Oil project, and I really should get back to work on the garbage collector (and the engine for an interactive shell).

But these ideas arise out of a concrete need, so they'll have to be addressed eventually. As a teaser, here's another post that shows the close relationship between shell and distributed systems: