Why Sponsor Oils? | blog | oilshell.org

A Garbage-Collected Heap in C++, Shaped Like Typed Python

2022-05-29 (Last updated 2022-05-31)

This post shows C++ snippets from our garbage-collected data structures: Str, List<T>, and Dict<K, V>. I hope it will catch the interest of programmers with experience in compilers and language runtimes.

Recent posts in the series:

  1. Oil Is Being Implemented "Middle Out". Why translate Python to C++?
  2. Brief Descriptions of a Python to C++ Translator. A technical overview for Python experts.

(If you're new to the project, see Why Create a New Unix Shell?.)

I've never worked on a garbage collector before, but I think this code is fun. It's simple, small, and does something useful. One or two people should be able to handle it — it's not a huge industrial project.

I'm sure there are many ways to do better, and I look forward to working on it with experienced engineers! Most people who work on language runtimes don't read this blog, so please share this post. This is a critical part of the project.

If you want to jump straight to the code, see mycpp/gc_heap.h. I started this post by updating the comments in that file.

Table of Contents
Overview
Influences / Prior Art
Using The Heap
Design: Two Graphs Overlaid
Graph of Application Types
Graph of GC Nodes
Important Types
"Ghost" Layout Types
Summary
Details
Operation That Resize
Tuples Aren't Heap Allocated
Fly in the Ointment: VTable Pointer
Can Field Masks Be Computed at Compile Time in C++?
Interaction with ML-Style Data Types
Cheney Choices
What's Left to Do?
Conclusion
Appendix
More FAQs
Links

Overview

Like all Unix shells, Oil is an AST interpreter. I once described it as "the first half of Crafting Interpreters, plus 20 or so syscalls"! It's a thin layer over the kernel.

Unlike all Unix shells, it's written in statically-typed Python. We translate this Python to C++ code that uses garbage-collected data structures, which currently use the Cheney semi-space algorithm.

Goals: Speed up the interpreter, and break the dependency on the CPython runtime.

Influences / Prior Art

Let's recap an appendix of the last post. The two resources I remember being most helpful are:

Other resources:

I appreciate more references! This list is biased toward languages that I'm familiar with. For example, I've never used Smalltalk, although it's not statically typed, so I don't expect it to be that similar.

(Update: I just stumbled across this Cheney collector by the Google v8 authors: two_space_heap.cc.)

Using The Heap

These data structures are used by both hand-written and generated code. Here's some code generated from state.py Mem::GetAllVars():

Dict<Str*, Str*>* result = nullptr;
StackRoots _roots({&result});

result = Alloc<Dict<Str*, Str*>>();
// ...
result->set(name, str_val->s);

Equivalent Python:

result = {}
result[name] = str_val.s

So gc_heap::Alloc<> is like new, except unreferenced objects are periodically collected. The manual registration of stack roots is OK for our mostly generated code, but can probably be improved for hand-written bindings. (I think I backed out of that experiment, and it might be worth another try.)


You can also look at the mycpp-examples benchmark, published with every release. It links to small Python examples and measures the speedup after translation to C++. (Unfortunately it doesn't show the C++ code.)

Here's another view of what we're talking about:

~/git/oilshell/oil$ wc -l mycpp/gc_heap.*
  303 mycpp/gc_heap.cc
 1403 mycpp/gc_heap.h
 1706 total

So it's small! More at Line Counts for Translation.

Direct Links: gc_heap.h and gc_heap.cc

Design: Two Graphs Overlaid

I suppose you can think of any garbage collector as "two graphs overlaid", but I think it's particularly true here. C++ templates and the garbage collector interact in an interesting way.

Graph of Application Types

The data model is simple — it's shaped like statically typed Python!

Graph of GC Nodes

But each of these typed nodes can be viewed as one or more Obj* instances, with an 8 byte header:

#define OBJ_HEADER()    \
  uint8_t heap_tag_;    \
  uint8_t type_tag_;    \
  uint16_t field_mask_; \
  uint32_t obj_len_;

What are these fields?

Here are the 5 heap tags:

namespace Tag {
const int Forwarded = 1;  // For the Cheney algorithm.
const int Global = 3;     // Neither copy nor scan.
const int Opaque = 5;     // Copy but don't scan.  List<int> and Str
const int FixedSize = 7;  // Fixed size headers: consult field_mask_
const int Scanned = 9;    // Copy AND scan for non-NULL pointers.
}

(The tags are odd numbers because I ran into an issue with vtables, explained below.)


In reading about garbage collectors, one thing I didn't find emphasized is that the GC metadata for Cheney can be smaller than for mark and sweep, especially on 64-bit machines.

For example, CPython has next and prev pointers for its sweep phase. This is 16 bytes alone, compared to our 8 bytes which also includes type info and field masks for static types.

This per-object overhead is important in all language runtimes, but especially so in Oil, because it has many small records in its fine-grained code representation.

(Update: Aidenn0 mentioned the obvious issue that Cheney has 2x space overhead, which is very true. Though I still think this difference is worth mentioning and understanding.)

Important Types

"Ghost" Layout Types

The following types are never instantiated. After examining the heap tag, the garbage collector does a reinterpret_cast<> from Obj* to LayoutX* for convenience.

// for Tag::FixedSize
class LayoutFixed : public Obj {
 public:
  // only entries denoted in field_mask will be valid
  Obj* children_[16];
};

// for Tag::Forwarded
class LayoutForwarded : public Obj {
 public:
  Obj* new_location;
};

Summary

So GC graph nodes can be either:

Details

Here are some more facets of the design.

Operation That Resize

Tuples Aren't Heap Allocated

Instead, tuples are value types:

They're only used for destructuring function return values. For example:

s, num_chars = single_f.GetRaw()

becomes:

Tuple2<Str*, int> tup2 = single_f->GetRaw();  // value, not pointer
s = tup2.at0();
num_chars = tup2.at1();

So "mycpp" is a fairly different language than Python! And again, the generated code is meant to be readable and stepped through in a debugger.

In my experience, the more "exotic" C++ features you use, the more likely it is that the DWARF source location data is confusing or wrong.

In my mind, the generated C++ code is about the performance model, and the Python source is about the application logic.

Fly in the Ointment: VTable Pointer

I mentioned above that the heap tags are odd numbers. This was the result of a fun debugging session that reminded me that the compiler reserves a vtable pointer on objects with virtual methods :-)

The vtables should be stored at aligned addresses, so if the collector see an odd number in an Obj* header, it assumes it's a heap tag. If it sees an even number, then it advances 4 or 8 bytes past the pointer, and there should be an odd heap tag there.

TODO: Can we make this portable C++? What about big endian machines? One goal of Oil is to be portable like Lua (few #ifdefs), not like Python (many #ifdefs). I need help with these things.

Can Field Masks Be Computed at Compile Time in C++?

As mentioned, I tend to use a modest subset of C++, similar to old versions of Google's style guide.

But the problem of computing field masks seems like a natural application of constexpr and offsetof() in C++.

Our code works, but I had some problems, and ended up using more macros than expected. I don't quite remember what happened, since it was about a year ago, but it's something to make note of.

Interaction with ML-Style Data Types

Our Python classes only use single inheritance. But we also use algebraic data types via Zephyr ASDL, and its code generator also generates C++ classes.

An AST is a big and elaborate data structure, and a problem in ML is that you sometimes have trivial variants that wrap the same type, adding indirection. For example, the leaf Token type can be used as a variant of both word_t and expr_t.

These little wrappers would increase the load on the garbage collector. So I implemented variants as types, which translates to "nominal" multiple inheritance in C++:

That is, there's no field inheritance in the application, but the object header is the 4 C++ fields shown. We wouldn't want those fields to be inherited twice. So that's one reason you see macros instead of inheritance in certain places.

Also, offsetof() apparently isn't specified well in the presence of inheritance (?).

Cheney Choices

femtolisp has heuristics for this that I started to copy, but I ran into problems. This needs more thought.

What's Left to Do?

I want this article to be accurate in the future, so I've moved this section to a wiki page Tasks Under NLnet Grant.

The first pass was Compiler Engineer Job, but now I realize there are many subtasks we could use help on!


To get this done, I see two opposite strategies, with gradations in between:

  1. Push this design forward as is. I believe I've figured out almost all the core issues, like stack rooting and static types. The remainder is in need of solid engineering.
  2. Find someone to "own" the whole bottom half of Oil. You can translate Oil to fast native code in any way you want!

Why not do this all myself? I touched on this in Oil is Being Implemented "Middle Out". The project is very big: it's more like implementing three languages than one!

Conclusion

So I think this is a fun and small piece of code, and it should lead us to a production quality shell. For some evidence, see Oil's Parser is 160x to 200x Faster Than It Was 2 Years Ago (2020).

Remember that Oil is the most bash-compatible shell by a mile, and has been for several years! It's also a brand new language which I've gotten great feedback on.


Please share this post with compiler and language runtime engineers.

And please sponsor Oil if you can! We now have a small grant, but it may not cover all of this specialized work.

Appendix

More FAQs

There was some good discussion on the last post:

I expect these questions to come up, and I've made light comments on them in this series:

The real answer to many of these questions is more like "I haven't had time to look into it". So I wrote this post in hopes of parallelizing the project!

(Although "Shell has different implementation constraints than other languages" is also common!)

I started a wiki page: FAQ: Why Not Write Oil in X?

Links