Home

Git Log in HTML: A Harder Problem and A Safe Solution

2017-09-28

This is an update to How to Quickly and Correctly Generate a Git Log in HTML.

Reader comments showed that I oversimplified the problem, so I present a harder, more general, problem here.

For the impatient, the solution is like the one in the last post, except:

  1. Use git log's %x00 to insert NUL bytes, because bash can't do this.
  2. For adversarial input, check the number of NUL bytes before escaping.

Read on for details and the justification.

Table of Contents

Recap of the Argument

The previous article had multiple goals:

  1. To explain a trick for escaping HTML in shell. I said it was useful and practical, but I didn't claim it was safe against adversarial input.
  2. To make a point about programming style: I compared the naive vs. the pedantic solution.
  3. To explore a design space for the Oil shell. In particular, how should we communicating with structured data?

Regarding point #2, I think some of the comments missed the forest for the trees, so let me repeat it here:

  1. There is a naive way to solve this problem, a.k.a the quick-but-unsafe way.

    Someone argued for ignoring escaping, so this is not a strawman.

  2. There is a pedantic way, a.k.a. the correct-but-clunky way.

    This also isn't a strawman. Someone said they implemented something similar in Haskell using libgit bindings, and then later switched to shell because of dependency problems. The main problem that I pointed out was dependencies.

  3. Both styles have drawbacks. I presented middle-ground solution &emdash; quick and correct, for some definition of correct.

    I used the trick of surrounding substrings to be escaped with 0x01 and 0x02 bytes.

  4. The middle-ground solution is s useful, and I chose to use it in "real life". But it also has drawbacks. How can we do better in the Oil shell? This is still an open question.

A Harder Variant of the Problem

The harder problem includes the full git commit description. And you can imagine adding any of the 20 or more fields that git log supports. Here's an excerpt of example output:

4c353d1 Wed Sep 20 00:44:45 2017 -0700 Andy Chu
Simplify, and add a git-specific solution.
a089617 Tue Sep 19 11:23:33 2017 -0700 Andy Chu
Modify example:

- Use bash $'' because it's not specific to git.  It can be used with
  other tools too.
- 0x01 and 0x02 so as not to confuse the issue of NUL in bash strings

These solutions are the worst part of shell! Think of it from the side of the pedantic owners.

Overall point:

Oil discussion: can be saved for the wiki

TODO: Link to this post from the first post at the top:

The Worst Part of Shell: Pushing the Problem Around

The original problem was that git commit descriptions can have HTML metacharacters like < and >.

In section 4 of the last post, I admitted that I'm just pushing the problem around. Instead of avoiding < and >, now we're avoiding 0x01 and 0x02 bytes. (It's trivial to insert those bytes: git commit -m $'\x01'.)

However, I received 5 or 10 alternative solutions on Reddit, Lobsters, and Hacker News, and all of them pushed the problem around in different ways:

These assumptions might be fine in some situations, but what bugs me is that none of the solutions had error checking. If the assumption is violated, then who knows what the program does?

This is a major reason that shell scripts have a reputation for being hard to debug. Data is not validated, and error handling is an afterthought.

A Secure and Maintainable Solution

That said, in this post I'm going to focus on the simplest way to do something both quick and correct with existing tools.

One philosophy of Oil is not to reinvent the wheel unnecessarily. See my comment about "Make" -- I'm not just trying to address one pet peeve about shell. I'm trying to replace the entire thing, and that means thoroughly understanding how shell interacts sits in its ecosystem.

This post is a bunch of odds and ends.

My solution doesn't assume any of these things. Moreoever, it checks its assumption.

Grep -- $'\x00' does NOT NOT NOT WORK This is something I Need to fix with the shell. That is a horrible design choice! But it's also a problem with grep.

This was my fault. I oversimplified the problem in the name of having short code snippets.

The real problem was more complicated than people thought. I am not interseted in solutions that assume the format of the git hash, that assume that only one spaces have comma

This is the worst part of shell. It is annoying and error-prone to think about such things.

I don't want to change the format.

Limitation sof the solution: Only one kind of escaping.

Pedantic Solution

This commenter experienced exactly what I thought.

Here's an interesting result: I got at least 10 shell one-liners in response. If you think that the pedantic solution is right, please post it in a comment or Github gist.

The second task for you: Make sure that your friend can clone the gist and run it. The first thing he or she might say is: "oops this version is wrong".

How Tools Should Integrate with Oil

I was going to talk more about the Oil way. I have enough thoughts on that to fill a blog post.

But unfortunately, Oil is still far away. I need to publish a new roadmap, but it looks like I'm going to work on [OSH][osh-language] for a while. For now, I think this discussion can take place on Reddit. I've started a thread here.

Conclusion

The nul count is not that satisfying. It reduces the "whipupitude" of the shell. I would like to do something better in Oil.