This is an update to How to Quickly and Correctly Generate a Git Log in HTML.
Reader comments showed that I oversimplified the problem, so I present a harder, more general, problem here.
For the impatient, the solution is like the one in the last post, except:
git log's %x00 to insert NUL bytes, because bash can't do
this.NUL bytes before escaping.Read on for details and the justification.
The previous article had multiple goals:
Regarding point #2, I think some of the comments missed the forest for the trees, so let me repeat it here:
There is a naive way to solve this problem, a.k.a the quick-but-unsafe way.
Someone argued for ignoring escaping, so this is not a strawman.
There is a pedantic way, a.k.a. the correct-but-clunky way.
This also isn't a strawman. Someone said they implemented something similar in Haskell using libgit bindings, and then later switched to shell because of dependency problems. The main problem that I pointed out was dependencies.
Both styles have drawbacks. I presented middle-ground solution &emdash; quick and correct, for some definition of correct.
I used the trick of surrounding substrings to be escaped with 0x01 and
0x02 bytes.
The middle-ground solution is s useful, and I chose to use it in "real life". But it also has drawbacks. How can we do better in the Oil shell? This is still an open question.
The harder problem includes the full git commit description. And you can
imagine adding any of the 20 or more fields that git log supports. Here's an
excerpt of example output:
| 4c353d1 | Wed Sep 20 00:44:45 2017 -0700 | Andy Chu |
Simplify, and add a git-specific solution. |
||
| a089617 | Tue Sep 19 11:23:33 2017 -0700 | Andy Chu |
Modify example: - Use bash $'' because it's not specific to git. It can be used with other tools too. - 0x01 and 0x02 so as not to confuse the issue of NUL in bash strings |
||
These solutions are the worst part of shell! Think of it from the side of the pedantic owners.
Overall point:
Oil discussion: can be saved for the wiki
TODO: Link to this post from the first post at the top:
The original problem was that git commit descriptions can have HTML
metacharacters like < and >.
In section 4 of the last post, I admitted that I'm just
pushing the problem around. Instead of avoiding < and >, now we're
avoiding 0x01 and 0x02 bytes. (It's trivial to insert those bytes: git
commit -m $'\x01'.)
However, I received 5 or 10 alternative solutions on Reddit, Lobsters, and Hacker News, and all of them pushed the problem around in different ways:
These assumptions might be fine in some situations, but what bugs me is that none of the solutions had error checking. If the assumption is violated, then who knows what the program does?
This is a major reason that shell scripts have a reputation for being hard to debug. Data is not validated, and error handling is an afterthought.
That said, in this post I'm going to focus on the simplest way to do something both quick and correct with existing tools.
One philosophy of Oil is not to reinvent the wheel unnecessarily. See my comment about "Make" -- I'm not just trying to address one pet peeve about shell. I'm trying to replace the entire thing, and that means thoroughly understanding how shell interacts sits in its ecosystem.
This post is a bunch of odds and ends.
My solution doesn't assume any of these things. Moreoever, it checks its assumption.
Grep -- $'\x00' does NOT NOT NOT WORK This is something I Need to fix with the shell. That is a horrible design choice! But it's also a problem with grep.
This was my fault. I oversimplified the problem in the name of having short code snippets.
The real problem was more complicated than people thought. I am not interseted in solutions that assume the format of the git hash, that assume that only one spaces have comma
This is the worst part of shell. It is annoying and error-prone to think about such things.
I don't want to change the format.
Rewriting escaping is error prone. Note that you have to reduce the escaping for &.
A goal of Oil shell is to reduce the number of escaping langauges you have to remember!!! Unix is a cacophony of messy languages. The messiness leads to insecurity.
Use %x00. I left this out for GENERALITY.
Also: safety:
git log --multiline | grep $'\x00'
Limitation sof the solution: Only one kind of escaping.
Point out the relation to utf-8. utf-8 can't contain 0x00 bl
Style proposal:
Tools should support a way to output \0 at the very least. And like git, %x00 is probably preferable.
find -print0 | xargs -0 already uses -0. This is established.
In fact, I might add patches to your tools!
Pedantic Solution
This commenter experienced exactly what I thought.
Here's an interesting result: I got at least 10 shell one-liners in response. If you think that the pedantic solution is right, please post it in a comment or Github gist.
The second task for you: Make sure that your friend can clone the gist and run it. The first thing he or she might say is: "oops this version is wrong".
I was going to talk more about the Oil way. I have enough thoughts on that to fill a blog post.
But unfortunately, Oil is still far away. I need to publish a new roadmap, but it looks like I'm going to work on [OSH][osh-language] for a while. For now, I think this discussion can take place on Reddit. I've started a thread here.
The nul count is not that satisfying. It reduces the "whipupitude" of the shell. I would like to do something better in Oil.