2016-12-30

1. What's the shortest function that implements the spec? We found one with four instructions, but didn't rule out the possibility of one with three.

2. In general, how does the minimum solution length relate to the dimensions of the problem (the distribution of 221 values into 21 groups, and 8 bits)? When does the problem become impossible?

This is a combinatorics problem. Fifteen years ago I would have asked on a Usenet math newsgroup.

A reason not to use this technique is that the numbers will change as the code evolves, and I don't necessarily want to do a new search every time.

1. Can we optimize the search? I noticed symmetries in the results -- i.e. there were 12 functions that had the same output distribution as `(id &
1. & (id + 13)`. Most search spaces for superoptimization are so big that they need to use stochastic techniques, but we could probably do some math for this specific problem to prune the search space.
1. What if the enum labeling was constrained by two or more lookup tables? That is, how would we find two expressions to replace two lookup tables on the same set of identifiers? This will probably happen in oil, and that's another reason I won't use this solution right away.

2. Our problem was to map 221 arbitrary IDs to 21 arbitary IDs, which is an easier problem than mapping 221 integers to 21 integers. So easy that we can solve it with single-threaded Python.

What other kinds of functions can be easily superoptimized?

1. How does the solution compare with perfect hashing? I believe perfect hash functions would be longer because the problem is more constrained. And some algorithms generate hashes that require lookup tables, which defeats the purpose of this optimization. (The Python code in this blog post may be useful.)

2. Do any compilers do this? They might do other kinds of superoptimization, but this problem appears unique because allowing the enum labels to vary makes the search exponentially easier.

C and C++ specify that unlabeled enums are assigned values from 0 to N-1, but the compiler should be able to detect if the real value is actually "observed" by any code. Strongly-typed enums in C++ 11 require an explicit `static_cast<int>` to be used as an integer.

1. What other techniques are there for replacing lookup tables with functions? Several google searches yielded nothing.

2. Finally, what's the difference between this code generated by GCC versus Clang? I wanted to verify that we're actually removing a memory access, so I compiled the code with `-O3` with both compilers.

The Clang code produces a sequence of four instructions as you might expect (`xor add and and`), but GCC produces a sequence with a `movzx` instruction in the middle.

I suspect there's no real difference: the `movzx` is a register-to-register operation, and it happens at the end of the Clang code too. So we have indeed removed a memory access with our algorithm.

This function:

```kind = 175 & id & ((id ^ 173) + 11)
```

compiles to this:

```blog-code/id-kind-func\$ ./run compare-gcc-clang
...
GCC

0000000000400580 <_Z10LookupKind2Id>:
400580:       89 f8                   mov    eax,edi
400582:       81 e7 af 00 00 00       and    edi,0xaf  # 0xaf is 175
40058b:       0f b6 c0                movzx  eax,al
40058e:       83 c0 0b                add    eax,0xb  # 0xb is 11
400591:       21 f8                   and    eax,edi
400593:       c3                      ret
400594:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
40059b:       00 00 00
40059e:       66 90                   xchg   ax,ax

...
Clang

00000000004005f0 <_Z10LookupKind2Id>:
4005f0:       40 88 f8                mov    al,dil
4005f5:       04 0b                   add    al,0xb   # 0xb is 11
4005f7:       40 20 f8                and    al,dil
4005fa:       24 af                   and    al,0xaf  # 0xaf is 175
4005fc:       0f b6 c0                movzx  eax,al
4005ff:       c3                      ret
```

Versions:

```blog-code/id-kind-func\$ ./run.sh show-versions
c++ (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
...

clang version 3.8.0 (tags/RELEASE_380/final)
Target: x86_64-unknown-linux-gnu
Is there any other difference between these two compilations? I'm not sure what the `nop` and `xchg` are for in the GCC listing -- they both look like no-op instructions.