Performance testing
While developing it (and introducing small changes) I was continuously testing the performance of this 4 implementations. It was very disappointing that every time I run it I get different results. Unfortunately my PC does many things (Skype, Steam, Virus scan) and every test is ran in a little bit different conditions.
I could run those test for long time and do averages but averages also have some disadvantages - one poor run can ruin the whole result.
I decided run tests for long time (actually 2 days continuous work) and take
best results. Assuming that in ideal world I should run performance tests on machine doing nothing else (100% CPU usage) taking best results is as close as I can imagine and one poor run (when Skype decided to take 70% of CPU for no reason) won't spoil the whole test.
Some 'bests' were a little bit surprising though, so I used 'averages' to confirm them.
Note: LZ4 itself comes it 2 flavours - 32 and 64 bit. One was supposed to be run in x86 and the other in x64 architecture. But I decided to implement both for both architectures. And I'm glad I did (see below).
So finally we have 4 different approaches (Mixed, C++/CLI, Unsafe, Safe), all of them in two flavours (32 and 64-bit) and all of them can be run on x86 and x64 giving 16 different combinations.

As picture is worth 1024 words:
Compression

Things you probably expected:
- Mixed Mode: is the best (32bit/x86 and 64bit/x64)
- Mixed Mode: is a little bit better in 64bit mode (64bit/x64)
- Safe: is the slowest one (no kidding)
Although, there are things I was surprised with:
- Mixed Mode: (not a surprise to me, but requires explanation) Big dip on 64bit/x86 is probably due to the fact that BitScanForward64 is not available on x86 so it falls back to De Brujin
- C++/CLI: I don't know why C++/CLI is slower than my C# Unsafe but it is (all the way across the graph) (did I improve an algorithm a little or C++/CLI compiler is much worse then C# compiler?)
- Safe: 32bit/x64 is (surprisingly) faster than 64bit/x64 (so after switching platform to 64bit it's worth to stay with 32bit algorithm)
Decompression

This is something what I wasn't expecting. Every single algorithm behaves a little bit different.
- Mixed Mode: 64bit/x64 is actually a little bit slower than 32bit/x86
- C++/CLI: this time C++/CLI is better than Unsafe when it matters (32bit/x86 and 64bit/x64)
- Unsafe: 32bit/x64 beats 64bit/x64 and 64bit/x86 beats 32bit/x86 (again: on 64 bit platform 32 bit algorithm is better, on 32 bit platform 64 bit algorithm is better - that's an anomaly)
- Safe: on both platforms (x86 and x64) 64 bit decompression is faster than its 32-bit counterpart (so whatever platform you are on use 64 bit decompressor)
I confirmed all the anomalies with average values:
