I visited Fabrice Bellard's website
(Bellard is a brilliant French programmer)
for the first time in a while. His very first entry intrigued me, linking to a the
Large Text Compression Benchmark.
This is a comprehensive comparison of how well various compression algorithms do at
compressing a 1-GB XML dump (from 2006) of Wikipedia. (Of course Bellard's own program, nncp, had the best result as of the last update to the site, which was August 2021, though nncp takes over 2 days to get that result --and that's using a GPU).
So I went through the list and thought I'd try out
Mathieu Chartier's mcm
entry since it seemed to have the best combination of speed and performance. I compiled
it with MinGW gcc 11 and ran
my own benchmark of nearly the same uncompressed size: my Win32/64 package for MinGW gcc 11,
which has a tar-ball size of 1,032,924,160 bytes. The results,
along with results from several other standard compression utilities, are below.
Indeed, mcm gets the best compression, but not by much over xz. The widely used 7-zip also turns in a very respectable score with a good blend of speed and
compression performance. If you are interested in trying mcm in Windows, here is a
Win64 mcm .exe file (command-line based).
Program
Flags
Run Time (s)
Compression
size (bytes)
Compression
Ratio (bits per byte)
Compression
Speed (MB/s)
Compression
mcm
-x11
255.6
94,643,694
0.733
3.85
9.16%
mcm
-h11
235.1
95,544,308
0.740
4.19
9.25%
mcm
-x10
244.5
95,928,885
0.743
4.03
9.29%
mcm
-m11
218.5
96,805,951
0.750
4.51
9.37%
mcm
-h10
222.2
96,837,087
0.750
4.43
9.38%
xz
-9
264.9
97,335,200
0.754
3.72
9.42%
mcm
-x9
235.8
97,362,670
0.754
4.18
9.43%
mcm
-m10
199.5
98,080,521
0.760
4.94
9.50%
mcm
-h9
217.3
98,276,574
0.761
4.53
9.51%
mcm
-m9
193.1
99,489,105
0.771
5.10
9.63%
mcm
-x8
242.4
103,514,788
0.802
4.06
10.02%
mcm
-h8
211.7
104,457,794
0.809
4.65
10.11%
mcm
-m8
193.7
105,651,443
0.818
5.09
10.23%
mcm
-t11
131.9
108,643,785
0.841
7.47
10.52%
7z
-t7z -mx=9
-ms=on
186.9
108,736,435
0.842
5.27
10.53%
mcm
-t10
130.3
110,483,748
0.856
7.56
10.70%
mcm
-t9
127.5
112,022,467
0.868
7.73
10.85%
mcm
-t8
127.8
120,184,123
0.931
7.71
11.64%
7z
150.8
169,579,190
1.313
6.53
16.42%
xz
-0
51.1
307,871,492
2.384
19.29
29.81%
bzip2
--best
84.1
328,167,083
2.542
11.71
31.77%
bzip2
--fast
77.2
340,557,458
2.638
12.75
32.97%
gzip
--best
107.1
354,146,627
2.743
9.20
34.29%
zip
49.5
363,251,119
2.813
19.88
35.17%
gzip
--fast
19.2
391,682,939
3.034
51.31
37.92%
(Run times are on a Core i9-9900 PC running Windows 11.)