|
Willus.com's 2011 Win32/64 C Compiler Benchmarks: I. BACKGROUND
3. Compiler Options
I tried to use all the options I could for each compiler that maximize the
compiled executable performance, including math shortcuts, no frame pointers,
etc. I pulled the Intel compiler options from those commonly used by Intel on
their own
SPEC CPU 2006 results submissions.
There are four significant additional options
that I wanted to try out (where possible): 1. 32-bit vs. 64-bit compiles,
2. inter-procedural
optimizations (IPO--gcc calls this link-time optimization and Microsoft calls
it link-time code generation), 3. profiled compiles
(e.g. compile the code with profile generation option, run it, and then
compile it again using the data generated by the run), and 4. automatic
generation of parallel (multi-threaded) code (denoted by // in the results
tables). I tried out all of these
options in various combinations on Intel and gcc 4.6.3, and I included
Microsoft VC++ 2010 on the first two options (64-bit and IPO).
I discuss the effects of each of these in the Summary.
Some of these options are available on the other compilers (gcc 3.4.2),
but I on the rest of the compilers I just did a baseline set of optimization flags.
Click on a column heading to sort the rows based on the data in that column.
Company |
Version |
Command-line Options |
Microsoft Visual C/C++ 2010 |
v16.00 .40219.01 for 80x86/x64 |
All Compiles /Ox maximum optimizations (includes /Ob2 /Og /Oi /Ot /Oy)
/GS- disable security checks
/fp:fast "fast" floating point model (less predictable results)
/Qfast_transcendentals generate inline FP intrinsics
/arch:SSE2 enable use of SSE2-enabled CPU (not available on 64-bit compiles!)
Inter-procedural Optimizations (IPO) /GL enable link-time code generation
|
Intel 2011 |
v12.1.1.259 Oct 11, 2011 |
All Compiles -QxHOST generate instructions for the highest instruction set available on the host CPU (not used on x264--did not work)
-O3 max optimization level
-Qprec-div- don't improve precision of float divides
-Qopt-prefetch enable pre-fetch insertion optimization
-Qauto-ilp32 shrink 64-bit pointers/longs to 32-bit when safe to do so
/F1000000000 reserve 1 GB of stack
-fp:fast=2 most aggressive optimizations on floating point data
-Qstd=c99 C99 compliance (for x264 only)
Inter-procedural Optimizations (IPO) -Qipo interprocedural optimizations (affects linker and librarian also)
Profiled -Qprof-gen/use generate/use profile data
Multi-threaded -Qparallel enable multi-threaded (parallel) code generation
|
MinGW (gcc 4.6.3) |
v4.6.3 Dec 9, 2011 (pre-release) |
All Compiles -Ofast maximum optimization level + -ffast-math
-march=native generate instructions for the highest instruction set available on the host CPU. Note that generally you just want to use -mtune=native since -march=native may not run on lesser CPUs, but I wanted absolute max performance.
-fomit-frame-pointer remove frame pointer for all functions
-momit-leaf-frame-pointer don't keep frame pointer in a register for leaf functions
-Wall show all compiler warnings
-std=gnu99 C99 compliance (for x264 only)
Inter-procedural Optimizations (IPO) -flto link-time optimizations (I tried -fwhole-program also, but it had virtually no effect.)
Profiled -fprofile-generate/use generate/use profile data
Multi-threaded -fgraphite-identity enable identity transformation for GRAPHITE
-floop-interchange perform loop-interchange transformations on loops
-floop-block perform loop-blocking transformations on loops
-floop-parallelize-all identify loops that can be parallelized
-ftree-loop-distribution perform loop distribution
-ftree-parallelize-loops |
MinGW (gcc 3.4.2) |
v3.4.2 Sept 6, 2004 |
All Compiles -O3 maximum optimization level
-ffast-math fast floating point algorithms
-fomit-frame-pointer remove frame pointer for all functions
-momit-leaf-frame-pointer don't keep frame pointer in a register for leaf functions
-Wall show all compiler warnings
-std=gnu99 C99 compliance (for x264 only)
|
Digital Mars |
v8.52 2004 |
All Compiles -o+all run optimizer with "all" flag
-6 generate P6 code
-mn Win32 memory model
-ff fast in-line 8087 code
|
Tiny CC |
v0.9.25 May 29, 2009 |
All Compiles (No flags specified)
|
| |