Willus.com

Home | Archive | About

CONTENTS

I. BACKGROUND
    1. Overview
    2. The Compilers
    3. Compiler Options
    4. The Programs
    5. Test Hardware
    6. Compiler Issues
    7. Other notes

II. RESULTS
    1. BW1D
    2. BZIP2
    3. CRAFTY
    4. K2PDFOPT (v1.30)
    5. LAME
    6. MESHER
    7. MODEL3D
    8. RESIZER
    9. TRANSCEND
    10. X264
    11. AVERAGE

III. SUMMARY

IV. COMMENTS

Willus.com's 2011 Win32/64 C Compiler Benchmarks:
I. BACKGROUND
3. Compiler Options

I tried to use all the options I could for each compiler that maximize the compiled executable performance, including math shortcuts, no frame pointers, etc. I pulled the Intel compiler options from those commonly used by Intel on their own SPEC CPU 2006 results submissions. There are four significant additional options that I wanted to try out (where possible): 1. 32-bit vs. 64-bit compiles, 2. inter-procedural optimizations (IPO--gcc calls this link-time optimization and Microsoft calls it link-time code generation), 3. profiled compiles (e.g. compile the code with profile generation option, run it, and then compile it again using the data generated by the run), and 4. automatic generation of parallel (multi-threaded) code (denoted by // in the results tables). I tried out all of these options in various combinations on Intel and gcc 4.6.3, and I included Microsoft VC++ 2010 on the first two options (64-bit and IPO). I discuss the effects of each of these in the Summary. Some of these options are available on the other compilers (gcc 3.4.2), but I on the rest of the compilers I just did a baseline set of optimization flags.

Click on a column heading to sort the rows based on the data in that column.

Company	Version	Command-line Options
Microsoft Visual C/C++ 2010	v16.00 .40219.01 for 80x86/x64	All Compiles /Ox maximum optimizations (includes /Ob2 /Og /Oi /Ot /Oy) /GS- disable security checks /fp:fast "fast" floating point model (less predictable results) /Qfast_transcendentals generate inline FP intrinsics /arch:SSE2 enable use of SSE2-enabled CPU (not available on 64-bit compiles!) Inter-procedural Optimizations (IPO) /GL enable link-time code generation
Intel 2011	v12.1.1.259 Oct 11, 2011	All Compiles -QxHOST generate instructions for the highest instruction set available on the host CPU (not used on x264--did not work) -O3 max optimization level -Qprec-div- don't improve precision of float divides -Qopt-prefetch enable pre-fetch insertion optimization -Qauto-ilp32 shrink 64-bit pointers/longs to 32-bit when safe to do so /F1000000000 reserve 1 GB of stack -fp:fast=2 most aggressive optimizations on floating point data -Qstd=c99 C99 compliance (for x264 only) Inter-procedural Optimizations (IPO) -Qipo interprocedural optimizations (affects linker and librarian also) Profiled -Qprof-gen/use generate/use profile data Multi-threaded -Qparallel enable multi-threaded (parallel) code generation
MinGW (gcc 4.6.3)	v4.6.3 Dec 9, 2011 (pre-release)	All Compiles -Ofast maximum optimization level + -ffast-math -march=native generate instructions for the highest instruction set available on the host CPU. Note that generally you just want to use -mtune=native since -march=native may not run on lesser CPUs, but I wanted absolute max performance. -fomit-frame-pointer remove frame pointer for all functions -momit-leaf-frame-pointer don't keep frame pointer in a register for leaf functions -Wall show all compiler warnings -std=gnu99 C99 compliance (for x264 only) Inter-procedural Optimizations (IPO) -flto link-time optimizations (I tried -fwhole-program also, but it had virtually no effect.) Profiled -fprofile-generate/use generate/use profile data Multi-threaded -fgraphite-identity enable identity transformation for GRAPHITE -floop-interchange perform loop-interchange transformations on loops -floop-block perform loop-blocking transformations on loops -floop-parallelize-all identify loops that can be parallelized -ftree-loop-distribution perform loop distribution -ftree-parallelize-loops
MinGW (gcc 3.4.2)	v3.4.2 Sept 6, 2004	All Compiles -O3 maximum optimization level -ffast-math fast floating point algorithms -fomit-frame-pointer remove frame pointer for all functions -momit-leaf-frame-pointer don't keep frame pointer in a register for leaf functions -Wall show all compiler warnings -std=gnu99 C99 compliance (for x264 only)
Digital Mars	v8.52 2004	All Compiles -o+all run optimizer with "all" flag -6 generate P6 code -mn Win32 memory model -ff fast in-line 8087 code
Tiny CC	v0.9.25 May 29, 2009	All Compiles (No flags specified)

<< Previous: The Compilers

Next: The Programs >>

This page last modified
Sunday, 08-Feb-2015 18:58:10 MST