Willus.com Home   |   Archive   |   About  

   CONTENTS

I. BACKGROUND
    1. Overview
    2. The Compilers
    3. Compiler Options
    4. The Programs
    5. Test Hardware
    6. Compiler Issues
    7. Other notes

II. RESULTS
    1. BW1D
    2. BZIP2
    3. CRAFTY
    4. K2PDFOPT (v1.30)
    5. LAME
    6. MESHER
    7. MODEL3D
    8. RESIZER
    9. TRANSCEND
    10. X264
    11. AVERAGE

III. SUMMARY

IV. COMMENTS

  
  
  
Willus.com's 2011 Win32/64 C Compiler Benchmarks:
III. SUMMARY

The Contenders
In my 2002 Benchmarks, Intel did very well. On the two benchmarks common to both 2002 and 2011 (BW1D and LAME), the average performance improvement of the fastest Intel-compiled code over the fastest gcc-compiled code was almost 80%. Intel had a similar performance margin over Microsoft in 2002. In 2011, on the otherhand, the result is similar, but the balance of power is shifting. Intel is still the overall winner based on a geometric mean of the performance scores, but this time the margins over gcc and Microsoft are only 7 - 26% (over gcc, depending on the compile flags and 32/64-bit--the best results, 64-bit, averaged 18% faster than gcc's) and 13 - 22% (over microsoft), respectively, and on six of the ten benchmarks, the fastest gcc-compiled executable is either faster or equivalent to (within 6%) the fastest Intel-compiled executable (MS VC 2010 only had one result within 6% of Intel's fastest score). This sort of performance seems roughly in line with what other people are getting or have gotten recently between Intel and gcc (Principled Technologies benchmarked Intel v9 vs. gcc 4.1.1 in Linux on SPEC CPU2000 SPECfp_rate_base in 2006 and showed Intel to be 22% faster). In addition, the fastest gcc 4.6.3 results show an average 20% improvement over gcc 3.4.2 (32-bit to 32-bit). Clearly the field has gained significant margin on the performance king, and that's good news for FOSS fans. The GCC team is to be commended, but they've still got work to do. Intel remains the compiler to beat.

The Rest
Digital Mars and Tiny CC certainly leave a lot to be desired in terms of run-time performance of their compiled executables (2.3X and 4.5X slower than Intel, respectively, on average), but for ease of setup and use, install size, and fast compiling, they are hard to beat, especially Tiny CC, which, on average, compiles and links C applications nearly 4X faster than its closest competitor (Microsoft), 7X faster than gcc 4.6.3, and over 10X faster than Intel! If your codes are small and/or already run plenty fast enough, Tiny CC is an excellent, easy-to-use option. If you need C++, too, then Digital Mars is your best option for installation compactness. I also want to give a nod to Open Watcom. Although it was not included in the results because it had trouble compiling all of the benchmarks (it is similar in capability and vintage to Digital Mars), it comes with an excellent set of tools and a good install package if you are just starting out and want to try out C programming with an easy-to-setup, easy-to-use compiler.

Intel's Transcendent Transcendental Performance
Two of Intel's clearest-cut victories are in BW1D and TRANSCEND. Both of these benchmarks make heavy use of sincos(), pow(), and other transcendental C math functions, so I wrote some programs to isolate the performance of these functions a la my MinGW fast math function testing code. Sure enough, Intel is definitely doing some in-line transcendental magic. On my sincos() loop test, the Intel-compiled version is 3 times faster than gcc's best result. On the pow() loop, Intel's margin increases--over 4 times. But the biggest margin is the exp() test loop: the Intel-compiled code is 8 times faster than gcc! I am sure this kind of margin on transcendentals is what lifts Intel to its impressive performance in BW1D and TRANSCEND. Hopefully some talented programmer can disassemble Intel's lightning fast transcendental C calls and mimic them in an open-source fast math library that can be tightly coupled to gcc. With something like that, I think gcc would be neck-and-neck with Intel for the best overall performance in these benchmarks.

[Update 1-19-2012: There is a discussion of the fast Intel math results in this gcc mailing list thread.]

Multi-threaded?
It sure doesn't look like the multi-threaded (automatic parallelization, denoted by an X under the // column in the results tables) compile options do anything on these examples (or I don't write code that parallelizes well!). In many cases, the performance gets worse, but when I watched my CPU performance in Task Manager, I could see that BW1D, for example, very occasionally utilized 2 threads with the Intel-compiled version (not so for the gcc version). I did some tests to convince myself that the compile options really work as advertised. They make a huge difference, for example, on the Intel-compiled versions of my transcendental function tests (because these consisted of a simple loop)--increasing the performance by a factor of 3 on my 4-thread CPU. But for these same transcedental test loops as compiled by gcc, the automatic parallelization had no effect. I was able to write an example of code that gcc could parallelize, but I had to spell out the parallel loops very obviously for gcc to turn them into multi-threaded loops. Intel's compiler seems to have more advanced automatic parallelization capability than gcc.

IPO, Profiling, and 64 bits
Interestingly, Intel gets more mileage out of the inter-procedural optimization (IPO) flags than gcc, but gcc gets a bit more out of the profiling flags. With Intel-compiled codes, the average benefit from IPO (-Qipo) is 6 to 10%, compared to 0 - 2% with gcc-compiled codes (-flto). Microsoft scores a 5% average IPO boost on 32-bit codes, but, strangely, 64-bit IPO with MS VC actually degraded performance on average. With Intel-compiled codes, the average benefit from profiling is 0 - 5%, compared to 4 - 9% with gcc-compiled codes. The advantage of compiling to 64-bit executables is significant--64-bit codes are faster than their 32-bit equivalents on every benchmark for Intel and gcc--by 10% to 20% on average, and in some cases by up to 70% (Crafty). A couple of MSVC's 64-bit results were actually slower than the 32-bit equivalents, but overall the performance improvement averaged around 10% for 64-bit VC 2010 codes.

My Compiler
These benchmarks did not make me want to switch from MinGW/gcc for my day-to-day compiling. Intel may have the performance edge, but the cleaner setup for gcc, its FOSS roots, its much smaller disk footprint, its wide array of compiling options, and its robust execution are all big draws for me. Plus gcc supports C++, Java, Fortran 95, Objective C, and Ada, and it supports a number of platforms, which makes porting my codes between Windows and Linux easier. I recommend my gcc 4.6.3 build.

Final Thoughts
Like I pointed out in 2002, in the end, picking the right compiler should involve more than just a few benchmark results. You'll want to find out what you are comfortable with, much of which will include how easy the compiler is to use, what extra tools it comes with, how good the documentation is, and exactly what you need it for. If you are a serious programmer, I highly recommend that you try more than one of these compilers. You can find links to these compilers on my Win32 C/C++ Compilers web page.

I'd again like to thank my very patient family for giving me the time to post this benchmark.


Go to the next page if you would like to post a comment.


      
  <<  Previous: AVERAGE

Next: COMMENTS   >>

 
This page last modified
Sunday, 08-Feb-2015 18:58:10 MST