|
|
Some Fast Math Functions
[Note 2011: My inline math functions are essentially not necessary
anymore with recent versions of gcc, but I leave this page up out of historical
interest. See Note 3 below.]
When MinGW's
pow function became 10x slower in release 3.0 and caused some of my
codes which used it heavily to become much slower, I started investigating
ways to implement some faster math functions. I first patched the 3.0
pow() function to go back to how it was in 2.0, but then I decided to
be more aggressive.
The floating point unit in most modern Intel and AMD CPU's (e.g. Pentiums
and Athlons) has many built-in transcendental functions such as sine,
cosine, arc-tangent, etc. These built-ins are automatically used by
the Microsoft C run-time library DLL which MinGW links to by default,
but making calls to the DLL typically incurs significant overhead.
You can use the header file here to in-line some of these functions
for faster performance on Pentiums and Athlons. It requires use of
the -ffast-math compile flag. I took
some of the code from Chapter 14 (pp. 807-808) of the Art of Assembly
Language link below. Note that the exp() and atan2() in-line versions
are actually slower on a 64-bit Opteron compile (SuSE Linux 8.0).
Also note that these in-line functions do not do any error
checking or trapping of any kind.
NOTE! My in-line pow() function now returns correct
results if the first argument is zero (Rev 1.01).
NOTE 2! GCC
v4.0 will include a more complete set of fast math intrinsics for
x87-compatible processors, including fsincos.
NOTE 3! (4-11-2010) I've noticed lately that the difference
between my in-lines and the gcc 3.x/4.x defaults depends significantly on
what arguments are sent to the functions. Sometimes mine are faster; sometimes
the gcc defaults are faster. In general, with gcc 4.x, I've found that only
my sincos in-line gives me any benefit over the gcc default on Core 2 processors,
and it's not by much.
x87inline.h
|
x87test.c
|
Art of Assembly
In-line
Assy How-To
|
In-line
Assy Linux Docs
|
Gnu C In-line Assy docs
Results: PIII
|
P4 Xeon
|
Opteron (32-bit)
|
Opteron (64-bit)
|
|
|
|
|