Subsections
We present benchmarks of majority optimizations discussed. We also present the
same benchmarks performed on IA-32 and Alpha system where possible to give an
comparison of effectivity of individual optimizations on these architectures. We hope
this to be useful to apply earlier published results on compiler optimization
(such as [FDO])
to the new platform and give a guide of what optimizations are most important.
We also present results with two different optimization levels -- the standard
optimization (-O2) used by the majority of distributions today and
aggressive optimization (-O3 -ftracer -funroll-loops -funit-at-a-time with profile
feedback) we found to give best overall SPEC score.
We did use modified prerelease of GCC 3.3 as used by SuSE Linux 8.2 for AMD64.
All the runs were performed on SuSE Linux on dedicated machines, however
important amount of random noise remains (especially for benchmarks Mesa, Gzip,
Perl and Twolf). Due to time limitations the benchmarks were performed with
one iteration only except for the benchmarks in the Table
and that were computed
with 3 iterations. Because the runs were not done on final hardware and
because we didn't satisfy the conditions for reportable runs in all tests, we
present relative numbers only.
Each table is divided into two sections -- first part includes
optimizations enabled by default at given optimization level, while the
other part contains optimization that user needs to enable by hand
either because they are ineffective, inappropriate for given settings or
does not obey the language standards. Each table also contains
comparison of two runs with equal settings in the first line to present
rough approximation of the noise in the numbers. Both performance and
sizes of the stripped binaries are presented.
The numbers always represent relative speedup (or code size increase) from the run with the
specified feature disabled to the run with specified feature enabled. For
instance -fomit-frame-pointer run in the table
compare performance of -O2 -fno-omit-frame-pointer to -O2
-fomit-frame-pointer. The benchmark ``standard optimization'' compare
-O0 to -O2.
The Following benchmarks were performed:
- aggressive optimization
- compare performance of unoptimized code (-O0) to the aggressive optimization settings described above.
- all prologue using move
- eliminate use of all push and pop operations in the prologues and epilogues except for cases where single register is saved. See Section 2.2.
- -fasynchronous-unwind-tables
- enable production of DWARF2 unwind information. See Section 2.3.
- -fbranch-probabilities
- enable profile feedback based optimizations. We implemented majority of transformations described on [FDO] with the exception of function in-lining and switch statement expansion.
- -fgcse
- enable global optimizers including (limited form of) partial redundancy elimination, load motion, constant propagation and copy propagation. GCC does contain loop invariant hoisting and extended basic block based value numbering pass making the global optimizers partly redundant.
- -fguess-branch-probability
- enable optimizations driven by static profile estimation. The profile is estimated by methods based on [profile] when profile feedback is not available.
- -finline-functions
- enable function in-lining.
- -fold-unroll-loops
- enable old loop unroller that actually unrolls some loops on Alpha.
- -fomit-frame-pointer
- enable elimination of frame pointer by using stack pointer instead. See Section 2.2.
- -foptimize-sibling-calls
- transform call to leaf function into jump.
- -fpeel-loops
- enable loop peeling.
- -fpic
- produce position independent code. See Section 2.7.
- -freorder-blocks
- enable intra-function basic block reordering and duplication based on significantly modified software trace cache algorithm [STC].
- -fschedule-insns2
- enable post-register allocation local scheduling. See Section 3.3.
- -fschedule-insns
- enable pre-register allocation region scheduling (not available for IA-32 and AMD64).
- -fstrength-reduce
- enable strength reduction.
- -fstrict-aliasing
- enable ANSI-C type based aliasing.
- full sized loads and moves
- avoids use of instructions initializing just portion of the destination registers. See Section 3.2 and 3.1.
- -ftracer
- enable super-block formation using algorithm similar
to [FDO]. The super-blocks are unified again after optimizations by
cross-jumping pass so this transformation is not used to improve scheduling as
commonly described in the literature. It is aimed to improve CSE and other
transformation by simplifying the control flow.
- -funit-at-a-time
- enable optimizations on whole compilation unit. At the moment GCC perform stronger function in-lining (in-lining of small functions called before defined and static functions called once) and use register calling conventions for static functions on IA-32. Only effective for C compiler.
- -funroll-all-loops
- enable loop unrolling of all small enough loops in the hot spots.
- -funroll-loops
- enable loop unrolling for loops with known
induction variable. While working on the paper we noticed that our new
implementation has important flaw avoiding loops from being unrolled on Alpha
architecture.
- -m64
- enable 64-bit code generation (used in comparisons relative to IA-32 code).
- -mfpmath=sse
- eliminate use SSE(2) instruction set for scalar floating point calculations.
- -mcmodel
- controls code and data segment size limits. See Section 2.7.
- -mred-zone
- enable use of 128 bytes below stack pointer for local data. See Section 2.2.
- partial SSE moves
- eliminate use of movlpd for double precision loads and movsd for register to register moves. See Section 3.2.
- prologue using move
- eliminate use of hot push and pop operations in the prologues and epilogues. See Section 2.2.
- standard optimization
- compare performance of unoptimized code (-O0) to the standard optimization settings (-O2).
Table 1:
Compilation Time Cost (AMD Opteron)
options |
slowdown |
|
0.00%
|
-fstrict-aliasing |
-1.13%
|
-fasynchronous-unwind-tables |
-0.38%
|
-freorder-blocks |
0.00%
|
-fomit-frame-pointer |
0.37%
|
-mred-zone |
0.38%
|
-mfpmath=sse |
0.75%
|
-maccumulate-outgoing-args |
0.75%
|
-foptimize-sibling-calls |
0.76%
|
-fguess-branch-probabilities |
1.54%
|
-fschedule-insns2 |
2.33%
|
-fgcse |
6.88%
|
-ffast-math |
-1.88%
|
-ftracer |
0.00%
|
-frename-registers |
0.74%
|
-funroll-loops |
3.38%
|
-fpic |
3.39%
|
-funroll-all-loops |
5.32%
|
-mcmodel=medium |
2.27%
|
-fbranch-probabilities |
142.74%
|
Performance (relative speedups in percents):
Table:
64-bit SPECint 2000 with Standard Optimization (AMD Opteron)
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
avg |
|
1.32 |
0.14 |
-0.45 |
-0.45 |
-0.17 |
0.19 |
0.41 |
0.11 |
0.60 |
0.28 |
0.27 |
-0.54 |
0.13
|
standard optimization |
105.37 |
82.29 |
90.55 |
12.06 |
87.14 |
58.23 |
451.70 |
97.05 |
101.18 |
75.30 |
142.14 |
55.99 |
93.40
|
-fguess-branch |
4.40 |
4.45 |
2.90 |
0.00 |
2.73 |
0.19 |
5.58 |
5.96 |
7.43 |
21.60 |
2.56 |
-1.46 |
4.10
|
probabilities |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
1.62 |
1.44 |
2.40 |
0.22 |
0.32 |
0.78 |
4.90 |
1.28 |
-0.45 |
4.34 |
0.41 |
0.93 |
1.46
|
-fstrict-aliasing |
1.48 |
4.62 |
1.93 |
0.00 |
3.68 |
0.58 |
-2.34 |
1.75 |
0.75 |
4.27 |
4.79 |
-2.34 |
1.19
|
-mfpmath=sse |
1.93 |
3.98 |
-0.23 |
0.00 |
-0.09 |
-0.39 |
2.11 |
0.00 |
1.81 |
3.94 |
0.27 |
0.80 |
1.06
|
prologue using move |
-0.74 |
0.14 |
0.34 |
0.00 |
4.04 |
0.98 |
-0.43 |
1.43 |
0.30 |
5.71 |
-0.28 |
0.13 |
0.93
|
full sized loads and moves |
-1.76 |
-0.29 |
-0.46 |
0.88 |
0.96 |
-0.20 |
24.90 |
-1.52 |
-0.45 |
-1.04 |
0.97 |
-3.61 |
0.93
|
-fgcse |
1.17 |
4.28 |
-1.77 |
1.35 |
0.48 |
1.38 |
2.33 |
1.75 |
-1.48 |
1.55 |
1.26 |
0.13 |
0.92
|
-foptimize |
1.62 |
0.43 |
-0.12 |
0.00 |
3.33 |
0.00 |
2.33 |
-0.35 |
1.51 |
2.44 |
0.27 |
0.26 |
0.92
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-finline-functions |
1.62 |
0.71 |
0.22 |
1.11 |
0.32 |
3.08 |
0.30 |
-1.04 |
0.58 |
-0.99 |
2.21 |
0.67 |
0.65
|
-fomit-frame-pointer |
0.29 |
1.58 |
0.56 |
0.67 |
5.00 |
1.57 |
-3.03 |
3.07 |
-0.60 |
0.47 |
2.41 |
-3.48 |
0.39
|
-freorder-blocks |
3.61 |
-0.29 |
-0.57 |
0.22 |
2.31 |
-0.78 |
0.72 |
4.06 |
0.75 |
3.45 |
1.84 |
-5.31 |
0.39
|
-maccumulate- |
1.92 |
-0.58 |
0.78 |
0.45 |
0.24 |
-0.39 |
1.04 |
-0.12 |
-0.60 |
-1.13 |
0.13 |
0.80 |
0.26
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-mred-zone |
1.47 |
0.14 |
1.35 |
-0.23 |
1.30 |
-0.20 |
-1.73 |
0.00 |
-0.30 |
-0.29 |
0.55 |
-0.67 |
0.13
|
partial SSE moves |
-0.30 |
5.89 |
-0.92 |
0.00 |
0.07 |
0.00 |
-1.17 |
0.00 |
0.00 |
-0.10 |
-0.14 |
-3.36 |
-0.27
|
aggressive optimization |
6.34 |
4.97 |
8.81 |
0.67 |
1.29 |
25.43 |
24.14 |
12.29 |
7.51 |
5.69 |
5.42 |
4.65 |
8.40
|
-fbranch-probabilities |
5.95 |
1.71 |
7.13 |
0.22 |
-0.65 |
16.76 |
2.98 |
3.90 |
0.14 |
6.95 |
0.27 |
3.73 |
4.07
|
-funroll-all-loops |
4.16 |
0.42 |
5.60 |
0.00 |
-4.28 |
0.77 |
16.42 |
4.02 |
1.35 |
0.57 |
1.82 |
1.46 |
2.50
|
-funroll-loops |
3.71 |
0.28 |
4.17 |
0.00 |
0.08 |
0.58 |
15.35 |
1.61 |
1.35 |
-4.78 |
0.55 |
3.32 |
2.23
|
all prologue using move |
-0.60 |
0.56 |
2.38 |
-0.23 |
-0.40 |
0.58 |
3.73 |
3.19 |
-0.15 |
-4.29 |
0.55 |
4.68 |
1.05
|
-ffast-math |
1.78 |
0.28 |
0.67 |
0.00 |
-0.25 |
-0.20 |
0.31 |
-0.81 |
0.15 |
2.67 |
1.12 |
2.64 |
0.78
|
-frename-registers |
-0.15 |
0.56 |
-0.68 |
0.00 |
0.08 |
0.58 |
1.34 |
-2.19 |
-0.76 |
-1.25 |
0.97 |
4.92 |
0.65
|
-funit-at-a-time |
0.89 |
2.71 |
0.79 |
0.45 |
0.72 |
0.38 |
0.00 |
-0.47 |
-0.45 |
0.68 |
0.69 |
-0.93 |
0.39
|
-ftracer |
3.12 |
0.14 |
1.57 |
0.00 |
1.13 |
-0.20 |
1.76 |
0.91 |
-7.81 |
-3.83 |
1.40 |
2.40 |
0.13
|
-cmodel=medium |
-4.30 |
-1.00 |
-0.45 |
0.00 |
-10.84 |
0.00 |
2.18 |
-3.57 |
-5.83 |
-6.27 |
-2.23 |
-0.27 |
-2.51
|
-fpic |
-9.11 |
-1.72 |
-1.68 |
0.89 |
-18.21 |
-0.78 |
-1.36 |
-16.79 |
-3.76 |
-15.16 |
-6.18 |
-1.48 |
-6.20
|
File size (relative increase of the size of stripped binaries in percents):
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
total |
standard optimization |
-11.24 |
-23.04 |
-23.74 |
-20.59 |
-17.13 |
-13.77 |
-13.71 |
-20.00 |
-36.54 |
-9.42 |
-15.83 |
-39.29 |
-22.31
|
-maccumulate- |
-0.42 |
-4.02 |
-3.47 |
-3.34 |
-0.35 |
-3.30 |
-3.15 |
-3.29 |
-4.31 |
-3.60 |
5.16 |
-2.51 |
-3.25
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fomit-frame-pointer |
-0.26 |
1.72 |
-1.13 |
-0.20 |
0.04 |
-3.76 |
-1.94 |
-1.24 |
-1.07 |
2.08 |
-0.08 |
-0.99 |
-0.71
|
-fstrict-aliasing |
0.00 |
-0.68 |
-0.15 |
0.00 |
0.00 |
0.00 |
0.22 |
0.00 |
-0.34 |
-0.66 |
0.00 |
-5.02 |
-0.40
|
-mred-zone |
0.00 |
-0.11 |
-0.19 |
0.00 |
-0.02 |
0.00 |
-0.76 |
0.59 |
-0.02 |
0.00 |
0.00 |
-0.04 |
-0.09
|
-fschedule-insns2 |
0.00 |
0.02 |
-0.15 |
0.00 |
0.01 |
0.00 |
0.02 |
0.00 |
0.00 |
0.02 |
0.00 |
-0.07 |
-0.05
|
-fgcse |
-0.11 |
0.04 |
-0.16 |
0.19 |
0.03 |
0.11 |
0.44 |
0.68 |
-0.01 |
-0.68 |
0.00 |
-1.16 |
-0.05
|
-foptimize |
0.00 |
-0.03 |
0.08 |
0.00 |
-0.02 |
0.00 |
-0.76 |
0.48 |
-0.16 |
-0.01 |
-0.23 |
-0.10 |
-0.03
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
partial SSE moves |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.16 |
0.00 |
0.00 |
0.00 |
0.00 |
0.01 |
0.02
|
full sized loads and moves |
0.00 |
0.00 |
0.04 |
0.00 |
1.21 |
0.00 |
0.00 |
0.00 |
0.08 |
-0.01 |
0.00 |
0.11 |
0.08
|
-mfpmath=sse |
0.00 |
-0.64 |
-0.15 |
0.00 |
0.00 |
0.00 |
2.34 |
-0.01 |
0.00 |
0.00 |
0.00 |
-1.64 |
0.13
|
prologue using move |
-0.11 |
1.06 |
1.01 |
0.00 |
1.26 |
-0.34 |
0.91 |
0.84 |
1.44 |
2.55 |
0.00 |
0.16 |
1.14
|
-freorder-blocks |
7.06 |
2.71 |
4.43 |
0.00 |
4.05 |
3.67 |
1.07 |
5.72 |
3.42 |
5.60 |
10.89 |
4.22 |
4.19
|
-finline-functions |
-0.73 |
1.15 |
8.85 |
-0.20 |
0.24 |
28.60 |
0.12 |
6.55 |
3.37 |
1.99 |
29.84 |
0.68 |
5.49
|
-fguess-branch |
7.00 |
4.41 |
5.82 |
0.00 |
3.60 |
3.34 |
2.64 |
6.67 |
5.85 |
8.74 |
10.89 |
3.97 |
5.66
|
probabilities |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fasynchronous |
7.12 |
10.28 |
7.38 |
6.31 |
3.76 |
17.16 |
4.83 |
9.26 |
9.04 |
7.88 |
18.14 |
5.34 |
7.71
|
unwind-tables |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fbranch-probabilities |
-4.91 |
-2.07 |
-2.20 |
0.82 |
0.11 |
0.02 |
-2.44 |
-3.92 |
-3.74 |
-4.72 |
-7.30 |
-1.80 |
-2.85
|
-funit-at-a-time |
-22.64 |
-4.95 |
-1.50 |
0.00 |
0.00 |
0.00 |
0.00 |
-0.82 |
-0.08 |
-0.01 |
0.00 |
-0.10 |
-1.09
|
-ffast-math |
0.00 |
-0.03 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
-0.68 |
0.00 |
-0.02 |
0.00 |
0.01 |
-0.09
|
-frename-registers |
0.00 |
0.26 |
0.97 |
0.00 |
0.28 |
0.00 |
1.99 |
0.68 |
0.24 |
0.04 |
0.00 |
1.83 |
0.78
|
all prologue using move |
-0.73 |
4.14 |
1.14 |
-0.96 |
-0.33 |
2.18 |
1.35 |
0.87 |
1.60 |
0.52 |
-0.77 |
2.38 |
1.17
|
-ftracer |
0.00 |
1.27 |
1.29 |
0.00 |
0.13 |
0.00 |
2.50 |
2.01 |
2.46 |
1.31 |
0.00 |
1.54 |
1.56
|
-funroll-loops |
13.30 |
7.92 |
3.18 |
1.34 |
4.22 |
7.11 |
1.26 |
2.70 |
12.57 |
0.02 |
9.82 |
8.70 |
4.21
|
-funroll-all-loops |
13.30 |
9.53 |
4.29 |
24.50 |
4.71 |
14.20 |
1.43 |
3.38 |
15.76 |
0.66 |
9.82 |
14.40 |
5.71
|
-fpic |
12.11 |
6.53 |
3.62 |
1.14 |
21.40 |
9.38 |
1.92 |
6.48 |
15.53 |
9.16 |
7.06 |
16.66 |
7.55
|
-mcmodel=medium |
13.62 |
8.10 |
7.10 |
0.00 |
17.57 |
7.44 |
6.35 |
8.29 |
8.35 |
6.64 |
9.90 |
13.33 |
8.09
|
aggressive optimization |
-14.42 |
4.03 |
21.89 |
5.12 |
6.44 |
44.45 |
-0.47 |
8.80 |
7.38 |
0.73 |
40.05 |
3.93 |
11.08
|
Performance (relative speedups in percents):
Table:
64-bit SPECfp 2000 with Standard Optimization (AMD Opteron)
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
avg |
|
-0.28 |
-0.13 |
0.00 |
0.00 |
0.23 |
-2.07 |
0.14 |
0.00 |
0.00 |
0.00 |
-0.16
|
standard optimization |
102.22 |
54.49 |
633.14 |
220.37 |
79.20 |
22.69 |
90.76 |
111.08 |
204.34 |
192.64 |
142.52
|
-mfpmath=sse |
9.30 |
0.12 |
3.31 |
2.38 |
11.68 |
102.55 |
0.28 |
8.32 |
11.53 |
6.01 |
12.43
|
-fguess-branch- |
7.62 |
0.00 |
6.42 |
2.78 |
7.48 |
0.42 |
-2.23 |
-1.27 |
-0.29 |
4.72 |
2.75
|
probabilities |
|
|
|
|
|
|
|
|
|
|
|
partial SSE moves |
2.86 |
0.13 |
2.95 |
3.21 |
3.34 |
-3.26 |
0.86 |
3.11 |
3.86 |
3.33 |
2.12
|
full sized loads and moves |
2.13 |
0.26 |
1.35 |
1.98 |
6.38 |
0.69 |
0.00 |
2.00 |
1.45 |
1.55 |
1.78
|
-fstrict-aliasing |
0.00 |
0.12 |
0.00 |
0.19 |
2.22 |
5.22 |
-2.23 |
0.90 |
0.00 |
5.08 |
1.44
|
-fschedule-insns2 |
2.23 |
0.00 |
7.72 |
0.78 |
0.34 |
-1.40 |
-2.50 |
0.90 |
4.50 |
1.01 |
1.28
|
-freorder-blocks |
0.97 |
0.12 |
0.18 |
0.19 |
13.09 |
2.28 |
0.28 |
0.00 |
-1.42 |
0.00 |
1.28
|
-fomit-frame-pointer |
2.51 |
0.00 |
4.53 |
0.38 |
-0.58 |
-1.80 |
-1.13 |
0.90 |
-0.29 |
3.63 |
0.95
|
prologue using move |
-3.24 |
0.00 |
0.00 |
0.00 |
3.58 |
0.69 |
0.00 |
-0.14 |
0.00 |
0.00 |
0.15
|
-finline-functions |
0.13 |
0.12 |
0.00 |
0.19 |
1.85 |
-1.51 |
1.84 |
-0.52 |
0.28 |
-0.17 |
0.15
|
-foptimize |
0.82 |
0.12 |
0.18 |
0.19 |
-0.46 |
-0.97 |
0.00 |
0.12 |
0.00 |
0.00 |
0.00
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
-mred-zone |
0.00 |
0.00 |
0.00 |
0.38 |
0.57 |
0.97 |
-2.10 |
-0.26 |
0.00 |
0.16 |
0.00
|
-maccumulate- |
0.55 |
-0.13 |
0.18 |
0.00 |
0.45 |
-3.46 |
0.00 |
0.00 |
-0.29 |
0.33 |
-0.16
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-fgcse |
1.37 |
0.00 |
-7.19 |
-5.15 |
-0.23 |
0.69 |
0.42 |
-0.64 |
-4.14 |
-2.13 |
-1.71
|
aggressive optimization |
5.57 |
-0.91 |
6.60 |
4.26 |
4.14 |
-1.93 |
7.96 |
3.58 |
10.63 |
-2.34 |
3.15
|
-funroll-all-loops |
2.72 |
-0.13 |
1.88 |
2.32 |
-1.50 |
5.58 |
0.42 |
3.58 |
-0.29 |
1.16 |
1.58
|
-funroll-loops |
2.72 |
0.00 |
1.88 |
2.51 |
-0.92 |
2.67 |
2.13 |
3.58 |
-0.29 |
1.16 |
1.57
|
-ffast-math |
0.81 |
0.00 |
0.00 |
2.13 |
1.26 |
-3.16 |
0.99 |
4.74 |
0.57 |
1.50 |
0.94
|
all prologue using move |
4.18 |
0.00 |
-0.39 |
0.19 |
0.23 |
-0.98 |
1.86 |
-0.27 |
1.14 |
0.34 |
0.63
|
-fbranch-probabilities |
-3.44 |
0.12 |
-0.94 |
0.38 |
15.14 |
-1.40 |
-0.15 |
-0.65 |
0.85 |
-3.35 |
0.15
|
-funit-at-a-time |
0.13 |
0.12 |
-0.19 |
0.00 |
3.93 |
-3.54 |
0.14 |
0.12 |
0.00 |
-0.17 |
0.15
|
-frename-registers |
-3.54 |
-0.26 |
5.66 |
-0.39 |
-7.23 |
-1.11 |
4.97 |
3.46 |
0.86 |
-0.34 |
0.15
|
-ftracer |
-0.82 |
0.00 |
0.00 |
0.00 |
-2.87 |
-2.35 |
-0.15 |
0.77 |
0.86 |
-0.67 |
-0.64
|
-cmodel=medium |
2.73 |
-0.26 |
-0.19 |
-0.39 |
-3.69 |
-0.83 |
-0.72 |
-1.03 |
-14.95 |
-0.17 |
-1.90
|
-fpic |
0.95 |
0.00 |
0.37 |
-0.97 |
1.72 |
-0.29 |
0.71 |
-0.13 |
-20.98 |
-0.17 |
-1.90
|
File size (relative increase of the size of stripped binaries in percents):
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
total |
standard optimization |
-25.71 |
-26.52 |
-36.03 |
-60.14 |
-34.62 |
-15.82 |
-33.14 |
-32.33 |
-38.32 |
-30.33 |
-36.85
|
-maccumulate- |
-1.63 |
-0.71 |
-1.83 |
-0.71 |
-3.40 |
-2.07 |
-1.80 |
-2.77 |
-1.12 |
-1.17 |
-1.89
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
0.00 |
0.00 |
0.00 |
0.05 |
0.00 |
0.00 |
0.00 |
0.02 |
-0.43 |
0.00 |
-0.21
|
-mred-zone |
0.00 |
0.00 |
-0.19 |
-2.31 |
-0.13 |
-0.08 |
-0.14 |
-0.12 |
-0.03 |
-0.12 |
-0.14
|
-fgcse |
0.00 |
-8.64 |
-4.00 |
-10.19 |
-0.74 |
1.91 |
-0.38 |
0.00 |
1.70 |
-3.61 |
-0.07
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
-0.13 |
0.07 |
0.00 |
-0.05 |
0.00 |
0.00 |
-0.04
|
-foptimize |
0.00 |
0.00 |
0.00 |
0.00 |
-0.24 |
0.00 |
0.00 |
0.04 |
-0.02 |
0.68 |
-0.02
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
full sized loads and moves |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.40 |
0.00 |
0.75 |
0.08
|
-fomit-frame-pointer |
0.00 |
0.47 |
0.75 |
-1.97 |
-0.05 |
0.39 |
-0.14 |
0.37 |
0.12 |
5.74 |
0.43
|
partial SSE moves |
0.00 |
0.23 |
0.00 |
0.71 |
0.79 |
0.00 |
0.00 |
0.24 |
0.43 |
0.89 |
0.53
|
prologue using move |
-0.28 |
0.00 |
0.00 |
0.11 |
1.78 |
0.00 |
0.00 |
0.26 |
-0.02 |
0.70 |
0.53
|
-freorder-blocks |
0.00 |
0.47 |
0.00 |
0.11 |
2.44 |
0.00 |
0.00 |
2.62 |
0.86 |
1.37 |
1.38
|
-mfpmath=sse |
0.00 |
2.16 |
0.00 |
6.26 |
-1.57 |
0.00 |
-0.14 |
3.19 |
2.65 |
4.39 |
1.60
|
-fguess-branch |
-0.28 |
1.43 |
0.00 |
-0.36 |
5.10 |
12.16 |
10.56 |
3.04 |
0.41 |
1.19 |
2.09
|
probabilities |
|
|
|
|
|
|
|
|
|
|
|
-finline-functions |
0.00 |
0.00 |
0.00 |
0.00 |
5.39 |
19.96 |
0.13 |
0.42 |
1.29 |
1.50 |
2.45
|
-fasynchronous- |
9.34 |
3.15 |
6.75 |
1.92 |
10.46 |
16.55 |
13.01 |
6.21 |
1.25 |
3.83 |
4.67
|
unwind-info |
|
|
|
|
|
|
|
|
|
|
|
-fbranch-probabilities |
0.64 |
0.15 |
0.76 |
0.19 |
-5.23 |
0.70 |
0.61 |
-2.11 |
-0.28 |
-0.06 |
-1.58
|
-ffast-math |
0.00 |
-0.95 |
0.00 |
0.58 |
-0.83 |
-13.04 |
-0.27 |
-5.57 |
0.86 |
0.00 |
-0.35
|
-funit-at-a-time |
0.00 |
0.00 |
0.00 |
0.00 |
-0.07 |
0.00 |
0.00 |
-0.03 |
0.00 |
0.00 |
-0.03
|
all prologue using move |
-0.28 |
1.40 |
0.37 |
1.29 |
0.78 |
-1.02 |
-0.40 |
2.26 |
0.61 |
1.96 |
0.86
|
-ftracer |
0.00 |
0.00 |
0.00 |
0.00 |
2.37 |
0.07 |
0.00 |
5.45 |
0.43 |
3.35 |
1.51
|
-frename-registers |
0.00 |
0.47 |
0.00 |
2.65 |
1.78 |
0.00 |
0.00 |
2.60 |
2.58 |
0.86 |
2.10
|
-funroll-loops |
1.93 |
24.69 |
6.32 |
6.42 |
7.95 |
20.05 |
0.65 |
11.14 |
3.02 |
6.63 |
5.63
|
-funroll-all-loops |
1.93 |
24.69 |
7.25 |
6.42 |
8.19 |
20.05 |
2.35 |
11.14 |
3.02 |
6.63 |
5.73
|
-fpic |
0.45 |
0.23 |
0.93 |
2.24 |
5.92 |
9.28 |
7.71 |
4.91 |
8.04 |
3.75 |
6.51
|
-mcmodel=medium |
0.09 |
4.93 |
0.00 |
7.49 |
3.53 |
0.85 |
1.83 |
5.45 |
24.62 |
6.36 |
14.32
|
aggressive optimization |
71.81 |
164.20 |
125.37 |
57.30 |
11.28 |
97.53 |
52.54 |
12.91 |
26.21 |
34.10 |
26.45
|
Performance (relative speedups in percents):
Table:
64-bit SPECint 2000 with Aggressive Optimization (AMD Opteron)
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
avg |
|
-0.28 |
-0.41 |
0.20 |
-0.45 |
0.00 |
-0.16 |
0.00 |
-0.11 |
0.84 |
0.00 |
0.13 |
0.38 |
0.12
|
aggressive optimization |
112.35 |
91.73 |
103.60 |
14.72 |
86.01 |
97.56 |
589.65 |
130.46 |
111.79 |
74.46 |
151.98 |
56.79 |
106.81
|
-fbranch-probabilities |
8.40 |
2.62 |
10.71 |
0.22 |
3.38 |
21.72 |
27.67 |
27.67 |
14.24 |
10.37 |
4.39 |
-1.56 |
9.49
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
full sized loads and moves |
1.00 |
0.67 |
-0.53 |
0.00 |
0.97 |
-0.48 |
56.39 |
1.79 |
0.71 |
0.62 |
0.13 |
4.64 |
4.61
|
-fbranch-probabilities |
2.69 |
0.00 |
5.62 |
-0.45 |
2.62 |
19.85 |
-0.92 |
11.94 |
4.06 |
2.29 |
1.07 |
0.51 |
3.77
|
-m64 |
9.90 |
0.27 |
3.39 |
-22.19 |
42.29 |
-2.13 |
45.66 |
0.30 |
-1.25 |
6.29 |
8.28 |
-13.33 |
3.38
|
-funroll-loops |
1.69 |
0.54 |
0.41 |
0.22 |
0.88 |
1.41 |
16.94 |
7.59 |
0.56 |
1.73 |
0.93 |
4.62 |
3.12
|
-freorder-blocks |
4.95 |
1.22 |
4.51 |
0.22 |
3.89 |
1.89 |
2.40 |
13.06 |
-0.56 |
-1.42 |
0.40 |
1.15 |
2.48
|
-fomit-frame-pointer |
0.13 |
0.00 |
2.19 |
0.44 |
2.03 |
1.73 |
2.31 |
5.38 |
-0.28 |
1.08 |
1.47 |
5.05 |
2.10
|
-fstrict-aliasing |
-0.56 |
4.80 |
0.82 |
0.44 |
1.04 |
1.89 |
1.61 |
2.08 |
1.72 |
1.64 |
5.88 |
1.15 |
1.85
|
-finline-functions |
-0.42 |
0.54 |
1.55 |
2.02 |
1.86 |
5.21 |
1.01 |
-0.31 |
0.42 |
3.62 |
3.13 |
2.75 |
1.85
|
-ftracer |
-0.69 |
-0.27 |
0.30 |
0.00 |
1.12 |
0.78 |
5.20 |
3.93 |
0.14 |
0.27 |
0.53 |
4.90 |
1.60
|
-fschedule-insns2 |
0.27 |
2.62 |
0.41 |
0.22 |
4.24 |
0.46 |
2.57 |
1.55 |
0.99 |
3.34 |
1.61 |
0.64 |
1.47
|
-mred-zone |
-0.42 |
0.13 |
0.61 |
0.66 |
0.96 |
0.31 |
-1.33 |
1.56 |
-0.56 |
7.01 |
-0.14 |
3.56 |
1.22
|
-fgcse |
2.70 |
4.06 |
1.14 |
-0.23 |
3.47 |
-0.77 |
-0.51 |
-0.82 |
2.29 |
1.27 |
0.93 |
0.25 |
1.10
|
-mfpmath=sse |
-0.28 |
2.48 |
-0.52 |
0.66 |
1.95 |
0.78 |
9.05 |
0.72 |
0.14 |
-2.80 |
-0.14 |
1.42 |
1.10
|
-frename-registers |
-0.42 |
1.22 |
-1.13 |
-0.45 |
4.24 |
0.46 |
-1.90 |
-0.72 |
-0.97 |
1.91 |
1.47 |
4.81 |
0.98
|
-funit-at-a-time |
-0.56 |
3.50 |
-1.23 |
0.22 |
1.12 |
0.93 |
0.16 |
-1.42 |
2.73 |
3.43 |
-0.27 |
2.64 |
0.98
|
prologue using move |
-0.43 |
0.54 |
1.06 |
0.43 |
1.06 |
0.79 |
-2.75 |
1.89 |
3.63 |
6.29 |
-0.14 |
-0.26 |
0.86
|
partial SSE moves |
-0.29 |
0.81 |
0.10 |
-0.44 |
0.00 |
0.63 |
0.00 |
0.62 |
0.00 |
0.26 |
-0.40 |
4.78 |
0.73
|
-foptimize |
0.00 |
-0.14 |
0.61 |
0.22 |
0.96 |
0.78 |
1.96 |
0.00 |
-1.93 |
-1.86 |
-0.27 |
3.15 |
0.60
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-maccumulate- |
-0.28 |
0.94 |
-0.11 |
-0.23 |
2.53 |
0.46 |
1.18 |
-0.72 |
2.43 |
-0.81 |
0.13 |
0.63 |
0.48
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fstrength-reduce |
-0.42 |
0.26 |
-1.22 |
0.00 |
0.64 |
0.00 |
-0.59 |
-1.81 |
0.42 |
4.30 |
-0.14 |
-0.13 |
0.00
|
all prologue using move |
-1.13 |
-0.27 |
-0.32 |
-0.22 |
1.28 |
0.94 |
6.46 |
-0.11 |
1.54 |
-1.33 |
0.39 |
0.50 |
0.61
|
-ffast-math |
-0.28 |
0.40 |
-1.24 |
-0.23 |
-1.92 |
0.00 |
0.08 |
0.10 |
0.56 |
1.34 |
-0.27 |
-3.56 |
-0.73
|
-fpeel-loops |
0.00 |
0.13 |
-1.13 |
0.22 |
-1.20 |
-0.62 |
0.08 |
-1.34 |
-1.69 |
-3.86 |
-0.40 |
-0.26 |
-0.73
|
-funroll-all-loops |
0.00 |
0.13 |
0.10 |
0.00 |
-0.48 |
-0.16 |
-0.84 |
2.04 |
-2.12 |
-5.58 |
0.26 |
-7.90 |
-1.70
|
-cmodel=medium |
-5.12 |
-1.21 |
-2.97 |
0.44 |
-10.61 |
-0.78 |
-1.09 |
0.00 |
0.28 |
-4.85 |
-0.67 |
-7.74 |
-3.28
|
-fpic |
-12.73 |
-1.89 |
-2.36 |
-0.89 |
-13.88 |
-6.96 |
-4.36 |
-12.79 |
-2.11 |
-18.23 |
-10.03 |
-8.87 |
-8.12
|
File size (relative increase of the size of stripped binaries in percents):
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
total |
aggressive optimization |
-24.01 |
-19.87 |
-6.95 |
-16.43 |
-11.81 |
24.89 |
-14.11 |
-12.48 |
-31.74 |
-8.77 |
17.87 |
-36.91 |
-13.57
|
-fbranch-probabilities |
-12.51 |
-8.07 |
-5.50 |
-0.95 |
-2.64 |
-2.55 |
-5.80 |
-7.77 |
-14.58 |
-5.56 |
-12.11 |
-10.22 |
-7.10
|
-maccumulate- |
-1.79 |
-1.55 |
-2.33 |
-1.44 |
-0.87 |
-2.85 |
-3.31 |
-1.77 |
-4.10 |
-3.78 |
3.05 |
-1.85 |
-2.58
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fgcse |
0.73 |
-1.16 |
-1.95 |
-0.37 |
-1.92 |
-1.27 |
-0.32 |
-0.59 |
-0.38 |
-0.68 |
-0.06 |
-3.33 |
-1.23
|
-fomit-frame-pointer |
-1.38 |
1.02 |
-0.81 |
-0.91 |
-0.27 |
-1.20 |
-1.94 |
-1.43 |
-1.10 |
1.41 |
-0.06 |
-1.20 |
-0.72
|
-fstrict-aliasing |
0.12 |
-1.14 |
-0.11 |
-0.73 |
0.00 |
0.36 |
0.36 |
-0.58 |
-0.56 |
-0.66 |
0.00 |
-5.14 |
-0.46
|
-mred-zone |
0.00 |
-0.06 |
-0.06 |
0.00 |
0.00 |
0.00 |
-0.34 |
-0.04 |
-0.02 |
0.12 |
0.00 |
-0.05 |
-0.05
|
-fschedule-insns2 |
-0.07 |
-0.06 |
-0.07 |
-0.19 |
0.01 |
0.07 |
0.00 |
-0.01 |
-0.02 |
0.00 |
0.00 |
-0.04 |
-0.03
|
-foptimize |
0.06 |
-0.04 |
0.10 |
0.00 |
0.00 |
-0.04 |
-0.45 |
-0.20 |
0.13 |
-0.01 |
-0.06 |
-0.05 |
-0.03
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fstrength-reduce |
0.24 |
0.11 |
-0.01 |
0.18 |
0.01 |
0.03 |
-0.02 |
0.00 |
0.10 |
0.00 |
0.00 |
0.12 |
0.02
|
partial SSE moves |
0.00 |
0.27 |
0.00 |
0.00 |
0.00 |
0.01 |
0.24 |
0.00 |
0.00 |
0.00 |
0.00 |
0.01 |
0.03
|
full sized loads and moves |
0.18 |
0.09 |
0.17 |
0.00 |
0.00 |
0.40 |
0.01 |
0.00 |
0.13 |
0.00 |
0.00 |
0.07 |
0.10
|
-mfpmath=sse |
0.00 |
-1.35 |
-0.05 |
-0.55 |
-0.14 |
-0.08 |
3.34 |
-0.58 |
0.00 |
0.00 |
0.00 |
-1.39 |
0.15
|
prologue using move |
0.00 |
0.07 |
0.14 |
0.00 |
-0.05 |
0.40 |
-0.02 |
0.45 |
0.28 |
0.37 |
-0.06 |
0.06 |
0.20
|
-funroll-loops |
1.73 |
0.98 |
0.34 |
3.97 |
1.51 |
3.22 |
0.28 |
0.04 |
1.00 |
0.00 |
0.00 |
0.77 |
0.52
|
-freorder-blocks |
0.24 |
0.11 |
1.05 |
-0.55 |
0.00 |
-0.04 |
0.20 |
0.63 |
0.36 |
0.00 |
0.00 |
0.21 |
0.53
|
-frename-registers |
1.35 |
1.18 |
1.26 |
0.00 |
1.47 |
0.71 |
2.27 |
0.67 |
0.62 |
0.66 |
0.00 |
2.19 |
1.16
|
-ftracer |
0.67 |
1.36 |
1.57 |
2.61 |
2.02 |
2.29 |
0.44 |
1.30 |
1.61 |
2.01 |
0.00 |
0.58 |
1.43
|
-fbranch-probabilities |
6.09 |
4.09 |
5.60 |
5.44 |
6.03 |
9.87 |
-0.21 |
3.90 |
3.58 |
4.49 |
7.78 |
3.27 |
4.40
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
-funit-at-a-time |
-14.10 |
2.25 |
12.02 |
0.00 |
2.04 |
5.62 |
0.00 |
4.14 |
6.08 |
2.66 |
7.60 |
1.92 |
5.94
|
-m64 |
16.48 |
-2.64 |
8.02 |
18.47 |
-19.00 |
15.52 |
0.25 |
11.38 |
9.65 |
-5.69 |
8.64 |
-3.44 |
3.90
|
-finline-functions |
8.71 |
7.94 |
23.54 |
2.80 |
3.51 |
39.11 |
-0.09 |
11.96 |
9.86 |
4.17 |
39.65 |
2.71 |
12.98
|
-ffast-math |
0.00 |
-0.02 |
0.03 |
0.00 |
0.00 |
0.00 |
0.00 |
-0.05 |
0.00 |
-0.02 |
0.00 |
0.01 |
0.00
|
-funroll-all-loops |
0.00 |
0.23 |
0.04 |
2.18 |
0.00 |
1.26 |
0.00 |
0.57 |
0.09 |
0.00 |
0.00 |
-2.94 |
0.03
|
-fpic |
16.27 |
4.69 |
-6.01 |
0.18 |
17.87 |
-21.91 |
0.96 |
1.39 |
6.50 |
7.12 |
-21.77 |
14.97 |
0.38
|
-fpeel-loops |
1.57 |
0.39 |
0.35 |
1.63 |
1.98 |
5.80 |
0.00 |
0.57 |
0.96 |
0.00 |
0.00 |
1.25 |
0.66
|
all prologue using move |
2.18 |
2.85 |
1.30 |
1.45 |
0.26 |
2.63 |
2.31 |
1.71 |
2.95 |
2.77 |
-0.72 |
2.62 |
1.91
|
-mcmodel=medium |
14.15 |
9.85 |
7.56 |
19.12 |
18.58 |
7.95 |
5.97 |
9.93 |
9.90 |
7.91 |
21.15 |
12.94 |
9.01
|
Performance (relative speedups in percents):
Table:
64-bit SPECfp 2000 with Aggressive Optimization (AMD Opteron)
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
avg |
|
1.30 |
0.00 |
0.89 |
0.56 |
-5.34 |
-0.28 |
0.00 |
-0.13 |
-1.29 |
1.21 |
-0.16
|
aggressive optimization |
101.11 |
53.87 |
686.79 |
225.30 |
101.38 |
26.80 |
100.81 |
123.51 |
225.00 |
180.97 |
149.23
|
-m64 |
5.00 |
-0.27 |
16.25 |
9.79 |
28.55 |
83.54 |
-1.31 |
19.17 |
28.33 |
20.86 |
19.34
|
-mfpmath=sse |
13.97 |
0.12 |
2.40 |
2.33 |
7.04 |
100.28 |
1.79 |
16.64 |
22.22 |
5.67 |
13.80
|
-fbranch-probabilities |
-0.83 |
0.39 |
10.83 |
3.96 |
19.62 |
2.23 |
-0.28 |
6.85 |
2.24 |
0.70 |
3.98
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
partial SSE moves |
1.58 |
0.13 |
2.18 |
1.76 |
0.70 |
1.27 |
-2.51 |
3.17 |
6.14 |
2.54 |
1.74
|
-fstrict-aliasing |
0.13 |
0.00 |
0.00 |
0.00 |
-0.90 |
4.49 |
1.37 |
5.49 |
0.00 |
4.71 |
1.73
|
full sized loads and moves |
-2.25 |
0.26 |
3.31 |
1.16 |
4.29 |
2.40 |
2.92 |
0.86 |
2.25 |
0.89 |
1.57
|
-fschedule-insns2 |
0.13 |
0.12 |
13.06 |
0.57 |
-9.93 |
1.53 |
-0.68 |
5.49 |
3.71 |
1.58 |
1.41
|
-ftracer |
0.27 |
0.00 |
-0.19 |
-0.19 |
-2.85 |
0.97 |
1.79 |
1.10 |
0.00 |
0.34 |
0.15
|
-mred-zone |
-0.95 |
0.00 |
-0.19 |
1.15 |
-2.32 |
0.13 |
1.09 |
0.00 |
0.00 |
0.00 |
-0.16
|
prologue using move |
-1.53 |
0.13 |
-0.18 |
-0.20 |
0.91 |
-0.84 |
-0.14 |
0.00 |
0.00 |
-0.18 |
-0.16
|
-frename-registers |
0.00 |
0.00 |
4.52 |
-0.76 |
-12.07 |
1.83 |
3.21 |
1.84 |
1.39 |
-1.03 |
-0.31
|
-fbranch-probabilities |
-1.61 |
0.00 |
-0.37 |
-0.57 |
7.36 |
-0.83 |
-0.14 |
-0.49 |
0.83 |
-4.16 |
-0.32
|
-fomit-frame-pointer |
-1.08 |
0.00 |
0.54 |
0.95 |
-11.17 |
-0.69 |
0.68 |
0.85 |
0.00 |
1.94 |
-0.62
|
-finline-functions |
0.00 |
0.12 |
-0.19 |
0.00 |
-12.12 |
2.97 |
1.23 |
0.36 |
-0.28 |
0.00 |
-0.77
|
-maccumulate- |
3.20 |
-0.13 |
0.00 |
-0.19 |
-9.94 |
-0.70 |
0.40 |
-0.13 |
-0.28 |
0.00 |
-0.78
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-freorder-blocks |
1.08 |
0.00 |
-0.19 |
-0.19 |
-11.27 |
1.11 |
0.13 |
1.72 |
0.00 |
0.00 |
-0.78
|
-funroll-loops |
-2.43 |
-0.13 |
0.00 |
1.34 |
-11.02 |
0.83 |
0.54 |
3.25 |
0.00 |
0.34 |
-0.78
|
-foptimize |
-1.20 |
0.00 |
-0.37 |
0.00 |
-13.20 |
0.97 |
-0.28 |
-0.49 |
0.00 |
0.34 |
-1.23
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
-fstrength-reduce |
-1.85 |
0.00 |
-0.37 |
5.20 |
-13.15 |
-0.14 |
0.95 |
-0.85 |
1.39 |
-2.04 |
-1.23
|
-funit-at-a-time |
-0.96 |
0.12 |
-0.19 |
-0.19 |
-11.26 |
0.00 |
1.09 |
0.00 |
0.00 |
0.00 |
-1.24
|
-fgcse |
-1.46 |
-0.39 |
-7.52 |
-4.36 |
-12.53 |
1.26 |
0.40 |
-0.13 |
-1.63 |
-3.19 |
-3.02
|
-ffast-math |
-2.01 |
0.00 |
-0.19 |
1.13 |
14.99 |
-0.70 |
2.16 |
1.45 |
-0.83 |
2.94 |
1.86
|
-fpeel-loops |
9.94 |
0.00 |
-0.19 |
0.18 |
0.00 |
-0.83 |
-1.22 |
0.00 |
0.00 |
-0.18 |
0.62
|
-funroll-all-loops |
-0.41 |
0.12 |
0.00 |
-0.19 |
0.00 |
0.98 |
-1.49 |
-0.13 |
0.00 |
0.17 |
-0.16
|
-fpic |
5.42 |
-0.13 |
0.00 |
-0.95 |
14.84 |
0.55 |
-1.76 |
0.00 |
-20.67 |
-0.18 |
-0.63
|
all prologue using move |
-5.90 |
0.00 |
-0.89 |
-0.39 |
0.20 |
-0.28 |
0.54 |
-0.62 |
0.00 |
0.17 |
-0.78
|
-cmodel=medium |
-0.54 |
-0.13 |
-0.55 |
-1.71 |
9.68 |
-3.19 |
-1.76 |
-3.88 |
-16.53 |
-1.22 |
-2.01
|
File size (relative increase of the size of stripped binaries in percents):
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
total |
aggressive optimization |
-16.48 |
-15.91 |
-34.31 |
-57.92 |
-33.11 |
8.36 |
-29.40 |
-26.61 |
-36.44 |
-25.42 |
-34.22
|
-fbranch-probabilities |
0.55 |
-8.26 |
-2.73 |
-3.79 |
-12.90 |
-10.98 |
-9.59 |
-7.97 |
-4.00 |
-7.95 |
-7.22
|
-maccumulate- |
-1.93 |
-0.62 |
-1.78 |
-0.78 |
-3.49 |
-0.97 |
-0.99 |
-1.92 |
-0.80 |
-1.19 |
-1.67
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-mred-zone |
0.00 |
-0.21 |
-0.37 |
-2.03 |
-0.77 |
-0.13 |
-0.13 |
-0.03 |
-0.01 |
-0.30 |
-0.30
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
-0.75 |
6.80 |
-10.04 |
0.00 |
0.00 |
-0.18 |
-0.27
|
-fgcse |
0.00 |
-8.64 |
-4.00 |
-10.19 |
-0.74 |
1.91 |
-0.38 |
0.00 |
1.70 |
-3.61 |
-0.07
|
-fschedule-insns2 |
0.00 |
0.00 |
0.00 |
0.00 |
-0.10 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
-0.03
|
prologue using move |
-0.09 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.06 |
0.00
|
-foptimize |
0.00 |
0.00 |
-0.37 |
0.00 |
-0.18 |
0.00 |
0.00 |
0.00 |
0.36 |
0.10 |
0.13
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
full sized loads and moves |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.12 |
0.00 |
0.34 |
0.08 |
0.17
|
-freorder-blocks |
0.00 |
0.00 |
0.00 |
0.00 |
0.03 |
0.24 |
0.49 |
0.00 |
0.42 |
-0.09 |
0.21
|
-funit-at-a-time |
0.00 |
0.00 |
0.00 |
0.00 |
0.11 |
0.12 |
4.67 |
1.85 |
0.00 |
0.00 |
0.23
|
-fomit-frame-pointer |
8.70 |
0.82 |
0.91 |
-1.92 |
-0.51 |
-0.73 |
-0.38 |
0.51 |
0.40 |
5.13 |
0.57
|
-fstrength-reduce |
0.00 |
0.00 |
0.18 |
-0.51 |
0.03 |
0.00 |
0.12 |
0.00 |
1.20 |
0.12 |
0.59
|
partial SSE moves |
0.00 |
0.20 |
0.18 |
0.39 |
0.77 |
0.60 |
0.00 |
0.00 |
0.82 |
0.23 |
0.65
|
-ftracer |
11.68 |
0.41 |
-1.26 |
0.00 |
0.03 |
0.36 |
0.87 |
5.54 |
0.00 |
0.92 |
0.70
|
-funroll-loops |
10.37 |
14.33 |
2.03 |
2.81 |
0.03 |
6.59 |
3.06 |
2.39 |
0.35 |
2.96 |
1.09
|
-fbranch-probabilities |
12.12 |
15.26 |
2.69 |
3.25 |
0.02 |
19.33 |
5.59 |
8.65 |
0.43 |
4.67 |
1.92
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
-frename-registers |
8.99 |
0.82 |
0.54 |
2.99 |
2.38 |
1.85 |
1.76 |
2.69 |
2.57 |
1.58 |
2.53
|
-finline-functions |
0.00 |
0.00 |
0.00 |
0.00 |
5.92 |
18.22 |
4.94 |
2.41 |
1.27 |
1.84 |
2.75
|
-mfpmath=sse |
8.70 |
2.96 |
2.03 |
8.08 |
-0.75 |
6.59 |
3.99 |
5.54 |
5.28 |
5.13 |
3.72
|
-m64 |
45.40 |
201.01 |
156.05 |
26.51 |
17.41 |
39.81 |
27.06 |
23.41 |
28.79 |
38.22 |
28.68
|
-ffast-math |
0.00 |
-0.83 |
0.00 |
0.94 |
-0.85 |
-6.44 |
-4.84 |
-8.23 |
0.40 |
-0.18 |
-0.81
|
-funroll-all-loops |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.24 |
0.61 |
0.00 |
0.00 |
0.00 |
0.01
|
-fpeel-loops |
0.00 |
0.00 |
0.00 |
1.39 |
0.00 |
0.36 |
1.36 |
0.00 |
0.00 |
0.12 |
0.07
|
all prologue using move |
-0.49 |
8.82 |
1.79 |
1.28 |
2.22 |
8.41 |
0.99 |
2.15 |
0.36 |
4.23 |
1.49
|
-fpic |
0.65 |
-6.38 |
2.35 |
1.11 |
5.32 |
-3.71 |
13.13 |
2.23 |
6.58 |
3.47 |
5.21
|
-mcmodel=medium |
0.00 |
9.45 |
2.17 |
7.98 |
5.43 |
10.44 |
11.27 |
5.24 |
23.48 |
6.72 |
14.49
|
Performance (relative speedups in percents):
Table:
32-bit SPECint 2000 with Aggressive Optimization (AMD Opteron)
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
avg |
|
1.06 |
-0.14 |
0.42 |
0.69 |
0.11 |
0.00 |
-0.13 |
0.20 |
-0.28 |
0.85 |
0.71 |
3.52 |
0.75
|
aggressive optimization |
96.74 |
76.81 |
73.11 |
14.74 |
56.38 |
83.61 |
349.45 |
111.06 |
98.34 |
71.82 |
122.25 |
67.09 |
89.12
|
-march=i386 to k8 |
5.23 |
8.41 |
3.45 |
0.17 |
9.02 |
6.80 |
82.00 |
-0.52 |
0.41 |
14.78 |
2.45 |
8.52 |
10.08
|
-fbranch-probabilities |
8.34 |
2.37 |
12.33 |
1.40 |
4.25 |
7.49 |
17.57 |
14.35 |
8.99 |
12.75 |
6.47 |
0.87 |
7.37
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fbranch- |
2.94 |
0.41 |
10.33 |
0.17 |
2.91 |
5.43 |
0.61 |
8.82 |
2.41 |
8.26 |
6.45 |
0.77 |
3.89
|
probabilities |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fomit-frame-pointer |
8.64 |
1.36 |
0.84 |
0.17 |
2.26 |
6.51 |
0.73 |
0.41 |
4.58 |
2.66 |
6.25 |
3.78 |
3.26
|
-fgcse |
1.99 |
1.52 |
-2.27 |
-0.69 |
0.57 |
-4.36 |
5.14 |
8.00 |
2.67 |
2.93 |
1.86 |
2.98 |
1.77
|
-finline-functions |
0.90 |
1.96 |
0.00 |
2.84 |
2.91 |
6.62 |
1.86 |
0.82 |
1.41 |
3.34 |
1.87 |
1.78 |
2.17
|
-ftracer |
0.15 |
1.94 |
4.58 |
-0.52 |
-0.34 |
-2.23 |
3.94 |
9.70 |
0.13 |
1.74 |
3.05 |
0.77 |
1.78
|
-fschedule-insns2 |
2.30 |
2.22 |
2.47 |
-0.35 |
2.32 |
0.15 |
0.12 |
1.87 |
-0.69 |
2.04 |
1.73 |
2.70 |
1.52
|
-funit-at-a-time |
-0.60 |
8.91 |
3.47 |
-0.18 |
2.55 |
-1.50 |
0.12 |
7.50 |
-1.10 |
1.83 |
0.28 |
-0.67 |
1.39
|
-freorder-blocks |
1.99 |
0.68 |
7.88 |
-0.87 |
3.52 |
0.76 |
-0.37 |
1.24 |
-0.83 |
2.23 |
2.01 |
-1.00 |
1.26
|
-funroll-loops |
-0.31 |
-0.55 |
0.00 |
0.34 |
0.22 |
-1.79 |
6.77 |
0.72 |
0.69 |
2.71 |
1.14 |
3.53 |
1.25
|
-march=ppro to k8 |
5.91 |
-1.89 |
2.37 |
0.34 |
0.45 |
-4.22 |
2.63 |
0.30 |
1.11 |
0.38 |
2.75 |
2.60 |
1.13
|
-maccumulate- |
0.60 |
-0.28 |
0.53 |
0.00 |
2.67 |
-2.08 |
5.95 |
2.62 |
0.27 |
4.06 |
1.00 |
-2.15 |
0.88
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-frename-registers |
-0.30 |
1.65 |
-0.94 |
-1.04 |
0.68 |
-2.67 |
-1.57 |
0.00 |
-0.14 |
2.74 |
0.85 |
5.49 |
0.75
|
-foptimize |
-0.16 |
0.27 |
2.24 |
0.34 |
-0.34 |
-1.93 |
-1.21 |
-0.11 |
0.69 |
1.93 |
0.56 |
0.11 |
0.25
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fstrict-aliasing |
1.07 |
-1.37 |
0.21 |
1.39 |
-0.12 |
0.00 |
0.12 |
0.10 |
0.55 |
0.09 |
0.71 |
0.55 |
0.25
|
-fstrength-reduce |
-0.16 |
0.54 |
-0.53 |
-1.04 |
0.57 |
-2.51 |
0.12 |
0.00 |
-1.10 |
-1.14 |
0.28 |
1.10 |
-0.25
|
-funroll-all-loops |
3.10 |
-0.28 |
0.31 |
-0.87 |
0.11 |
2.73 |
0.49 |
2.98 |
0.68 |
1.14 |
-0.15 |
1.98 |
1.00
|
-mfpmath=sse |
1.83 |
2.32 |
1.28 |
-1.38 |
0.11 |
0.45 |
0.36 |
0.51 |
1.39 |
0.94 |
0.85 |
0.32 |
0.75
|
-ffast-math |
-0.31 |
1.09 |
0.63 |
0.34 |
-0.46 |
0.15 |
0.12 |
0.72 |
0.55 |
0.86 |
0.42 |
0.44 |
0.50
|
-fpeel-loops |
2.29 |
0.00 |
-0.32 |
-0.52 |
0.90 |
3.17 |
0.00 |
0.10 |
-3.43 |
-0.29 |
0.70 |
-1.20 |
0.00
|
-fpic |
-20.49 |
-5.64 |
-17.55 |
-3.28 |
-29.60 |
-28.19 |
-10.27 |
-29.75 |
-23.00 |
-35.03 |
-25.65 |
-17.66 |
-20.81
|
File size (relative increase of the size of stripped binaries in percents):
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
total |
aggressive optimization |
-18.85 |
-6.25 |
3.51 |
-21.10 |
2.34 |
33.46 |
-4.21 |
-6.83 |
-22.83 |
-2.91 |
33.80 |
-22.33 |
-4.05
|
-fbranch-probabilities |
-14.82 |
-8.93 |
-5.82 |
0.67 |
-1.96 |
-3.46 |
-5.89 |
-7.95 |
-14.56 |
-3.10 |
-11.81 |
-10.11 |
-6.87
|
-fgcse |
1.21 |
-1.15 |
-1.23 |
0.00 |
2.31 |
-0.93 |
0.20 |
0.52 |
0.21 |
0.51 |
-1.59 |
-1.60 |
-0.28
|
-foptimize |
0.07 |
0.11 |
0.09 |
0.00 |
0.07 |
0.00 |
-1.44 |
0.05 |
0.01 |
-0.03 |
-1.18 |
-0.02 |
-0.14
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00
|
-fstrength-reduce |
0.21 |
0.09 |
0.02 |
0.00 |
0.05 |
-0.29 |
0.03 |
0.09 |
0.12 |
0.00 |
0.00 |
-0.19 |
0.02
|
-fschedule-insns2 |
-0.15 |
0.21 |
-0.07 |
0.00 |
-0.07 |
0.00 |
1.63 |
-0.02 |
-0.04 |
-0.01 |
0.00 |
0.03 |
0.15
|
-march=ppro to k8 |
-2.15 |
1.33 |
-0.40 |
0.00 |
-0.36 |
0.00 |
5.56 |
-0.29 |
-0.49 |
0.10 |
-1.18 |
0.31 |
0.40
|
-funroll-loops |
3.06 |
0.81 |
0.32 |
0.00 |
1.16 |
2.91 |
0.08 |
0.21 |
0.88 |
0.08 |
2.31 |
0.31 |
0.48
|
-frename-registers |
0.49 |
0.48 |
0.52 |
0.00 |
0.51 |
0.00 |
1.42 |
0.81 |
0.22 |
0.10 |
1.02 |
0.31 |
0.55
|
-freorder-blocks |
-0.08 |
-0.06 |
1.22 |
0.00 |
0.50 |
-0.03 |
0.17 |
0.82 |
0.29 |
0.10 |
0.53 |
0.22 |
0.62
|
-fomit-frame-pointer |
-1.77 |
2.89 |
0.39 |
0.00 |
-0.14 |
0.77 |
4.52 |
-0.79 |
0.17 |
2.38 |
-2.80 |
-0.11 |
0.95
|
-ftracer |
0.00 |
1.33 |
1.78 |
0.00 |
4.56 |
2.91 |
0.31 |
2.07 |
1.71 |
2.56 |
0.29 |
0.31 |
1.80
|
-fbranch-probabilities |
6.98 |
3.72 |
6.73 |
0.67 |
9.29 |
9.37 |
-0.26 |
4.48 |
3.81 |
4.67 |
6.41 |
2.35 |
4.93
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
-maccumulate- |
1.29 |
6.40 |
6.00 |
0.00 |
1.95 |
2.47 |
0.38 |
2.07 |
4.64 |
19.88 |
3.13 |
4.36 |
5.87
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
|
|
-funit-at-a-time |
-11.69 |
6.01 |
13.64 |
0.00 |
2.27 |
6.00 |
0.00 |
4.45 |
7.07 |
2.65 |
6.53 |
1.86 |
6.58
|
-march=i386 to k8 |
1.43 |
9.46 |
9.78 |
0.00 |
3.65 |
6.00 |
8.00 |
4.13 |
6.70 |
21.21 |
4.02 |
8.63 |
9.24
|
-finline-functions |
10.90 |
8.91 |
28.84 |
0.00 |
3.79 |
39.55 |
0.16 |
13.26 |
10.95 |
4.65 |
50.44 |
2.30 |
14.46
|
-ffast-math |
0.00 |
-0.79 |
0.01 |
0.00 |
-0.02 |
0.00 |
0.00 |
-0.13 |
0.00 |
-1.23 |
0.00 |
-0.06 |
-0.21
|
-funroll-all-loops |
0.00 |
0.25 |
0.05 |
0.00 |
0.07 |
2.83 |
0.00 |
0.03 |
0.07 |
0.03 |
1.19 |
0.21 |
0.15
|
-fpeel-loops |
2.19 |
1.15 |
0.39 |
0.00 |
2.81 |
6.13 |
0.00 |
0.21 |
0.88 |
0.02 |
1.25 |
1.61 |
0.72
|
-fpic |
12.59 |
6.19 |
-4.89 |
0.00 |
14.80 |
-27.60 |
10.58 |
4.43 |
1.15 |
1.35 |
-21.21 |
9.83 |
0.84
|
-mfpmath=sse |
-0.08 |
1.15 |
-0.03 |
0.00 |
-0.06 |
0.00 |
10.10 |
0.17 |
0.00 |
0.00 |
1.19 |
-1.80 |
1.13
|
Performance (relative speedups in percents):
Table:
32-bit SPECfp 2000 with Aggressive Optimization (AMD Opteron)
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
avg |
|
0.13 |
0.00 |
0.00 |
-0.21 |
0.28 |
2.57 |
-0.14 |
0.00 |
6.00 |
0.00 |
0.72
|
aggressive optimization |
77.83 |
27.22 |
445.45 |
148.97 |
56.22 |
-30.46 |
92.25 |
101.18 |
122.37 |
156.08 |
98.56
|
-march=i386 to k8 |
6.02 |
0.00 |
2.53 |
3.17 |
13.31 |
1.54 |
-0.65 |
1.49 |
-3.05 |
2.11 |
2.41
|
-fbranch-probabilities |
3.49 |
0.39 |
4.74 |
4.28 |
0.72 |
1.81 |
-1.42 |
7.93 |
-2.16 |
0.20 |
1.66
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
-fomit-frame-pointer |
-0.14 |
0.12 |
3.49 |
2.25 |
9.32 |
1.02 |
0.38 |
0.29 |
0.00 |
1.03 |
1.63
|
-march=ppro to k8 |
8.34 |
0.00 |
0.00 |
-0.82 |
10.41 |
-1.50 |
0.26 |
-0.59 |
-0.94 |
-0.62 |
1.10
|
-fstrength-reduce |
10.13 |
-0.26 |
1.46 |
1.03 |
-8.02 |
-1.54 |
0.13 |
0.89 |
-0.32 |
3.64 |
0.91
|
-funroll-loops |
3.93 |
0.00 |
0.00 |
0.61 |
-7.65 |
1.81 |
0.52 |
4.62 |
0.95 |
-0.21 |
0.36
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
-1.27 |
-0.13 |
0.14 |
0.00 |
0.00 |
0.00
|
-frename-registers |
0.81 |
0.12 |
-0.62 |
0.00 |
-5.69 |
-0.52 |
1.98 |
-0.15 |
0.63 |
0.62 |
-0.19
|
-funit-at-a-time |
0.13 |
0.00 |
0.00 |
0.00 |
-5.75 |
0.25 |
2.25 |
0.29 |
0.00 |
0.00 |
-0.19
|
-ftracer |
1.65 |
0.00 |
0.00 |
0.00 |
-6.54 |
0.51 |
0.39 |
2.26 |
-0.32 |
-0.82 |
-0.37
|
-finline-functions |
0.00 |
0.00 |
0.00 |
0.00 |
-7.14 |
3.70 |
1.85 |
-0.15 |
0.00 |
-0.21 |
-0.37
|
-maccumulate- |
2.20 |
0.00 |
0.20 |
0.20 |
-6.37 |
-0.76 |
-0.40 |
0.00 |
0.00 |
0.41 |
-0.37
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-foptimize |
-0.27 |
0.00 |
0.00 |
0.00 |
-6.44 |
2.84 |
0.00 |
0.14 |
-0.32 |
0.00 |
-0.37
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
-0.54 |
0.13 |
1.04 |
2.72 |
-6.49 |
-0.26 |
-1.67 |
1.34 |
-6.48 |
1.04 |
-0.72
|
-freorder-blocks |
0.68 |
-0.13 |
0.20 |
0.00 |
-4.78 |
-1.52 |
-0.13 |
1.04 |
-1.55 |
-0.62 |
-0.73
|
-fbranch-probabilities |
1.78 |
0.00 |
-0.21 |
-2.80 |
0.00 |
-2.53 |
0.26 |
-1.17 |
-0.63 |
-2.23 |
-0.91
|
-fgcse |
2.21 |
-0.39 |
0.20 |
-2.40 |
-3.99 |
2.02 |
-0.13 |
-0.59 |
-10.68 |
0.20 |
-1.43
|
-mfpmath=sse |
2.43 |
0.25 |
3.29 |
-0.21 |
12.53 |
97.20 |
-0.14 |
1.47 |
13.20 |
3.30 |
10.14
|
-ffast-math |
1.21 |
0.25 |
0.00 |
2.04 |
3.13 |
-0.26 |
3.89 |
0.58 |
-0.95 |
3.09 |
1.44
|
-fpeel-loops |
3.78 |
0.00 |
0.00 |
2.25 |
0.00 |
0.51 |
-0.26 |
0.00 |
0.00 |
0.00 |
0.54
|
-funroll-all-loops |
0.00 |
0.12 |
0.00 |
0.00 |
0.00 |
-2.54 |
-0.26 |
0.14 |
0.00 |
0.00 |
-0.19
|
-fpic |
-5.15 |
0.25 |
-3.72 |
3.46 |
-0.43 |
-1.31 |
-10.15 |
-2.36 |
-11.64 |
-1.45 |
-3.10
|
File size (relative increase of the size of stripped binaries in percents):
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
sixtrack |
apsi |
total |
aggressive optimization |
-3.88 |
-1.94 |
-20.88 |
-25.85 |
-23.54 |
14.89 |
-16.01 |
-17.99 |
-17.79 |
-11.69 |
-18.60
|
-fbranch-probabilities |
0.24 |
-2.71 |
0.69 |
-4.31 |
-14.27 |
-7.93 |
-4.87 |
-11.72 |
-4.35 |
-7.09 |
-7.78
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00
|
-march=ppro to -march=k8 |
0.00 |
0.53 |
0.00 |
-4.45 |
1.81 |
0.77 |
0.10 |
0.41 |
-0.60 |
0.00 |
0.04
|
-funit-at-a-time |
0.00 |
0.00 |
0.00 |
0.00 |
0.24 |
0.00 |
3.26 |
0.43 |
0.00 |
0.00 |
0.14
|
-freorder-blocks |
0.00 |
0.00 |
0.00 |
0.04 |
0.15 |
-0.12 |
0.43 |
0.31 |
0.28 |
0.00 |
0.20
|
-foptimize |
0.00 |
0.00 |
0.00 |
0.00 |
0.26 |
0.00 |
0.00 |
0.02 |
0.23 |
1.56 |
0.30
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
-frename-registers |
0.00 |
0.26 |
0.00 |
0.04 |
0.25 |
0.33 |
0.65 |
0.02 |
0.61 |
0.00 |
0.38
|
-ftracer |
7.98 |
0.00 |
0.00 |
0.00 |
0.07 |
0.44 |
0.76 |
5.35 |
0.00 |
1.44 |
0.66
|
-funroll-loops |
5.15 |
6.26 |
0.00 |
1.15 |
0.06 |
7.76 |
1.21 |
0.43 |
0.07 |
2.48 |
0.57
|
-fgcse |
-1.84 |
3.89 |
0.00 |
-4.45 |
-0.79 |
0.11 |
0.54 |
0.09 |
2.85 |
-3.17 |
0.76
|
-fbranch-probabilities |
10.49 |
6.75 |
0.69 |
1.60 |
-0.55 |
11.19 |
2.58 |
7.22 |
0.63 |
3.16 |
1.38
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
0.00 |
0.81 |
0.00 |
1.44 |
0.63 |
0.66 |
1.32 |
3.68 |
2.53 |
2.07 |
1.90
|
-fomit-frame-pointer |
2.10 |
1.60 |
0.00 |
4.64 |
2.24 |
0.00 |
0.54 |
4.52 |
1.22 |
9.28 |
2.41
|
-fstrength-reduce |
0.00 |
-1.33 |
0.00 |
31.50 |
-0.04 |
-2.69 |
-0.22 |
-0.54 |
4.37 |
3.07 |
3.14
|
-finline-functions |
0.00 |
0.00 |
0.00 |
0.00 |
6.28 |
13.54 |
6.74 |
1.95 |
1.85 |
3.97 |
3.23
|
-march=i386 to -march=k8 |
7.17 |
-4.61 |
0.00 |
1.44 |
6.05 |
0.55 |
0.87 |
-0.68 |
4.19 |
8.23 |
4.35
|
-maccumulate- |
7.52 |
1.91 |
0.00 |
0.71 |
3.53 |
1.23 |
1.77 |
0.43 |
6.49 |
9.36 |
5.03
|
outgoing-args |
|
|
|
|
|
|
|
|
|
|
|
-ffast-math |
0.00 |
-0.81 |
0.00 |
0.23 |
-1.37 |
-31.41 |
-31.50 |
-6.71 |
-0.07 |
-0.78 |
-1.89
|
-funroll-all-loops |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.11 |
0.65 |
0.00 |
0.00 |
0.00 |
0.01
|
-fpeel-loops |
0.77 |
0.00 |
0.00 |
0.42 |
0.00 |
0.22 |
1.19 |
0.00 |
0.06 |
0.00 |
0.08
|
-fpic |
4.90 |
-6.17 |
0.00 |
-25.58 |
9.63 |
-3.10 |
2.72 |
5.98 |
7.44 |
-0.10 |
5.74
|
-mfpmath=sse |
4.04 |
7.23 |
0.00 |
10.72 |
2.53 |
7.29 |
8.28 |
15.12 |
8.83 |
6.81 |
7.33
|
Performance (relative speedups in percents):
Table:
64-bit SPECint 2000 with Aggressive Optimization (DEC Alpha EV56/600Mhz)
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
avg |
|
0.00 |
-0.66 |
0.71 |
0.00 |
1.63 |
0.00 |
0.60 |
0.00 |
8.02 |
5.84 |
-0.55 |
4.72 |
1.96
|
aggressive optimization |
143.98 |
77.03 |
73.26 |
16.94 |
105.84 |
141.75 |
505.83 |
119.81 |
128.84 |
94.27 |
180.89 |
71.33 |
115.27
|
-fschedule-insns2 |
16.23 |
10.00 |
1.51 |
2.20 |
11.18 |
2.75 |
20.56 |
8.84 |
2.87 |
3.78 |
15.38 |
5.63 |
8.08
|
-fschedule-insns |
|
|
|
|
|
|
|
|
|
|
|
|
|
-funit-at-a-time |
1.42 |
2.63 |
3.73 |
3.67 |
-2.83 |
28.33 |
18.57 |
16.66 |
0.00 |
16.42 |
3.52 |
5.51 |
7.63
|
-finline-functions |
5.18 |
2.63 |
2.18 |
1.47 |
14.63 |
31.62 |
1.19 |
8.33 |
0.00 |
22.13 |
4.73 |
2.70 |
6.84
|
-fbranch-probabilities |
1.45 |
7.58 |
6.06 |
2.22 |
15.47 |
27.50 |
5.03 |
14.00 |
2.08 |
-3.48 |
0.00 |
0.65 |
6.16
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fbranch-probabilities |
9.30 |
2.66 |
6.81 |
5.97 |
-4.33 |
29.66 |
-1.18 |
9.93 |
4.22 |
17.51 |
2.31 |
-0.66 |
5.44
|
-fschedule-insns2 |
7.87 |
6.16 |
3.75 |
0.72 |
7.69 |
-0.89 |
7.84 |
7.38 |
1.42 |
3.14 |
7.89 |
5.63 |
5.03
|
-fomit-frame-pointer |
0.00 |
0.00 |
2.94 |
0.00 |
5.34 |
2.01 |
5.76 |
7.69 |
3.52 |
3.18 |
5.26 |
1.33 |
2.63
|
-freorder-blocks |
0.71 |
0.00 |
2.15 |
0.00 |
14.10 |
1.31 |
-6.94 |
5.62 |
3.52 |
4.48 |
2.22 |
2.64 |
2.63
|
-fgcse |
5.42 |
0.00 |
1.41 |
0.72 |
-1.02 |
-0.65 |
14.86 |
1.19 |
2.09 |
2.54 |
2.82 |
-0.66 |
1.94
|
-fif-conversion |
2.96 |
5.47 |
0.00 |
2.20 |
4.97 |
0.65 |
13.15 |
0.00 |
2.08 |
3.20 |
-0.56 |
1.31 |
2.61
|
-fstrength-reduce |
-3.53 |
-1.28 |
1.44 |
2.18 |
-3.30 |
-0.65 |
22.30 |
-2.96 |
2.08 |
-1.87 |
2.27 |
4.08 |
1.97
|
-funroll-loops |
-1.42 |
0.00 |
2.18 |
0.00 |
22.29 |
0.00 |
3.65 |
-0.60 |
-1.37 |
1.87 |
0.00 |
-3.88 |
1.30
|
-fstrict-aliasing |
-2.88 |
4.08 |
-0.71 |
0.73 |
2.13 |
8.45 |
-16.97 |
4.34 |
4.25 |
3.16 |
4.59 |
0.65 |
0.65
|
-frename-registers |
0.71 |
0.64 |
0.71 |
-0.72 |
5.40 |
0.66 |
5.73 |
2.40 |
0.68 |
-12.50 |
-1.11 |
3.44 |
0.65
|
-foptimize |
-2.12 |
0.00 |
0.71 |
-1.42 |
-14.80 |
-0.65 |
2.42 |
-2.49 |
0.68 |
1.86 |
1.11 |
-0.65 |
-1.28
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-ftracer |
0.00 |
-4.55 |
0.00 |
-2.16 |
-12.07 |
-0.65 |
3.06 |
-2.95 |
0.00 |
1.25 |
1.11 |
-7.10 |
-2.59
|
-ffast-math |
-1.44 |
-3.73 |
-2.12 |
2.18 |
7.65 |
0.00 |
-1.78 |
-0.59 |
1.37 |
8.60 |
-0.55 |
1.32 |
1.29
|
-funroll-all-loops |
0.70 |
-0.65 |
-2.78 |
0.72 |
2.59 |
1.30 |
5.16 |
4.40 |
0.00 |
-3.04 |
0.55 |
-3.25 |
0.00
|
-fpeel-loops |
0.00 |
3.28 |
-0.71 |
-1.43 |
4.44 |
1.30 |
-3.51 |
-2.36 |
0.00 |
0.61 |
0.00 |
-1.30 |
0.00
|
-fold-unroll-loops |
0.00 |
0.64 |
0.00 |
0.72 |
-4.62 |
1.31 |
10.71 |
3.03 |
-1.37 |
-2.54 |
-6.63 |
0.00 |
0.00
|
-fpic |
0.00 |
-2.64 |
0.00 |
0.73 |
-13.23 |
3.63 |
-4.10 |
-0.65 |
-1.40 |
-3.71 |
5.48 |
-2.65 |
-2.05
|
File size (relative increase of the size of stripped binaries in percents):
options |
gzip |
vpr |
gcc |
mcf |
crafty |
parser |
eon |
perl |
gap |
vortex |
bzip2 |
twolf |
total |
aggressive optimization |
-38.22 |
-29.20 |
-9.28 |
-42.75 |
-28.90 |
5.66 |
-49.91 |
-12.38 |
-36.23 |
-17.64 |
-3.00 |
-39.40 |
-22.85
|
-fbranch-probabilities |
-10.66 |
-1.50 |
-2.43 |
0.79 |
-0.71 |
2.11 |
-4.12 |
-6.17 |
0.00 |
-3.29 |
-9.80 |
-5.73 |
-3.09
|
-fomit-frame-pointer |
-10.98 |
-3.61 |
-1.53 |
0.00 |
-1.19 |
-3.23 |
-7.01 |
-2.35 |
-2.88 |
-2.10 |
-1.09 |
-3.01 |
-2.64
|
-fgcse |
-0.25 |
-1.53 |
-1.07 |
0.00 |
-0.87 |
-1.56 |
-1.29 |
-0.48 |
0.08 |
0.01 |
-10.13 |
0.00 |
-0.84
|
-fstrict-aliasing |
0.03 |
-1.22 |
0.00 |
0.00 |
-0.07 |
-0.28 |
0.26 |
-0.20 |
-0.53 |
-0.26 |
0.00 |
-3.01 |
-0.28
|
-freorder-blocks |
-0.04 |
0.01 |
0.31 |
0.00 |
-0.43 |
0.01 |
-1.35 |
-0.23 |
-0.27 |
0.00 |
0.00 |
0.00 |
-0.09
|
-foptimize |
0.06 |
-0.01 |
0.23 |
0.00 |
0.00 |
0.01 |
-1.26 |
-0.04 |
0.10 |
0.00 |
0.00 |
0.00 |
-0.02
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
|
|
|
-frename-registers |
0.06 |
-0.09 |
0.00 |
0.00 |
-0.10 |
0.02 |
0.08 |
-0.09 |
0.01 |
-0.03 |
0.00 |
0.00 |
-0.02
|
-fif-conversion |
-0.10 |
-0.19 |
0.28 |
0.00 |
0.15 |
-0.21 |
-1.31 |
0.05 |
0.04 |
0.00 |
0.00 |
0.00 |
-0.01
|
-fstrength-reduce |
0.06 |
0.33 |
0.00 |
0.00 |
0.23 |
-0.48 |
0.05 |
0.06 |
0.20 |
0.01 |
0.01 |
0.00 |
0.04
|
-funroll-loops |
0.06 |
0.00 |
0.27 |
0.00 |
0.12 |
0.34 |
0.00 |
0.05 |
0.00 |
0.00 |
0.00 |
0.00 |
0.12
|
-ftracer |
0.04 |
0.63 |
2.22 |
0.00 |
3.66 |
5.31 |
0.11 |
2.09 |
-0.11 |
3.25 |
0.00 |
3.19 |
1.99
|
-funit-at-a-time |
-20.22 |
0.29 |
9.22 |
0.83 |
1.09 |
6.54 |
-4.12 |
4.22 |
0.00 |
-1.08 |
0.30 |
-2.99 |
3.12
|
-fbranch-probabilities |
0.46 |
4.61 |
5.48 |
0.79 |
5.40 |
6.52 |
0.20 |
4.37 |
0.06 |
4.34 |
0.42 |
3.34 |
3.90
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
0.00 |
4.24 |
4.73 |
0.00 |
3.87 |
0.00 |
5.63 |
3.53 |
4.29 |
3.47 |
0.00 |
3.41 |
4.06
|
-fschedule-insns2 |
0.00 |
4.42 |
5.01 |
0.00 |
3.87 |
0.00 |
7.14 |
4.76 |
5.25 |
4.69 |
0.00 |
3.41 |
4.76
|
-fschedule-insns |
|
|
|
|
|
|
|
|
|
|
|
|
|
-finline-functions |
0.47 |
8.20 |
23.93 |
0.79 |
3.89 |
43.62 |
-4.17 |
14.35 |
0.00 |
2.22 |
52.11 |
-2.89 |
11.68
|
-ffast-math |
-0.31 |
-0.09 |
-0.01 |
-0.40 |
-0.06 |
-0.12 |
-0.04 |
-0.01 |
-0.07 |
-0.04 |
-0.12 |
-0.03 |
-0.04
|
-funroll-all-loops |
0.99 |
0.31 |
0.00 |
0.00 |
0.43 |
2.29 |
0.00 |
0.11 |
0.00 |
0.02 |
0.00 |
0.00 |
0.13
|
-fpeel-loops |
12.32 |
0.57 |
0.03 |
0.00 |
2.11 |
6.22 |
0.00 |
0.18 |
0.00 |
0.04 |
0.22 |
0.00 |
0.49
|
-fpic |
-1.53 |
1.09 |
0.12 |
0.39 |
1.78 |
5.18 |
2.52 |
1.25 |
1.35 |
0.21 |
1.28 |
0.80 |
0.92
|
-fold-unroll-loops |
12.39 |
8.85 |
-1.48 |
0.00 |
5.54 |
5.61 |
2.90 |
2.75 |
13.59 |
0.00 |
11.26 |
9.30 |
2.83
|
Performance (relative speedups in percents):
Table:
64-bit SPECfp 2000 with Aggressive Optimization (DEC Alpha EV56/600Mhz)
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
apsi |
avg |
|
0.00 |
-0.75 |
-0.21 |
0.00 |
0.93 |
0.00 |
0.83 |
-0.84 |
-1.76 |
0.00
|
-fschedule-insns2 |
14.49 |
10.74 |
50.22 |
17.06 |
28.57 |
7.60 |
17.08 |
24.61 |
25.41 |
21.69
|
-fschedule-insns |
|
|
|
|
|
|
|
|
|
|
-fschedule-insns2 |
1.93 |
0.00 |
0.92 |
3.25 |
34.50 |
7.73 |
4.67 |
5.26 |
0.00 |
5.78
|
-fstrength-reduce |
9.27 |
0.75 |
2.71 |
4.88 |
2.85 |
1.19 |
2.56 |
0.84 |
1.81 |
3.17
|
-fbranch-probabilities |
3.12 |
0.00 |
1.44 |
1.33 |
14.21 |
7.10 |
3.41 |
-0.83 |
0.90 |
3.14
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
-ftracer |
1.85 |
0.00 |
1.02 |
0.20 |
8.54 |
1.14 |
0.82 |
0.84 |
6.79 |
2.36
|
-fbranch-probabilities |
1.85 |
-0.75 |
-1.83 |
0.40 |
5.85 |
1.65 |
8.03 |
1.69 |
-1.74 |
1.56
|
-funit-at-a-time |
2.48 |
0.74 |
-0.21 |
0.40 |
7.25 |
-1.68 |
10.00 |
2.56 |
-3.45 |
1.56
|
-fstrict-aliasing |
0.00 |
0.00 |
-0.21 |
0.00 |
2.35 |
-6.56 |
9.00 |
0.84 |
0.90 |
0.77
|
-fomit-frame-pointer |
2.48 |
-0.75 |
-0.41 |
0.00 |
4.34 |
-0.58 |
6.19 |
0.00 |
-0.88 |
0.77
|
-fgcse |
0.60 |
0.00 |
0.00 |
-2.18 |
3.33 |
6.50 |
6.14 |
0.84 |
-2.61 |
0.76
|
-finline-functions |
1.85 |
-0.75 |
-0.31 |
0.20 |
7.42 |
-9.40 |
2.56 |
0.84 |
-2.59 |
0.00
|
-freorder-blocks |
0.00 |
-0.75 |
0.00 |
0.10 |
4.32 |
-5.24 |
6.14 |
-1.64 |
0.00 |
0.00
|
-frename-registers |
0.60 |
-1.49 |
0.40 |
0.40 |
5.85 |
-1.66 |
0.82 |
0.84 |
-1.79 |
0.00
|
-foptimize |
-0.61 |
0.00 |
-1.42 |
0.10 |
2.35 |
-4.66 |
5.21 |
0.00 |
-1.76 |
0.00
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
-fif-conversion |
0.00 |
0.00 |
0.20 |
0.20 |
0.94 |
1.10 |
4.31 |
-0.84 |
-0.90 |
0.00
|
-funroll-loops |
0.60 |
-2.99 |
-1.01 |
0.10 |
1.87 |
-3.98 |
0.83 |
0.00 |
-0.88 |
-0.77
|
-fold-unroll-loops |
6.66 |
-0.75 |
0.20 |
2.43 |
-36.75 |
3.48 |
-4.96 |
1.66 |
3.63 |
-3.08
|
-ffast-math |
-0.60 |
0.00 |
0.10 |
0.30 |
-0.47 |
2.90 |
-5.47 |
-0.83 |
-2.59 |
-0.76
|
-fpic |
0.63 |
0.00 |
-0.21 |
0.20 |
-2.04 |
2.95 |
0.00 |
0.00 |
0.87 |
0.00
|
-funroll-all-loops |
0.00 |
-0.75 |
0.71 |
0.70 |
0.00 |
7.55 |
-0.82 |
-4.17 |
-1.79 |
0.00
|
-fpeel-loops |
3.63 |
0.00 |
0.20 |
6.06 |
0.00 |
4.06 |
-4.14 |
0.00 |
0.00 |
0.76
|
File size (relative increase of the size of stripped binaries in percents):
options |
wupwise |
swim |
mgrid |
applu |
mesa |
art |
equake |
ammp |
apsi |
total |
-fbranch-probabilities |
0.37 |
-0.11 |
0.20 |
0.15 |
-7.43 |
-6.42 |
-0.06 |
-0.92 |
-2.47 |
-4.77
|
-funit-at-a-time |
0.37 |
-0.11 |
0.20 |
0.15 |
-7.37 |
-6.42 |
0.57 |
0.03 |
-2.47 |
-4.61
|
-fomit-frame-pointer |
0.00 |
-0.53 |
-1.53 |
-0.35 |
-3.45 |
-7.19 |
-2.12 |
-4.38 |
-1.30 |
-2.96
|
-fgcse |
0.00 |
-26.92 |
0.57 |
-8.87 |
-1.06 |
-7.19 |
0.25 |
-0.02 |
-0.74 |
-1.93
|
-fstrict-aliasing |
0.00 |
0.00 |
0.00 |
0.00 |
-0.31 |
-7.19 |
-2.17 |
-0.10 |
-0.15 |
-0.44
|
-fif-conversion |
0.00 |
-0.21 |
-0.09 |
-0.08 |
-0.22 |
-0.73 |
0.05 |
0.31 |
-0.03 |
-0.11
|
-foptimize |
0.00 |
0.00 |
0.00 |
0.00 |
-0.04 |
0.00 |
-0.06 |
-0.01 |
0.00 |
-0.02
|
sibling-calls |
|
|
|
|
|
|
|
|
|
|
-freorder-blocks |
0.00 |
0.10 |
0.00 |
0.02 |
0.28 |
-0.19 |
0.11 |
-0.43 |
-0.20 |
0.07
|
-funroll-loops |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.02 |
0.02 |
0.00
|
-frename-registers |
0.00 |
0.10 |
0.16 |
-0.08 |
0.13 |
0.18 |
0.22 |
0.11 |
-0.32 |
0.05
|
-finline-functions |
0.37 |
0.20 |
0.28 |
0.13 |
-0.74 |
19.29 |
8.46 |
1.21 |
-1.87 |
0.09
|
-ftracer |
8.22 |
0.00 |
0.08 |
0.02 |
0.22 |
0.55 |
0.22 |
1.01 |
1.72 |
0.79
|
-fbranch-probabilities |
9.36 |
0.84 |
1.36 |
-1.20 |
0.00 |
2.79 |
0.97 |
4.33 |
1.80 |
1.17
|
-fguess-branch... |
|
|
|
|
|
|
|
|
|
|
-fstrength-reduce |
7.50 |
2.68 |
3.28 |
7.17 |
-0.17 |
0.00 |
0.22 |
0.37 |
7.24 |
1.70
|
-fschedule-insns2 |
3.87 |
2.47 |
3.73 |
6.90 |
5.47 |
3.11 |
4.91 |
5.97 |
6.53 |
5.64
|
-fschedule-insns2 |
3.78 |
2.47 |
4.26 |
11.04 |
5.55 |
3.11 |
5.66 |
5.97 |
6.97 |
6.25
|
-fschedule-insns |
|
|
|
|
|
|
|
|
|
|
-fpic |
-1.98 |
-0.42 |
0.24 |
-2.57 |
-0.09 |
2.76 |
1.10 |
-1.06 |
-0.08 |
-3.83
|
-ffast-math |
-0.21 |
-2.00 |
-1.05 |
-0.97 |
0.27 |
0.30 |
-0.60 |
-0.76 |
-0.61 |
-0.17
|
-fpeel-loops |
0.00 |
0.00 |
0.00 |
0.90 |
0.00 |
7.73 |
2.60 |
0.00 |
0.00 |
0.30
|
-funroll-all-loops |
0.00 |
0.00 |
0.81 |
0.29 |
0.00 |
7.73 |
1.13 |
0.37 |
0.27 |
0.34
|
-fold-unroll-loops |
2.71 |
36.40 |
15.56 |
5.15 |
4.52 |
23.73 |
6.75 |
15.75 |
8.91 |
7.79
|
Real World Performance
One of the main goals has been to develop system ready for both enterprise and
desktop (workstation) use. While the need of 64-bit addressing space for
the enterprise is well understood, the effect on desktop performance is
often discussed. The main drawback of 64-bit system, as discussed in section
2.1 is the increased memory footprint of the programs
and subsequent slowdown of program startup times critical for today desktop
systems.
In this section we present few simple benchmarks of this phenomenon on SuSE
Linux 8.2. Both the 32-bit and 64-bit version of the system were installed on
the equally sized ReiserFS partitions in the default configuration. The tests
were performed in the same order on both systems with reboots in between.
Additional packages were installed as needed. We hope this procedure to
minimize amount of the noise in the numbers.
Table:
Desktop Performance Relative to 32-bit System
test |
speedup |
bootup time |
-0.9% |
KDE startup from disk |
18.1% |
KDE startup from cache |
14.6% |
The Table compares startup times of several programs. As can be seen,
the 64-bit system, perhaps surprisingly, is significantly faster in two of them and comparable in bootup times.
The Table compares compilation of the package gimp.
Table:
Gimp Compilation Times Relative to 32-bit System
|
|
speedup |
|
test |
real |
user |
system |
tar xjf |
17.7% |
9.8% |
4% |
./configure |
-4.3% |
0.7% |
-31% |
make |
12.9% |
19.8% |
-39% |
As can be seen on Table the memory consumption grows up by
about 1/4 as expected, but due to relative compactness of CISC AMD64
instruction set, the increase is much smaller than one seen after switching to
RISC or VLIW systems.
Table:
Memory Resources Consumption
test |
32-bit |
64-bit |
increase |
konqueror |
14 M |
18 M |
28% |
gimp |
8.6 M |
9.9 M |
15% |
mozilla |
22 M |
27 M |
22% |
In fact Tables and shows decrease in the code section sizes.
Table:
Size of Common Binaries in /usr/bin
section |
32-bit |
64-bit |
increase |
.text |
56216 K |
53419 K |
-5% |
.bss |
18169 K |
21098 K |
16% |
.data |
10239 K |
14076 K |
37% |
.rodata |
17543 K |
19734 K |
12% |
.eh_frame |
546 K |
8269 K |
1414% |
.rela.plt |
358 K |
1076 K |
200% |
.rela.dyn |
40 K |
126 K |
215% |
total |
80435 K |
91141 K |
13% |
Table:
Size of Common Shared Libraries
section |
32-bit |
64-bit |
increase |
.text |
71967 K |
67526 K |
-7% |
.bss |
33463 K |
11557 K |
-72% |
.dynstr |
13608 K |
13587 K |
-1% |
.rodata |
12119 K |
12217 K |
0% |
.dynsym |
11424 K |
7611 K |
66% |
.eh_frame |
6367 K |
12730 K |
99% |
.data |
6018 K |
9695 K |
61% |
.rela.dyn |
4382 K |
12844 K |
193% |
.plt |
3898 K |
6499 K |
66% |
.rela.plt |
1293 K |
3888 K |
200% |
.got |
823 K |
1654 K |
100% |
total |
171812 K |
198111 K |
15% |
The major growths can be seen in the section .eh_frame that is usually not load into the memory
and sections related to the dynamic relocations. According to our benchmarks these are not critical,
since dynamic loader is still slightly faster in 64-bit version compared to 32-bit.
Overall, we can recommend use of 64-bit system instead of 32-bit on
AMD64 machines intended for desktop use as long as memory consumption increased
by 25% is not major limitation (that is hardly the case for computers sold
today).
Jan Hubicka
2003-05-04