Summary

We believe that in our two semesters long effort we accomplished our original goal and exceeded it greatly, as our infrastructure improvements allowed us to implement several successful optimization passes (and few not so successful, see discussion in [*] and [*]) in shorter time than before. Overall benefits we benchmarked are competitive with results reported by other compiler teams.

While the original reorganization of compiler to use CFG was mainly done by Jan Hubicka in the early stages of our projects, the other participants of the project (Zdenek Dvorák, Josef Zlomek and Pavel Nejedlý) were able to familiarize themselves with the necessary parts of the compiler in less than one month and use the CFG infrastructure to develop new optimization passes, improve importantly the GCC ability to predict the profile statically and bring framework to measure efficiency of individual branch prediction heuristics, and increase usability and robustness of basic block profiler implementation and when needed find and fix the latent GCC implementation problems. They also brought important feedback for design of CFG code, its documentation, and implemented several extensions (such as new natural loop tree representation, accurate debugging output support routines or robust profiler code).

A number of optimizations were implemented and we believe that mostly successful ones -- among the most important ones, the register allocation changes, code placement and new loop unrolling code. All the new passes are significantly shorter and easier to maintain than older GCC code.

For instance the CFG code is shared across all optimizations. Code layout and basic block duplication module is implemented on using it and is reused in several optimization passes (block reordering, loop optimizer, tracer). We implemented natural loop discovery code in 1100 lines and loop body duplication code in 900 lines used by loop peeling, unrolling and unswitching. Our loop unroller is just 700 lines long as opposed to 4000 lines of code of monolithic unrolling code present in old loop optimizer that magically used 3000 lines of code for instruction chain duplication from function inlining module. Our new loop optimizer, even when it is feature wise much poorer than the old code, already outperforms the old code in SPEC2000 benchmark at much lower code size expanses and we got loop peeling and unswitching de-facto for free.

We have found our implementations to be significantly shorter than the ones present in Open64 and competitive with Impact compiler12.1, the other two large compiler projects with sources publically available. We also believe our implementations are easier to understand, but we probably need to keep the judgment on independent reader.

We have successfully merged major infrastructural changes to GCC mainline tree. Important amount of work is already present in the 3.1 branch to be released in 5 days (April 15th). 3.1 version will be the first official GCC version supporting profile feedback and doing limited amount of profile based optimizations -- register allocation, code alignment and simplistic basic block reordering.

Some more changes have been merged to the development tree, after it had been unfreezed last month, and only some of new optimizations are waiting to be integrated. Merging of the other changes is scheduled after the 3.2.0 release, because GCC maintainers are now focusing on pushing out the release.

Our special thanks come to Richard Henderson, one of the most active GCC maintainers, who kindly reviewed majority of our patches and approved them for inclusion into mainline and provided useful comments and ideas and even developed and improved our code in areas we were unsure about, and Andreas Jaeger, who tested and benchmarked our work using SPEC2000 benchmark suite.


Subsections
Jan Hubicka 2003-05-04