A control flow graph is a data structure built on top of the intermediate code representation (RTL instruction chain or trees) abstracting the control flow behavior of compiled function. It is an oriented graph where nodes are basic blocks and edges represent possible control flows from one basic block to another.
The basic block (node of control flow graph) is defined as a structure:
typedef struct basic_block_def { /* The first and last insns of the block. */ rtx head, end; /* The first and last trees of the block. */ tree head_tree; tree end_tree; /* The edges into and out of the block. */ edge pred, succ; /* Liveness info. */ /* The registers that are modified within this block. */ regset local_set; /* The registers that are conditionally modified within this block. In other words, registers that are set only as part of a COND_EXEC. */ regset cond_local_set; /* The registers that are live on entry to this block. */ regset global_live_at_start; /* The registers that are live on exit from this block. */ regset global_live_at_end; /* Auxiliary info specific to a pass. */ void *aux; /* The index of this block. */ int index; /* Loop info. */ /* The loop depth of this block. */ int loop_depth; /* Outermost loop containing the block. */ struct loop *loop_father; /* Immediate dominator. */ struct basic_block_def *dominator; /* Expected number of executions: calculated in profile.c. */ gcov_type count; /* Expected frequency. Normalized to be in range 0 to BB_FREQ_MAX. */ int frequency; /* Various flags. See BB_* below. */ int flags; }Basic block is a straight-line sequence of code that can be entered only at the beginning and leaved only at the end. Basic blocks are represented using
basic_block
data type that contains pointers to the head (first
instruction of the basic block) and the end (last instruction) as well as other
information maintained about each block.
In the RTL function representation, the head pointer always points either to
NOTE_INSN_BASIC_BLOCK
or to CODE_LABEL
, if present. Basic block
ends by control flow instruction or last instruction before following
CODE_LABEL
.
Special basic blocks ENTRY_BLOCK_PTR
and EXIT_BLOCK_PTR
represent
possible entry points and exits from the compiled function.
The BASIC_BLOCK
array contains all basic blocks in the order they appear
in the insn stream.
The RTL instruction stream contains not only the ``real'' instructions, but also notes. Notes may or may not appear inside basic block. Any function that moves or duplicates the basic blocks needs to take care of updating of these notes. Many of notes expect that code consists of linear regions making such updates difficult.
Additionally the jump table vectors are represented as ``instructions'' inside
the insn chain. These vectors never appear in the basic block and should be
always placed just after table jump instructions referencing them. After
removing the table-jump it is difficult to eliminate the code computing address
and referencing the vector, so cleaning up the vectors is postponed to liveness
analysis and thus the vectors may appear in the insn chain without any purpose.
Before any edge is made fall-thru, the existence of such construct in the way
needs to be checked by calling can_fallthru
function.
Edges in the control flow graphs are described by structure:
/* Control flow edge information. */ typedef struct edge_def { /* Links through the predecessor and successor lists. */ struct edge_def *pred_next, *succ_next; /* The two blocks at the ends of the edge. */ struct basic_block_def *src, *dest; /* Instructions queued on the edge. */ rtx insns; /* Auxiliary info specific to a pass. */ void *aux; int flags; /* see EDGE_* below */ int probability; /* biased by REG_BR_PROB_BASE */ gcov_type count; /* Expected number of executions calculated in profile.c */ } *edge;
Edges represent possible control flow transfers from the end of basic block to the head of another. Single linked lists of edges to predecessors and successors are maintained for each basic block.
In common case edges correspond to branches or ``fall-thru'' edges to the following
basic block, but there are several other reasons why edge may be created. The
type of edge can be obtained via flags
field.
There are following types:
Edges corresponding to destinations of jump instructions have no flags defined. These edges are used for unconditional or conditional jumps and table jumps and are most convenient to manipulate with as they may be freely redirected.
Fall-thru edges are present in case the basic block may continue execution to the following
one without branching and do have EDGE_FALLTHRU
flag set.
Unlike other types of edges, these edges must come into following basic block in the
insn stream and thus function force_nonfallthru
is available to insert
jump in the case that redirection is needed. This may require creation of
a new basic block.
Exception handling edges represent possible control
transfers from trap to the
exception handler. The definition of trap varies. In C++ only function
calls can throw, but for Java exceptions like division by zero or segmentation
fault are defined and thus each instruction possibly throwing this kind of
exception needs to be handled as control flow instruction.
The destination of edge is specified by REG_EH_REGION
note attached to
the insn.
The flags of such notes set EDGE_EH
and EDGE_ABNORMAL
. In case
of the call EDGE_ABNORMAL_CALL
flag is set too.
When updating the insn stream it is easy to change possibly trapping
instruction to non-trapping. Opposite conversion is difficult and should not
happen. Predicate may_trap_p
may be used to check whether
instruction still may trap or not. The edges can be eliminated via
purge_dead_edges
call.
Sibling calls terminate the function in a non-standard way and thus an edge
to the exit must be present. EDGE_ABNORMAL_CALL
and EDGE_ABNORMAL
are set in such case.
Computed jumps contain edges to all labels in the function referenced from
the code. All those edges have EDGE_ABNORMAL
flag set. The edges
used to represent computed jumps often cause compile time performance
problems, since functions consisting of many taken labels and many computed
jumps may have very dense flow graphs, so these edges need to be handled
with special care.
GCC allows nested functions to return into caller using goto statement
referring to label passed to as an argument. The labels passed to nested
functions contain special code to cleanup after function call. Such section of
code is referred as nonlocal goto receivers. In the case function
contains such nonlocal goto receivers, the edge from the call to label is
present having EDGE_ABNORMAL
and EDGE_ABNORMAL_CALL
flags set.
By definition, execution of function starts by basic block 0, so there is always
an edge from entry block to the first real basic block. The alternate entry
points are specified by CODE_LABEL
with LABEL_ALTERNATE_NAME
defined. This feature is currently used for multiple entry point prologues
and is limited to post-reload passes only. In future full support for multiple
entry functions defined by Fortran 90 needs to be implemented.
In the pre-reload representation function terminates by the last instruction in the insn chain and no explicit return instructions are used. This corresponds to the fall-thru edge into exit block. After reload optimal RTL epilogues are used, that use explicit (conditional) return instructions that are represented by edges with no flags set.
In many cases compiler must make choice whether to trade speed in one part of code for speed in another, or trade code size for code speed. In such cases it is useful to know information about how often given block executes and that is the purpose for maintaining profile within flow graph.
GCC allows the profile to be either feedback based or statically estimated.
The feedback based profile is produced by compiling the program with instrumentation, executing it on the train run and reading the numbers of executions of basic block edges back to the compiler while re-compiling the program to produce final executable. This method provides very accurate information about where program spends most of time on the train run. Whether it matches the average run depends, of course, on the choice of train data set, but several studies has shown, that the behavior of program usually changes just marginally over different data sets.
When profile feedback is not available, compiler attempts to predict the
behavior of each branch in the program using a set of heuristics (see
predict.def
for details) and compute estimated frequencies of
each basic block by propagating the probabilities over the graph.
Each basic block contains two fields -- frequency and count.
Frequency is an estimation how often is basic block executed within a
function and is represented as integer scaled in the range
0-BB_FREQ_BASE
. Most frequent basic block in function is initially set
to BB_FREQ_BASE
and rest of frequencies are scaled according to that.
During optimization, the frequency of most frequent basic block can both
decrease (for instance by loop unrolling) or grow (for instance by
cross-jumping optimization).
The count contains number of execution measured during training run and is nonzero only when profile feedback is available. This value is represented as 64bit integer. Most optimization passes can use only the frequencies of basic block, while few passes may want to know exact counts. The frequencies should always match the counts after scaling, however during updating of the profile information numerical error may accumulate into quite large errors.
Similarly each edge contains probability field--an integer in range
from 0 to REG_BR_PROB_BASE
. It represents probability of passing control from
the end of source basic block to the destination. The probability that control
flow arrives via given edge to the destination basic block is called reverse
probability and is not directly represented, but it may be easily computed
from frequencies of basic blocks. EDGE_FREQUENCY
macro is available to
compute how frequently is given edge taken. count
field is present for
each edge as well, representing same information as for basic block.
The basic block frequencies are not represented in the RTL instruction stream,
the edge frequencies are represented only for conditional jump via
REG_BR_PROB
, since they are used when instructions are output to
the assembly file and flow graph is no longer maintained.
The name of the file corresponding to a source file is formed simply by changing
file suffix to .da
, e.g. for test.c
the corresponding file is
test.da
.
As mentioned before, the file consists of several sections, each represening single run. The structure of each section is shown on figure
The function name in the figure is stored as a (4 bytes), the length (4 bytes), the name itself (padded to 4-byte boundary) followed by a (4 bytes).
The extension block is used for storage of other important data which may be emited by future versions of gcc. This allows use of particular profile with different versions of gcc.
In current version, extension block contains the following information:
The content of the file can be examined by utility gcov
, which outputs
the corresponding source file together with execution counts for each line
(so-called line coverage).
There is currently no utility for manipulating the profile output file structure,
e.g. removing runs or merging two files together.
Important task is to keep both control flow graph and profile up-to-date with the instruction stream during optimization passes. Reconstruction of control flow graph after each pass is not an option, as it is too expensive and we lose profile information.
At the moment, the basic block boundaries are maintained transparently during
emitting instruction, so rarely there is need to move them manually (such as in
case someone wants to output instruction outside basic block explicitly). Each
instruction has defined BLOCK_FOR_INSN
value that represents pointer to the
basic block owning it, so the basic block list may be better viewed as integral
part of instruction chain, than structure built on the top of it.
Updating of edges is not transparent and optimization pass is required to do
that manually. However only few cases occur in practice. Commonly the
optimization pass simplifies the instruction possibly eliminating some edge.
This may happen by simplifying the conditional jump into unconditional, but
also by simplifying possibly trapping instruction to non-trapping while
compiling Java. The pass may call purge_dead_edges
on given basic block
to remove unneeded edges, if any.
Other common scenario is redirection of branch instructions, but this is best
modeled as redirection of edges in the control flow graph and thus use of
redirect_edge_and_branch
is preferred over more low level functions,
such as redirect_jump
that operate on RTL chain only.
Last common case is inserting control flow instruction into middle of basic
block. The find_sub_basic_blocks
may be used to split existing basic
block and add necessary edges, or split_block
may be used when
instruction in middle of basic block wants to become target of branch
instruction.
For global optimizer, a common operation is to split edges in the flow graph
and insert instruction to them. This can be easily done via function
commit_insn_to_edge
that emits instruction ``to the edge'' caching it
for later commit_edge_insertions
call that will care creation of new
basic block where needed and flushing the instruction to actual instruction
stream.
While debugging the optimization pass, an verify_flow_info
function may
be useful to find bugs in the control flow graph updating code.
More delicate task than updating control flow graph is to update profile. Many
of the function to modify flow graph, like redirect_edge_and_branch
do
not have enough information to easily update profile, so updating profile is
in majority cases left on the caller. Since it is difficult to discover bugs in
the profile updating code, as they manifest themselves only by producing worse
code and checking profile consistency is not possible, because of numeric
error accumulation, special care needs to be taken into this issue.
It is important to point out, that REG_BR_PROB_BASE
and
BB_FREQ_BASE
are both set low enough to be possible to compute second
power of any frequency or probability in the flow graph, it is not possible to
even square the count
field, as modern CPUs are fast enough to execute
operations quickly.
Liveness information is useful to determine whether register X is ``live'' at given point of program, that means that it contains important value. This information is used, for instance, during register allocation pass, as the pseudo registers need to be assigned to unique hard register or stack slot only when they are live. The hard registers and stack slots may be freely reused for other values when they are dead.
The liveness information is stored partly in the RTL instruction chain and
partly in the flow graph. RTL chain stores local information: each instruction
may contain REG_DEAD
note representing that value of given register is
no longer needed or REG_UNUSED
note representing that the value computed
by instruction is never used. The second is useful for instructions computing
multiple values at once.
Each basic block contains bitmaps representing liveness of each register at
entry and exit of basic block (global_live_at_start
and
global_live_at_end
). flow.c
contains function to compute
liveness of each register at any given place in the instruction stream using
this information.
Liveness is expensive to compute and thus it is desirable to keep it up to date
during optimization passes. This can be easily accomplished using flags
field of basic block. The functions modifying instruction stream automatically
set BB_DIRTY
flag of basic block, so the pass may simply use
clear_bb_for_blocks
before doing any modifications and then ask dataflow
modulule via function update_life_info_in_dirty_blocks
to get liveness
updated.
This scheme works reliably as long as no control flow graph transformations are done. The task of updating liveness after control flow graph changes is more difficult as normal iterative data flow may produce invalid results or get into cycle when the initial solution is not bellow the desired one. Only simple transformations, like splitting basic blocks or emitting to the edge are safe, as functions to implement them already know how to update liveness locally.
Loop tree describes the structure of loops. Nodes correspond to loops, while
edges reflect the subloop/superloop stucture. Root of the tree is a dummy loop
containing the whole function (including ENTRY_BLOCK_PTR
as latch and
EXIT_BLOCK_PTR
as header).
Struct loop is defined as follows:
struct loop { /* Index into loops array. */ int num; /* Basic block of loop header. */ basic_block header; /* Basic block of loop latch. */ basic_block latch; /* Number of blocks contained within the loop. */ int num_nodes; /* The loop nesting depth. */ int depth; /* Array of superloops of the loop. */ struct loop **pred; /* The outer (parent) loop or NULL if outermost loop. */ struct loop *outer; /* The first inner (child) loop or NULL if innermost loop. */ struct loop *inner; /* Link to the next (sibling) loop. */ struct loop *next; /* Loop that is copy of this loop (must be valid only just after copying the loop). */ struct loop *copy; ... /* Fields specific for the old loop optimizer are omitted. */ };
The whole loop tree (plus additional information) is stored in struct loops:
struct loops { /* Number of natural loops in the function. */ int num; /* Array of pointers to loops. Subloops should always have greater index than their parents. */ struct loop **parray; /* Pointer to root of loop hierarchy tree. */ struct loop *tree_root; ... /* Fields specific for old loop optimizer are omitted. */ };
Field loop_father
of basic_block
is a pointer to the innermost loop containing
the basic block.
Functions for building, modifying and querying the loop structure are provided
in cfgloop.c
and cfgloopanal.c
; some higher level functions for manipulating
it may be found in loop-new.c
.