Subsections
This section describes the standard function calling sequence,
including stack frame layout, register usage, parameter passing and so
on.
The standard calling sequence requirements apply only to global
functions. Local functions that are not reachable from other
compilation units may use different conventions. Nevertheless, it is
recommended that all functions use the standard calling sequence when
possible.
Registers and the Stack Frame
The AMD64 architecture provides 16 general purpose 64-bit registers.
In addition the architecture provides 16 SSE registers, each 128 bits
wide and 8 x87 floating point registers, each 80 bits wide. Each of
the x87 floating point registers may be referred to in MMX/3DNow!
mode as a 64-bit register. All of these registers are global to all
procedures in a running program.
This subsection discusses usage of each register. Registers %rbp, %rbx and
%r12 through %r15 ``belong'' to the calling function and the
called function is required to preserve their values. In other words,
a called function must preserve these registers' values for its
caller. Remaining registers ``belong'' to the called
function.3.5 If a
calling function wants to preserve such a register value across a
function call, it must save the value in its local stack frame.
The CPU shall be in x87 mode upon entry to a function. Therefore,
every function that uses the MMX registers is required to issue
an emms or femms instruction before accessing the MMX
registers.3.6 The direction flag in the %eflags
register must be clear on function entry, and on function return.
In addition to registers, each function has a frame on the run-time
stack. This stack grows downwards from high addresses. Figure
shows the stack organization.
Figure:
Stack Frame with Base Pointer
|
The end of the input argument area shall be aligned on a 16 byte
boundary. In other words, the value
is always a multiple
of 16 when control is transferred to the function entry point. The
stack pointer, %rsp, always points to the end of the latest allocated
stack frame. 3.7
The 128-byte area beyond the location pointed to by %rsp is considered
to be reserved and shall not be modified by signal or interrupt
handlers.3.8 Therefore, functions may use this area for
temporary data that is not needed across function calls. In
particular, leaf functions may use this area for their entire stack
frame, rather than adjusting the stack pointer in the prologue and
epilogue.
Parameter Passing
After the argument values have been computed, they are placed in
registers, or pushed on the stack. The way how values are passed is
described in the following sections.
We first define a number of classes to classify arguments. The
classes are corresponding to AMD64 register classes and defined as:
- INTEGER
- This class consists of integral types that fit into one of
the general purpose registers.
- SSE
- The class consists of types that fits into a SSE register.
- SSEUP
- The class consists of types that fit into a SSE register
and can be passed and returned in the most significant half of it.
- X87, X87UP
- These classes consists of types that will be returned via
the x87 FPU.
- COMPLEX_X87
- This class consists of types that will be returned
via the x87 FPU.
- NO_CLASS
- This class is used as initializer in the algorithms. It
will be used for padding and empty structures and unions.
- MEMORY
- This class consists of types that will be passed and
returned in memory via the stack.
The size of each argument gets rounded up to
eightbytes.3.9
The basic types are assigned their natural classes:
The classification of aggregate (structures and arrays) and union
types works as follows:
- If the size of an object is larger than two eightbytes, or
in C++, is a non-POD 3.10 structure or union type, or contains unaligned fields, it has class
MEMORY.3.11
- Both eightbytes get initialized to class NO_CLASS.
- Each field of an object is classified recursively so that always
two fields are considered. The resulting class is calculated
according to the classes of the fields in the eightbyte:
- If both classes are equal, this is the resulting class.
- If one of the classes is NO_CLASS, the resulting class is the other class.
- If one of the classes is MEMORY, the result is the MEMORY class.
- If one of the classes is INTEGER, the result is the INTEGER.
- If one of the classes is X87, X87UP, COMPLEX_X87 class,
MEMORY is used as class.
- Otherwise class SSE is used.
- Then a post merger cleanup is done:
- If one of the classes is MEMORY, the whole argument is passed in memory.
- If SSEUP is not preceeded by SSE, it is converted to SSE.
Once arguments are classified, the registers get assigned (in
left-to-right order) for passing as follows:
- If the class is MEMORY, pass the argument on the stack.
- If the class is INTEGER, the next available register of the
sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is
used3.12.
- If the class is SSE, the next available SSE register is used, the
registers are taken in the order from %xmm0 to %xmm7.
- If the class is SSEUP, the eightbyte is passed in the upper
half of the least used SSE register.
- If the class is X87, X87UP or COMPLEX_X87, it is passed in memory.
Figure:
Register Usage
|
If there is no register available anymore for any eightbyte of an
argument, the whole argument is passed on the stack. If registers have
already been assigned for some eightbytes of this argument, those
assignments get reverted.
Once registers are assigned, the arguments passed in memory are pushed
on the stack in reversed (right-to-left3.13) order.
For calls that may call functions that use varargs or stdargs
(prototype-less calls or calls to functions containing ellipsis
(...) in the declaration) %al 3.14 is used as hidden argument to specify
the number of SSE registers used. The contents of %al do not need to
match exactly the number of registers, but must be an upper bound on
the number of SSE registers used and is in the range 0-8 inclusive.
The returning of values is done according to the following algorithm:
- Classify the return type with the classification algorithm.
- If the type has class MEMORY, then the caller provides space for
the return value and passes the address of this storage in %rdi as
if it were the first argument to the function. In effect, this
address becomes a ``hidden'' first argument.
On return %rax will contain the address that has been passed in by
the caller in %rdi.
- If the class is INTEGER, the next available register of the
sequence %rax, %rdx is used.
- If the class is SSE, the next available SSE register of the
sequence %xmm0, %xmm1 is used.
- If the class is SSEUP, the eightbyte is passed in the upper half of the
last used SSE register.
- If the class is X87, the value is returned on the X87 stack in
%st0 as 80-bit x87 number.
- If the class is X87UP, the value is returned together with the
previous X87 value in %st0.
- If the class is COMPLEX_X87, the real part of the value is
returned in %st0 and the imaginary part in %st1.
As an example of the register passing conventions, consider the
declarations and the function call shown in
Figure . The corresponding register
allocation is given in Figure , the stack
frame offset given shows the frame before calling the function.
Figure:
Parameter Passing Example
|
Figure:
Register Allocation Example
|
Footnotes
- ...
function.3.5
- Note that in contrast to the Intel386 ABI, %rdi,
and %rsi belong to the called function, not the caller.
- ...
registers.3.6
- All x87 registers are caller-saved, so
callees that make use of the MMX registers may use the faster
femms instruction.
- ... frame.3.7
- The conventional use of %rbp as a frame
pointer for the stack frame may be avoided by using %rsp (the stack
pointer) to index into the stack frame. This technique saves two
instructions in the prologue and epilogue and makes one additional
general-purpose register (%rbp) available.
- ...
handlers.3.8
- Locations within 128 bytes can be addressed using
one-byte displacements.
- ... .3.9
- Therefore the stack will always be eightbyte aligned.
- ...POD\xspace 3.10
- The term POD is from the ANSI/ISO C++ Standard, and
stands for Plain Old Data. Although the exact definition is
technical, a POD is essentially a structure or union that could
have been written in C; there cannot be any member
functions, or base classes, or similar C++ extensions.
- ...3.11
- A non-POD object cannot be passed in registers
because such objects must have well defined addresses; the address
at which an object is constructed (by the caller) and the address
at which the object is destroyed (by the callee) must be the same.
Similar issues apply when returning a non-POD object from a
function.
- ... used3.12
- Note that %r11 is neither required to be
preserved, nor is it used to pass arguments. Making this register
available as scratch register means that code in the PLT need not
spill any registers when computing the address to which control
needs to be transferred. %rax is used to indicate the number of
SSE arguments passed to a function requiring a variable number of
arguments. %r10 is used for passing a function's static chain
pointer.
- ... (right-to-left3.13
- Right-to-left order
on the stack makes the handling of functions that take a variable
number of arguments simpler. The location of the first argument can
always be computed statically, based on the type of that argument.
It would be difficult to compute the address of the first argument
if the arguments were pushed in left-to-right order.
- ...%al\xspace 3.14
- Note that the rest of %rax is undefined, only the contents
of %al is defined.
Jan Hubicka
2003-05-04