The following sections show only the difference to the Intel386 ABI.
The AMD64 architecture usually does not allow to encode arbitrary 64-bit constants as immediate operand of the instruction. Most instructions accept 32-bit immediates that are sign extended to the 64-bit ones. Additionally the 32-bit operations with register destinations implicitly perform zero extension making loads of 64-bit immediates with upper half set to 0 even cheaper.
Additionally the branch instructions accept 32-bit immediate operands that are sign extended and used to adjust instruction pointer. Similarly an instruction pointer relative addressing mode exists for data accesses with equivalent limitations.
In order to improve performance and reduce code size, it is desirable to use different code models depending on the requirements.
Code models define constraints for symbolic values that allow the compiler to generate better code. Basically code models differ in addressing (absolute versus position independent), code size, data size and address range. We define only a small number of code models that are of general interest:
This allows the compiler to encode symbolic references with offsets in the range from -231 to 210 directly in the sign extended immediate operands, with offsets in the range from 0 to 231+210 in the zero extended immediate operands and use instruction pointer relative addressing for the symbols with offsets in the range -210 to 210.
This is the fastest code model and we expect it to be suitable for the vast majority of programs.
The kernel of an operating system is usually rather small but runs in the negative half of the address space. So we define all symbols to be in the range from 264-231 to 264-210.
This code model has advantages similar to those of the small model, but allows encoding of zero extended symbolic references only for offsets from 231 to 231+210. The range offsets for sign extended reference changes to 0-231+210.
The medium code model does not make any assumptions about the range of symbolic references to data sections. Size and address of the text section have the same limits as the small code model.
This model requires the compiler to use movabs instructions to access static data and to load addresses into register, but keeps the advantages of the small code model for manipulation of addresses to the text section (specially needed for branches).
The large code model makes no assumptions about addresses and sizes of sections.
The compiler is required to use the movabs instruction, as in the medium code model, even for dealing with addresses inside the text section. Additionally, indirect branches are needed when branching to addresses whose offset from the current instruction pointer is unknown.
It is possible to avoid the limitation for the text section by breaking up the program into multiple shared libraries, so we do not expect this model to be needed in the foreseeable future.
Unlike the previous models, the virtual addresses of instructions and data are not known until dynamic link time. So all addresses have to be relative to the instruction pointer.
Additionally the maximum distance between a symbol and the end of an instruction is limited to 231-210-1, allowing the compiler to use instruction pointer relative branches and addressing modes supported by the hardware for every symbol with an offset in the range -210 to 210.
This model is like the previous model, but makes no assumptions about the distance of symbols to the data section.
In the medium PIC model, the instruction pointer relative addressing can not be used directly for accessing static data, since the offset can exceed the limitations on the size of the displacement field in the instruction. Instead an unwind sequence consisting of movabs, lea and add needs to be used.
This model is like the previous model, but makes no assumptions about the distance of symbols.
The large PIC model implies the same limitation as the medium PIC model regarding addressing of static data. Additionally, references to the global offset table and to the procedure linkage table and branch destinations need to be calculated in a similar way.
AMD64 does not need any function prologue for calculating the global offset table address since it does not have an explicit GOT pointer.
This section describes only objects with static storage. Stack-resident objects are excluded since programs always compute their virtual address relative to the stack or frame pointers.
Because only the movabs instruction uses 64-bit addresses directly, depending on the code model either %rip-relative addressing or building addresses in registers and accessing the memory through the register has to be used.
For absolute addresses %rip-relative encoding can be used in the small model. In the medium model the movabs instruction has to be used for accessing addresses.
Position-independend code cannot contain absolute address. To access a global symbol the address of the symbol has to be loaded from the Global Offset Table. The address of the entry in the GOT can be obtained with a %rip-relative instruction in the small model.
Not done yet.
Some otherwise portable C programs depend on the argument passing scheme, implicitly assuming that 1) all arguments are passed on the stack, and 2) arguments appear in increasing order on the stack. Programs that make these assumptions never have been portable, but they have worked on many implementations. However, they do not work on the AMD64 architecture because some arguments are passed in registers. Portable C programs must use the header file <stdarg.h> in order to handle variable argument lists.
When a function taking variable-arguments is called, %rax must be set to the total number of floating point parameters passed to the function in SSE registers.3.16
The prologue of a function taking a variable argument list and known to call the macro va_start is expected to save the argument registers to the register save area. Each argument register has a fixed offset in the register save area as defined in the figure .
Only registers that might be used to pass arguments need to be saved. Other registers are not accessed and can be used for other purposes. If a function is known to never accept arguments passed in registers3.17, the register save area may be omitted entirely.
The prologue should use %rax to avoid unnecessarily saving XMM registers. This is especially important for integer only programs to prevent the initialization of the XMM unit.
The va_list type is an array containing a single element of one structure containing the necessary information to implement the va_arg macro. The C definition of va_list type is given in figure .
The va_start macro initializes the structure as follows:
The algorithm for the generic va_arg(l, type) implementation is defined as follows:
The va_arg macro is usually implemented as a compiler builtin and expanded in simplified forms for each particular type. Figure is a sample implementation of the va_arg macro.