VEX prefix

The VEX prefix (from "vector extensions") and VEX coding scheme are an extension to the x86 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

Features[]

The VEX coding scheme allows the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:

The opcode map is extended to make space for future instructions.
It allows instruction codes to have up to four operands (plus immediate), where the original scheme allows only two operands (plus immediate).
It allows the size of SIMD vector registers to be extended from the 128-bits XMM registers to 256-bits registers named YMM. There is room for further extensions of the register size.
It allows existing two-operand instructions to be modified into non-destructive three-operand forms where the destination register is different from both source registers. For example, c = a + b instead of a = a + b (where register a is changed by the instruction).

The VEX prefix replaces the most commonly used instruction prefix bytes and escape codes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code. In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.

The two-byte VEX prefix contains the following components:

The bit, R̅, similar to the REX.R prefix bit used in the x86-64 instruction set extension.
Four bits named v̅, specifying a second source register operand.
A bit named L specifying 256-bit vector length.
Two bits named p to replace operand size prefixes and operand type prefixes (66, F2, F3).

The three-byte VEX prefix additionally contains:

The three bits, X̅; B̅; and W, also similar to the corresponding bits in the REX prefix.
Five bits named m. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths >256 bits, specifying different instruction lengths, or extending the opcode space; however, as of 2013, Intel decided to introduce a new encoding scheme, the EVEX prefix, rather than expand the remaining m bits.

Technical description[]

Intel 64 instruction format using VEX prefix
# of bytes	0,2,3	1	1	0,1	0,1,2,4	0,1
[Prefixes]	[VEX]	OPCODE	ModR/M	[SIB]	[DISP]	[IMM]

The VEX coding scheme uses a code prefix consisting of two or three bytes, which is added to existing or new instruction codes.^[1]

In x86 architecture, instructions with a memory operand may use the ModR/M byte which specifies the addressing mode. This byte has three bit fields:

mod, bits [7:6] - combined with the r/m field, encodes either 8 registers or 24 addressing modes. Also encodes opcode information for some instructions.
reg/opcode, bits [5:3] - depending on primary opcode byte, specifies either a register or three more bits of opcode information.
r/m, bits [2:0] - can specify a register as an operand, or combine with the mod field to encode an addressing mode.

The base-plus-index and scale-plus-index forms of 32-bit addressing (encoded with r/m=100 and mod <>11) require another addressing byte, the SIB byte. It has the following fields:

scale factor, encoded with bits [7:6]
index register, bits [5:3]
base register, bits [2:0].

To use 64-bit addressing and additional registers present in the x86-64 architecture, the REX prefix has been introduced which provides additional space for encoding addressing modes. Bit-field W changes the operand size to 64 bits, R expands reg to 4 bits, B expands r/m (or opreg in the few opcodes such as "POP reg" which encode register number in their 3 lowest opcode bits), and X and B expand index and base in the SIB byte. However REX prefix is encoded quite inefficiently, wasting half of its 8 bits.

REX and VEX encoding
REX
	7	6	5	4	3	2	1	0
Byte 0	0	1	0	0	W	R	X	B
3-byte VEX
	7	6	5	4	3	2	1	0
Byte 0 (C4h)	1	1	0	0	0	1	0	0
Byte 1	R̅	X̅	B̅	m₄	m₃	m₂	m₁	m₀
Byte 2	W	v̅₃	v̅₂	v̅₁	v̅₀	L	p₁	p₀
2-byte VEX
	7	6	5	4	3	2	1	0
Byte 0 (C5h)	1	1	0	0	0	1	0	1
Byte 1	R̅	v̅₃	v̅₂	v̅₁	v̅₀	L	p₁	p₀

The VEX prefix provides a compact representation of the REX prefix, as well as various other prefixes, to expand the addressing mode, register enumeration and operand size and width:

R̅, X̅ and B̅ bits are inversion of the REX prefix's R, X and B bits; these provide a fourth (high) bit for register index fields (ModRM reg, SIB index, and ModRM r/m; SIB base; or opcode reg fields, respectively) allowing access to 16 instead of 8 registers. The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand; for non-integer instructions, it is a general opcode extension bit.
v̅ is the inversion of an additional source register index.
m replaces leading opcode prefix bytes. The values 1, 2 and 3 are equivalent to opcode prefixes 0F, 0F 38 and 0F 3A; all other values are reserved. The 2-byte VEX prefix always corresponds to the 0F prefix.
L indicates the vector length; 0 for 128-bit SSE (XMM) registers, and 1 for 256-bit AVX (YMM) registers.
p encodes additional prefix bytes. The values 0, 1, 2, and 3 correspond to implied prefixes of none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.

Register addressing in 64-bit mode using VEX prefix
Addressing mode	Bit 3	Bits [2:0]	Register type	Common Usage
REG	VEX.R	ModRM.reg	General purpose, Mask, Vector	Register operand
RM (if ModRM.mod=11)	VEX.B	ModRM.r/m	GPR, Mask, Vector	Register operand
RM	VEX.B	ModRM.r/m	GPR	Register memory address
BASE	VEX.B	SIB.base	GPR	Base + Index * Scale memory address
INDEX	VEX.X	SIB.index	GPR	Base + Index * Scale memory address
VIDX	VEX.X	SIB.index	Vector	Base + VectorIndex * Scale memory address
NDS/NDD	VEX.v₃v₂v₁v₀		GPR, Mask, Vector	Register operand
IS4	Imm8[7:4]		Vector	Register operand

Instructions coded with the VEX prefix can have up to four variable operands (in registers or memory) and one constant operand (immediate value). Instructions that need more than three variable operands use immediate operand bits to specify a 4th register operand (IS4 above). At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instruction set uses VEX prefix only for instructions using the SIMD XMM registers.

However, the VEX coding scheme has been used for other instruction types as well in subsequent expansion of instruction set. For example, BMI introduced VEX coded arithmetic and bit manipulation instructions that operate on general purpose registers. AVX-512 introduced 8 mask registers and added instructions to manipulate them. These instructions use VEX encoding. VEX.R, VEX.B or VEX.v3 is ignored when the field is used to encode a mask register.

The VEX prefix's initial-byte values, C4h and C5h, are the same as the opcodes of the LDS and LES instructions. These instructions are not supported in 64-bit mode. To resolve the ambiguity while in 32-bit mode, VEX's specification exploits the fact that a legal LDS or LES's ModRM byte can not be of the form 11xxxxxx (which would specify a register operand). Various bit-fields in the VEX prefix's second byte are inverted to ensure that the byte is always of this form in 32-bit mode. Similarly, the REX prefix's one-byte form has the four high-order bits set to four, which replaces sixteen opcodes numbered 0x40-0x4F. Previously, those opcodes were individual INC and DEC instructions for the eight standard processor registers; x86-64 code must use ModR/M INC and DEC instructions.^[2]

Legacy SIMD instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.

Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency.

History[]

In August 2007, AMD proposed the SSE5 instruction set extension which includes a new coding scheme for instructions with three operands, using an extra byte named DREX intended for the Bulldozer processor core, due to begin production in 2011.^[3]^[4]
In March 2008, Intel proposed the AVX instruction set, using the new VEX coding scheme.^[5]
In August 2008, commentators deplored the expected incompatibility between AMD and Intel instruction sets, and proposed that AMD revise their plans and replace the DREX scheme with the more flexible and extensible VEX scheme.^[6]
In May 2009, AMD announced a revision of the proposed SSE5 instruction set to make it compatible with the AVX instruction set and the VEX coding scheme. The revised SSE5 is called XOP.^[7]
January 2011. The AVX instruction set is supported in Intel's Sandy Bridge microprocessor architecture.
2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, are supported in the AMD Bulldozer processor.^[8]
2013. The FMA3 instruction set is supported in Intel Haswell processors.

References[]

^ Intel Corporation (January 2009). "Intel Advanced Vector Extensions Programming Reference".
^ Intel Corporation (2016-09-01). "Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A". p. 2-8. Retrieved 2021-09-13.
^ "128-Bit SSE5 Instruction Set". AMD Developer Central. Retrieved 2009-06-02.
^ Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.
^ "Intel Software Network". Intel. Archived from the original on 2008-04-07. Retrieved 2008-04-05.
^ "AMD and Intel incompatible - What to do?". AMD Developer Forums. Retrieved 2012-08-10.
^ "AMD64 Architecture Programmer's Manual Volume 4: 128-Bit and 256-Bit Media Instructions" (PDF). AMD. December 22, 2010.
^ "Striking a balance". Dave Christie, AMD Developer blogs. Archived from the original on 2013-11-09. Retrieved 2012-08-10.

[1] Intel Corporation (January 2009). "Intel Advanced Vector Extensions Programming Reference".

[2] Intel Corporation (2016-09-01). "Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A". p. 2-8. Retrieved 2021-09-13.

[3] "128-Bit SSE5 Instruction Set". AMD Developer Central. Retrieved 2009-06-02.

[4] Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.

[5] "Intel Software Network". Intel. Archived from the original on 2008-04-07. Retrieved 2008-04-05.

[6] "AMD and Intel incompatible - What to do?". AMD Developer Forums. Retrieved 2012-08-10.

[7] "AMD64 Architecture Programmer's Manual Volume 4: 128-Bit and 256-Bit Media Instructions" (PDF). AMD. December 22, 2010.

[8] "Striking a balance". Dave Christie, AMD Developer blogs. Archived from the original on 2013-11-09. Retrieved 2012-08-10.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]