How Many x86-64 Instructions Are There Anyway?
x86 is an enormously popular instruction set that is used on most desktop computers and servers (but usually not on mobile devices like phones). Given this wide use, it might seem like an easy question to ask how many x86 instructions there are, but it turns out this is more intricate than it looks.
Background
x86 is the name of a whole family of so-called instruction set architectures (ISA for short). The ISA defines the available instructions of a CPU, but also the registers, the memory architecture, how interrupts are handled, and much more. It is different from the microarchitecture, though, which determines how the ISA is implemented. Different microarchitectures can implement the same ISA, which is what happens with different AMD and Intel microarchitectures implementing essentially the same ISA. Since the introduction of x86 by Intel in 1978, the architecture has evolved extensively, but with almost full backward compatibility. Partly for this reason, the instruction set is full of weird corner-cases and historical artifacts. Here, I will look at the 64-bit variant (sometimes called x86-64) of the ISA and attempt to count the number of instructions. Specifically, the ISA from the Haswell microarchitecture, one of the latest ISA available at the time of writing. Due to the backward compatibility, this instruction set still includes most of the instructions from the very first generation of x86, and anything in between.
Attempts 1 and 2: Mnemonics
Machine instructions (the ones and zeros that tell your computer what to do next) have a portion called the opcode that determines what operation is performed. A mnemonic is a symbolic name for these opcodes. For instance, to add one to the eax
register, one could use the instruction addl $1, %eax
. Here, addl
is the mnemonic, and so we can count the number of unique mnemonics to determine the number of instructions. I've written a small program to do this:
Great, so we are done and there are 1,279 instructions, right? Actually, the story is more complicated. Those familiar with assembly will have realized that the l
in addl
is a suffix that indicates the bit width of the operands. For instance, to add one to a 16-bit register (instead of the 32-bit register eax
), one would use addw $1, %ax
. Both of these instructions are doing addition, maybe we should not count them separately. In fact, had we used Intel syntax for the instructions, then the same operation would be written as add eax, 1
(note the absence of a width suffix, and the inverted ordering of operands). Let's try that:
Attempt 3: Mnemonics and Operand Types
If we look a little closer at mnemonics, we find that the situation is actually not so simple. Let's again look at the two instructions to add a constant to a 16 and 32-bit register, addw $0x1, %ax
and addl $0x1, %eax
, respectively1. It turns out, that on 64-bit machines we consider here, they actually have relatively different semantics. The former just adds one to ax
.
On the other hand, addl $0x1, %eax
also does the addition, but additionally zeros out the upper 32 bits of rax
. Here is an example illustrating this behavior:
Note that the 32-bit addition on the right zeroes out the upper 32 bits of rax
(marked in red), but the 16-bit addition (as well as the 8-bit addition not shown here) don't do that. That doesn't really seem like the same instruction any longer, even if both have the same (Intel) mnemonic. For another example why mnemonics can be misleading consider the instruction to perform a 32-bit signed multiplication: imull
. There is a variant that multiplies two registers, and there is a variant that multiplies two registers and a constant2. Again, even though both of these share the same mnemonic, they hardly seem like the same instruction (one multiplies two numbers, the other three numbers).
Therefore, maybe we should really count all the different combinations of mnemonic as well as the argument types. For instance, addl $1, %eax
would be add_imm32_r32
. This ensures that we correctly identify the examples shown as different instructions, and leaves us with 3,683 instructions:
Attempt 4: Mnemonics and Operand Widths
Now we have a lot of instructions, and some of them seem like they are not really different instructions. For instance, the instruction to add a constant to a 32-bit register and the instruction to add a constant to a 32-bit memory location are counted separately, when the operation they perform could be argued to be identical. Thus, maybe a better way would be to consider the mnemonic in addition to the operand width. Our running example would be add_32_32
, regardless of whether the operand is a register, memory location or constant. This way, we end up with 2,034 instructions:
Attempt 5: Counting everything (added 2018-08-09)
All the attempts so far assume that we actually know what instructions a given CPU has, and that there is a textual representation for these instructions.
However, these assumptions might be less well-founded than it seems, as people have discovered hidden and undocumented instructions in modern CPUs.
The most thorough examination of the space of all instructions has been done by Christopher Domas with sandsifter,
where he systematically explores the space of bit-level encoded instructions. Through clever tricks he is able to tell which bit
sequences make up valid instructions, and which don't. This has the advantage of making very few assumptions and in principle allows
us to count all bit sequences that denote valid instructions. Note that this will
count addl $1, %eax
and addl $2, %eax
as separate
instructions, because assembled as bits they are 0x83, 0xC0, 0x01
and 0x83, 0xC0, 0x02
,
respectively. And there is another problem: there are actually too many instructions to count them all this way. On x86-64, instructions can be up to 15 bytes long, making far to many to efficiently count them.
But even if we don't get an actual number here, it is another interesting way to (at least in principle if not in practice) count instructions. Christophers whitepaper is worth a read for much more details on his approach and findings.
It's All Wrong
Unfortunately, there are even more weird corner cases and historical artifacts in x86 that make counting instructions hard. For instance, we used mnemonics to group the same "mathematical operation" regardless of the operands, but it turns out that doing a bitwise xor
operation is done using xor
(in Intel syntax), but also pxor
, vpxor
, vxorpd
, vxorps
, xorpd
, and xorps
depending on the operand size and other things (and even more in AT&T syntax)3. So if attempt 2 tries to group the same operation regardless of the operation, then that count is actually too high.
And on the other end of the spectrum, we tried to count every different operation separately in attempt 3. However, it turns out there are instructions that really are a family of instructions. For instance, consider cmpsd
that compares two floating point numbers. A third operand, an 8-bit constant, determines if the comparison is for equality, inequality, if the first number is larger, and 5 more variants. These clearly seem like different instructions, yet we have counted them as a single one.
Unfortunately, I can't think of an easy automatic way of counting such nuances, so I will leave it at the numbers I have.
Even more to the extreme we could go with attempt 5, where I suggested counting every valid bit sequeence and point to some work that has attempted to move in that directly. Unfortunately there are too many to actually get a count, but it is an interesting way to think about the problem nonetheless.
Conclusions
As I have hopefully convinced you by now, this innocent looking question is more difficult to answer than it might seem. It depends on your notion of what an instruction is, and is complicated by the many intricacies of the complex x86-64 instruction set. Here is an overview of possible answers:
Measure | Count | Comment |
---|---|---|
AT&T mnemonic (e.g., addl ) |
1,279 | Counts the number of unique mnemonics in AT&T syntax. |
Intel mnemonic (e.g., add ) |
981 | This is a rough estimate of the number of different kinds of operations the x86 instruction set can perform, ignoring the operand type and size. There are various caveats as described earlier, such as some operations having multiple different mnemonics. |
Mnemonic and operand types (e.g., add_r32_imm32 ) |
3,683 | Distinguishes instructions in every way possible and is thus a sort of upper bound on the number of instructions. If you are looking for a very fine-grained notion of instruction, this is a good measure. |
Mnemonic and operand width (e.g., add_32_32 ) |
2,034 | This is an estimate of the number of different kinds of instructions, if an operation on 8 bits is considered to be different from the same operation on 16 bits (because sometimes, these actually have different semantics, as shown with the zeroing of the upper 32 bits for the addl instruction). Does not distinguish instructions that operate on registers vs. constants vs. memory locations. |
Every valid bit sequence (e.g., 0x83, 0xC0, 0x01 ) |
? | Counts every bit seqeuence that makes a valid x86 instruction. Unfortunately there are too many to count. |
Personally, the 4th attempt seems to capture my intuitive understanding of what an instruction is the closest. So, if I ever have to cross the bridge of death and get asked how many x86-64 instructions there are, I shall answer 2,034.
All the numbers in this blog post (except for attempt 5) have been obtained through a small program making use of our awesome C++11 library for working with x86-64 assembly. Just like the library, my program is available as open source on GitHub.
-
From here on I will use AT&T syntax because that's what I'm used to. It will not have any effect on the numbers. ↩
-
Actually, the situation is even more complicated for
imull
(no surprise at this point). There are seven different variants: The first one,imull_r32
, just takes a single register and useseax
as an implicit input, and stores 64 bits of output. Next,imull_r32_r32
andimull_r32_r32
multiply two 32-bit numbers and store 32 bits of output, and finallyimull_r32_m32_imm8
,imull_r32_r32_imm8
,imull_r32_r32_imm32
, andimull_r32_m32_imm32
multiply two registers (or memory locations) as well as a constant (8 or 32 bits) and result in 32 output bits. ↩ -
Some of these different mnemonics to perform a bitwise exclusive or operation are due to the operand size. It turns out that for newer instructions that work on the
xmm
andymm
vector registers, different mnemonics depending on the operand size, both in AT&T as well as in Intel syntax. For the bitwise operations, there are also different mnemonics depending on the type of data (integer, single precision floating point or double precision floating point), even though the operation is on the bit level and, therefore, the same for all types (yes, different mnemonics perform semantically exactly the same operation). This is because internally, processors may have several pipelines for different data types for efficiency reasons, and thus using a bitwise mnemonics of the wrong type will compute the result correctly, but may be slower because the data has to be moved from one pipeline to another. ↩
Questions or comments? Send me an email or find me on twitter @stefan_heule.