How many bytes does the push instruction push onto the stack when I don't specify the operand size?

In this case, how many bytes does the push instruction push onto the stack? Does the number of bytes pushed depends on the operand size (so in my example it will push 1 byte)?

354k 49 49 gold badges 685 685 silver badges 934 934 bronze badges asked Jul 16, 2017 at 11:24 user8315006 user8315006

The native register size, to keep the stack aligned. In 32-bit mode, it will push 4 bytes. In 64-bit mode, it will push 8 bytes.

Commented Jul 16, 2017 at 11:26

BTW, you could have tested this with a debugger. Just single-step the instruction and see how esp / rsp changes. You could also look at the disassembly output and notice that they both assemble to the same machine code.

Commented Jul 16, 2017 at 23:59

2 Answers 2

Does the number of bytes pushed depends on the operand size

It doesn't depend on the value of the number. The technical x86 term for how many bytes push pushes is "operand-size", but that's a separate thing from whether the number fits in an imm8 or not.

(so in my example it will push 1 byte)?

No, the size of the immediate is not the operand-size. It always pushes 4 bytes in 32-bit code, or 64 in 64-bit code, unless you do something weird.

Recommendation: always just write push 123 or push 0x12345 to use the default push size for the mode you're in and and let the assembler pick the encoding. That is almost always what you want. If that's all you wanted to know, you can stop reading now.

First of all, it's useful to know what sizes of push are even possible in x86 machine code:

There are no other options. The stack pointer is always decremented by the operand-size of the push 2 . (So it's possible to "misalign" the stack by pushing 16 bits). pop has the same choices of size: 16, 32, or 64, except no 32-bit pop in 64-bit mode.

This applies whether you're pushing a register or an immediate, and regardless of whether the immediate fits in a sign-extended imm8 or it needs an imm32 (or imm16 for 16-bit pushes). (A 64-bit push imm32 sign-extends to 64-bit. There is no push imm64 , only mov reg, imm64 )

In NASM source code, push 123 assembles to the operand-size that matches the mode you're in. In your case, I think you're writing 32-bit code, so push 123 is a 32-bit push, even though it can (and does) use the push imm8 encoding.

Your assembler always knows what kind of code it's assembling, since it has to know when to use or not use operand-size prefixes when you do force the operand-size.

MASM is the same; the only thing that might be different is the syntax for forcing a different operand-size.

Anything you write in assembler will assemble to one of the valid machine-code options (because the people that wrote the assembler know what is and isn't encodeable), so no, you can't push a single byte with a push instruction. If you wanted that, you could emulate it with dec esp / mov byte [esp], 123

NASM Examples:

Output from nasm -l /dev/stdout to dump a listing to the terminal, along with the original source line.

Lightly edited to separate opcode and prefix bytes from the operands. (Unlike objdump -drwC -Mintel , NASM's disassembly format doesn't leave spaces between bytes in the machine-code hexdump).

 68 80000000 push 128 6A 80 push -128 ;; signed imm8 is -128 to +127 6A 7B push byte 123 6A 7B push dword 123 ;; still optimized to the imm8 encoding 68 7B000000 push strict dword 123 6A 80 push strict byte 0x80 ;; will decode as push -128 ****************** warning: signed byte value exceeds bounds [-w+number-overflow] 

dword is normally an operand-size thing, while strict dword is how you request that the assembler doesn't optimize it to a smaller encoding.

All the preceding instructions are 32-bit pushes (or 64-bit in 64-bit mode, with the same machine code). All the following instructions are 16-bit pushes, regardless of what mode you assemble them in. (If assembled in 16-bit mode, they won't have a 0x66 operand-size prefix)

 66 6A 7B push word 123 66 68 8000 push word 128 66 68 7B00 push strict word 123 

NASM apparently seems to treat the byte and dword overrides as applying to the size of the immediate, but word applies to the operand-size of the instruction. Actually using o32 push 12 in 64-bit mode doesn't get a warning either. push eax does, though: "error: instruction not supported in 64-bit mode".

Notice that push imm8 is encoded as 6A ib in all modes. With no operand-size prefix, the operand size is the mode's size. (e.g. 6A FF decodes in long mode as a 64-bit operand-size push with an operand of -1 , decrementing RSP by 8 and doing an 8-byte store.)

The address-size prefix only affects the explicit addressing mode used for push with a memory-source, e.g. in 64-bit mode: push qword [rsi] (no prefixes) vs. push qword [esi] (address-size prefix for 32-bit addressing mode). push dword [rsi] is not encodeable, because nothing can make the operand-size 32-bit in 64-bit code 1 . push qword [esi] does not truncate rsp to 32-bit. Apparently "Stack Address Width" is a different thing, probably set in a segment descriptor. (It's always 64 in 64-bit code on a normal OS, I think even for Linux's x32 ABI: ILP32 in long mode.)

When would you ever want to push 16 bits? If you're writing in asm for performance reasons, then probably never. In my code-golf adler32, a narrow push -> wide pop took fewer bytes of code than shift/OR to combine two 16b integers into a 32b value.

Or maybe in an exploit for 64-bit code, you might want to push some data onto the stack without gaps. You can't just use push imm32 , because that sign or zero extends to 64-bit. You could do it in 16-bit chunks with multiple 16-bit push instructions. But still probably more efficient to mov rax, imm64 / push rax (10B+1B = 11B for an 8B imm payload). Or push 0xDEADBEEF / mov dword [rsp+4], 0xDEADC0DE (5B + 8B = 13B and doesn't need a register). four 16-bit pushes would take 16B.

Footnotes:

  1. In fact REX.W=0 is ignored, and doesn't modify the operand-size away from its default 64-bit. NASM, YASM, and GAS all assemble push r12 to 41 54 , not 49 54 . GNU objdjump thinks 49 54 is unusual, and decodes it as 49 54 rex.WB push r12 . (Both execute the same). Microsoft agrees as well, using a 40h REX as padding on push rbx in some Windows DLLs. Intel just says that 32-bit pushes are "not encodeable" (N.E. in the table) in long mode. I don't understand why W=1 isn't the standard encoding for push / pop when a REX prefix is needed, but apparently the choice is arbitrary. Fun-fact: only stack instructions and a few others default to 64-bit operand size in 64-bit mode. In machine code, add rax, rdx needs a REX prefix (with the W bit set). Otherwise it would decode as add eax, edx . But you can't decrease the operand-size with a REX.W=0 when it defaults to 64-bit, only increase it when it defaults to 32. http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix lists the instructions that default to 64-bit in 64-bit mode. Note that jrcxz doesn't strictly belong in that list, because the register it checks (cx/ecx/rcx) is determined by address-size, not operand-size, so it can be overridden to 32-bit (but not 16-bit) in 64-bit mode. loop is the same. It's strange that Intel's instruction reference manual entry for push (HTML extract: http://felixcloutier.com/x86/PUSH.html) shows what would happen for a 32-bit operand-size push in 64-bit mode (the only case where stack address width can be 64, so it uses rsp ). Perhaps it's achievable somehow with some non-standard settings in the code-segment descriptor, so you can't do it in normal 64-bit code running under a normal OS. Or more likely it's an oversight, and that's what would happen if it was encodeable, but it's not.
  2. Except segment registers are 16-bit, but a normal push fs will still decrement the stack pointer by the stack-width (operand-size). Intel documents that recent Intel CPUs only do a 16b store in that case, leaving the rest of the 32 or 64b unmodified. x86 doesn't officially have a stack width that's enforced in hardware. It's a software / calling convention term, e.g. char and short args passed on the stack in any calling conventions are padded out to 4B or 8B, so the stack stays aligned. (Modern 32 and 64-bit calling conventions such as the x86-32 System V psABI used by Linux keep the stack 16B aligned before function calls, even though an arg "slot" on the stack is still only 4B). Anyway, "stack width" is only a programming convention on any architecture. The closest thing in the x86 ISA to a "stack width" is the default operand-size of push / pop . But you can manipulate the stack pointer however you want, e.g. sub esp,1 . You can, but don't for performance reasons :P