Skip to main content

ASM x86 Introduction

· 6 min read
Strider

Hi, I thought I would do something about assembler on the x86 architecture. Assembler is the instruction set of a CPU that is provided to us. This language is very primitive but still very intuitive. This is a very hardware oriented language.

Assembler itself can be seen like an API of the CPU, because as already mentioned, we only use the instruction set of the CPU, and that's exactly what assembler is. In x86 assembler, we have a large instruction set. This may be due to the fact that there are many instructions that are hard to remember, and also because in x86 we usually have a CISC processor in front of us.

What is CISC? (In a nutshell)

CISC stands for Complex Instruction Set Computing. Here we have not only simple instructions but also many complex instructions. CISC is often found in the Von Neumann architecture, which is today almost every PC you have at home. Here it is so that not all but many commands were processed after more than one processor cycle. This means that a command may need more than one cycle.

What is RISC? (In a nutshell)

RISC stands for Reduced Instruction Set Computing. As the name suggests, this is a reduced instruction set that is often found in the Harvard architecture. The Harvard architecture is also present in an x86 CPU, e.g. in cache management. Here, however, we also have the advantage that an instruction was processed after one cycle. This means it is faster than CISC.

What is Harvard or Von-Neumann?

Both computer architectures, which differ essentially in the memory management and buses. In the Von Neumann architecture, program and data share a memory, which runs over a common data bus and address bus.

dia1.png

In the Harvard architecture we have separate memories for program and data. Both memories have their own data bus but a common address bus. This architecture can be found in AVR/Arduino microcontrollers.

dia2.png

Registers

To get started with assembler, we need to know how the CPU is constructed. In a x86 32bit CPU we have registers which are needed to process the instructions. In the picture below we see the most important registers, there are more registers in the CPU but I will talk about them in the next posts. For the beginning the 10 registers from the picture are enough. Each register is principally 32Bit long.

dia3.png

These registers are again divided into 3 groups. The first group is "General purpose registers". Here are the registers EAX, EBX, ECX, EDX, ESI and EDI.

The registers EAX to EDX are multi-purpose registers, which individually offer special functions.

  • EAX: Accumulator register is a preferred target for arithmetic operations with the accumulator (arithmetic unit).
  • EBX: Base register, is or was used for addressing the initial addresses.
  • ECX: (Count Register), is or was used as a counter for loops or shift operations.
  • EDX: (Data register), is used to store the 2nd operand.

Today all 4 mentioned registers can also be used in this way.

The registers EDI and ESI are two special registers of the group "General purpose registers", because they are often used as indices for e.g. arrays. ESI is the SourceIndex and EDI the DestinationIndex.

The 2nd group is called "Pointerregister", because here are registers, which only show addresses. This group includes the registers ESP, EIP and EBP.

The register ESP always points to the upper element of the stack, therefore it is also called stack pointer. The stack pointer can be overwritten manually. Normally, if no manual intervention takes place, it is incremented automatically as soon as an element is dropped or decremented again when the upper element is fetched from the stack.

The EIP register also called instruction pointer or command counter always points to the next command to be executed. This cannot be overwritten manually with simple commands such as mov add inc sub etc.... This is in the broadest sense only readable. After each execution of a command, the instruction pointer is incremented by the next address. At jumps, e.g. at the command jmp 0xdeadbeef, the address given in jmp is loaded into the instruction pointer. And the whole game goes on like this.

In addition to the instruction pointer we have the base pointer, which has stored the return address in case of a function call. As soon as a function is called, the current value of the instruction pointer is loaded into the base pointer. When we are done in the function the CPU has to know from where we are coming from, and this is exactly what the basepointer is for. But the basepointer is also there for something else, because you have to ask yourself how the CPU can process multiple programs without problems. The answer is simple, each program has its own block where stack and heap, as well as the instructions and data are contained. Each block has a start and end address. The start address of a block is stored in the base pointer. So the CPU knows which program, which instruction, which memory locations.

The last group is the flag register. In this register, the values (Carry, Parity, Adjust, Zero, Sign, Trap, Interrupt, Direction, Overflow) are set. Each bit in the register represents a flag. These are set after every single operation, e.g. if 1-1 would calculate, the Zero flag would be set. If we work with negative numbers, for example, the Sign flag is set.

Register sizes

Now that we know which register does what exactly, let's look at the division of a register. Each register I mentioned is 32Bit long, but this is only the case when we work on 32Bit level. Normally we have to deal with 16Bit registers.

dia4.png

Here we see a register in the picture (e.g. EAX). The register is found at 32Bit with the name EAX (Extended AX). If it is not 32Bit but smaller this register is called AX, and is also only 16Bit long. This register is again divided into two bytes (8Bit), AH(AX-High) and AL(AX-Low). The register AH delivers the high byte and AL as the low byte. AL, AH, AX, and EAX can be read and written in assembler, just like the other registers except EIP and Flags.

I hope I could give you a little introduction to assembler, and see you in the next post 😄