ASM x86 – Loops & Delays

August 24, 2019 · 4 min read

author

Hi, in the fourth part of my series, I want to talk about loops and delays.

We saw in the last post how jumps work and how they can be linked to conditions. That's what we're going to exploit today to build ourselves loops and delays.

Loops and delays are basically the same, except that with a delay only the CPU is busy. In a loop we execute code during the delay.

A loop looks like this:

section .text

global _start:

_start:
    xor ecx, ecx

_loop:
    cmp ecx, 0xa 
    je _end
    inc ecx
    jmp _loop

_end:
    mov edx, 0xc0cac01a

Here in this code example, I have built a Simple For loop which runs 10 times, and then sets the EDX register to 0xC0CAC01A. We see that at the beginning the register ECX is cleared. We use this register as a counter in the loop. We jump already at the program start, directly into the loop, and check if our ECX register has the value 0xa $\Leftrightarrow 10_{10}$ . If this is the case, we exit the loop. If this is not the case, we increase the value in the register by 1, and go back to the beginning of the loop.

Exiting a loop can be realized by a jump in another code section. Here in the case we jump to the section _end. At the end of the program, we set the register EDX to 0xC0CAC01A ("Coca Cola").

It is also possible to realize a While loop, but here we have to sacrifice a memory location.

section .text

global _start:

_start:
	xor ecx, ecx

_loop:
	cmp ecx, 0x1
	je _end
	;do staff
	jmp _loop


_end:
	mov edx, 0xc0cac01a

Here is a small example of a While loop, here we run in the loop until an event, sets the register ECX to 1. When ECX, is at 1, it will jump out of the loop. In the comment "do staff" then code can be executed, which at some point, sets the register ECX to 1. It is smarter to simply use a memory location on the stack or similar instead of a register.

We can also build an infinity loop by simply omitting the condition. This would look something like this.

section .text

global _start:

_start:
	xor ecx, ecx

_loop:
	;do staff
	jmp _loop


_end:
	mov edx, 0xc0cac01a

If we look closely at loops, we see that depending on how many times we run through the loop, this can take a lot of time. And this is exactly how we can build delays, by simply counting up the register values. Here we have to keep in mind that each CPU model has a different clock rate. So we have to see how many times per second the CPU oscillates or clocks. So if we have a CPU that clocks at 2.5GHz, to get to one second, we have to divide that value by 10, and spread it over 2 loops.

Normally delays are calculated using the clocks per instruction, but I'll leave that out for now.

section .text

global _start:

_start:
    xor ecx, ecx
    xor edx, edx

_loop_10:
    inc ecx
    cmp ecx, 0xa
    je _end
    xor edx, edx
    je _loop_5
    jmp _loop_10

_loop_inner:
    cmp edx, 0x7735940
    je _loop_10
    inc edx
    jmp _loop_inner

_end:
    push 0xc0cac01a

Here we see that we have divided the processor clock by 10. We find this value in the program section _loop_inner. This loop runs a total of $125 \times 10^{6}$ times, and is called 10 times by the outer loop _loop_10. What does call mean, we jump from one loop to the other. Once the inner loop is done, we just jump back to the outer loop. And do the whole thing again from the beginning. If we run this program, we see that it took about 1 second for the program to finish. We can't determine the time exactly, but we can approximate it. I made a small example video where we can see the delay. I made a rough measurement with the command "date".

I hope you enjoyed it and we'll see you again in the next post 😄