[PoP:Rev] Reverse Engineering x86 ELF: 0x2

New Concepts Covered


In this article, we will look at the a program that reads in 3 integers from user input, feeds it into a function to add them, and prints the sum of the integers.

Source Code

#include <stdio.h>

int add(int x, int y, int z)
{
	int i = 0;

	/* Please do not do the following.
	   It's vulnerable to integer overflow. */

	i = x + y + z;
	
	return i;
}

int main()
{
	int x, y, z;

	printf("Enter 3 integers: ");
	scanf("%d %d %d", &x, &y, &z);

	printf("The sum is %d\n", add(x, y, z));

	return 0;
}

Disassembly

Dump of assembler code for function main:
   0x0804848d <+0>:	push   ebp
   0x0804848e <+1>:	mov    ebp,esp
   0x08048490 <+3>:	sub    esp,0xc
   0x08048493 <+6>:	push   0x8048570
   0x08048498 <+11>:	call   0x8048330 <[email protected]>
   0x0804849d <+16>:	add    esp,0x4
   0x080484a0 <+19>:	lea    eax,[ebp-0xc]
   0x080484a3 <+22>:	push   eax
   0x080484a4 <+23>:	lea    eax,[ebp-0x8]
   0x080484a7 <+26>:	push   eax
   0x080484a8 <+27>:	lea    eax,[ebp-0x4]
   0x080484ab <+30>:	push   eax
   0x080484ac <+31>:	push   0x8048583
   0x080484b1 <+36>:	call   0x8048350 <[email protected]>
   0x080484b6 <+41>:	add    esp,0x10
   0x080484b9 <+44>:	mov    ecx,DWORD PTR [ebp-0xc]
   0x080484bc <+47>:	mov    edx,DWORD PTR [ebp-0x8]
   0x080484bf <+50>:	mov    eax,DWORD PTR [ebp-0x4]
   0x080484c2 <+53>:	push   ecx
   0x080484c3 <+54>:	push   edx
   0x080484c4 <+55>:	push   eax
   0x080484c5 <+56>:	call   0x804846b <add>
   0x080484ca <+61>:	add    esp,0xc
   0x080484cd <+64>:	push   eax
   0x080484ce <+65>:	push   0x804858c
   0x080484d3 <+70>:	call   0x8048330 <[email protected]>
   0x080484d8 <+75>:	add    esp,0x8
   0x080484db <+78>:	mov    eax,0x0
   0x080484e0 <+83>:	leave
   0x080484e1 <+84>:	ret
End of assembler dump.

Dump of assembler code for function add:
   0x0804846b <+0>:	push   ebp
   0x0804846c <+1>:	mov    ebp,esp
   0x0804846e <+3>:	sub    esp,0x4
   0x08048471 <+6>:	mov    DWORD PTR [ebp-0x4],0x0
   0x08048478 <+13>:	mov    edx,DWORD PTR [ebp+0x8]
   0x0804847b <+16>:	mov    eax,DWORD PTR [ebp+0xc]
   0x0804847e <+19>:	add    edx,eax
   0x08048480 <+21>:	mov    eax,DWORD PTR [ebp+0x10]
   0x08048483 <+24>:	add    eax,edx
   0x08048485 <+26>:	mov    DWORD PTR [ebp-0x4],eax
   0x08048488 <+29>:	mov    eax,DWORD PTR [ebp-0x4]
   0x0804848b <+32>:	leave
   0x0804848c <+33>:	ret
End of assembler dump.

Local Variables

0x0804848d <+0>:	push   ebp
0x0804848e <+1>:	mov    ebp,esp
0x08048490 <+3>:	sub    esp,0xc

Similar to the previous article, the disassembly starts off with a function prologue. However, there is an additional sub instruction in this scenario. In the above, esp is being decremented by 0xc, which is 12 in decimal. If we look back at the source code for main, we see that three integers (x, y, z) are initialized at the start of the function. Since integers are 4 bytes, sub esp, 0xc is the program’s attempt at allocating space for local variables on the stack for those 3 integers.

Calling Functions

When doing reverse engineering, I generally find it useful to start off by looking for call instructions. This is because programs are essentially a sequence of function calls. Therefore, splitting it into chunks of call instructions would allow us to get a good idea of what is going on.

After finding each call instruction, I would then look at the preceding lines of code to find all the corresponding push instructions. By doing so, we are effectively identifying all the parameters passed into the forementioned function call.

Let’s use this strategy to continue our analysis of main.

printf

0x08048493 <+6>:	push   0x8048570
0x08048498 <+11>:	call   0x8048330 <[email protected]>
0x0804849d <+16>:	add    esp,0x4

From the above call chunk, we are able to derive the following line of code,

printf(0x8048570);

If we look at the documentation for printf, we know that it takes in a parameter of type char *. This would be the (format) string that will be printed. So if we go with the assumption that 0x8048570 is the address of the string to be printed and verify it using gdb,

Note: Most disassemblers will resolve this for you by default.

gef➤  print (char *) 0x8048570
$1 = 0x8048570 "Enter 3 integers: "

we will find that it’s as we expect. Hence, we have effectively reversed the following line of code:

printf("Enter 3 integers: ");

Immedately after the call instruction, add esp, 0x4 is called. This is done to clean up the parameters that were pushed onto the stack to perform the function call. In this scenario, we only push one parameter onto the stack (4 bytes), so esp is incremented by 4. This cleanup is done in accordance to the cdecl calling convention whereby the caller main, performs cleanup on the behalf of the callee printf.

scanf

0x080484a0 <+19>:	lea    eax,[ebp-0xc]
0x080484a3 <+22>:	push   eax
0x080484a4 <+23>:	lea    eax,[ebp-0x8]
0x080484a7 <+26>:	push   eax
0x080484a8 <+27>:	lea    eax,[ebp-0x4]
0x080484ab <+30>:	push   eax
0x080484ac <+31>:	push   0x8048583
0x080484b1 <+36>:	call   0x8048350 <[email protected]>
0x080484b6 <+41>:	add    esp,0x10

In this example, we see three repetitions of lea followed by push. If we refer to intel manual, we will see that lea is an initialism for load effective address. In short, lea r32, m means that the address of m will be loaded into the 32-bit register r32.

In this case, we are loading the address of [ebp-0xc] into the register eax. Note that surrounding our operand with square brackets references the value stored at that address. Therefore, lea eax, [ebp-0xc] translates to: load the effective address of the value stored at ebp-0xc into the 32-bit register eax.

If we recall, a sub esp, 0xc was done in the function prologue to make space for local variables. Currently, we are pushing the addresses of [ebp-0xc], [ebp-0x8], and [ebp-0x4] which are our three local variables onto the stack. This is followed by 0x8048583 which is the address of the format string %d %d %d.

More observant readers would have noticed at this point that while scanf() expects a format string followed by the corresponding addresses to read data into, the 3 integer addresses were pushed onto the stack first, followed by the format string. This is due to cdecl, which specifies that parameters for function calls should be pushed from right to left.

This time, add esp, 0x10 is done for cleanup because we pushed 4 variables onto the stack.

Based on this information, we would have reversed even more of this program.

int x, y, z;

printf("Enter 3 integers: ");
scanf("%d %d %d", &x, &y, &z);

add

0x080484b9 <+44>:	mov    ecx,DWORD PTR [ebp-0xc]
0x080484bc <+47>:	mov    edx,DWORD PTR [ebp-0x8]
0x080484bf <+50>:	mov    eax,DWORD PTR [ebp-0x4]
0x080484c2 <+53>:	push   ecx
0x080484c3 <+54>:	push   edx
0x080484c4 <+55>:	push   eax
0x080484c5 <+56>:	call   0x804846b <add>
0x080484ca <+61>:	add    esp,0xc

Based on what we have done so far, we can quickly translate the above to

add(x, y, z);

What we are more interested to discuss lies in the following snippet from the disassembly of the add function.

0x08048471 <+6>:	mov    DWORD PTR [ebp-0x4],0x0
0x08048478 <+13>:	mov    edx,DWORD PTR [ebp+0x8]
0x0804847b <+16>:	mov    eax,DWORD PTR [ebp+0xc]
0x0804847e <+19>:	add    edx,eax
0x08048480 <+21>:	mov    eax,DWORD PTR [ebp+0x10]
0x08048483 <+24>:	add    eax,edx
0x08048485 <+26>:	mov    DWORD PTR [ebp-0x4],eax
0x08048488 <+29>:	mov    eax,DWORD PTR [ebp-0x4]
0x0804848b <+32>:	leave
0x0804848c <+33>:	ret

If we are familiar with how stack frames work, we would be able to tell that [ebp+0x8], [ebp+0xc], and [ebp+0x10] are the three parameters passed into add, namely x, y, and z. From lines +13 to +24, we can see that eax = x + y + z after line +24 executes.

In the next two lines, the value in eax is copied to the local variable ebp-0x4 and then copied back into eax. Going back to the previous article, we know that this is because eax will contain the return value by convention.

Based on this, we will be able to add on to what we have reversed.

int add(int x, int y, int z)
{
	int i = 0;
	
	i = x + y + z;

	return i;
}

int main()
{
	int x, y, z;

	printf("Enter 3 integers: ");
	scanf("%d %d %d", &x, &y, &z);

	add(x, y z);
}

Let’s continue with the next few lines in main,

0x080484cd <+64>:	push   eax
0x080484ce <+65>:	push   0x804858c
0x080484d3 <+70>:	call   0x8048330 <[email protected]>
0x080484d8 <+75>:	add    esp,0x8

Since we know that eax contains the return value of add(x, y, z) and 0x804858c contains the format string The sum is %d\n, we can derive the following code.

int add(int x, int y, int z)
{
	int i = 0;
	
	i = x + y + z;

	return i;
}

int main()
{
	int x, y, z;

	printf("Enter 3 integers: ");
	scanf("%d %d %d", &x, &y, &z);

	printf("The sum is %d\n", add(x, y z));
}

and if we account for the following function epilogue,

0x080484db <+78>:	mov    eax,0x0
0x080484e0 <+83>:	leave
0x080484e1 <+84>:	ret

we would have effectively reversed the following C code from the disassembly.

int add(int x, int y, int z)
{
	int i = 0;
	
	i = x + y + z;

	return i;
}

int main()
{
	int x, y, z;

	printf("Enter 3 integers: ");
	scanf("%d %d %d", &x, &y, &z);

	printf("The sum is %d\n", add(x, y z));

	return 0;
}