Assembly: Purpose of loading the effective address before a call to a function?

Related searches

Source C Code:

 int main()
    {
      int i;
      for(i=0, i < 10; i++)
      {
        printf("Hello World!\n");
      }
    }

Dump of Intel syntax x86 assembler code for function main:

  1.  0x000055555555463a <+0>:     push   rbp
  2.  0x000055555555463b <+1>:     mov    rbp,rsp 
  3.  0x000055555555463e <+4>:     sub    rsp,0x10
  4.  0x0000555555554642 <+8>:     mov    DWORD PTR [rbp-0x4],0x0
  5.  0x0000555555554649 <+15>:    jmp    0x55555555465b <main+33>
  6.  0x000055555555464b <+17>:    lea    rdi,[rip+0xa2]    # 0x5555555546f4
  7.  0x0000555555554652 <+24>:    call   0x555555554510 <puts@plt>
  8.  0x0000555555554657 <+29>:    add    DWORD PTR [rbp-0x4],0x1
  9.  0x000055555555465b <+33>:    cmp    DWORD PTR [rbp-0x4],0x9
  10. 0x000055555555465f <+37>:    jle    0x55555555464b <main+17>
  11. 0x0000555555554661 <+39>:    mov    eax,0x0
  12. 0x0000555555554666 <+44>:    leave  
  13. 0x0000555555554667 <+45>:    ret    

I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.

I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.

1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).

My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.

So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?

2nd Question: What is the purpose of line 11? (mov eax,0x0?) I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.

The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?

The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)

The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie

mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.

If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).

Assembly 1: Basics – CS 61, Registers are the fastest kind of memory available in the machine. x86-64 has 14 This table gives all the basic registers, with special-purpose registers is set to a new value if a condition is true), and function call and return instructions. form to compute addresses, thanks to the lea (Load Effective Address) instruction. Implementation of Call" Instruction" Effective Operations" pushl src subl $4, %esp movl src, (%esp) popl dest movl (%esp), dest addl $4, %esp call addr pushl %eip jmp addr ESP before call 0 Note: can’t really access EIP directly, but this is implicitly what call is doing Call instruction pushes return address (old EIP) onto stack

This:

lea    rdi,[rip+0xa2]

Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).

Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.

In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.

The purpose of:

mov eax,0x0

Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.

Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.

Guide to x86 Assembly, Contents: Registers | Memory and Addressing | Instructions | Calling Convention in x86 assembly using special assembler directives for this purpose. mov (% ebx), %eax, /* Load 4 bytes from the memory address in EBX into EAX. it would have needed to save them on the stack before the call and restore them after it. lea — Load effective address The lea instruction places the address specified by its second operand into the register specified by its first operand. Note, the contents of the memory location are not loaded, only the effective address is computed and placed into the register. This is useful for obtaining a pointer into a memory region.

If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.

There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU

On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.

EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.

The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions

Here you have more interesting cases https://godbolt.org/z/M4MeGk

[PDF] Assembly Language: Function Calls, Goals of this Lecture CPU fetches source operand from memory at address i addl $4, %esp call addr pushl %eip jmp addr ret pop %eip. ESP before ret. 0. Load effective address •Compute address, save in register, do not access memory •LEA: immediate mode CSE2405-15 PC-Relative Addressing Mode Want to specify address directly in the instruction •But an address is 16 bits, and so is an instruction! •After subtracting 4 bits for opcode and 3 bits for register, we have 9 bits available for

CS 131/CSCI 1310: Fundamentals of Computer Systems, Lecture 8: Assembly Language, Calling Convention, and the Stack The %rip register on x86-64 is a special-purpose register that always holds the memory address of the Deviations from sequential instruction execution, such as function calls, loops, Since %rax does not change before the conditional branch, it will be� If you call LoadLibrary with the name of an assembly without a path specification and the assembly is listed in the system compatible manifest, the call is automatically redirected to the side-by-side assembly. The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count.

Assembly - Addressing Modes, Assembly - Addressing Modes - Most assembly language instructions require operands to be Register addressing; Immediate addressing; Memory addressing For example, look at the following definitions that define tables of data − (stdout) mov eax,4 ;system call number (sys_write) int 0x80 ;call kernel mov [name],� LDR is not only used to load data from memory into a register. Sometimes you will see syntax like this:.section .text .global _start _start: ldr r0, =jump /* load the address of the function label jump into R0 */ ldr r1, =0x68DB00AD /* load the value 0x68DB00AD into R1 */ jump: ldr r2, =511 /* load the value 511 into R2 */ bkpt

Load Effective Address calculates its src operand in the same way as the mov instruction does, but rather than loading the contents of that address into the dest operand, it loads the address itself. lea can be used not only for calculating addresses, but also general-purpose unsigned integer arithmetic (with the caveat and possible advantage

Comments
  • Thanks for fixing the same problem in your answer on How does this program know the exact location where this string is stored?. At the time I didn't manage to convince you to change it, but I wasn't going to let the same error happen again :P
  • @PeterCordes yeah, I only realized what you meant now :') sorry about that.