In the previous post, my first instinct figuring out how to make a Linux syscall in assembly was to look at specs and documentation, but a former colleague wrote in to ask why I didn't just look at the output of GCC. Also, why not just return 110
from main
rather than calling exit(110)
? These are good questions, so let's discuss them!
Philosophically speaking, knowledge I get from reading specs and knowledge I get from reverse engineering feel different; from specs I expect to get a full picture of how things are supposed to work, and from reverse engineering I expect to see a particular example of how things do work in at least some cases. It really depends on the situation which of these two approaches is more useful or true! And even though I prefer knowledge from specs, reverse engineering can be a good way to figure out what specs to even look at and how to interpret them.
Practically speaking, when I tried looking at the output of GCC for this particular problem, it wasn't very useful:
$ cat >with_exit.c <<'EOF'
#include <stdlib.h>
int main(void) { exit(110); }
EOF
$ gcc -S with_exit.c -o /dev/stdout
.file "with_exit.c"
.text
.globl main
.type main, @function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $110, %edi
call exit
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
The two relevant lines are movl $110, %edi
and call exit
, but that's like saying "to exit, call the exit function", which isn't very enlightening. The problem is that in C, every syscall is wrapped by a standard library function, and what we're looking for is inside there rather than in our program. I thought it would maybe get inlined if I turned on optimizations, but no:
$ gcc -O2 -S with_exit.c -o /dev/stdout
.file "with_exit.c"
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.section .text.startup,"ax",@progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB15:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $110, %edi
call exit
.cfi_endproc
.LFE15:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
What about using return
instead? I got similarly unenlightening results:
$ cat >with_return.c <<'EOF'
int main(void) { return 110; }
EOF
$ gcc -S with_return.c -o /dev/stdout
.file "with_return.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $110, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
$ gcc -O2 -S with_return.c -o /dev/stdout
.file "with_return.c"
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.section .text.startup,"ax",@progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
movl $110, %eax
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
In both the unoptimized and the optimized version, return 110
simply translates to the two instructions movl $110, %eax
and ret
(return).
This is as far as GCC gets us, but in the spirit of reverse engineering, let's turn to GDB for help!
I thought it would be reasonably simple to use GDB to step through all the assembly instructions that happen when exit(110)
gets called, and it is simple (the only GDB knowledge I used was the commands break main
to set up a breakpoint, run
to get to it, display/i $pc
to show the next assembly instruction at each step, and stepi
to start stepping through the code), but it's also extremely long. You can see the output of my GDB session if you're curious, but be warned that it's stepping through about 1375 instructions. As far as I can tell, some of the reasons are:
exit
even exists, where it lives in memory, etc., and this seems to involve hash tables and string comparisons.
exit
does a bunch of cleanup:
atexit
or on_exit
and call them.
In the end though, it does use the syscall
instruction to invoke the Linux kernel. As for the arguments passed to the kernel, I'm not sure I would want to extract them from this GDB session.
So that's what happens when calling exit
, but what if we just return
from main
? We saw above that this turns into the ret
instruction, but that can't be used to jump back into the kernel directly, so there has to be something implicitly running after main
. This can be confirmed by looking at some of the ELF information:
$ gcc with_return.c -o with_return
$ readelf --file-header --symbols with_return
ELF Header:
...
Entry point address: 0x4003e0
...
Symbol table '.symtab' contains 66 entries:
Num: Value Size Type Bind Vis Ndx Name
...
59: 00000000004003e0 42 FUNC GLOBAL DEFAULT 14 _start
...
61: 00000000004004d6 11 FUNC GLOBAL DEFAULT 14 main
...
The first instructions from this program that get executed are at the entry point 0x4003e0
, which is the address of the symbol _start
, not main
. So _start
is responsible for doing a bunch of setup before actually calling main
. And once main
is done, it returns to whatever _start
was doing for the cleanup. We can see this in GDB:
$ gdb with_return
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from with_return...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x4004da
(gdb) run
Starting program: with_return
Breakpoint 1, 0x00000000004004da in main ()
(gdb) display/i $pc
1: x/i $pc
=> 0x4004da <main+4>: mov $0x6e,%eax
(gdb) stepi
0x00000000004004df in main ()
1: x/i $pc
=> 0x4004df <main+9>: pop %rbp
(gdb)
0x00000000004004e0 in main ()
1: x/i $pc
=> 0x4004e0 <main+10>: retq
(gdb)
__libc_start_main (main=0x4004d6 <main>, argc=1, argv=0x7fffffffde38,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffde28) at ../csu/libc-start.c:325
325 ../csu/libc-start.c: No such file or directory.
1: x/i $pc
=> 0x7ffff7a2d830 <__libc_start_main+240>: mov %eax,%edi
(gdb)
0x00007ffff7a2d832 325 in ../csu/libc-start.c
1: x/i $pc
=> 0x7ffff7a2d832 <__libc_start_main+242>:
callq 0x7ffff7a47030 <__GI_exit>
(gdb)
__GI_exit (status=110) at exit.c:104
Once it's done, main
returns to __libc_start_main
, which then calls __GI_exit
. In the previous GDB session, this appears on line 2971 out of 5784, more than halfway through! So it looks like the extra work for looking up exit
rather than using return
is a pretty big chunk.