Exploring syscalls with GCC and GDB

In the previous post, my first instinct figuring out how to make a Linux syscall in assembly was to look at specs and documentation, but a former colleague wrote in to ask why I didn't just look at the output of GCC. Also, why not just return 110 from main rather than calling exit(110)? These are good questions, so let's discuss them!

Philosophically speaking, knowledge I get from reading specs and knowledge I get from reverse engineering feel different; from specs I expect to get a full picture of how things are supposed to work, and from reverse engineering I expect to see a particular example of how things do work in at least some cases. It really depends on the situation which of these two approaches is more useful or true! And even though I prefer knowledge from specs, reverse engineering can be a good way to figure out what specs to even look at and how to interpret them.

Practically speaking, when I tried looking at the output of GCC for this particular problem, it wasn't very useful:

$ cat >with_exit.c <<'EOF'

#include <stdlib.h>
int main(void) { exit(110); }

EOF
$ gcc -S with_exit.c -o /dev/stdout
  .file "with_exit.c"
  .text
  .globl main
  .type main, @function
main:
.LFB2:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq %rsp, %rbp
  .cfi_def_cfa_register 6
  movl $110, %edi
  call exit
  .cfi_endproc
.LFE2:
  .size main, .-main
  .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
  .section .note.GNU-stack,"",@progbits

The two relevant lines are movl $110, %edi and call exit, but that's like saying "to exit, call the exit function", which isn't very enlightening. The problem is that in C, every syscall is wrapped by a standard library function, and what we're looking for is inside there rather than in our program. I thought it would maybe get inlined if I turned on optimizations, but no:

$ gcc -O2 -S with_exit.c -o /dev/stdout
  .file "with_exit.c"
  .section .text.unlikely,"ax",@progbits
.LCOLDB0:
  .section .text.startup,"ax",@progbits
.LHOTB0:
  .p2align 4,,15
  .globl main
  .type main, @function
main:
.LFB15:
  .cfi_startproc
  subq $8, %rsp
  .cfi_def_cfa_offset 16
  movl $110, %edi
  call exit
  .cfi_endproc
.LFE15:
  .size main, .-main
  .section .text.unlikely
.LCOLDE0:
  .section .text.startup
.LHOTE0:
  .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
  .section .note.GNU-stack,"",@progbits

What about using return instead? I got similarly unenlightening results:

$ cat >with_return.c <<'EOF'

int main(void) { return 110; }

EOF
$ gcc -S with_return.c -o /dev/stdout
  .file "with_return.c"
  .text
  .globl main
  .type main, @function
main:
.LFB0:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq %rsp, %rbp
  .cfi_def_cfa_register 6
  movl $110, %eax
  popq %rbp
  .cfi_def_cfa 7, 8
  ret
  .cfi_endproc
.LFE0:
  .size main, .-main
  .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
  .section .note.GNU-stack,"",@progbits
$ gcc -O2 -S with_return.c -o /dev/stdout
  .file "with_return.c"
  .section .text.unlikely,"ax",@progbits
.LCOLDB0:
  .section .text.startup,"ax",@progbits
.LHOTB0:
  .p2align 4,,15
  .globl main
  .type main, @function
main:
.LFB0:
  .cfi_startproc
  movl $110, %eax
  ret
  .cfi_endproc
.LFE0:
  .size main, .-main
  .section .text.unlikely
.LCOLDE0:
  .section .text.startup
.LHOTE0:
  .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
  .section .note.GNU-stack,"",@progbits

In both the unoptimized and the optimized version, return 110 simply translates to the two instructions movl $110, %eax and ret (return).

This is as far as GCC gets us, but in the spirit of reverse engineering, let's turn to GDB for help!

What happens inside the stdlib

I thought it would be reasonably simple to use GDB to step through all the assembly instructions that happen when exit(110) gets called, and it is simple (the only GDB knowledge I used was the commands break main to set up a breakpoint, run to get to it, display/i $pc to show the next assembly instruction at each step, and stepi to start stepping through the code), but it's also extremely long. You can see the output of my GDB session if you're curious, but be warned that it's stepping through about 1375 instructions. As far as I can tell, some of the reasons are:

The standard library functions are dynamically linked, so the program actually has to look up whether a function called exit even exists, where it lives in memory, etc., and this seems to involve hash tables and string comparisons.
Once it's found, exit does a bunch of cleanup:
- Look for global destructors and call them.
- Look for functions registered with atexit or on_exit and call them.
- Various bits of IO cleanup.

In the end though, it does use the syscall instruction to invoke the Linux kernel. As for the arguments passed to the kernel, I'm not sure I would want to extract them from this GDB session.

What happens after main

So that's what happens when calling exit, but what if we just return from main? We saw above that this turns into the ret instruction, but that can't be used to jump back into the kernel directly, so there has to be something implicitly running after main. This can be confirmed by looking at some of the ELF information:

$ gcc with_return.c -o with_return
$ readelf --file-header --symbols with_return
ELF Header:
...
  Entry point address:               0x4003e0
...
Symbol table '.symtab' contains 66 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
    59: 00000000004003e0    42 FUNC    GLOBAL DEFAULT   14 _start
...
    61: 00000000004004d6    11 FUNC    GLOBAL DEFAULT   14 main
...

The first instructions from this program that get executed are at the entry point 0x4003e0, which is the address of the symbol _start, not main. So _start is responsible for doing a bunch of setup before actually calling main. And once main is done, it returns to whatever _start was doing for the cleanup. We can see this in GDB:

$ gdb with_return
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from with_return...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x4004da
(gdb) run
Starting program: with_return

Breakpoint 1, 0x00000000004004da in main ()
(gdb) display/i $pc
1: x/i $pc
=> 0x4004da <main+4>: mov    $0x6e,%eax
(gdb) stepi
0x00000000004004df in main ()
1: x/i $pc
=> 0x4004df <main+9>: pop    %rbp
(gdb)
0x00000000004004e0 in main ()
1: x/i $pc
=> 0x4004e0 <main+10>: retq
(gdb)
__libc_start_main (main=0x4004d6 <main>, argc=1, argv=0x7fffffffde38,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffde28) at ../csu/libc-start.c:325
325 ../csu/libc-start.c: No such file or directory.
1: x/i $pc
=> 0x7ffff7a2d830 <__libc_start_main+240>: mov    %eax,%edi
(gdb)
0x00007ffff7a2d832 325 in ../csu/libc-start.c
1: x/i $pc
=> 0x7ffff7a2d832 <__libc_start_main+242>:
    callq  0x7ffff7a47030 <__GI_exit>
(gdb)
__GI_exit (status=110) at exit.c:104

Once it's done, main returns to __libc_start_main, which then calls __GI_exit. In the previous GDB session, this appears on line 2971 out of 5784, more than halfway through! So it looks like the extra work for looking up exit rather than using return is a pretty big chunk.