Binary exploits

Binary exploits can be used for a lot of different things. It can be used to find vulnerabilities in most programs.

The hacker knows that it is not the code written by the programmer that gets executed by the computer. Since a program is compiled into machine-code it is actully the machine code that gets executed. And the human readable representation of machine code is assembly. So yeah, let's learn some assembly.

A lot in this chapter is just my notes from reading Hacking - The Art of Exploitation. So if you really want to learn binary exploitation you should probably stop reading here and just pick up that book instead, it is a lot better.

What's up with all the hexadecimal?

Hexadecimal is a base 16 counting system. 0-9 plus A-F. So the numbers are, 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. So we can count to 16 without using two digits. So F == 16 This is pretty convenient because one byte is made up of 8 bits. And with 8 bits (0 and 1) you. can form 256 (2^8) different values. So two hexadecimal digits can represent values up to 256. Just like two decimal digits can represent value up to 99.

So two hexadecimal digits can represent any byte value. So one byte can be translated into a two digit hexadecimal value.

01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14....
a0,a1, a2, a3, a4...
f0, f1, f2, f3...

And so on.

Memory address

So in order to analyze assembly we are going to write a short program in C.

#include <stdio.h>

int main(){

  for (size_t i = 0; i < 10; i++) {
    puts("Hello world");
  }

  return 0;
}

So we have written a program in C and then compiled it. Now we want to look at the assembly code so see what code is actually going to be run by the machine.

objdump -D programName

This will give us som crazy output like this

00000000004004e6 <main>:
 4004e6:	55                   	push   %rbp
 4004e7:	48 89 e5             	mov    %rsp,%rbp
 4004ea:	48 83 ec 10          	sub    $0x10,%rsp
 4004ee:	48 c7 45 f8 00 00 00 	movq   $0x0,-0x8(%rbp)
 4004f5:	00 
 4004f6:	eb 0f                	jmp    400507 <main+0x21>
 4004f8:	bf a4 05 40 00       	mov    $0x4005a4,%edi
 4004fd:	e8 be fe ff ff       	callq  4003c0 <puts@plt>
 400502:	48 83 45 f8 01       	addq   $0x1,-0x8(%rbp)
 400507:	48 83 7d f8 09       	cmpq   $0x9,-0x8(%rbp)
 40050c:	76 ea                	jbe    4004f8 <main+0x12>
 40050e:	b8 00 00 00 00       	mov    $0x0,%eax
 400513:	c9                   	leaveq 
 400514:	c3                   	retq   
 400515:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
 40051c:	00 00 00 
 40051f:	90                   	nop

This is just the part about the main-function of the program. The output is a lot more, but that's not interesting to us at the time.

00000000004004e6 This number represents a place in memory. It is like an address. It could be written in base 10 if we wanted to. And it would still be the correct address. But out of the convenience describes about the address is written in hexadecimal. So the address is 16 digits. That is because the binary is a 64 bit addressing schema. So a 64-bit process can have 2^64 (1.84467441 × 10^19) memory addresses.

So on the first line after the main-line we see the number 55. All these numbers are actually machine-code, but instead of writing it in binary (01010101010101101) it is written in hexadecimal. The mnemonics to the right of those numbers are the instructions written in assembly. They are written so that we, humans, can understand it a bit easier. Instead of having to remember that 90 means nop. We just have to remember nop. So that's great. Makes it a lot easier to understand machine code.

So instead of having to memorize 10010000 it is represented as 90 in hexadecimal. And instead of having to remember 90 in hexadecimal we just have to remember nop. Pretty great. But in the end they mean the same thing, they are just represented in three different ways.

AT&T Syntaxt or Intel Syntax

There are basically two types of assembly language representation, it is the: AT&T syntax and the Intel syntax. The AT&T syntax is the default syntax in linux distributions. So when we run objdump, like the example above, it is in AT&T syntax. And we can tell that it is AT&T because it has all those $ and % signs. If you add the -M intel to you objdump command you will see the output in Intel-syntax. But in the end, it doesn't really matter, tomato tomato.

We can set the syntax in gdb with the following command:

set dis intel
#or
set disassembly-flavor intel

Registers

Okay, so the processor in your computers has something called registers. Registers are like internal variables for your processor. They are predefined, in the sense that you can't create registers. They are already there. You can think of them as like micro-memories, or just variables. They are used by the processor to make stuff faster, instead of having to look up a specific place in the memory it has its own micro-memory. There are only 16 registers available on x86 processors. So it is not that much to remember. The names of the registers are a bit different between 64 bit processors and 32 bit. A 64bit processor can run 32 bit binaries, but 32 bit processors can't run 64 bit binaries. If you want to know what type a binary is you just type

file binaryName

These are the names for 32 bit registers. And they are divided into sub-groups.

General registers These registers are mainly used for like temporary memory for the processor.

EAX - Accumulator

EBX - Base

ECX - Counter

EDX - Data

Index and pointers ESI - Source index

EDI - Destination index

EBP - Base pointer - This one stores an address in its little micro-memory.

EIP - Instruction pointer. Like a child points his finger on each word it reads in a book, the instruction pointer is that finger. It always points to the current instruction the processor is reading. This is a an important pointer.

ESP - Stack pointer - This one also stores an address.

Segment registers CS

DS

ES

FS

GS

SS

Indicator EFLAGS

So let's take a look at them in a real program. Let's run the program above but this time with a debugger, the Gnu Debugger.

gdb -q ./myprogram

First we set a breakpoint with the command: break main to stop the program right before the main-function is run. Then we type info registers to see what we got with our registers.

Breakpoint 1, 0x00000000004004ea in main ()
(gdb) info registers
#### General registers
rax            0x4004e6	4195558
rbx            0x0	0
rcx            0x0	0
rdx            0x7fffffffe798	140737488349080

#### Index and pointers
rsi            0x7fffffffe788	140737488349064
rdi            0x1	1
rbp            0x7fffffffe6a0	0x7fffffffe6a0
rsp            0x7fffffffe6a0	0x7fffffffe6a0
r8             0x400590	4195728
r9             0x7ffff7dea6d0	140737351952080
r10            0x83e	2110
r11            0x7ffff7a57520	140737348203808
r12            0x4003f0	4195312
r13            0x7fffffffe780	140737488349056
r14            0x0	0
r15            0x0	0
rip            0x4004ea	0x4004ea <main+4>

#### Indicator
eflags         0x246	[ PF ZF IF ]

#### Segment registers
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0

So hexadecimal is used as a way to represent binary out of convenience.

So assembly is written in the following form:

mnemonic    destination,source

The mnemonics are instructions like: mov, push, sub The destination and source are registers, addresses in memories, or values.

Mnemonics

mov    rbp,rsp

So here we move the current value in rsp (stack pointer) to rbp (base pointer). This is pretty standard in the beginning of a program. We take the stack-pointer and say that it is equal to the base-pointer for now.

sub    rsp,0x10

Here we read: Subtract 0x10 from rsp. So Stack-pointer register is now equal to what it was before minus 0x10.

add/inc ;add or increment

Flow control

cmp ; is used to compare values.
jmp ; jump to a different part of the program.

For example

cmp    QWORD PTR [rbp-0x8],0x9
jbe    4004f8 <main+0x12>

Here we are making a comparison. Compare rbp-0x8 ==? 0x9. And jbe stands for jump if below or equal. I am guessing that is the loop. Then we have an address: 4004f8 which is the address to the point in the program where the loop is initiated. So it makes a comparison and if it is false it jumps to the beginning of the loop.

Vulnerable functions

access()

Notes

Warning: Using access() to check if a user is authorized to, for example, open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be avoided. (In the example just described, a safer alternative would be to temporarily switch the process's effective user ID to the real ID and then call open(2).)

http://linux.die.net/man/2/access

Last updated