Binary exploits
Binary exploits can be used for a lot of different things. It can be used to find vulnerabilities in most programs.
The hacker knows that it is not the code written by the programmer that gets executed by the computer. Since a program is compiled into machine-code it is actully the machine code that gets executed. And the human readable representation of machine code is assembly. So yeah, let's learn some assembly.
A lot in this chapter is just my notes from reading Hacking - The Art of Exploitation. So if you really want to learn binary exploitation you should probably stop reading here and just pick up that book instead, it is a lot better.
What's up with all the hexadecimal?
Hexadecimal is a base 16 counting system. 0-9 plus A-F. So the numbers are, 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. So we can count to 16 without using two digits. So F == 16 This is pretty convenient because one byte is made up of 8 bits. And with 8 bits (0 and 1) you. can form 256 (2^8) different values. So two hexadecimal digits can represent values up to 256. Just like two decimal digits can represent value up to 99.
So two hexadecimal digits can represent any byte value. So one byte can be translated into a two digit hexadecimal value.
And so on.
Memory address
So in order to analyze assembly we are going to write a short program in C.
So we have written a program in C and then compiled it. Now we want to look at the assembly code so see what code is actually going to be run by the machine.
This will give us som crazy output like this
This is just the part about the main-function of the program. The output is a lot more, but that's not interesting to us at the time.
00000000004004e6
This number represents a place in memory. It is like an address. It could be written in base 10 if we wanted to. And it would still be the correct address. But out of the convenience describes about the address is written in hexadecimal. So the address is 16 digits. That is because the binary is a 64 bit addressing schema. So a 64-bit process can have 2^64 (1.84467441 × 10^19) memory addresses.
So on the first line after the main-line we see the number 55. All these numbers are actually machine-code, but instead of writing it in binary (01010101010101101) it is written in hexadecimal. The mnemonics to the right of those numbers are the instructions written in assembly. They are written so that we, humans, can understand it a bit easier. Instead of having to remember that 90 means nop. We just have to remember nop. So that's great. Makes it a lot easier to understand machine code.
So instead of having to memorize 10010000
it is represented as 90
in hexadecimal. And instead of having to remember 90
in hexadecimal we just have to remember nop
. Pretty great. But in the end they mean the same thing, they are just represented in three different ways.
AT&T Syntaxt or Intel Syntax
There are basically two types of assembly language representation, it is the: AT&T syntax and the Intel syntax. The AT&T syntax is the default syntax in linux distributions. So when we run objdump, like the example above, it is in AT&T syntax. And we can tell that it is AT&T because it has all those $ and % signs. If you add the -M intel
to you objdump command you will see the output in Intel-syntax. But in the end, it doesn't really matter, tomato tomato.
We can set the syntax in gdb with the following command:
Registers
Okay, so the processor in your computers has something called registers. Registers are like internal variables for your processor. They are predefined, in the sense that you can't create registers. They are already there. You can think of them as like micro-memories, or just variables. They are used by the processor to make stuff faster, instead of having to look up a specific place in the memory it has its own micro-memory. There are only 16 registers available on x86 processors. So it is not that much to remember. The names of the registers are a bit different between 64 bit processors and 32 bit. A 64bit processor can run 32 bit binaries, but 32 bit processors can't run 64 bit binaries. If you want to know what type a binary is you just type
These are the names for 32 bit registers. And they are divided into sub-groups.
General registers These registers are mainly used for like temporary memory for the processor.
EAX - Accumulator
EBX - Base
ECX - Counter
EDX - Data
Index and pointers ESI - Source index
EDI - Destination index
EBP - Base pointer - This one stores an address in its little micro-memory.
EIP - Instruction pointer. Like a child points his finger on each word it reads in a book, the instruction pointer is that finger. It always points to the current instruction the processor is reading. This is a an important pointer.
ESP - Stack pointer - This one also stores an address.
Segment registers CS
DS
ES
FS
GS
SS
Indicator EFLAGS
So let's take a look at them in a real program. Let's run the program above but this time with a debugger, the Gnu Debugger.
First we set a breakpoint with the command: break main
to stop the program right before the main-function is run. Then we type info registers
to see what we got with our registers.
So hexadecimal is used as a way to represent binary out of convenience.
So assembly is written in the following form:
The mnemonics are instructions like: mov, push, sub The destination and source are registers, addresses in memories, or values.
Mnemonics
So here we move the current value in rsp (stack pointer) to rbp (base pointer). This is pretty standard in the beginning of a program. We take the stack-pointer and say that it is equal to the base-pointer for now.
Here we read: Subtract 0x10 from rsp. So Stack-pointer register is now equal to what it was before minus 0x10.
Flow control
For example
Here we are making a comparison. Compare rbp-0x8 ==? 0x9. And jbe stands for jump if below or equal. I am guessing that is the loop. Then we have an address: 4004f8 which is the address to the point in the program where the loop is initiated. So it makes a comparison and if it is false it jumps to the beginning of the loop.
Vulnerable functions
access()
Notes
Warning: Using access() to check if a user is authorized to, for example, open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be avoided. (In the example just described, a safer alternative would be to temporarily switch the process's effective user ID to the real ID and then call open(2).)
http://linux.die.net/man/2/access
Last updated