Copyright F-Secure 2010. All rights reserved.Protecting the irreplaceable | f-secure.com
Reverse Engineering II: The Basics
This document is only to be distributed to teachers and students of the Malware Analysis and
Antivirus Technologies course and should only be used in accordance with the course guidelines.
Copyright F-Secure 2010. All rights reserved.
Agenda
• Very basics
• Intel x86 crash course
• Basics of C reversing
2
Copyright F-Secure 2010. All rights reserved.3
Binary Numbers
1 0 1 1 - Nibble
B
1 0 1 1B
1 1 0 1D
- Byte
1 0 1 1B
1 1 0 1D
0 0 1 13
1 0 0 19
- Word
Copyright F-Secure 2010. All rights reserved.4
Byte Order a.k.a. Endianness
12 34
1234
00 01= 0x3412 (Little Endian)
= 0x1234 (Big Endian)
= 0x1234 (Little Endian)
= 0x3412 (Big Endian)
00 01
Copyright F-Secure 2010. All rights reserved.5
Little Endian Dword
12 34 56 78
12345678
00 01 02 03
0x78563412
0x12345678
00 01 02 03
Copyright F-Secure 2010. All rights reserved.6
Endianness Matters
• Data exchange between computers
• Networking protocols
• File formats for disk storage
Copyright F-Secure 2010. All rights reserved.7
System Endianness
Little
Endian
Big
Endian
Switchable
Endianness
Intel x86PowerPC
(exc. G5)ARM
Intel 8051Sparc
(exc. v9)Alpha
Most
uControllersSystem/370 Intel IA64
Copyright F-Secure 2010. All rights reserved.8
ASCII Code
0x00 - 0x1FControl
Characters
Backspace,
Line feed
0x20 - 0x3FDigits and
Punctuation
0-9 <> =
.,: *-()!
0x40 - 0x5F
Upper-case
Letters and
Special
ABCD...
@[]\^_
0x60 - 0x7E
Lower-case
Letters and
Special
abcd...
`{}|~
Copyright F-Secure 2010. All rights reserved.9
ASCII Example
H e l l o 1 2 3 4
48 65 6C 6C 6F 20 31 32 33 34
http://en.wikipedia.org/wiki/ASCII
Copyright F-Secure 2010. All rights reserved.10
Unicode Strings
ff fe 48 00 65 00 6c 00 6c 00 6f 00
H e l l oBOM
UTF-16 / UCS-2
http://en.wikipedia.org/wiki/UTF-16/UCS-2
http://en.wikipedia.org/wiki/Category:Unicode
Copyright F-Secure 2010. All rights reserved.11
String Storage
• ASCIIZ: Zero-terminated ASCII
• Pascal: Size byte + ASCII string
• Delphi: Size Dword + ASCII or Unicode string
H e l l o
ASCIIZ: 48 65 6C 6C 6F 00
Pascal: 05 48 65 6C 6C 6F
Copyright F-Secure 2010. All rights reserved.13
Introduction to Intel x86
• Started with 8086 in 1978
• Continued with 8088, 80186, 80286, 386, 486, Pentium, 686 ...
• CISC architecture
• 32-bit is called x86-32 or IA-32
• 64-bit is called x86-64, AMD64, EMT64T
• 80386 introduced in 1986
• Has a 32-bit word length
• Has eight general-purpose registers
• Supports paging and virtual memory
• Addresses up to 4GiB of memory
Copyright F-Secure 2010. All rights reserved.14
Data Register Layout
Image Copyright © 1997-2008 Intel Corporation
Copyright F-Secure 2010. All rights reserved.15
Data Registers
AL / AH / AX
EAXAccumulator Arithmetic operations
BL / BH / BX
EBXData index
General data
storage, index
CL / CH / CX
ECXLoop counter Loop constructs
DL / DH / DX
EDXData register Arithmetics
Copyright F-Secure 2010. All rights reserved.16
Address Registers
IP / EIP Instruction Pointer Program execution
SP / ESP Stack Pointer Stack operation
BP / EBP Base Pointer Stack frame
SI / ESI Source Index String operation
DI / EDI Destination Index String operation
Copyright F-Secure 2010. All rights reserved.17
Segment Registers
CS Code Segment Program code
DS Data Segment Program data
ES / FS / GS Other Segments Other uses
Copyright F-Secure 2010. All rights reserved.18
EFLAGS Register
Image Copyright © 1997-2008 Intel Corporation
Copyright F-Secure 2010. All rights reserved.19
Mnemonic Examples
MOV EAX, 1 Move 1 to EAX
ADD EDX, 5 Add 5 to EDX
SUB EBX, 2 Subtract 2 from EBX
AND ECX, 0 Bit-wise AND 0 to ECX
XOR EDX, 4 Bit-wise eXclusive OR 4 to EDX
SHL ECX, 6 Shift ECX left by six
ROR EBX, 3 Bit-wise rotate EBX right by 3
INC ECX Increment ECX
Copyright F-Secure 2010. All rights reserved.20
More Mnemonics
JNZ label Jump if not zero (equal)
JMP label Unconditional jump to label
CALL func Call function
RET Return from function
LOOP label ECX--, Jump to label if not zero
PUSH EAX Push EAX to stack
POP EDI Pop EDI from stack
LODSB Load byte from DS:ESI to AL
Copyright F-Secure 2010. All rights reserved.
Basic Data Types
• char - 1 byte
• short - 2 bytes
• int - 4 bytes (platform word)
• long - 4 bytes
• float - 4 bytes floating point
• double - 8 bytes floating point
Copyright F-Secure 2010. All rights reserved.
Pointers and Arrays
• Pointers can point to any memory location
• One-dimensional arrays are flat memory
• Multi-dimensional arrays use pointers
A[0] A[1] A[2] A[3]
char a[4];
char *b, c;
c = a[2];
b = a;
c = *(b+2);
Copyright F-Secure 2010. All rights reserved.
Composite Types: Structure
• Memory is allocated for all members
• Members are accessible separately
struct {
unsigned int id;
unsigned short age;
char name[16];
} record;
Copyright F-Secure 2010. All rights reserved.
Alignment
• Data structures are aligned to word size
• #pragma pack(n) directive can change it
• #pragma pack(1) removes alignment
• Important when reconstructing structures
Copyright F-Secure 2010. All rights reserved.
Structure Storage
long id;
2-byte padding
char name[16];
char name[16];
PackedAligned
sizeof(record) = 24 sizeof(record) = 22
short age; short age;
long id;
Copyright F-Secure 2010. All rights reserved.
Composite Types: Union
• Memory is allocated for the largest member
• Holds only one member at a time
union foo {
int one;
char two;
};
Copyright F-Secure 2010. All rights reserved.
Control Structures
• Conditional Branch
• Iteration
• Switch-Case
• Goto label
Copyright F-Secure 2010. All rights reserved.
Conditional Branch: if
int example_if()
{
int foo = 0;
if (foo)
{
do_one_thing();
}
else
{
do_another();
}
}
var_C = dword ptr -0Ch
push ebp
mov ebp, esp
sub esp, 18h
mov [ebp+var_C], 0
cmp [ebp+var_C], 0
jz short loc_1F27
call _do_one_thing
jmp short locret_1F2C
loc_1F27:
call _do_another
locret_1F2C:
leave
retn
Copyright F-Secure 2010. All rights reserved.
Iteration: for
int example_for()
{
int i;
for (i=0; i<10; i++)
{
if (check_something(i))
break;
}
}
push ebp
mov ebp, esp
sub esp, 28h
mov [ebp+var_C], 0
jmp short loc_1F51
loc_1F3D:
mov eax, [ebp+var_C]
mov [esp], eax
call _check_something
test eax, eax
jnz short locret_1F57
lea eax, [ebp+var_C]
inc dword ptr [eax]
loc_1F51:
cmp [ebp+var_C], 9
jle short loc_1F3D
locret_1F57:
leave
retn
Copyright F-Secure 2010. All rights reserved.
Iteration: while
int example_while()
{
int i = 0;
while (i < 100)
{
if (check_something(i))
break;
}
}
push ebp
mov ebp, esp
sub esp, 28h
mov [ebp+var_C], 0
jmp short loc_1F77
loc_1F68:
mov eax, [ebp+var_C]
mov [esp], eax
call _check_something
test eax, eax
jnz short locret_1F7D
loc_1F77:
cmp [ebp+var_C], 64h
jl short loc_1F68
locret_1F7D:
leave
retn
Copyright F-Secure 2010. All rights reserved.
Branching: Switch-Case
int example_switch()
{
int i = 1;
switch (i)
{
case 0:
do_one_thing();
break;
case 1:
do_another();
break;
default:
check_something(i);
}
}
push ebp
mov ebp, esp
sub esp, 38h
mov [ebp+var_C], 1
mov eax, [ebp+var_C]
mov [ebp+var_1C], eax
cmp [ebp+var_1C], 0
jz short loc_1FAB
cmp [ebp+var_1C], 1
jz short loc_1FB2
mov eax, [ebp+var_C]
mov [esp], eax
call _check_something
jmp short locret_1FB9
loc_1FAB:
call _do_one_thing
jmp short locret_1FB9
loc_1FB2:
call _do_another
jmp short $+2
locret_1FB9:
leave
retn
Copyright F-Secure 2010. All rights reserved.
Branching: Goto
int example_goto(void)
{
open_files();
if do_one_thing()
goto error;
if do_another()
goto error;
close_files();
return 1;
error:
close_files();
return 0;
}
push ebp
mov ebp, esp
sub esp, 18h
call _open_files
call _do_one_thing
test eax, eax
jnz short loc_1FE6
call _do_another
test eax, eax
jnz short loc_1FE6
call _close_files
mov [ebp+var_C], 1
jmp short loc_1FF2
loc_1FE6:
call _close_files
mov [ebp+var_C], 0
loc_1FF2:
mov eax, [ebp+var_C]
leave
retn
Copyright F-Secure 2010. All rights reserved.
Function Calling Conventions
• Common calling conventions:
•__stdcall - Standard calls on Windows
•__cdecl - Most common C calling convention
•__fastcall - Uses registers for arguments
•__thiscall - Pass ‘this’ pointer in ECX in C++
• Most important: Who is going to clean the stack?
• Mixing them will crash the program
Copyright F-Secure 2010. All rights reserved.
Simple C Program
int foobar(int x, int y)
{
int z;
return x;
}
int main(void)
{
int z = foobar(1, 2);
}
Copyright F-Secure 2010. All rights reserved.
__cdecl Calls
PUSH arg2
PUSH arg1
CALL function
ADD ESP,8
PUSH EBP
MOV EBP, ESP
SUB ESP, 4
MOV EAX, [EBP+8]
MOV ESP, EBP
POP EBP
RET
ARG2
ARG1
RET Addr.
Saved EBP
LOC1
arg1: EBP+8
arg2: EBP+12
loc1: EBP-4
Stack
Copyright F-Secure 2010. All rights reserved.
__stdcall Calls
PUSH arg2
PUSH arg1
CALL function
PUSH EBP
MOV EBP, ESP
SUB ESP, 4
MOV EAX, [EBP+8]
MOV ESP, EBP
POP EBP
RETN 8
ARG2
ARG1
RET Addr.
Saved EBP
LOC1
arg1: EBP+8
arg2: EBP+12
loc1: EBP-4
Copyright F-Secure 2010. All rights reserved.
Reading
C Programming Information:
http://www.cprogramming.com/
http://www.unixwiz.net/techtips/win32-
callconv-asm.html
Intel x86 Function-call Conventions: