software security through targeted diversiﬁcation...abstract this thesis is discussing about...

FACULTY OF ENGINEERING

THESIS SUBMITTED FOR THE PROGRAMME

MASTER OF ARTIFICIAL INTELLIGENCE

ACADEMIC YEAR 2006-2007

Software Security throughTargeted Diversification

Mantadelis Theofrastos

Du Xiaodai

Promotor : Prof. Bart Preneel

Daily leaders: Jan Cappaert

Nessim Kisserli

Contents

Contents iii

1 Introduction 2

1.1 Software protection . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Security through diversity . . . . . . . . . . . . . . . . . . . . 3

1.3 Targeted diversification . . . . . . . . . . . . . . . . . . . . . . 5

2 Defending with targeted diversification 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 CPU simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 The necessity of building a module of CPU . . . . . . . 7

2.2.2 Overview of the basic execution environment . . . . . . 7

2.2.3 Class design: states of CPU simulator . . . . . . . . . . 8

2.2.4 The model of snippet: bridge between CPU simulator

and genetic algorithm . . . . . . . . . . . . . . . . . . 10

2.2.5 Extension of the CSnippet class . . . . . . . . . . . . . 12

2.2.6 Modelling a register on a bit level . . . . . . . . . . . . 13

2.2.7 Weaknesses of the symbolic framework . . . . . . . . . 14

2.2.8 Symbolic analyzer . . . . . . . . . . . . . . . . . . . . 14

2.3 Genetic programming . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Genetic computing algorithm . . . . . . . . . . . . . . 16

2.3.3 Predefined vs. random Initial population . . . . . . . . 17

2.3.4 Fitness value and fitness function . . . . . . . . . . . . 17

2.3.5 Roulette-wheel selection . . . . . . . . . . . . . . . . . 21

i

2.3.6 Genetic operator for reproduction . . . . . . . . . . . . 22

2.3.7 Discussion of insertion function . . . . . . . . . . . . . 25

2.3.8 Discussion of convergence . . . . . . . . . . . . . . . . 27

3 Attacking targeted diversification 30

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Program representation and program analysis . . . . . . . . . 31

3.2.1 Graph data structures . . . . . . . . . . . . . . . . . . 31

3.2.2 Control Flow Graph . . . . . . . . . . . . . . . . . . . 33

3.2.3 Searching graphs . . . . . . . . . . . . . . . . . . . . . 34

3.2.4 Representation of a program as a search tree . . . . . . 38

3.3 An attack algorithm . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.1 Best Matching Node Search . . . . . . . . . . . . . . . 44

3.3.2 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.3 Heuristics of Best Matching Node Search . . . . . . . . 47

3.3.4 Selection of heuristics and automated generation . . . . 53

3.4 A comparison algorithm . . . . . . . . . . . . . . . . . . . . . 55

3.4.1 Longest Common Subsequence . . . . . . . . . . . . . . 55

3.4.2 Longest Common Subsequence at a Diverse Population 59

4 Experimental results 61

4.1 Diverse snippet generation by genetic computing algorithm . . 61

4.1.1 Predefined initial population vs. Random initial pop-

ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.1.2 Tuning the parameters of genetic computing algorithm 63

4.1.3 Conclusion and weakness . . . . . . . . . . . . . . . . . 64

4.2 Attacking diversified software containing snippets . . . . . . . 66

4.2.1 Best Matching Node Search experimental results . . . . 66

4.2.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Conclusions 77

5.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2.1 For BMNS algorithm . . . . . . . . . . . . . . . . . . . 78

5.2.2 A new diversification model . . . . . . . . . . . . . . . 79

5.2.3 Homogeneity of instructions in a snippet . . . . . . . . 80

A Appendix 82

A.1 Assembly instructions in diversified software . . . . . . . . . . 82

A.2 Shortest edit script . . . . . . . . . . . . . . . . . . . . . . . . 84

A.3 Best Matching Node Search algorithm in pseudocode . . . . . 86

List of Tables 90

List of Figures 92

Bibliography 96

Abstract

This thesis is discussing about software security through targeted

diversification. It is a continuation of the previous year thesis by Mer-

ckx [20]. The first part of this thesis is focused on defending software

with the use of targeted diversification. For that we implement a

genetic computing algorithm that generates code snippets. This im-

plementation is presented at Chapter 2. An important factor for every

software protection method is to be tested against attack attempts.

So in this thesis, we also focused on attacking diversified population.

The second part presents an attack scheme that targets diversified

software population. The scheme is presented at Chapter 3.

At Chapter 4 we present the data from our experimental results

and finally at Chapter 5 we combine the empirical knowledge we

gained from the two parts of the thesis and conclude about defen-

sive systems using diversification.

Keywords: software protection, targeted diversification, cracking, simi-

larity measures, control flow graph analysis.

0Credits: A big thank to the best daily supervisors Jan Cappaert and Nessim Kisserli.

1

Chapter 1

Introduction

1.1 Software protection

Software protection started a long “cat-and-mouse” struggle between devel-

opers and crackers. Software protection is a broader term which involved the

copy protection of computer software and the counter measure of software

cracking. Usually, the term copy protection is used interchangeably with

software protection.

Copy protection, also known as copy prevention or copy restriction, is a

system for preventing the unauthorized reproduction of copyrighted media

like movies, music and computer software [23]. Very often software copy

protection is achieved by integrating security code in the application. Though

security code itself can be susceptible to attack, software protection ideally

makes the software itself resistant to attack [20]. Several modern Digital

Rights Management (DRM) techniques and technical protection measures

were discussed by Merckx in the previous version of this thesis [20].

Apart from anti-piracy measures like copy protection, software protec-

tion also includes mechanisms against tampering, reverse engineering and

exploitations. These software protection techniques are commonly applied

to software distributions as a countermeasure against cracking. Table 1.1,

lists respectively the current user-level software protection techniques and

their targeted phase of the cracking process.

2

Countermeasure Efficient against Circumvented byStatic tamper Tampering Loaders, analysis,resistance (key generators, serials)Dynamic tamper Tampering Loaders, analysis,resistance (key generators, serials)Anti-debugger code Dynamic analysis Debugger plug-ins, patchingObfuscation Analysis Deobfuscators, analysisEncryption, packers Static analysis Unpackers, dynamic analysis

Table 1.1: Merckx’s table of countermeasures

According to Main and van Oorschot, the process of devising an attack

on an application to defeat the program’s security code typically follows four

stages; “Analysis”, “Tampering”, “Automation” and “Distribution”. In the

preceding work [20], Merckx discussed state-of-the-art countermeasures and

concluded that they all target either the analysis or the tampering phase

of the cracking process. Once an individual has successfully cracked the

application, these techniques provide no protection against the distribution

and widespread applicability of the crack. The effort to break the piracy

chain has obviously failed, since cracks are still widely available, even for

heavily protected applications.

In the following sections, class breaks and how to interrupt the piracy

chain before the automation and distribution phases are discussed.

1.2 Security through diversity

The goal of introducing diversification in software security is to prevent the

problem of breaking one instance will lead to breaking all instances.

As it is shown in [18] and [20] cracking an application is a many step pro-

cedure. Those steps are mainly the “Analysis”, “Tampering”, “Automation”

and finally the “Distribution”. It is almost impossible to stop the cracking

procedure at the “Analysis” or the “Tampering” step, mainly because the

open architecture of modern technology.

The diversification of an application targets to stop the cracking pro-

3

cedure at the “Automation” step of the cracking procedure. If a cracker

succeeds in bypassing current protection mechanisms and disable the appli-

cation security code of his instance, there is no guarantee that the same

attack would work if the software population has a certain degree of diver-

sity. A classical automated attack can still be devised but its chance of being

applicable to a specific instance is diminished. Figure 1.1 shows a hypothet-

ical curve of the number of distributed cracked instances of an application

during the four stages of the cracking procedure. Preventing the automation

step will dramatically reduce the number of cracked software instances. In

security systems we refer to the “Class Break” when the protection system

fails and this failure affects all the instances of that software. We can see

that a “Class break” appears when the crackers manage to automate the

cracking procedure and distribute it. This hypothetical curve is for illustra-

tion purposes only and should not be viewed as backed by a mathematical

model. Although research suggests the spread of pirated content resembles

that of epidemiological diseases and can be modelled by various economic

equations [28].

Figure 1.1: Cracked software distribution.

Because in software development the decisions are taken by an “if then

else” structure or an equivalent the protection will almost always be one or

more of those structures. When the program is compiled, these structures

are translated to conditional branch or jump instructions. In the end the

cracker needs to locate and tamper with those branch instructions, therefore

4

is sometimes called “branch jamming”.

The most common method of automation is the creation of a small patch.

The crackers create a small file that will reproduce the step of the “Tamper-

ing” automatically. Patching a file is an easy procedure requiring only that

the target file (to be patched) be very similar to the source file from which

the patch was created. The idea of diversification is to destroy this similarity

between source and target files, thus forcing the patch to fail.

Typically, patches work by locating one or more specific addresses at

the target file and modifying one or more bytes. More general patching

techniques exist that search for specific bytes in the target file and modify

them.

Diversification will force both patching methods to fail in target instances

which differ from the source used in creating the patch. A similar approach

by software ageing, which requires software to be updated at regular periods

is presented at paper [16].

1.3 Targeted diversification

In practice, the goal of diversification is to generate syntactically different

but semantically equivalent programs. These diverse programs hide the true

semantic difference between this version and a previous version amidst a large

number of artificial syntactic differences.

According to the conclusion of previous thesis, diverse programs created

by inserting harmless snippets (discussed further in Section 2.3.4) is shown

to be resistant to current automated tampering attacks. While multiple

precautions were discussed to counter potential attacks on the diversification

scheme, there are no actual experiments to support it. In this thesis, we

focus on the “arm-race” between the defence, creating diverse populations of

snippets, and the attack, using our BMNS algorithm, see also Section 3.3.1.

Through experiments, we will show when and how diversification fails to

protect the software and how diversification can be modelled in a way that

it will overcome the current attack methods.

5

Chapter 2

Defending with targeted

diversification

2.1 Introduction

In this chapter we will, illustrate how genetic programming techniques can

be adapted to create code snippets that look like real code but overall do not

affect a program when they are inserted in that program. The ideas that we

will present extend and improve the ideas of Merck [20]. The goal of these

snippets is to provide means to create diverse program instances - with the

same functionality - that are resistant to global attacks, such as commonly

known “cracks” and more intelligent patching programs that tamper with

software in a malicious way.

First we introduce why and how we establish a CPU model to evaluate the

snippet. Based on the conclusion of the previous thesis, the diversification

is coming by the insertion of junk code (harmless snippets) which modifies

the offset of some original instructions. The CPU model here guarantees the

health of snippets. Then we explain how we implement the genetic computing

algorithm to generate diverse snippets. Improvements to the diversity scheme

are discussed base on the experimental result.

6

2.2 CPU simulator

2.2.1 The necessity of building a module of CPU

As mentioned before, the final goal is to change the offset of the crucial

instructions through inserting harmless snippets. Given the high risk of

directly executing harmful snippets, a CPU simulator is necessary to assess

the overall effect of a snippet’s execution on the host programme. In our

implementation, a state model of basic execution environment of CPU is set

up. Comparing the state before and after execution of the snippet a decision

is made to judge if this snippet is harmless or not. To build a CPU module,

we first describe the concerned execution environment of CPU.

2.2.2 Overview of the basic execution environment

Any program or task running on a processor is given a set of resources for

executing instructions and for storing code, data, and state information.

These resources make up the basic execution environment for a processor.

The basic execution environment is used jointly by the application programs

and the operating system or executive running on the processor [5].

Basic program execution registers - On the X86 architecture, the

eight general-purpose registers, the six segment registers, the EFLAGS regis-

ter, and the EIP (instruction pointer) register comprise a basic execution en-

vironment in which a set of general-purpose instructions be executed. These

instructions perform basic integer arithmetic, handle program flow control,

operate on bit and byte strings, and address memory. We limit our simulation

of the basic execution environment to 8 general purpose registers, the stack,

and the extended flags register. The EIP register is modified when a snippet

was inserted into the original code, but with the guarantee of environment

before and after the inserted snippet, the change is harmless.

Stack - To support procedure or subroutine calls and the passing of pa-

rameters between procedures or subroutines, a stack and stack management

resources are included in the execution environment. The stack is located in

memory. The key point is to keep the pointer and content of stack identical

7

to the state they were in before the snippet executed.

Actually, many other environment variables exist in the modern proces-

sor, like x87 FPU registers, MMX registers, etc. More complex implemen-

tation is possible to make the snippet harder to be recognized by crackers.

However, it may be harder to guarantee the correct execution of more com-

plex snippets.

2.2.3 Class design: states of CPU simulator

According to the requirement of execution environment, the main objects

like bit, register and stack compose the main state of the CPU simulator.

A collection object, “CInstructionCollections”, stores all the modifications

by instructions which can be executed by the CPU simulator. When the

CPU simulator executes instructions, an initial state is recorded first. After

reading the instructions one by one, the state of CPU simulator is modified

according to the detail procedure which is stored in the collection object. See

Figure 2.1

Through this implementation, a CPU simulator can take charge in the

verification of snippets and output the detail report to the snippet object.

This is used by the genetic algorithm to evaluate the degree of harmlessness

of the snippet.

An initial fitness value is defined inside the class “CInstructionCollections”

for each type of instructions. Then the fitness function of the genetic algo-

rithm can take both initial fitness value and combination of different instruc-

tions as reference to score that snippet.

8

Figure 2.1: The CPU simulator.

1. Read snippets from file and store it in object.

2. Emulate snippets.

(a) Initialize the CPU state.

(b) For every instruction DO Until <the end of this snippet>,

i. Read instruction and find the matched operation.

ii. Change the CPU state according to the matched oper-

ation.

(c) Report the difference between the initial state and post state.

Table 2.1: The CPU simulator pseudocode.

9

2.2.4 The model of snippet: bridge between CPU sim-

ulator and genetic algorithm

The term snippet we mentioned before refers to a sequence of instructions.

Moreover, there are many useful attributes in the class “CSnippet” which not

only can boost the verification and also support the process of the genetic

algorithm. Fig 2.2 is the main functions of class “CSnippet”.

Figure 2.2: Class of snippet.

In the beginning, the operation “open(filename)” input the assemble in-

structions from a file and store them in the vector“ instructions” in the form

of the class “CInstruction”.

The routine “updataMap()” will be triggered when the contents in the

“ instructions” are changed. For the general instruction like transfer instruc-

tion, arithmetic and logic instruction, CPU simulator only need to change

involved CPU states according to the manual of real CPU (base on IA-32

Intel in our case). But for the (un)conditional jump instruction within a big

snippet, a more complex structure which takes charge the record of execution

order is necessary.

10

A vector object “ map” is established for storing the structure “La-

belMap”. In the instructions container of snippet, there is already a uniform

index for each instruction. The label and the index of that label store in an

independent container as Figure 2.3. When a snippet with (un)conditional

jump instruction is executed, instead of searching for the destination label

in the whole instruction container, it’s more efficient to look for the exactly

index of label in the “LabelMap” container.

Figure 2.3: An instance of LabelMap.

Another problem should be mentioned is the renaming of labels. After

genetic computing of several generations, there may be some label in the

same name within one snippet. It is because the roulette wheel selection and

genetic operation (which will be introduced in the chapter genetic computing)

may have a chance to generate new snippet based on different combination

of one snippet.

The attributes “ harmless”, “ fitness” and “ msg” will only be set after

the verification of CPU simulator. “ harmless” mark if the snippet is harm-

less or not. “ fitness” keep the total fitness value which is the key reference

of the genetic algorithm for evaluating and generating the next generation.

11

“ msg” store the detail report of the modification of each CPU state.

2.2.5 Extension of the CSnippet class

Consider about the complexity to generate a harmless result successfully,

we define three types of snippets: harmful snippets, harmless snippets and

semi-harmful snippets.

Harmless snippets - These are snippets whose execution has no effect

on the state of the CPU, i.e. the contents of all the registers are restored

(including the extended flags register) as is the state of the stack. This is the

’ideal’ snippet.

Semi-harmful snippets - These are snippets whose execution modifies the

state of the CPU minimally (e.g. fails to restore the contents of 1 or 2 regis-

ters). Such snippets may still be used after careful selection of an insertion

point (e.g. at a point in the program where the contents of the 2 registers is

no longer needed).

Harmful snippets - These are snippets which cannot be considered semi-

harmful and are of no use to us (besides their limited contribution to the

gene pool).

Establishing new attributes of snippet which described the modification

detail of CPU state is a good extension for the future work. The semi-

harmless snippet and insertion function, which will be mentioned in the

chapter genetic algorithm, also come from this idea.

Consider about the effectiveness of snippet for hiding the crucial instruc-

tion, we define another three types of snippets: Type A, Type B and Type

C.

• Type A - The snippet without jump instruction to modify CFG 3.2.2.

• Type B - The snippet modified the CFG.

• Type C - Type A or Type B snippet which also imitate the Before,

After and After Branch block of crucial node.

12

2.2.6 Modelling a register on a bit level

Figure 2.4 shows the UML class diagram to model general purpose registers,

flag registers and a single bit. Because some instructions only modify a single

bit of the flag register, modelling at bit level is necessary. Moreover, instead

of only storing 0 or 1 values, a symbol is used to represent the content a

register bit. For example “EAX00” represents the least significant bit of

register EAX.

Figure 2.4: Class of register

Not only the flag register bits are better to be modelled as a set of bits,

also the general purpose registers should be. For instance, when an instruc-

tion “ROL AL, 2” is executed and the value of AL is 10101010. Without

considering the effect on the flags, the algebraic value of AL after the exe-

cution is exactly same as the old one. An algorithm which only tracks the

13

algebraic value of register may conclude that the state is same while this is

only the case for this particular value in AL. With the introduction of sym-

bolic value, the initial state of AL (the first eight bits of EAX) is marked

as:

EAX07 | EAX06 | EAX05 | EAX04 | EAX03 | EAX02 | EAX01 | EAX00

After the execution of instruction, the result is:

EAX05 | EAX04 | EAX03 | EAX02 | EAX01 | EAX00 | EAX07 | EAX06

With this symbolic representation, it is clear that the state of the EAX

register is changed by the instruction “ROL AL, 2”. A symbolic representa-

tion is thus correct in all cases, independent of what register contents might

be.

2.2.7 Weaknesses of the symbolic framework

While the symbolic representation is more correct than the algebraic rep-

resentation, it makes it more difficult to model basic arithmetic like ADD,

SUB, etc. Under the symbolic representation there is no real value inside the

CPU state. For instance, if an instruction “ADD EAX, 1” is executed. On

a bit level, the least significant bit of register EAX changes from “EAX00”

to “EAX00 + 1” and maybe also affect the following bits via the carry.

Without the information of the initial value of EAX, it is impossible

to judge whether a carry digit should be transferred to “EAX01” and to

decide whether the flag should be set or not. Solutions for this problem

will be discussed in Sections and of symbolic analyzer 2.2.8 and insertion

function 2.3.7 in the chapter genetic algorithm.

n practice, the symbolic representation is successful to model instructions,

such as: XCHG, MOV, PUSH, POP, NOP, etc., which only affect whole

registers of the CPU state (thus rather on a register level than on a bit

level).

2.2.8 Symbolic analyzer

To improve the descriptive capability of symbolic representation, a symbolic

analyzer can be introduced to solve some logical and arithmetical calcula-

14

tions. For instance:

XOR EAX, EAX

NOT EAX

XOR EAX, EAX

NOT EAX

After the execution of these instructions, the content in the last significant

bit is:

EAX01

name EAX01

content ¬((¬(EAX01∧EAX01))∧ EAX01)

Because of the transitive features of ∧ (bitwise exclusive or) and ¬ (bitwise

complement), and the mathematical properties of ∧, the content of EAX

should be the same as before and the snippet is harmless. An analyzer that

can solve equations of this form can be a further extension for our modelling

framework. It then can solve equations such as:(((EAX+EBX).3).2)/1)-

EBX/2-EBX/2, where . means ROR and / means ROL.

2.3 Genetic programming

2.3.1 Introduction

In keeping with work done during the previous thesis, we use genetic pro-

gramming to generate snippets according the specified requirement of the

original code to achieve the software diversity [20].

Improvements to the diversity scheme are made in 2 areas: Evolving

new ways to defeat attackers in the inevitable arms-race with crackers, and

improving the genetic model to produce ’better’ snippets. First, let’s begin

with the genetic computing algorithm.

15

2.3.2 Genetic computing algorithm

Definition 2.1. A genetic algorithm (or GA) is a search technique used in

computing to find exact or approximate solutions to optimization and search

problems.

A typical genetic algorithm requires two things to be defined [9]:

• A genetic representation of the solution domain.

• A fitness function to evaluate the solution domain.

In our case, the representation of the solution is the snippet which is a

sequence of instructions. The fitness function quantifies the harmlessness of

snippets. That’s why a CPU module, which can track the execution envi-

ronment, is needed. Once the genetic representation and the fitness function

are defined, GA proceeds to initialize a population of solutions and then

improve it through selection and reproduction. The execution procedure is

show below.

1. Initialization: choose initial population

2. Evaluate the fitness of each individual in the population

3. Repeat

(a) Selection: select best-ranking individuals to reproduce

(b) Reproduction: breed new generation through crossover and

mutation (genetic operations) and produce to offspring

(c) Evaluate the individual fitness of the offspring

(d) Replace worst ranked part of population with offspring

4. Until <terminating condition>

Table 2.4: Genetic computing algorithm pseudocode.

16

The main procedure can be categorized to four main steps: initialization,

selection, reproduction and termination.

2.3.3 Predefined vs. random Initial population

It is possible to generate the initial population randomly then using the

fitness functions to select the suitable snippets to participate the next round

of reproduction. In this way, the result is more unpredictable. The GA

will automatically select the best choice for us according to the definition of

fitness function. The cost for GA to discover a fit individual from a random

initial population is the disadvantage of this approach, but it allows us to

explore a space we otherwise wouldn’t.

In some case, we hope the final result have some similarity with the initial

one. The predefined initial population is a solution to achieve this goal.

Through setting up the initial population manually, a “clue” was given to

the GA and guide the production of the next generation.

So, it’s depending on the requirement of environment to decide which one is

better.

2.3.4 Fitness value and fitness function

Definition 2.2. A fitness function is a particular type of objective function

that quantifies the optimality of a solution in a genetic algorithm.

With these more optimal snippets, a new generation derived from them

will hopefully be even better.

The implementation and evaluation of the fitness function is an important

factor in the speed and efficiency of the algorithm. The fitness function shows

the fitness value for each individual and the genetic algorithm depends on

the grads of the fitness value to find the point of best fitted individual. The

term fitness landscape can be considered as another way of looking at the

fitness function. In the section of Discussion of convergence 2.3.8, a further

discussion is introduced.

For our implement, a combination strategy of both initial fitness value and

17

fitness function were involved. To decide the value of an instruction, initially,

a fitness value is assigned to it depending on the complexity and the practical

using rate of each instruction. If we expect one instruction have a more

chance to present in the following generation, a higher initial fitness value

should assign to it. Then, the snippet, which consists of several instructions,

has a total fitness value. Base on this value, a farther verification occurs

depending on the relationship between the combinations of instructions.

Below, there are some factors which can be applied in the fitness function:

harmless, semi-harmless harmful snippet

A harmless snippet is the snippet which has no harm to all the states of

execution environment after its execution. It is the ideal snippet we want and

it can be inserted into any place of original code. But in practice, complete

harmless snippets normally only have simple structures and the instructions

which can be applied in the snippet are limited. This feature makes it difficult

to generate diverse result making the locating of the snippets easier for the

cracker.

Semi-harmful means the snippet which influence only one or less states

of the CPU. This kind of snippets is more real and also more difficult to be

tracked. These snippets can be categorized into different level as reference

of fitness score.

• Lv1. affects one or some flags

• Lv2. swap or assign the content of registers

• Lv3. swap the content in bit level

• Lv4. lose information in bit level

To insert the snippets in Lv1 and Lv2, an insertion function, which will be

mentioned later, has to be established first for locating the suitable positions

which can satisfy the requirements of them. For the snippets in level 3, “rol

%eax, $16” for instance, swap the first and last sixteen bits of EAX on a 32

bit architecture. It still has but very low chance to restore the content by

18

rotating it back. But for the snippet in level 4, “shl %eax, $8” for instance,

the first eight bits already lost and can not restore anymore.

The harmful snippet is the worst snippet for GA. Snippets that not only

change CPU states but also mess up some state in bit level, overwrite the

original contents in the stack and are stuck in the dead loop belong to this

category. These snippets bring more complexity for GA to successfully evolve

into harmless snippets because of the destructive influences. If this kind

of snippets had been chose to build next generation, it’s difficult for the

algorithm to converge to a good result.

In practice, we first design an encouragement strategy which assigns more

fitness value to the harmless snippet, less fitness value for the semi-harmless

snippet and punishment for the harmful snippet.

The length of snippet

The length snippet itself is not a crucial factor of fitness function since the

restriction of length is not necessary in our case. But the longer the snippet,

the trickier combination of instructions can be implemented. Long snippet

gives more space to perform the complex combination of instructions. We

even can let the snippet really do something instead of junk codes.

Snippet01 Snippet02

.L1: jmp .L3

pushl %ebx .L2:

movl %ebx, %eax addl %eax, $1

subl %eax, $10 cmpl %eax, %ebx

jmp .L2 jne .L2

.L4: popl %ebx

jmp .L4

.L3:

There are snippet01 and snippet02 as described above. If only looking

at the snippet01, it changes the content of register EAX and can only insert

into the original code in places where the contents of the EAX register are

19

no longer needed. But associating with snippet02 together, the two snippets

as one united snippet is harmless. After the stepping into of snippet01, the

execution flow then directly jumps to snippt02 where the register EAX is

restored. Then it jump back snippet01 again and return to the flow of the

original code. Because there is an unconditional jump at the beginning of

snippet02, it’s safe to be inserted any where of the original code. Here we

only demonstrate the possible solution and didn’t consider the effect on the

flags. In practice, more complex chain actions happen.

Compared with the long snippet, the short snippet contains fewer com-

plexities. But frequent short snippets might blend in more with the sur-

rounding code. In our implement, a variable is established to refer the de-

sired length of the snippet. A departure from this predefined value decreases

the fitness value of this snippet.

Homogeneity of instructions in a snippet

A snippet’s homogeneity is directly related to its fitness as illustrated in

section 5.2.3. The idea of homogeneity of snippets is coming from the result

of attack in the worst scenario 4.2.1. The results show that the snippet

should have similarities with the blocks Before, After and After Branch of

the critical node 3.2.4.

the other parameters

There are many other parameters which can be taken as the reference of

fitness function. For instance, the number of repeated instructions in one

snippet can be considered as a factor of fitness function. Because the snippets

made up of repeated instructions lack complexity to prevent being located by

a cracker. But the trade-off of which attribute gives the main contribution

and how many attributes may decrease the gradient of fitness landscape have

to be consider depending on the environment.

20

2.3.5 Roulette-wheel selection

There are several generic selection algorithms. Certain selection methods

rate the fitness of each solution and preferentially select the best solutions.

Other methods rate only a random sample of the population, as this process

may be very time-consuming. In our experience, a computing algorithm

based on the roulette-wheel selection is built, which is one of the popular

and well-studied selection methods.

Figure 2.5: Roulette-wheel selection

The selection process is more like a roulette wheel game in which each

candidate solution represents a pocket on the wheel. The sizes of the pockets

are proportionate to the probability of selection of the snippets. Selecting N

snippets from the population is equivalent to playing N games on the roulette

wheel.

The candidate snippets with a higher fitness will have more chance to be

selected. There is also a chance that some weaker snippets, which have less

fitness values, may survive the selection process. Though these snippets may

be weak, it may include some combinations of instructions which could prove

useful following the reproduction process. The main steps of roulette wheel

are shown below.

1. Normalize the fitness value. Normalization means multiplying the fit-

ness value of each individual by a fixed number, so that the sum of all

fitness values equals 1. The population is sorted by descending fitness

values.

2. Compute the accumulated normalized fitness values. The accumulated

21

fitness value of an individual is the sum of its own fitness value plus the

fitness values of all the previous individuals. The accumulated fitness

of the last individual should of course be 1.

3. Randomly select individual until a purpose population reached. A ran-

dom number R between 0 and 1 is chosen. The selected individual is

the first one whose accumulated normalized value is greater than R.

It’s obvious that this selection algorithm cause the snippets in the pop-

ulation pool to converge to a same figure, because the genetic algorithm is

a optimal searching algorithm and the snippet with high fitness has chance

to be chosen several times. It’s true that the modification of this step can

give the next generation more diversity. We can force the selection to ignore

the snippet which already been choose up to a threshold, keep the diverse

population for the production of the next generation. But in this way, we

influence the scheme of genetic algorithm. A better choice is dynamically

increase the probability of the mutation operation when the difference in

the population drops below a threshold. Then through more mutation, the

diversity of population spread. In a naıve way, we can just use the charac-

teristic of random selection and force it to stop early within few generations.

After executing the genetic algorithm several times, only the best solution

(snippet) of each loop is picked and a new population base on them is build.

2.3.6 Genetic operator for reproduction

In genetic algorithms, the genetic operator is used to vary the programming

of a chromosome or chromosomes (instruction or snippet in our case) from

one generation to the next. These processes result in the next generation

population of snippets is different from the previous generation. Generally,

the average fitness value will have increased by this procedure for the pop-

ulation, since only the best candidate from the first generation are selected

for breeding. At the same time, a small proportion of less fit snippets are

selected because of the roulette-round selection.

22

Insertion crossover

There are several type of crossover had been introduced in the genetic algo-

rithm [9]. Depending on the environment of our specified case, the uniform

length of individual is not necessary and the rate of successful generation is

more important. From this point, an insert crossover, which is just directly

insert one snippet into another snippet, have been chosen as the main genetic

operator because of success at producing harmless snippets. As the Figure 2.6

showed, the insert point generate randomly.

Figure 2.6: Insertion crossover

The disadvantage of this kind crossover is that after several generations

the length of snippet will increase swiftly. Allowing the fitness function to

reward or punish snippets for their length is a solution for this problem. A

parameter can be defined for the optimal snippet length and snippets whose

length diverges from this preference receive a lower fitness value.

After generations of genetic algorithm, the insertion crossover has a high

rate to combine two same snippets to a redundancy successor. This repeti-

tive composite within the snippet result in the convergence happening more

swiftly and finally present a bad result. To avoid this tragedy happen, we can

dynamically decrease the rate of insertion crossover and increase the rate of

cut and splice crossover (see next section). Also, more mutation operations

in evolution will help to restrain this tendency.

Cut and splice crossover

The “Cut and splice” crossover operation is following the idea of one-point

crossover of previous thesis. For distinguishing from the popular one-point

23

crossover in the genetic algorithm, we use “cut and splice” as the new name

which describes the function more clearly.

The insertion crossover is more or less ensuring for keeping the snippet

survives in the next generation, but the “cut and splice” crossover is more

dangerous and may create harmful successors even all their parents are harm-

less in the previous generation.

Figure 2.7: “Cut and splice” crossover

Each parent has a separate choice crossover point and the instructions

after each point just swap to generate the children. This kind of crossover

has a stronger effect to the characteristic of snippet. In the worst case, the

state of CPU after executing this kind of snippet will be corrupted at the

bit level. By trade-off, it brings more diversity and also keep some level of

characteristic of parents comparing with the insertion crossover.

Randomly shift mutation

Some argue that crossover is the most important, while mutation is only

necessary to ensure that potential solutions are not lost. Others argue that

crossover only serves to propagate innovations originally found by muta-

tion. There are many references in Fogel [13] that support the importance

of mutation-based search. And it’s clear that the mutation operator bring

more diversity to a population. On a search level, mutations help the GA

explore random portions of the domain’s search space thus helping it avoid

being trapped by local minima. Please see the section of convergence as

reference 2.3.8.

In our case, a randomly shift mutation operator is established as illus-

trated in the Figure 2.8. The instruction of A and B, which are randomly

24

Figure 2.8: Randomly shift mutation

chosen, are swapped after the mutation. From the view of instructions level,

it’s risky to change the order of execution. The change of CPU state is unpre-

dictable and some specified instruction may cause the modification in the bit

level rather then state variable level. It’s much difficult to restore it and for

genetic algorithm to find the totally harmless one in the further evaluations.

But in some case, if only one or several CPU states, which are not in the bit

level, are changed, the result is still possible work with the combination of

insertion function.

Different with the procedure of previous thesis, we didn’t randomly choose

one instruction from the instruction pool to replace one in the snippet. In

this way, the probability to become harmless is greater than randomly picking

up from instruction pool.

2.3.7 Discussion of insertion function

The genetic algorithm is an optimal searching algorithm which is tries to

locate the best solution depending on fitness value. But for our purpose,

the best solution is not needed. Instead of that, we expect diverse snippets

with acceptable fitness values and as diverse as they can. We have shown

4.1 that the totally harmless snippets generated by our genetic algorithm

work. But the snippet diversity and combination of snippet instructions is

extremely limited. If we simply insert these snippets into an assembly code

randomly, it does not cause too much trouble for a cracker to locate the

crucial instruction. A more tricky insertion algorithm is needed to decide

where to insert it and in which condition to insert it. To challenge the

BMNS algorithm 3.3.1, the genetic algorithm should use instructions similar

25

to those found in the critical sections and the same crucial instruction within

the original codes to confuse the cracker.

1. Locate the crucial instruction.

2. Choose the snippets with this specified instruction from the

predefined snippet library as the initial population.

3. Run the genetic algorithm several loops to generate diverse

results.

4. Choose the harmless and semi-harmless snippets in the results

as the candidate of insertion.

5. Use insertion function to locate the suitable position where

the CPU states satisfy the requirement of specified snippet

and add the compatible snippet inside the original code.

Table 2.6: Insertion algorithm pseudocode.

The insertion function here makes the semi-harmless snippets functionally

harmless in some particular environments. For instance, there is a semi-

harmless snippet only change the zero flag to 0. The insertion function has

to find a place where the ZF is already 0 or where it is not live at all.

Figure 2.9 above shows part of original program code where a “jz” in-

struction occurred. Here in the Basic Block 01, which is after label L1 and

before the next instruction which can modify the flag register, we can declare

that the zero flag is stabilized at 0. So, the semi-harmless snippet can be

inserted here because the reset of flag zero to 0 did not cause any risk for the

execution of original code.

To implement this algorithm, a snippet library for each potential instruc-

tion has to be defined first. Through tuning the fitness function, we can

give the genetic algorithm various tendencies to generate the snippets which

26

Figure 2.9: JZ conditional jump

change specified CPU states. A snippets library can be built by sorting the

result snippets based on their tendencies. Then, insertion function only needs

to pick up the snippet from this library according to the involved environment

of original code.

2.3.8 Discussion of convergence

In evolutionary algorithms convergence means the population contains sub-

stantially similar individuals. Individuals in the population with the better

phenotypes [26] are selected to have more children than the less fit ones. The

selection according “surviving of the fittest” reduces the spread of phenotypes

in the population. The spread in the population is unequally reduced, so that

individuals are more tightly clustered about the better phenotypes discov-

ered so far [17]. The crossover and mutation operations spread the genotypes

again. But after many generations in genetic algorithms, selections and ge-

netic operations cause phenotypes to become concentrated. The population

spread caused by mutations is balanced by selection.

In many problems, GA may have a tendency to converge towards local

optima rather than the global optimum of the problem. This means that

it does not know how to avoid short-term fitness to gain longer-term fit-

ness. This problem occurs depending on the shape of the fitness landscape,

27

the nature of the problem, the quality of the representation of the problem

domain.

An obvious alternative for this searching process is for our explorer to

start at some point in the landscape and simply follow ascending gradients.

This approach is called hill climbing. If the explorer cannot see the landscape

around him he could still climb a hill by choosing the steepest direction.

Figure 2.10: Sketch of a fitness landscape. The arrows indicate the preferredflow of a population on the landscape, and the points A, B, and C are localoptima. The red ball indicates a population. [9]

In our case, after several generation of genetic algorithm, all the snippets

in the population pool converge to a same sequence of instructions and seldom

change any more. It can be considered the population (red ball) has already

climbed up the local peak A. But it is clear the peak B is the highest peak in

this landscape and if this is only a segment of problem domain, the highest

peak is uncertain.

This problem may be solved by using a different fitness function, increas-

ing the rate of mutation, or by using selection techniques that maintain a

diverse population of solutions. But the No Free Lunch theorem [34] proves

that there is no general solution to this problem.

But, the potential goal for our project is to achieve the snippet diversity,

not to searching the best fitted snippet for the fitness function. With the

28

random selection characteristic of genetic algorithm, we can reach some level

of diversity in the middle phase of the evaluation. As the evolution progresses,

the genetic algorithm tends to drop this diversity and converge to one optimal

result. So, for preserving the diversity, it is better to stop the evaluation when

the difference in the population drops to a threshold. We can simply take

the snippets in the population here or change the mutation rate and continue

the evaluation to explore other parts of the search space.

One naıve solution is that we can simply take the snippets in the pop-

ulation when the threshold is reached and repeat the algorithm again. The

diversity of result is more depending on the characteristic of the random se-

lection. Another solution is that we continually increase the mutation rate

when the convergence is up to the threshold and maintain the evaluation

to explore the undetected domain until an appropriate level of difference is

satisfied.

29

Chapter 3

Attacking targeted

diversification

3.1 Introduction

This chapter is dedicated to a developed method that specifically attacks

software protected by targeted diversification. We named this method Best

Matching Node Search (BMNS). The target of this attack method is to test

and see the effectiveness of protecting software through targeted diversifica-

tion.

We start the chapter by mentioning some theory about the Control Flow

Graph (CFG) (Section 3.2.2) and searching graphs (Section 3.2.3). After

the chapter is dedicated to the explanation of the BMNS algorithm (Sec-

tion 3.3.1) and finally at explaining the Longest Common Subsequence (LCS)

(Section 3.4.1) and how it can assist us to attack a diversified software pop-

ulation.

The idea of the attack method is simple. It is based on the following

assumption and information.

Assumption 3.1. The critical instruction is a conditional jump.

Assumption 3.2. We know which the critical instruction is, where it is

located in each diverse instance of the program, and which instructions sur-

round it.

30

The first assumption is safe as long the protection of the software is im-

plemented on high level programming languages with the use of “if then

else” structures. These structures will produce at the binary level some con-

ditional jump instructions. The cracking techniques that tampers the con-

ditional jump instructions are the most common and are known us “branch

jamming” [20].

The second assumption it actually refers that the attackers can analyse

and tamper the binary code successfully on one instance of a diversified

program.

The attack method is generating a tree similar to the CFG of the pro-

gram and looks in that tree for a “fingerprint” that will locate the critical

instruction. To achieve that, the attacker generates heuristics from the sur-

rounding instructions around the critical conditional jump instruction. Using

this heuristics a search algorithm traverses the tree and locates the most sim-

ilar conditional jump which should be the desired critical instruction.

Also we investigate if the attacker can have more than one instance from

the diverse population. Then the attacker can use comparison algorithms

that will automatically generate for him the instructions that are common

around the node in all instances, getting this way the “fingerprint”. With

those instructions the attacker can generate automatically the needed heuris-

tics to locate the critical instruction.

3.2 Program representation and program anal-

ysis

3.2.1 Graph data structures

A graph is an abstract data type structure. It is abstract because we can

represent many different data types as graphs. A graph is a very general

data representation. The graph data structure concept is taken directly from

the graphs in mathematics.

A graph consists from two different sets of objects. The first set of objects

31

is called either points, or nodes, or vertices. We will refer to it as nodes. The

second set of objects is called either edges, or lines, and we will refer to it as

edges.

So a graph consists from a set of nodes and a set of edges that establish

relationships (connections) between the nodes. In proper graphs which are

undirected an edge from the node A to the node B is considered to be the

same as the edge from the node B to the node A. If the graph is a directed

graph (digraph), we consider that each direction has a different directed edge.

Definition: [10] A graph or undirected graph G is an ordered pair

G := (V, E) that is subject to the following conditions:

1. V is a set, whose elements are called points, vertices or nodes,

2. E is a set of pairs (unordered) of distinct vertices, called edges

or lines.

The vertices belonging to an edge are called the ends, endpoints,

or end vertices of the edge.

V (and hence E) are usually taken to be finite sets, and many of the

well-known results are not true (or are rather different) for infinite

graphs because many of the arguments fail in the infinite case.

The order of a graph is |V |, the number of vertices.

A graph’s size is |E|, the number of edges.

The degree of a vertex is the number of other vertices it is connected

to by edges.

Table 3.1: Graphs definition.

In practice two main data structures for the representing graphs is used:

1. The adjacency list, which is implemented by representing each node as

a data structure that contains a list of all adjacent nodes.

2. And the adjacency matrix, in which the rows and columns of a two-

dimensional array represent source and destination vertices and entries

32

in the graph indicate whether an edge exists between the vertices.

Graphs are used in many areas of both mathematics and computer sci-

ence. It is a very flexible method to represent relationships between objects.

All trees can be represented by graphs but not all graphs are trees. The main

characteristic for a graph to be a tree is that there is a single unique path

along edges from the root to any particular node. That means the in a tree

not two edges will join together to the same node.

For further reading on graphs, see [4], [10] and [30].

3.2.2 Control Flow Graph

A Control Flow Graph (CFG) is a directed graph of the execution paths that

a program can follow. The nodes of the CFG are blocks of code, called basic

blocks. Each basic block starts in one entrance point and finishes when the

first jump is encountered. The jump is also the block’s end point. Thus basic

blocks do not contain any jumps or branches themselves. The directed edges

of the CFG are the jumps that exit from each block. This jumps direct to

the start of the next block. In the CFG there are two special blocks, the

entry block and the exit block. The entry block is the part of the program

that the execution begins. The exit block is the block which all execution

ends.

At Table 3.2 you can see the Pascal code of a small program and at

Figure 3.1 you see the control flow graph of the program [29].

33

FOR i := 0 to 30 DO

BEGIN

s := a[i];

IF s < 0 THEN

a[i] := (s+4)∧2

ELSE

a[i] := cos(s+4);

b[i] := s+4;

END

END

Table 3.2: A CFG example.

Figure 3.1: The CFG of the example at Table 3.2.

The CFG is mainly used by compilers for optimizing the source code.

Optimizations like detecting dead loops are performed with the use of the

CFG. Furthermore is the CFG also used for static analysis tools.

For further reading on Control Flow Graph, see [2], [10] and [22].

3.2.3 Searching graphs

Definition 3.1. Depth first search: [25] Any search algorithm that considers

outgoing edges of a vertex before any neighbours of the vertex, which is, out-

going edges of the vertex’s predecessor in the search. Extremes are searched

34

first.

This is easily implemented with recursion. An algorithm that marks all

vertices in a directed graph in the order they are discovered and finished,

partitioning the graph into a forest.

Depth first search (DFS) is a searching algorithm used to traverse in a

tree, tree structure, or a graph. Usually one starts from the root node of

the tree (select any node as root node in the case you are exploring a graph)

and explores as deep as possible reaching the first leaf of the tree before

backtracking.

DFS is a uniformed search that always expands the first child node that

appears at the search tree and goes deeper and deeper until it finds the goal

node, or until it reaches a node that does not have any child nodes (leaf

node). Then the algorithm backtracks, returning to the last non expanded

node and explores it. DFS is implemented as a recursive algorithm or in

non-recursive implementations, all new expanded nodes are added to a LIFO

(Last In First Out) stack for exploration.

A problem with DFS is that some search trees have a bigger length than

can be contained in memory, when DFS is searching a tree like that it suffers

from non-termination and cannot find the solution. Also if a tree has an in-

finite loop in the structure, then again DFS will suffer from non-termination

and will never look some of the tree nodes. The simple solution of “re-

membering the visited nodes” does not always work because of insufficient

memory. The solution given is to maintain an increasing limit of the depth

of the tree, this searching method is called iterative deepening depth-first

search.

For the graph shown at Figure 3.2 and considering that all edges are

bidirectional a depth-first search starting at A, assuming that the left edges

in the shown graph are chosen before right edges, and assuming the search

remembers previously-visited nodes and will not repeat them (since this is a

small graph), will visit the nodes in the following order: A, B, D, F, E, C,

G.

Performing the same search, without remembering previously visited no-

des, results in visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc.

35

Figure 3.2: Searching a graph example.

Getting caught in the A, B, D, F, E cycle forever, and never reaching C or

G.

To conduct a DFS search:

1. Form a one-element queue consisting of a zero-length path

that contains only the root node.

2. Do until the queue is empty,

(a) Remove the first path from the queue.

(b) Create new paths by extending the first path to all the

neighbours of the terminal node.

(c) Reject all new paths that introduce loops.

(d) Add the new paths, if any, to the front of the queue.

Table 3.3: The DFS algorithm.

Another algorithm to explore a graph, tree, or tree structure is the

Breadth first search (BFS).

36

Definition 3.2. Breadth first search: [24] A search algorithm that considers

neighbours of a vertex, that is, outgoing edges of the vertex’s predecessor in

the search, before any outgoing edges of the vertex. Extremes are searched

last.

For the graph shown at Figure 3.2 a BFS algorithm starting from node

A, and assuming that the left edges in the shown graph are chosen before

right edges, and also assuming that the search remembers previously-visited

nodes and will not repeat them (since this is a small graph), the algorithm

will visit the nodes in the following order: A, B, C, E, D, F, G.

To conduct a BFS search:

1. Form a one-element queue consisting of a zero-length path

that contains only the root node.

2. Do until the queue is empty,

(a) Remove the first path from the queue.

(b) Create new paths by extending the first path to all the

neighbours of the terminal node.

(c) Reject all new paths that introduce loops.

(d) Add the new paths, if any, to the back of the queue.

Table 3.4: The BFS algorithm.

The main difference that DFS has with BFS can be clearly seen at Ta-

bles 3.3 and 3.4. The last line of the algorithm in the DFS adds the new

paths at the front of the queue forcing the algorithm to traverse first the

depth, but in BFS the new paths are added at the back of the queue.

Space complexity of DFS is much lower than BFS. It also lends itself

much better to heuristic methods of choosing a likely-looking branch. Time

complexity of both algorithms are proportional to the number of vertices plus

the number of edges in the graphs they traverse O(|V |+ |E|).

37

For further reading on Depth, Breadth first search, see [8], [32] and [33].

3.2.4 Representation of a program as a search tree

Before mentioning how we search and identify a certain node with our algo-

rithm, we present a binary tree that is a simplification to the control flow

graph 3.2.2. A downside of our tree representation is that it is infinite. How-

ever, this has no practical implications to our algorithm as it is capable of

detecting loops.

To explain how we can represent any program as a tree first we must

explain how the conditional jump instructions operate. A conditional jump

instruction checks the state of one or more status flags from the control

register (EFLAGS ) and, depending of the state of the flags it performs a jump

to a target instruction, changing the execution of the program /refDUJMP.

A small example: the jump carry (jc) instruction would first check the

state of the carry flag and, if the carry flag was set to 1 the instruction would

jump to a target instruction or otherwise (the carry flag was set to 0) the

execution of the program would continue normally to the next instruction

after the jc. The state of the flags changes from the execution of instructions

and from the processor states.

We can now represent any program as a search tree if we consider the

conditional jump instructions (jc, jnc, je, jne, jo, jno, etc) as the tree nodes.

These nodes always expand in two separate branches.

Before each of the tree nodes we will have a number of instructions.

We will refer to those instructions as the Before node instructions. After

every node the will be two sets of instructions. The first set that we will

refer to as the After node instructions are the instructions that will follow if

the conditions of the node are not met (sometimes called the “fall-through”

path). The second set of instructions is the instructions that the node would

jump at if the conditions are met, we will refer to this set as the After Branch

node instructions. The later is sometimes called the ”target” path as well.

Figure 3.3 illustrates a node and the three blocks.

Following the execution of the program we generate the tree. All the

38

Figure 3.3: The three node blocks.

instructions that we encounter we place them in the instruction sets (Be-

fore, After, After Branch). If we encounter a jump instruction jmp at the

execution then we jump at the stated location and we do not include the

jump instruction in the instruction sets. Also the node instructions are not

included in the instruction sets.

We must make some assumptions to simplify the generation of the tree.

Assumption 3.3. The conditional jump we are looking for, it must not be

a dynamic jump.

Assumption 3.4. There are no dynamic jmp instructions that we must

follow.

Assumption 3.5. The call instruction will always be followed by a ret in-

struction that will give back the control to the calling location.

Assumption 3.6. Also we assume that we can disassemble properly and

read only the “opcode” of the instructions and ignore their address based

parameters.

The first two assumptions refer to jump instructions that would generate

the jump target address by the use of a register. This instruction is difficult

to follow because we need to execute all the instructions that would modify

the register to be used.

39

The third assumption is a simplification telling us that it is not necessary

to follow the call instructions because every call should return to the next

instruction. If the critical instruction is inside in one of the subroutines called

by the call instruction then we can search that subroutine separately.

The last assumption is needed for the search algorithm to look only the

constant information and not information that would change because of di-

versification. The fourth assumption it can be implemented by making a

table that has how many bytes we must read for the instruction code and

how many for the parameters for each instruction see Section A.1. This pro-

tects the algorithm from reading data bytes that have the same hexadecimal

number with the instructions we are looking for and understanding them

wrongly.

Table 3.5 represents a small program in assembly that we will use as an

example of how we create the search tree. You can notice that the parameters

for most of the instructions are emitted; we only keep parameters for the

conditional and unconditional jump instructions.

0: addl 7: subl 14: decl 21: addl

1: subl 8: jc 7 15: cmpl 22: addl

2: cmpl 9: addl 16: jc 20 23: imul

3: jc 8 10: addl 17: addl 24: idiv

4: jmp 6 11: incl 18: addl 25: jc 2

5: addl 12: imul 19: cmpl 26: addl

6: cmpl 13: subl 20: jc 10 27: nop

Table 3.5: The example program.

The program execution starts with the add instruction followed by the

subl, cmpl, and reaching to our first node jc 8. This will also be our starting

node.

The instructions that were found before (addl, subl, cmpl) are kept as the

Before set of that node.

40

Next the algorithm will expand the node until it finds two new nodes or

the end of the program.

It expands the node by first following the instructions that would be

executed if the conditions of the conditional jump instruction were not met

and creates the After set with the following instructions (cmpl, subl) and we

find the next node jc 7.

If you look in the program you will notice that the next instruction is

jmp 6, the jump instruction is not added to the sets but instead it is followed.

Because of that the fifth instruction addl is not added to the set because it

will not be executed in this case.

Last step to expand the node is to follow the conditional jump as if the

conditions where met. That means we jump at the eighth instruction and

we encounter again the node jc 7 but without encountering any instructions

before. This creates the After Branch set of node jc 8 to be empty.

Continuing like this we construct the tree of Figure 3.4. The nodes

coloured blue are loops. We consider a loop when the node has the same

before instructions and reaches at the same node. We can see in the tree

that the node jc 7 is repeated many times. It can actually be reached with

three different ways. The three different approaches to the instruction (jc 7 )

give us three different nodes.

We will represent a node by x : y, where:

• x is the address of the node.

• y is the entrance address via which the node was reached.

The node representation for the first node jc 8 will become 3:0. Also the

node jc 8 is encountered as node 3:2 at the bottom of the tree see Figure 3.5.

The node jc 7 has three different representations 8:4, 8:8, 8:7 ; the node jc 20

has two representations 16:9, 16:10 ; the node jc 10 is represented as 20:17

and 20:20 ; last is the node jc 2 which is represented only as 25:21.

Using this representation is easier to detect loops. We only need to com-

pare the address of the node and the entrance address instead of comparing

the instruction blocks. We need to consider the nodes different by the en-

trance address, and not only the node address. This way the same conditional

41

Figure 3.4: The represented search tree.

42

Figure 3.5: The represented tree with node representation.

43

jump instruction when is reached from different entrance address, will have

different Before block and thus could give different heuristic value.

3.3 An attack algorithm

3.3.1 Best Matching Node Search

Before explaining the algorithm it is necessary to explain what we mean

when we refer to a node. For our search algorithm a node is the different

conditional jump instructions in the program. Earlier at Section 3.2.4 we

explained how these nodes are created and how we can represent a program

as a tree by following its execution.

The goal of the Best Matching Node Search algorithm is to find the node

that marks the critical instruction of any program that belongs to the diverse

population of the original program. These programs will have the critical

instruction in different locations because of the diversification. The BMNS

algorithm generates a search tree with the conditional jump instructions as

the tree nodes. After the algorithm searches the tree to find the node that

has the highest similarity with the critical instruction of the program. This

similarity measure is actually a “fingerprint” of the found critical instruction

from the attacker. We create this “fingerprint” from the instructions that

are executed before and after the conditional jump instruction.

This efficient search is to be used for the automation of the attacking

procedure against diverse software population. This search should be able to

locate the critical instruction that the attacker needs to patch in a diverse

population of software.

Our aim with this algorithm is to find out if the harmless snippets are

sufficient to hide the critical instruction from search attacks. If with good

heuristics we can easily and sufficiently detect the location of the critical

instruction and if the cost of using an algorithm like this is small then the

harmless snippets are not capable of blocking the automation of the cracking

procedure thus failing their initial goal.

For the selection of the heuristics we assume that the attacker knows

44

which instructions are modelled for use as snippets. This way we can avoid

selecting those instructions for heuristics and thus making the similarity mea-

sure more efficient. This assumption is merely for optimisation purposes and

is in no way required for the BMNS algorithm to successfully locate the

critical instruction.

Initialize Q with first Node

WHILE Q IS NOT empty

Take first Node from Q

StartAddress = Node.StartAddress

CurAddress = Node.StartAddress

WHILE Tree[CurAddress ] IS NOT empty

Collect Before Block

IF Found a Node THEN

After = ExpandNode(CurAddress + 1)

IF NOT EXIST AfterNode IN Q THEN

Add AfterNode(ExpAddress :CurAddress + 1)) to Q

END IF

AfterBranch = ExpandNode(Node.JumpAddress)

IF NOT EXIST AfterBranchNode IN Q THEN

Add AfterBranchNode(ExpAddress :Node.JumpAddress) to Q

END IF

Compute Node.Heuristics

IF Node.Heuristics > BestNode.Heuristics THEN

BestNode.Heuristics = Node.Heuristics

END IF

Add Node to ExpandedQ

Remove Node from Q

END IF

Increase CurAddress

END WHILE

END WHILE

Table 3.6: The BMNS algorithm.

45

At Table 3.6 you can see the BMNS algorithm in pseudocode. Also a more

detailed pseudocode version and an implementation in C++ of the algorithm

can be found at the Appendix A.3 and after demand.

The BMNS algorithm is based on the DFS algorithm. We only made

modifications to meet the specifications of our specific problem, and thus it

should retain the same time complexity with the DFS algorithm.

To calculate the time complexity of the BMNS algorithm first we must

calculate the time complexity of the heuristics function, which is O(n) with

n in the worst case being the maximum size of the blocks, because the block

size is chosen to be much smaller than the program size this will give us a

constant time complexity O(1) which we can ignore.

The complexity for the expansion of one node is again O(n) with n being

now the total length of the node, at the worst case the n is the length of

the whole program if the program has only one node. For the expansion

of all nodes this will give us O(m1 + ... + mi + ... + mn) with n being the

number of the nodes and mn is the length of each node. Finally because the∑ni=1 mi gives us roughly the length of the program we can write that the

time complexity for expanding all nodes will be O(l) where l is the length of

the program.

To complete the time complexity for the algorithm we need to calculate

the traverse between the nodes. This time complexity is O(n) with n being

the number of nodes the program has. Finally this gives us a total complexity

of O(n+l) with l being the length of the program and n the number of nodes.

Indeed the BMNS algorithm retains the same time complexity with the DFS

or BFS algorithms. The complexity calculations were based on [4].

3.3.2 Heuristics

Two fundamental goals in computer science are:

1. Finding algorithms with provably good run times.

2. And with provably good or optimal solution quality.

46

A heuristic is an algorithm that gives up one or both of these goals. For

example, it usually finds pretty good solutions, but there is no proof that the

solutions could not get arbitrarily bad; or it usually runs reasonably quickly,

but there is no argument that this will always be the case. Therefore, we

would like to define a heuristic algorithm as follows:

Definition 3.3. A heuristic algorithm is a programming strategy based on

trial-and-error methods and feedback evaluation. It does not guarantee opti-

mal solutions or good execution times, but it is often usable in practice due

to its reasonable good results.

3.3.3 Heuristics of Best Matching Node Search

At a non diverse program population, the critical instruction would remain at

the same position and have exactly the same parameters. This fact makes it

easy to always locate and change in the same way for the whole population,

and thus, it allows us to automate the cracking of the program. This is

sometimes called a “global attack” and is the “Class break” of the protection

system. On the other hand, through diversification of the program each

instance has the critical instruction at a different location and most likely

with different parameters. This makes the automation of the cracking more

complicated. The crackers can still locate and change the critical instruction

in their instance of the program, but can they locate it automatically in all

diversified instances?

In this section we will discuss how we create a “fingerprint” using the

assembly instructions which are not affected by the diversification.

We know that the snippets that are inserted are harmless, and that they

do not actually change the “semantics” of the assembly program. Hence, a

snippet when executed must restore the program state back to its initial state

before the snippet started being executed 2.3.4. The thing that snippets do

change, is that they relocate parts of the assembly code. This relocation also

changes a part of the assembly instructions, as it is shown at Section A.1. It

is important that the actual assembly source remains the same and just has

noise (harmless snippets) in between it.

47

Taking in account the above, and that the snippets do not actually change

the critical instruction or the neighboured instructions, we can generate a

“fingerprint” that will identify the critical instruction. This is done by fol-

lowing the execution flow of the program. Besides of the inserted harmless

snippet instructions, we should also encounter the normal instructions of

the program that remain the same. Finding those instructions allows us to

generating a “fingerprint” that will identify the critical instruction in all or

almost all instances of the diverse population.

To find the critical instruction, we will look at the surrounding instruc-

tions. There are three different blocks of surrounding instructions to look at

as we saw at Section 3.2.4.

Figure 3.6: A node and its surrounding instructions.

For example, in Figure 3.6 we see the instructions around the node 20:17

from the example program in Figure 3.5. If we take those instructions for

heuristics we have the heuristic blocks at Table 3.7. The asterisk following

an instruction indicates that we ignore the parameters of that instruction,

and the asterisk that separates two instructions indicates that between those

48

instructions any number of other instructions can be found and will be ig-

nored.

Node read blocks

Before After After Branch

1 addl * pushl * addl *

2 movl * movl * pushl *

3 pushl * subl * exch *

4 movl * popl * movl *

5 movl * cmpl * popl *

6 incl *

Heuristic blocks

Before After After Branch

1 addl * addl * addl *

* * *

2 addl * addl * incl *

* * *

3 cmpl * imul * imul *

* *

4 idiv * subl *

*

5 decl *

*

6 cmpl *

Table 3.7: Example of calculating heuristic value of a

node.

Searching now the execution tree of the program, it is very rare to find

a node different from node 20:17 that has the exact same surrounding in-

structions. This enables us to look for the node 20:17 without knowing its

location in the assembly. Also by relaxing the searching criteria and only

49

looking for the best matching surrounding instructions then we can ignore

possible snippets that have been inserted around and close to the node.

A problem that appears, is that if the selected instructions for the heuris-

tics are from an inserted snippet, then in the other instances those instruc-

tions would not exist. This results in the possibility to find several matches,

including false positives.

The three heuristics blocks can be used in a variety of ways with each

way creating a different heuristics function. We use them by comparing each

of the three heuristic blocks with the equivalent constant blocks that were

expanded around the node. The compare means that we must find the same

instructions in the constant blocks at the same order. We could use weights

to each of the found instruction but mainly we add the order in which the

instruction where found. So the first instruction has a magnitude of one, the

second of two, etc.

The weight parameter for the heuristics can play an important role if one

block can give a much greater value from the other blocks as we illustrated

in the previous example. We could increase the weights at the Before and

After blocks so that are closer to the After Branch block value. This would

protect of finding a node that is similar only to the After Branch block and

not similar in the Before and After block.

Another important use of the weights could be that if there are multi-

ple similar nodes that have differences only at one heuristic block then the

weights could increase the value of that specific block making it more impor-

tant and discarding this way some duplicate nodes.

As an example we reconsider the blocks shown at Table 3.7. The Before

block compared to the “Before heuristics” will produce a heuristic value of

1 for the addl instruction, the After block with the “After heuristics” will

give us a heuristic value of 0 for not finding even the first addl instruction

and finally the After Branch block will have a heuristic value of 3 for finding

an addl instruction in the beginning and also finding the second instruction

of the heuristics block the incl at the end. The node will have in total a

heuristic value of 4. The maximum heuristic value that a node could have

using those heuristics is 6 for the before, 10 for the after and 21 for the after

50

branch; in total a node can reach a heuristic value of 37. Different heuristics

give different maximum values.

At Table 3.8 we can see the assembly source from a compiled program.

This program sorts three numbers. Instead of implementing a swap function

to sort the three numbers we purposefully repeated the same swap source

code to make the three swaps. This way there is a big similarity at the

instructions of the program. We use this example to show how similar nodes

look like and also to show that even in that case we can find heuristics that

can locate each of the nodes.

0: movl $-1, -4(%ebp) 16: movl %eax, -4(%ebp) 0: movl 16: movl

1: movl -8(%ebp), %edx 17: movl -8(%ebp), %eax 1: movl 17: movl

2: movl -12(%ebp), %eax 18: movl %eax, -16(%ebp) 2: movl 18: movl

3: cmpl %eax, %edx 19: movl -4(%ebp), %eax 3: cmpl 19: movl

4: jge 11 20: movl %eax, -8(%ebp) 4: jge 11 20: movl

5: movl -12(%ebp), %eax 21: movl -12(%ebp), %edx 5: movl 21: movl

6: movl %eax, -4(%ebp) 22: movl -16(%ebp), %eax 6: movl 22: movl

7: movl -8(%ebp), %eax 23: cmpl %eax, %edx 7: movl 23: cmpl

8: movl %eax, -12(%ebp) 24: jge 31 8: movl 24: jge 31

9: movl -4(%ebp), %eax 25: movl -16(%ebp), %eax 9: movl 25: movl

10: movl %eax, -8(%ebp) 26: movl %eax, -4(%ebp) 10: movl 26: movl

11: movl -8(%ebp), %edx 27: movl -12(%ebp), %eax 11: movl 27: movl

12: movl -16(%ebp), %eax 28: movl %eax, -16(%ebp) 12: movl 28: movl

13: cmpl %eax, %edx 29: movl -4(%ebp), %eax 13: cmpl 29: movl

14: jge 21 30: movl %eax, -12(%ebp) 14: jge 21 30: movl

15: movl -16(%ebp), %eax 31: cmpl $-1, -4(%ebp) 15: movl 31: cmpl

32: jne 0 32: jne 0

Table 3.8: The sort three example program with and without

parameters.

At Figure 3.7 is the program of Table 3.8 represented as a tree. It is easier

to notice at the tree representation that all nodes of the programs only use

the instruction movl and look very similar between them.

51

Figure 3.7: The represented search tree.

52

The difference is mainly at the amount of movl instructions each of the

node has. An important notice is that in a real implementation part of

the instruction parameters would be read and give different values to the

instructions. For example if we look at instructions 5 and 6 we can notice

that the have different parameters comparing constant with register. Those

parameters would not change from diversification and would give different

heuristic values that we could use.

But even without taking in account the extra parameters we can notice

that the node 14:5 has the greatest amount of movl instructions both at

the Before instructions and at the After instructions. This makes that node

unique from the other nodes. Also node 32:25 is unique for having the great-

est amount of movl instruction in the After Branch block. For the rest of

the nodes 4:0 and 24:15 unfortunately any heuristics that we can select will

guide us of having duplicate results. The main reason for that is node 14:5

which will always get at least the same heuristic value with those two nodes.

3.3.4 Selection of heuristics and automated generation

We saw how we can generate a search tree out of the assembly of a source,

we saw how we can search that tree and also we saw that the heuristics

should be some of the neighbourhood instructions. Next we define better

which specific instructions must be used for heuristics, and investigate the

possibility of improving our heuristics automatically.

Having a single instance of the program to be attacked forces us to select

manually which instructions will be used for the heuristics. It is very im-

portant to know that some instructions are more relevant as heuristics than

others. The main reasons for that are:

Lemma 3.1. The rarity of an instruction in the assembly. The rarer it is,

the more useful that instruction will be for the matching.

Lemma 3.2. The knowledge if an instruction has been modelled for snip-

pets. If an instruction has been modelled for snippets and is used in harmless

material then that instruction is a poor choice for use in heuristics.

53

Lemma 3.3. The continuation of some instructions could be unique. Some

instructions might have a specific order of appearance that will make them

different in any part of the source. This instructions as a set should be used

to identify the location.

Let us elaborate on these rules of thumb. The rarer an instruction is,

then the lower the probability to encounter it close to a node. That makes

the instruction a good candidate for the search algorithm.

An instruction that is modelled for diversification could easily be found

almost everywhere. Also the worst is that in a different diversified instance

that instruction could never be found! If by accident we use an instruction

that is from a harmless snippet because at that point it looked important

then in a different instance of the program that instruction could be missing!

This could disrupt completely our entire analysis.

The continuation of the instructions is a very important element. It is

actually the whole idea of the “fingerprint” search. In an assembly even the

rarest of the instructions likely will occur multiple times. But how probable

is that a part of source code that has a different function from another part

will have the same instructions at the exact same order?

From the above lemmas we can define a characterizing block as.

Definition 3.4. A characterizing block is one that even if we perform the

BMNS algorithm with heuristics only for that block we still find the target

node.

That means that the node has a sequence of instructions that are unique

in the whole assembly. When a node has characterizing blocks it is easier to

locate it. And always that block should be used for the heuristics.

It is interesting also to examine the case that the attacker has more

than one diversified instance of the program at his disposal. In that case

the attacker could easily develop an automation tool for the selection of the

heuristics.

For this automation tool the attacker would use the searching algorithm

to find the three instruction blocks around the critical instruction. This

procedure would be repeated for each of the different instances the attacker

54

has. Thus the attacker would get multiple times the instruction blocks around

the critical node. This blocks then can be compared between them and give

an idea of the original instructions that are included.

For this comparison procedure a diffing algorithm can be used. The

attacker would use a custom made diffing tool that would generate him the

instructions that are common to the blocks of all instances he has access at.

These common instructions could be used for the selection and improvement

of the heuristics.

A diffing tool used for the purpose of automating the generation of heuris-

tics should have the following specific characteristics:

1. It will be able to compare from two instances up to n instances of the

diversified program.

2. It will only return the Longest Common Subsequence (LCS) of the

instances and not the differences.

It is necessary that the diffing tool is able to compare multiple instances.

The diversified population might have similar snippets which will generate

many common instructions around the nodes. This makes the diffing proce-

dure more difficult and demands more instances for the automated creation

of the heuristics. The more instances that the attacker has, the more efficient

the automated generation of heuristics will be.

It is important that even with two only instances the diffing can help the

generation of the heuristics by disregarding many snippets.

The attacker mainly will use the diffing as a guideline to get efficient

heuristics faster.

For more information on diffing see Chapter 3.4.1.

3.4 A comparison algorithm

3.4.1 Longest Common Subsequence

The longest common subsequence problem, is about finding between two or

more sets of sequence the largest common part. In most of its applications

55

it is used for two different sets.

The LCS problem is considered to be NP-hard for comparing n different

sets. There exist algorithms that can find the LCS for two different sets in

polynomial time [14], but the general solution for n sets is solved in expo-

nential time. The recursive algorithm shown in Table 3.9 solves the LCS

problem for two sets in exponential time.

FUNCTION lcs(x, y)

n = length(x), m = length(y)

IF length(x) = 0 OR length(y) = 0 THEN RETURN “”

best = lcs(x[1, n - 1], y[1, m])

IF length(best) < length(lcs(x[1, n], y[1, m - 1])) THEN

best = lcs(x[1, n], y[1, m - 1])

END IF

IF x[n] = y[m] AND length(best) < length(lcs(x[1, n - 1], y[1,m - 1]))

THEN best = lcs(x[1,n-1],y[1,m-1]) + x[n]

RETURN best

Table 3.9: A recursive LCS algorithm [1].

To solve the LCS problem in polynomial time dynamic programming

must be used. A polynomial time algorithm is shown at Table 3.10. Even

if the algorithm at Table 3.10 has polynomial complexity it unfortunately

needs a lot of memory. The memory space grows quadratically Θ(n2). When

comparing texts the algorithm can be improved by the use of hashing tables.

This improves both the speed and memory requirements.

Hashing will replace the strings with numbers. This will reduce the mem-

ory needed from the algorithm by having to handle just numerical identities

of the strings. Also it will improve the execution time because instead of

comparing text the algorithm will compare numbers which is much faster.

56

FUNCTION lcs(x, y)

n = length(x), m = length(y)

FOR i = 0 TO n

FOR j = 0 TO m

IF i = 0 OR j = 0 THEN table[i, j] = “”

IF x[i] = y[j] THEN table[i, j] = x[i]

ELSE table[i, j] = table[i - 1, j]

IF length(table[i, j] < length(table[i, j - 1])) THEN

table[i, j] = table[i, j - 1]

END IF

IF x[i] = y[j] AND length(table[i, j]) < length(table[i - 1, j - 1])

THEN

table[i, j] = table[i - 1, j 1] + x[i]

END IF

END FOR

END FOR

RETURN table[n, m]

Table 3.10: A dynamic LCS algorithm [1].

Another optimization that is often used, is to first compare the beginning

and ending part of the sets, and then use the algorithm only for the part

of the sets where differences occur. This optimization reduces the size of

the sets and can give for big sets that have small differences really faster

execution. In the worst case that the first and last elements of the sets are

different it only costs two extra comparisons.

The dynamic programming algorithm actually keeps an MxN array L

which contains the length of all the common subsequence that the two sets

have. The array is filled by following the recursive approach shown at Ta-

ble 3.11. It takes O(mn) time to fill the array and the last L[m, n] element

of the array contains the total length of the LCS.

57

L[i, j] =

0 if i = 0 or j = 0

L[i - 1, j - 1] + 1 if i, j > 0 and ai = bj

max(L[i, j - 1], L[i - 1, j]) otherwise

with 0 ≤ i ≤ m and 0 ≤ j ≤ n

and a1a2...am, b1b2...bm the sets to compare

Table 3.11: Calculating the length array L.

At Table 3.12 you can see two examples of two sequences and there longest

common subsequences. We can notice that even in small examples the prob-

lem produces multiple solutions.

Using the dynamic LCS algorithm and the second example from Table 3.9

we can construct Table 3.13 that displays the LCS length table. The number

shows the length of the LCS. To read this table and find the LCS one needs

just to follow where the number changes. For example L[1, 1] equals 1 which

give us the element A, the next change is in L[2, 3] where the cell value equals

2 and the element is C. Finally the last element D will be given from L[4, 4]

that equals 3.

Example 1. Sets A = CBADAABCC, B = ABBCADACBDDBA

Largest Common Subsequences = BADAC, BADAA,

BADAB, CADAA, CADAB, . . .

Example 2. Sets A = ABCD, B = ACBDC

Largest Common Subsequences = ABC, ABD, ACD

Table 3.12: An LCS example.

We must notice that most of the used algorithms only find and use the

first LCS that the encounter. Finding all of the LCS that is contained at

the length table it requires an extra algorithm that has time complexity

O(mn) [14].

The LCS is further used to generate the Shortest Edit Script (SES) (Ap-

pendix A.2) which is the smallest script that transforms a sequence set to

another one. Generating automatically the SES is equal of generating the

58

LCS. Usually referring to diffing means to generate the SES.

0 1 2 3 4

A B C D

0 0 0 0 0 0

1 A 0 1 1 1 1

2 C 0 1 1 2 2

3 B 0 1 2 2 2

4 D 0 1 2 2 3

5 C 0 1 2 3 3

Table 3.13: A length table generated by the dynamic

LCS algorithm.

For further reading on the Longest Common Subsequence problem, see

[1], [11], [14], and [15].

3.4.2 Longest Common Subsequence at a Diverse Pop-

ulation

An algorithm that finds the LCS of multiple instances could be used to

attack a diverse population of software. The idea of this attack would be

that extracting the LCS will be equal on extracting the original source code.

For the LCS to be equal with the original source code we assume that all the

differences of the diverse population are introduced with the use of snippets.

Then that software can be cracked and published. Or someone could make

a patch with the SES that would transform any instance to the cracked LCS

version.

The first thing we should ask is if an algorithm like that is possible?

Usually diffing is done to two or even three instances. An algorithm for n

instances is feasible but because the problem is as shown at Section 3.4.1

NP-hard this algorithm will run slowly. But that is not the real problem

of this cracking method. The algorithm could be executed until comple-

tion on a suitably powerful machine, requiring neither monitoring nor user-

59

intervention. Even if the algorithm would take a few days to execute, it

would still yield the desired result.

Using the cracked LCS can be prevented by calculation a checksum for

every diverse instance, this checksum will be depended on the diversity of

the instance. Then this checksum can be checked to ensure that the correct

diverse instance is being executed. This security could be used from and for

the libraries of the application making it difficult to modify all of them.

Still the above approach can be successfully attacked. There is another

also technique that can be easily used to protect the software from extracting

the LCS. As we have seen the snippets inserted are harmless but their in-

structions are harmful. A small number of similar snippets could be inserted

on the same locations of each of the instance with the purpose of modifying

the LCS and inserting really harmful instructions in it. At Table 3.14 you

can see two snippets that when an algorithm extracts the LCS will leave a

trace of that snippets and will also modify them from harmless to harmful.

Considering the similarity between the snippets there should be a signif-

icant possibility that some of the instructions contained in the snippets will

be included to the LCS. Of course that will destroy the program by leaving

instructions that will be harmful.

Snippet 1 Snippet 2 Snippets’ LCS

pushl %eax exch %eax, %ebx

movl %ebx, %eax pushl %ebx

exch %eax, %ebx movl %eax, %ebx exch %eax, %ebx

popl %eax popl %eax popl %eax

Table 3.14: Two similar snippets and their LCS.

So we can conclude that implementing an LCS algorithm for n instances

is possible, but the result from the LCS algorithm will not be suitable for

attacking the diverse software.

60

Chapter 4

Experimental results

4.1 Diverse snippet generation by genetic com-

puting algorithm

4.1.1 Predefined initial population vs. Random initial

population

The figure 4.1 is the experiment of genetic computing algorithm under one

hundred random initial population, fifty generations, thirty percentage rate

of crossover and thirty percentage rate of mutation. The random generated

snippets have a fixed length of three. We select this length for the initial

population so the insertion crossover operator will yield not lengthy harmless

snippets. The figure 4.2 shows an experiment under the same conditions but

with twenty predefined initial population. Each snippet is harmless and is

manually created with a random length. We can see that the behavior of

the GA, by the number of different and harmful snippets is different between

predefined and random initial population.

Even most of the randomly generated snippets in the initial population

are harmful. After several generations, the genetic computing algorithm

significantly decreases the number of harmful snippets. Also, based on the

curves of best fitness value and average fitness value, the GA is improving

the snippets step by step.

61

Figure 4.1: Random initial population

Figure 4.2: Predefined initial population

62

A difference between randomly generated initial population and prede-

fined initial population is that the curves of different snippets and harmful

snippets are more fluctuant. It’s reasonable because the structure and combi-

nation of instructions in predefined initial population have more complexity

comparing with the short randomly generated initial population.

4.1.2 Tuning the parameters of genetic computing al-

gorithm

Based on different combinations of operation rate, we tested the influence of

genetic operation. All the results are under the conditions: fifty generation,

one hundred random initial population with fixed length four, same fitness

function. All the experiments had yield out reasonable harmless snippets.

Figure 4.3: Crossover 0.05% Mutation rate 0.5%

Comparing with the results of figure 4.3 and figure 4.4, it is surpris-

ing to find out that the result with low rate of crossovers and high rate of

mutations have a better average fitness. Also the average fitness converges

to the best fitness value. From the observation of the decrease velocity of

63

harmful snippets, the crossover operators indeed have more influence to the

health of snippets. This means that our design of random shift mutation is

not capable enough.

Figure 4.4: Crossover 0.5% Mutation rate 0.05%

4.1.3 Conclusion and weakness

Using a strategic combination of early stopping (when a threshold of dis-

tinctness in the population is reached), careful rectification of the fitness

function and genetic operators, the genetic computing algorithm is shown to

be suitable for creating diverse snippets.

As mentioned in the section 2.3.8, genetic algorithm as an optimal search-

ing measure has the tendency to converge to one best result. According to

our experiments, at the beginning first five steps the difference in the pop-

ulation drop significantly. By tuning the parameter and fitness functions of

genetic algorithm, we can slow this process and even bring more diversity to

the next generations. But by the nature of convergence, this diversity orig-

inates on few individuals and eventually leads to limited snippet diversity.

For instance:

64

Snippet01 Snippet02 Snippet03

pushl %ebx nop nop

exch %eax, %ebx pushl %ebx pushl %ebx

pushl %ebx exch %eax, %ebx exch %eax, %ebx

movl %eax, %ecx pushl %ebx pushl %ebx

nop exch %eax, %ebx exch %eax, %ebx

popl %eax movl %eax, %ecx movl %eax, %ecx

pushl %ebx movl %eax, %ecx movl %eax, %ecx

movl %eax, %ecx movl %eax, %ecx movl %eax, %ecx

nop popl %eax movl %eax, %ecx

exch %eax, %ebx popl %ebx nop

movl %eax, %ecx popl %eax

nop popl %ebx

popl %eax

popl %ebx

Table 4.1: Generated snippets.

At Table 4.1 there are three snippets that were automatically generated

by the genetic computing algorithm after fifty generations. It is clear that

these snippets are similar and may come from one same individual.

We only implemented few instructions because of the complexity of chain

effects of assemble instructions. It’s reasonable to hypothesis that the intro-

duction of more complex instructions will decrease the rate of successfully

generating harmless snippets but will increase the diversity of the snippets.

65

4.2 Attacking diversified software containing

snippets

4.2.1 Best Matching Node Search experimental results

The following experiments were performed to evaluate the effectiveness of

protecting software through diversification by inserting harmless code snip-

pets. It is important to know if the simple harmless snippets that are mod-

elled can actually make a difference in protecting software.

For the experiments we used a number of different tools.

1. A genetic computing algorithm utility was used, for generating diverse

harmless snippets.

2. A snippet insertion utility was used, which takes the generated snippets

and inserts them randomly in an existing assembly file. Also this utility

transforms the assembly labels to line numbers like a compiler would

transform the labels to addresses.

3. A custom diffing utility was used for the automated generation of

heuristics for each node of the assembly file.

4. A search utility was used, which performs the BMNS algorithm 3.3.1.

All four utilities were implemented and you can find the source code at

the Appendix.

The goal of our experiments was to locate the same conditional jump

instructions in a diversified program population with the use of the BMNS

algorithm.

We used four different snippet libraries of ten snippets each that were

generated by the genetic computing algorithm utility. We also took four

different source codes and compiled them without assembling to get their

assembly. Using the snippet insertion utility we generated ten different in-

stance of each of the assemblies and of each of the four different libraries.

This gave us sixteen different sets of diversified population, each having ten

66

differently diversified programs. The snippets were inserted randomly in the

assemblies. The amount of snippets was set to be from 100 to 150, exactly

how many snippets were inserted in each assembly was random.

The four generated libraries all contained “Type A” snippets 2.2.5. The

snippets were different in size, used instructions and order of the instructions.

Assembly File File 1 File 2 File 3 File 4 Snippet Results

9.00 10.001 10.00 9.00

Snippet 6.671,2 8.661 5.002 10.001

Library 1 10.00 10.00 10.00 8.00

10.00 3.251,2 10.00 10.001 8.72

10.00 10.001 10.00 10.001

Snippet 10.00 6.52 5.002 5.001,2

Library 2 10.00 10.00 10.00 2.002

7.00 10.001 10.001 8.751 8.39

5.001,2 10.00 10.001 10.00

Snippet 10.001 10.00 5.002 9.00

Library 3 9.001 10.00 10.001 5.001,2

9.00 10.001 10.001 9.501 8.84

10.00 10.001 10.00 10.001

Snippet 5.501,2 5.002 10.00 10.00

Library 4 10.001 10.001 4.001,2 10.00

10.00 10.001 10.00 10.001 9.03

File Results 8.82 8.96 8.69 8.52 Total = 8.75

All the results are out of ten.1 Manually modified heuristics.2 Identical nodes exist.

Table 4.2: The experimental results with “Type A” snip-

pets.

For each of these sixteen different sets we selected randomly four different

nodes, which we considered to contain critical instruction. We then used the

automatically generated heuristics from the diffing utility to try and locate

67

each node at each diverse assembly. If the automated heuristics located

the node at nine out of ten times then we kept that result. Otherwise by

modifying the heuristics manually we tried to improve the results. The results

of how often we locate a node by the use of the BMNS algorithm, can be

seen at Table 4.2.

The automated heuristics were generated by looking at the node only

from one calling block. Some nodes might have multiple calling blocks and

for those nodes we could use either one of the different calling blocks to

generate the heuristics. Usually before modifying the automated heuristics

we try the automated heuristics from the rest of the calling blocks.

The times that the automated heuristics were not sufficient enough to

locate the correct node we manually modified them. The modified heuristics

are marked on the table with 1.

The results are how many times the BMNS algorithm found the correct

node out of the ten diverse instances. When the result is a decimal number,

it means that the BMNS algorithm found both the correct node as well as

other duplicate nodes. This behaviour is observed for the following reasons:

1. The heuristics are not adequate.

2. The node looks very similar or even is identical with other node(s) in

the program.

3. The snippet insertion destroys the “fingerprint” of the node.

For the results at Table 4.2 the heuristics used are of the best possible;

the results are, close to optimal. We have marked which results are from a

node that has identical or very similar other nodes with 2. In those cases we

usually find the critical node in all instances but we also get other nodes with

the same heuristic values. In the case that there exists one identical node to

the node we search, then the results of the BMNS algorithm are to find both

of the nodes with the same heuristic value and that gives us a score of five

out of ten. When a node is called from a different previous block we consider

it a different node, because of that sometimes we find more than once the

wanted node and this increases the success results.

68

For the results at Table 4.2 the heuristics used are of the best possible;

the results are, if not the best possible, from the best possible. We have

marked which results are from a node that has identical or very similar other

nodes with 2. In those cases we usually find the wanted node in all instances

but we also get other nodes with the same heuristic values. In the case that

there exists one identical node to the node we search, then the results of the

BMNS algorithm are to find both of the nodes with the same heuristic value

and that gives us a score of five out of ten. When a node is approached

from a different location we consider it a different node, because of that

sometimes we find more than once the wanted node and this increases the

success results.

Tables 4.3 and 4.4, show how many times modifying the automated

heuristics we increased the success results of the BMNS algorithm. Those

tables show how many identical nodes we encountered. Table 4.3 shows

this results compared to the assembly file and Table 4.4 shows this results

compared to the snippet library.

Assembly File File 1 File 2 File 3 File 4

# of identical nodes 3 3 4 3

# of modifying heuristics 6 9 5 9

All the results are out of sixteen.

Table 4.3: The assembly file results.

Snippet Library Library 1 Library 2 Library 3 Library 4

# of identical nodes 3 4 3 3

# of modifying heuristics 6 6 9 8

All the results are out of sixteen.

Table 4.4: The snippet libraries results.

Figure 4.5 shows us the average success results of the BMNS algorithm

depending the assembly file that is used. The magenta coloured line is the

69

Figure 4.5: Average results by assembly file.

total average success of the algorithm. We notice that the assembly file four

has the worst results. The node that was used for the seventh experiment on

that file had another four identical nodes. Because of that node success is

only two out of ten. That bad result affected the assembly file average result

significantly.

Figure 4.6 displays the average success results depending on the snippet

library used. Again the magenta coloured line represents the total average

success of the algorithm. The results show that snippet library two has the

smallest success. Again this result is affected much from the identical node

at the seventh experiment of the assembly file four. Looking at Table 4.4 we

notice that snippet library two had one extra identical node from the rest

snippet libraries.

At Figure 4.7, the magenta coloured line represents the total average

success, and the yellow coloured line represents the average success of the

individual file. We can notice that around four out of sixteen nodes of each

70

Figure 4.6: Average results by snippet library.

file have a success value less than the average. The results are similar also

for the snippet libraries Figure 4.8.

We can notice that eventually the success results are more affected by the

number of identical nodes than the actual characteristics of the “Type A”

snippets 2.2.5. The assembly files in average have three identical nodes in

the sample of sixteen random nodes. The rest of the nodes could be located

with a very good success rate.

A last observation about the above experiments. The snippets were in-

serted within, at least one of the three blocks of the critical node, more that

80 percent of the time.

The results at Table 4.5 were taken by performing the experiments with

a snippet library that is using “Type B” snippets 2.2.5. This library changes

the control flow graph of the program. This modification to the library

creates some extra difficulties for the attack. First it adds “fake” nodes

increasing the size of the tree significantly. Also if the added snippet is

71

Figure 4.7: Results of the assembly files.

Figure 4.8: Results of the snippet libraries.

72

inserted near the critical node the information about the heuristic block

there is mangled. We can see at Table 4.5 that the success of the BMNS

algorithm has been decreased to some nodes. We also can notice that some

nodes were less affected. We elaborate more on the influence of snippets at

Section 4.2.2.

“Type B” Snippet library

File 1 File 4

9 4.51

91 91

9 6.51

9 6.67 Total = 7.83

All the results are out of ten.1 Manually modified heuristics.

Table 4.5: The experimental results with “Type B” snip-

pets.

Finally we experimented with a snippet library that contains “Type C”

snippets 2.2.5. This means that the snippets imitate the instructions sur-

rounding the critical node. The problem with these snippets is that they are

not harmless and it is not known they can be made harmless by a genetic

computing algorithm and still keep their similarity with the critical node

instructions.

This experiment was performed on a specific node which it had character-

izing blocks 3.4 and with the previous snippet libraries would perform well.

The node was successfully located 5.66 times out of the 10 with heuristics

selected looking at only one diverse instance and reached 8.33 out of 10 with

heuristics taken after comparing the node from five diverse instances. As

can be seen from table 4.6, the results do not improve if we compare more

than five diverse instances. We also note that comparing only two instances

produces better information than comparing four instances. This could be a

result of comparing a bad instance containing many snippets.

73

# instances compared 1 2 4 5 7 10

Result 5.66 8 7 8.33 8.33 8.33

Table 4.6: The experimental results with “Type C” snip-

pets.

4.2.2 Weaknesses

The BMNS algorithm performed very well with “Type A” snippets 2.2.5.

But when the insertion of “Type B” snippets 2.2.5 was tested we observed

a decrease to the performance of the algorithm. We can hypothesise that a

node is affected more from inserting “Type B” snippets 2.2.5:

Hypothesis 4.1. When a node has only one characterizing block 3.4.

Hypothesis 4.2. When the node’s After Branch block is part of one of the

other block.

Hypothesis 4.3. If all three blocks get mangled from the insertion of the

snippets.

The first hypothesis is saying that we locate the node because a specific

block is unique and the other two blocks do not actually influence the result

of the BMNS algorithm then the probability of a snippet destroying the node

is big.

The second hypothesis says actually that the After Branch block is the

same with one of the other blocks. That means that mangling one of the two

blocks by inserting a snippet it will mangle also the other block.

The third assumption is a rare case for random snippet insertion. But

generally if the snippets destroy all three block by inserting them to each

block then that node will not have instructions at the surrounding blocks.

To better overcome the difficulties that snippets with conditional jumps

create the BMNS algorithm could prune the search tree by making the as-

sumption that it looks for a specific conditional jump as a critical instruction.

74

This would prune much of the nodes in the tree making the remainder of the

conditional jump instructions available for heuristic calculation.

Another modification that could be implemented in the BMNS algorithm,

is for the algorithm to keep a short history of the previous nodes, then using

that history it can calculate heuristics by combining the blocks. This mod-

ification would protect the destruction of the blocks and would improve the

results when the inserted snippets use conditional jump instructions.

We saw in the experiments that inserting snippets that are similar to the

critical node and combine the same conditional jump with the node affects

the result of the algorithm significantly 4.6. The reasons are the same with

inserting snippets with conditional jump instructions as explained above but

this specific case introduces an extra difficulty.

Hypothesis 4.4. Snippets that illustrate the critical instruction blocks affect

the BMNS algorithm more.

Adding “Type C” snippets 2.2.5 results in other nodes to have greater

similarity than the original node has after is affected by snippets. Even if our

experiment showed that the node is located successfully some of the times

it is very probable that with the use of a better insertion function that does

not inserts randomly the snippets the BMNS algorithm will have difficulty

locating the correct node within the best nodes.

Last if a model of diversification like the model that is presented at 5.2.2

becomes functional then the BMNS algorithm will fail on locating the critical

instruction. This diversification model could be attacked only by a program

that would analyse the execution and understand the use of the instructions.

An algorithm like that would actually crack every instance and any program

that is protected by the method that the algorithm knows.

That the BMNS algorithm sometimes locates duplicate nodes, which ex-

ist in the original source code, it is not an actual weakness for the BMNS

algorithm. Even if the algorithm finds a large amount of similar nodes, as

long as it locates the critical instruction within those nodes, the crack could

patch all of the nodes producing multiple cracked programs. Some of those

programs would be negatively affected by the patch but a simple execution

75

on each of those programs will reveal which is the correct cracked program.

4.2.3 Conclusions

After experimenting with diverse program instances, we concluded that tar-

geted diversification could improve the software protection under some con-

ditions. The BMNS algorithm presented at Section 3.3.1 shows that it can

locate the critical instructions in a diverse population most of the times.

The experiments certainly showed that snippets that do not alter the

CFG (Section 3.2.2) of the program will not hinder at all the automation of

the cracking. Altering the CFG complicates things for the attackers, but not

enough. While the original semantics remain unchanged inside the program

the attackers can find ways to locate it. Diversification should attempt to

alter real parts of the source code changing not only the CFG but also the

content of the binary code.

Looking from another perspective at software defence, someone could say

that a patch that partly disassembles and generates a CFG-alike tree to

locate the location that needs to be patched resembles to be an expensive

attack. But this depends much from the protected software. Small cheap

utilities that are distributed through internet could benefit from a protection

like that. But attackers certainly will use this more advance patch system to

attack expensive specialized applications. Also there exist crackers (white-

hats [18]) that attack software not for profit, but to display that all systems

can be cracked and find vulnerabilities to security systems.

76

Chapter 5

Conclusions

5.1 General Conclusions

Our work illustrates that the arms race between the “Defenders” and the

“Attackers” is a never ending game of evolving techniques to either protect or

to attack. Reality shows us that no matter the developed protection systems

after a short period someone will develop an attack method to defeat the

new protections measures.

Because genetic algorithms are mostly used as optimization algorithms,

the characteristics of convergence will finally destroy the diversity of the

snippet population. Using a strategic combination of early stopping (when a

threshold of distinctness in the population is reached), careful rectification of

the fitness function and genetic operations, the genetic computing algorithm

is shown to be suitable for creating diverse snippets. Despite that only a

limited number of instructions are implemented in our CPU model due to

implications of the symbolic representation, reasonably diverse and harmless

snippets were yielded based on an initial population of basic snippets. Com-

bining these results with the attack chapter results, one important point is

clear: snippets generated without considering instruction homogeneity with

the critical node only offer minimal resistance against the BMNS algorithm.

The BMNS algorithm is a new attack system that targets specifically

the protection systems that use diversification to hide the critical instruc-

77

tion of software. It is capable of searching in a simplified CFG and uses

a “fingerprint” to locate the critical instruction. The empirical results in

Chapter 4.2.1 show that diversification can be attacked by using this ad-

vanced search method. At the previous year thesis [20], it was claimed that

diversification protects from searching algorithms. This is true as long the

searching algorithm only searches the diverse file for specific bytes.

Based on our empirical experience, we can say that snippets which do

not modify the CFG of the program have a poor performance. To make

diversification a realistic protection method, the snippets have to modify the

CFG of the program. Even then the BMNS algorithm showed resilience and

managed to perform well enough.

We also experimented with snippets that both imitate the critical instruc-

tion and modify the CFG. The experiments that we performed with “Type

C” snippets revealed us the necessity of a targeted snippet insertion function

instead of the simple solution of random insertion function.

Already, the diversification starts to evolve towards methods modifying

and obfuscating the CFG so the naıve BMNS algorithm will loose the required

information to identify the critical node [19]. But again the analysis tools

are also improved to discover the effects of this methods [6].

5.2 Further work

5.2.1 For BMNS algorithm

Implementing and testing the BMNS algorithm suggested to us possible im-

provements and additions that could be implemented. Some of the additions

might only improve the results against the current diversification methods,

but there is an interesting improvement that would evolve the BMNS algo-

rithm and intimidate further and more advanced diversification systems.

The BMNS algorithm should namely be able to keep a history of visited

nodes. This way it could merge basic blocks that were separated by the

insertion of snippets that extend the CFG by inserting opaque control flow

transfers.

78

This modification should extend the BMNS algorithm to overcome snip-

pets that use conditional jumps to change the CFG. Even targeted snippets

that would be inserted directly to protect the critical node will be bypassed

by this addition.

5.2.2 A new diversification model

Diversification is a promising approach to protect software. We believe that

with appropriate diversification method the automation of cracking could be

thwarted to a great extent.

We want to propose a diversification system that actually changes the

structure of the software. This can be accomplished at the compilation level

of the software. To develop software someone can use different high level

programming languages and different compilers, resulting in a diverse pop-

ulation of program instances that all have exactly the same input/output

behaviour. The system we want to propose is based on a high level source

code that can be compiled in many different ways generating functionally

equivalent but structurally very different binaries.

Figure 5.1: Diversified compiler model.

At Figure 5.1 we present a schematic of the proposed diversifier compiler

model. Each time when the compiler needs to compile a high level instruc-

tion, it has to select from a pool of different assembly implementations, either

randomly or with more advanced methods. This comes down to compiling

one high level program to many different binaries.

A diversified population generated from a compiler like this will most

likely not have many common patterns for an automated crack to locate and

patch. This diverse population is not achieved with our current state-of-the-

art snippets and our diversification is working on certain parts of the software

79

itself.

Some easy strategies to implement on a compiler which will generate di-

verse population could be: randomly sorting the subroutines, changing the

used registers, randomly locating the program data, using inverted condi-

tional jumps, etc. It is interesting to note the difficulty to modify the control

flow graph. All the mentioned strategies do not actually modify the structure

of the control flow chart. The strategies should target to destroy as much

commonality as possible.

A diversified population as such will require either a search algorithm that

understands the meaning of the instructions, automatically performs analyse

and tampering, and thus actually automates the crack procedure. This would

resemble a universal cracker. Or it would require a very effective decompiler

which will return a very similar source code like the original high level source

code to recognize via a high-level semantic analysis which instruction is the

critical one.

5.2.3 Homogeneity of instructions in a snippet

At this moment, the generation of diverse snippets by our genetic comput-

ing algorithm has less relation with the original code. The initial fitness

value of each instruction could be based on the statistical analysis properties

of the original code, but the information of the structure and combination

not yet taken into account. Considering the attack results in the worst sce-

nario 4.2.1, only the snippet with the closest resemblance to the Before, After

and After Branch blocks of the critical node, results in a significant increase

in the cracking cost. Hence, for the maximizing snippet efficiency to obscure

crackers, homogeneity of instructions in a snippet must strongly influence an

individual’s fitness.

A strategy based on the weakness of BMNS is introduced.

Figure 5.2 shows part of the assembly instructions from the critical node

used by the BMNS algorithm. We defined an encouragement strategy ac-

cording to the different levels of the homogeneity. In this idea, if the auto-

generated snippet has the same structure with original assembly codes, a

80

Figure 5.2: Homogeneity encouragement strategy

great bonus (x 6) is assigned to the fitness value. If only part of the snippet

satisfies these requirements, fewer bonuses will be assigned. By this ap-

proach, more information from the original code is used to contribute to the

fitness value. Through the evaluation of a genetic algorithm, the population

will have a better tendency to converge to the homogeneous snippets.

We could measure the similarity of the snippets with the structure of the

original assembly by the use of an LCS algorithm [12].

81

Appendix A

Appendix

A.1 Assembly instructions in diversified soft-

ware

The assembly instructions have a specific format. The topic of this section

is to explain this format, and to show that only a small part of the assembly

instructions is modified when we use diversification. Because of that we can

use heuristics to locate a specific instruction in the assembly.

In Figure A.1 you can see the most general format of an assembly in-

struction. All assembly instructions will have a subset of the shown sections

and will always contain the “Opcode” part.

Figure A.1: Assembly instruction format.

First we must note that an assembly instruction is the series of some

82

bytes and we represent them in hexadecimal format. There exists also a

symbolic representation that is easier to read for humans and understand

the instruction.

Each part of the assembly instructions has a specific size range in bytes.

The first part “Instruction Prefixes” it is not commonly used any more. It

is from one to four bytes long and diversification does not modify it.

The second part of an assembly instruction is the most important. It is

the “Opcode”. This part specifies which specific instruction is going to be

used. It varies on size from one to three bytes. Some times also three bits of

the operand are included to the next part of the instruction. This part does

not change by diversification. This is the part that we will use mostly for

heuristics to locate the critical instruction.

The third part of an assembly instruction is the “ModR/M” part. This

part defines which register the instruction is going to use or which type of

memory addressing the instruction will use. Some encodings need also a

second byte that is called “SIB”. This fourth part is connected with the

“ModR/M” part and it only appears if it is required from the “ModR/M”

part. Those two parts are an extension of the “Opcode” and should be used

for the heuristics. There is a possibility that the diversification could change

part of this code by moving the distance an instruction jumps to. This would

force a change from a short jump to a long jump. Because we do not look

at the jump instructions but we only follow them, this specific modification

does not affect our method.

The fifth part is the Displacement. This part actually tells the instruction

to which memory to jump and take information. It is zero, one, two or

four bytes long. Depending of how far the program must jump to go to the

necessary data. This part is the main part of the instructions that is changed

from diversification. The rest parts of the instructions remain the same.

Last part of an assembly instruction is the Immediate. This part is similar

to the Displacement part. It can have a length of zero, one, two or four bytes.

This can be data or constant address in the source code. In the case of that

part containing an address it will be modified by the diversification.

Both for the “displacement” and the “immediate” part, their size is de-

83

fined at the “ModR/M” part of the instruction. So reading that part of the

instruction will guide us how to skip this information.

Let us illustrate an example of how the instruction mov eax, ebx (Intel

representation) will look at its hexadecimal format. The eax, ebx registers

are 32-bit registers, this tells us that the mov instruction must be the one for

32-bit registers. Looking at the manual we find that the operation code of

the mov instruction for 32-bit registers in hexadecimal value is 89. Next we

need to define the “ModR/M” part of the instruction that is for moving from

the eax to the ebx register. Again looking at the manual we find that the

“ModR/M” byte will have the hexadecimal value 18. Finally the instruction

mov eax, ebx has the value 89 18 in hexadecimal.

Now, let us look at a more complex example that contains an address

within. If for example we have the instruction mov [label], eax. This instruc-

tion tells to the processor to read from the memory location [label] and put

the data in the eax register. This instruction again is 32-bit mov instruction

and thus it will have the same operand code 89. But the “ModR/M” byte

now will change. Looking at the manual we find that the hexadecimal value

of the “ModR/M” will now be 05. This information is not enough for the in-

struction to be executed. We also need a displacement to the address [label].

So the last part of the instruction will be a 32-bit displacement address that

the value will change at each instance of the diverse population. Finally we

have the instruction 89 05 xx xx xx xx where the xx xx xx xx contains the

displacement address in hexadecimal form. Still we can see that a part of

the instruction does not change and might still be located with a search for

the appropriate opcode.

For further reading on assembly instructions, see [5].

A.2 Shortest edit script

The Shortest Edit Script (SES) is the smallest script that transforms a se-

quence set to another one. Generating automatically the SES is equal of

generating the LCS. When we refer to diffing we mean generating the SES.

The algorithm presented at Table A.1 generates the LCS/SES in polyno-

84

mial time for two different sets (a, b). The time complexity of the algorithm

is O(ND) where N is the sum of the lengths of the two sets and D is the

length of the produced SES.

CONSTANT MAX = M + N

INTEGER V[-MAX..MAX]

V[1] = 0

FOR D = 0 TO MAX

FOR k = -D TO D STEP 2

IF k = -D OR k 6= D AND V[k - 1] < V[k + 1] THEN

x = V[k + 1]

ELSE

x = V[k - 1] + 1

END IF

y = x - k

WHILE x < N AND y < M AND ax+1 = by+1

x = x + 1

y = y + 1

END WHILE

V[k] = x

IF x ≥ N AND y ≥ M THEN

Length of an SES is D

STOP

END IF

END FOR

END FOR

Length of an SES is greater than MAX

Table A.1: The greedy LCS/SES algorithm [21].

An example of an LCS and the SES of two sets can be found at Table A.2.

At the SES the D symbol means that delete so 1D, 2D means that we must

delete the character at position 1, 2 the I symbol stands for Insert and it

means that we must insert a symbol after the location, so 3IB means insert

85

B after character 3. All the modifications are considered to take place si-

multaneous so the locations mentioned are for the initial sequence. The SES

would modify the set A and produce the set B.

Sets A = ABCABBA and B = CBABAC

Largest Common Subsequences = CABA, BABA, CBBA...

Shortest Edit Script = 1D, 2D, 3IB, 6D, 7IC

Table A.2: An example of SES.

For further reading on the Shortest Edit Script see [11], and [21].

A.3 Best Matching Node Search algorithm in

pseudocode

GLOBAL HeuBefore, HeuAfter, HeuAfterBranch, Tree

READ HeuBefore, HeuAfter, HeuAfterBranch, Tree

CurAddress = 0, StartAddress = 0

NextInstruction = NULL, Node = NULL

Before = empty, After = empty, AfterBranch = empty

Q = empty, ExpandedQ = empty

WHILE Tree[CurAddress] IS NOT NULL

NextInstruction = Tree[CurAddress]

IF NextInstruction = “jmp <address>” THEN

CurAddress = $address

ELSE IF NextInstruction = conditional jump instruction THEN

Add Node(CurAddress:StartAddress) to Q

EXIT WHILE

ELSE IF NextInstruction = any other instruction THEN

Increase CurAddress

END IF

86

END WHILE

WHILE Q IS NOT empty

Take first Node from Q

StartAddress = Node.StartAddress

CurAddress = Node.StartAddress

WHILE Tree[CurAddress] IS NOT NULL

NextInstruction = Tree[CurAddress]

IF NextInstruction = Jmp $address THEN

CurAddress = $address


After = ExpandNode(CurAddress + 1)

AfterBranch = ExpandNode(Node.JumpAddress)

Node.Heuristics = CalculateHeu(Before, HeuBefore)

Node.Heuristics += CalculateHeu(After, HeuAfter)

Node.Heuristics += CalculateHeu(AfterBranch, HeuAfterBranch)

IF Node.Heuristics > BestNode.Heuristics THEN

BestNode.Heuristics = Node.Heuristics

END IF

Add Node to ExpandedQ

Remove Node from Q

EXIT WHILE


IF Before IS full THEN

Shift all instruction one place higher

Before[last] = NULL

END IF

Add NextInstruction to Before

Increase CurAddress

END IF

END WHILE

END WHILE

Table A.3: The BMNS algorithm in pseudocode.

FUNCTION ExpandNode(ExpAddress) RETURNS Block

87

Block = empty

StartAddress = ExpAddress

WHILE Tree[ExpAddress] IS NOT NULL

NextInstruction = Tree[ExpAddress]

IF NextInstruction = Jmp $address THEN

ExpAddress = $address


IF NOT EXIST Node IN Q THEN

Add Node(ExpAddress:StartAddress) to Q

END IF

EXIT WHILE


IF Block IS NOT full THEN

Add NextInstruction to Block

END IF

Increase CurAddress

END IF

END WHILE

RETURN Block

Table A.4: The expand node function in pseudocode.

FUNCTION CalculateHeu(Block, HeuBlock) RETURNS H

H = 0, I = 0, J = 0

WHILE Block [I ] IS NOT NULL

IF Block [I ] = HeuBlock [J ] THEN

H = H + 1

J = J + 1

END IF

I = I + 1

END WHILE

RETURN H

Table A.5: The heuristic function in pseudocode.

88

List of Tables

1.1 Merckx’s table of countermeasures . . . . . . . . . . . . . . . . . . 3

2.1 The CPU simulator pseudocode. . . . . . . . . . . . . . . . . . . . 9

2.4 Genetic computing algorithm pseudocode. . . . . . . . . . . . . . . 16

2.6 Insertion algorithm pseudocode. . . . . . . . . . . . . . . . . . . . . 26

3.1 Graphs definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 A CFG example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 The DFS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 The BFS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 The example program. . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.6 The BMNS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.7 Example of calculating heuristic value of a node. . . . . . . . . . . 49

3.8 The sort three example program with and without parameters. . . 51

3.9 A recursive LCS algorithm [1]. . . . . . . . . . . . . . . . . . . . . 56

3.10 A dynamic LCS algorithm [1]. . . . . . . . . . . . . . . . . . . . . . 57

3.11 Calculating the length array L. . . . . . . . . . . . . . . . . . . . . 58

3.12 An LCS example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.13 A length table generated by the dynamic LCS algorithm. . . . . . 59

3.14 Two similar snippets and their LCS. . . . . . . . . . . . . . . . . . 60

4.1 Generated snippets. . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 The experimental results with “Type A” snippets. . . . . . . . . . 67

4.3 The assembly file results. . . . . . . . . . . . . . . . . . . . . . . . 69

4.4 The snippet libraries results. . . . . . . . . . . . . . . . . . . . . . 69

4.5 The experimental results with “Type B” snippets. . . . . . . . . . 73

4.6 The experimental results with “Type C” snippets. . . . . . . . . . 74

89

A.1 The greedy LCS/SES algorithm [21]. . . . . . . . . . . . . . . . . . 85

A.2 An example of SES. . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.3 The BMNS algorithm in pseudocode. . . . . . . . . . . . . . . . . . 87

A.4 The expand node function in pseudocode. . . . . . . . . . . . . . . 88

A.5 The heuristic function in pseudocode. . . . . . . . . . . . . . . . . 88

90

List of Figures

1.1 Cracked software distribution. . . . . . . . . . . . . . . . . . . . . . 4

2.1 The CPU simulator. . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Class of snippet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 An instance of LabelMap. . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Class of register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Roulette-wheel selection . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Insertion crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.7 “Cut and splice” crossover . . . . . . . . . . . . . . . . . . . . . . . 24

2.8 Randomly shift mutation . . . . . . . . . . . . . . . . . . . . . . . 25

2.9 JZ conditional jump . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.10 Sketch of a fitness landscape. The arrows indicate the preferred flow

of a population on the landscape, and the points A, B, and C are

local optima. The red ball indicates a population. [9] . . . . . . . . 28

3.1 The CFG of the example at Table 3.2. . . . . . . . . . . . . . . . . 34

3.2 Searching a graph example. . . . . . . . . . . . . . . . . . . . . . . 36

3.3 The three node blocks. . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 The represented search tree. . . . . . . . . . . . . . . . . . . . . . . 42

3.5 The represented tree with node representation. . . . . . . . . . . . 43

3.6 A node and its surrounding instructions. . . . . . . . . . . . . . . . 48

3.7 The represented search tree. . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Random initial population . . . . . . . . . . . . . . . . . . . . . . . 62

4.2 Predefined initial population . . . . . . . . . . . . . . . . . . . . . 62

4.3 Crossover 0.05% Mutation rate 0.5% . . . . . . . . . . . . . . . . . 63

4.4 Crossover 0.5% Mutation rate 0.05% . . . . . . . . . . . . . . . . . 64

4.5 Average results by assembly file. . . . . . . . . . . . . . . . . . . . 70

91

4.6 Average results by snippet library. . . . . . . . . . . . . . . . . . . 71

4.7 Results of the assembly files. . . . . . . . . . . . . . . . . . . . . . 72

4.8 Results of the snippet libraries. . . . . . . . . . . . . . . . . . . . . 72

5.1 Diversified compiler model. . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Homogeneity encouragement strategy . . . . . . . . . . . . . . . . 81

A.1 Assembly instruction format. . . . . . . . . . . . . . . . . . . . . . 82

92

Bibliography

[1] The Algorithmist. Longest common subsequence. http://

www.algorithmist.com/index.php/Longest Common Subsequence, 2007.

[2] Frances E. Allen. Control flow analysis. SIGPLAN Not., 5(7):1–19, 1970.

[3] Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. Software piracy

prevention through diversity. In DRM ’04: Proceedings of the 4th ACM work-

shop on Digital rights management, pages 63–71, New York, NY, USA, 2004.

ACM Press.

[4] Timothy Budd. Classic Data Structures in Java. Addison-Wesley, 2001.

[5] Intel Corporation. IA-32 Intel Architecture Software Developers Manual, vol-

umes 1-3, 1997-2005.

[6] Mila Dalla Preda, Matias Madou, Koen De Bosschere, and Roberto Gia-

cobazzi. Opaque predicates detection by abstract interpretation. In Proceed-

ings of the 1st International Workshop on Emerging Applications of Abstract

Interpretation (EAAI06), pages 35–50, Vienna, Austria, 2006. ENTCS.

[7] Thomas Dullien. Graph-based comparison of executable objects. In In sym-

posium sur la Scurit des technologies de l’information et des communications.

University of Technology in Florida, 2005.

[8] Wikipedia The Free Encyclopedia. Depth-first search, breadth-first search.

http://www.wikipedia.org/, 2007.

[9] Wikipedia The Free Encyclopedia. Genetic algorithm, fitness landscape.

http://www.wikipedia.org/, 2007.

93

[10] Wikipedia The Free Encyclopedia. Graph (mathematics), graph (data struc-

ture), graph theory, control flow graph. http://www.wikipedia.org/, 2007.

[11] Wikipedia The Free Encyclopedia. Longest common subsequence problem,

diff. http://www.wikipedia.org/, 2007.

[12] H. Fashandi and A.M.E. Moghaddam. A new rotation invariant similarity

measure for trajectories. In Computational Intelligence in Robotics and Au-

tomation, 2005. CIRA 2005. Proceedings. 2005 IEEE International Sympo-

sium, pages 631–634, 2005.

[13] David B. Fogel. Evolutionary Computation: Toward a New Philosophy of

Machine Intelligence. Wiley-IEEE, 2006.

[14] Ronald I. Greenberg. Fast and simple computation of all longest common

subsequences, 2002.

[15] Ronald I. Greenberg. Bounds on the number of longest common subsequences,

2003.

[16] Markus Jakobsson and Michael K. Reiter. Discouraging software piracy using

software aging. In Security and Privacy in Digital Rights Management : ACM

CCS-8 Workshop DRM 2001. Springer Berlin / Heidelberg, 2001.

[17] William B. Langdon and Riccardo Poli. Foundations of Genetic Programming.

Springer, 2002.

[18] A. Main and P.C. van Oorschot. Software protection and application security:

Understanding the battleground, 2004.

[19] Anirban Majumdar and Clark Thomborson. Manufacturing opaque predi-

cates in distributed systems for code obfuscation. In ACSC ’06: Proceedings

of the 29th Australasian Computer Science Conference, pages 187–196, Dar-

linghurst, Australia, Australia, 2006. Australian Computer Society, Inc.

[20] Gert Merckxt. Software security through targetted diversification. Master’s

thesis, Katholieke Universiteit Leuven, 2005-2006.

[21] Eugene W. Myers. An o(ND) difference algorithm and its variation. Algo-

rithmica, 1(2):251–266, 1986.

94

[22] Chris Nielson Flemming Nielson Hanne R. Hankin. Principles of Program

Analysis. Springer, 2005.

[23] Thomas Obnigene. Dvd glossary.

http://www.filmfodder.com/movies/dvd/glossary/glossary.htm, 2007.

[24] National Institute of Standards and Technology. breadth-first search.

http://www.nist.gov/dads/HTML/breadthfirst.html, 2007.

[25] National Institute of Standards and Technology. depth-first search.

http://www.nist.gov/dads/HTML/depthfirst.html, 2007.

[26] Justinian P. Rosca. Analysis of complexity drift in genetic programming.

Genetic Programming 1997: Proceedings of the Second Annual Conference,

pages 286–294, 1997.

[27] Todd Sabin. Comparing binaries with graph isomorphisms.

http://www.bindview.com/Services/Razor/Papers/2004/

comparing binaries.cfm, 2004.

[28] Margaret Sackeyfio. Mathematical modeling of music downloading and online

piracy. Master’s thesis, Baruch College, 2005.

[29] Olin Shivers. Control-Flow Analysis of Higher-Order Languages or Taming

Lambda. PhD thesis, School of Computer Science Carnegie Mellon University

Pittsburgh, 1991.

[30] Jeremy P. Spinrad. Efficient Graph Representations. American Mathematical

Society, 2003.

[31] JSrg Tiedemann. Automatic construction of weighted string similarity mea-

sures. Department of Linguistics Uppsala University, 1999.

[32] Kent State University. Graph algorithms, depth first search (dfs), breadth

first search (bfs).

http://www.personal.kent.edu/∼rmuhamma/Algorithms/algorithm.html.

[33] Patrick Henry Winston. Artificial Intelligence Third Edition. Addison-Wesley,

1992.

95

[34] David H. Wolpert and William G. No free lunch theorems for search. IEEE

transactions on evolutionary computation, 1997.

96

software security through targeted diversiﬁcation...abstract this thesis is discussing about...

Documents