software security through targeted diversification...abstract this thesis is discussing about...
TRANSCRIPT
FACULTY OF ENGINEERING
THESIS SUBMITTED FOR THE PROGRAMME
MASTER OF ARTIFICIAL INTELLIGENCE
ACADEMIC YEAR 2006-2007
Software Security throughTargeted Diversification
Mantadelis Theofrastos
Du Xiaodai
Promotor : Prof. Bart Preneel
Daily leaders: Jan Cappaert
Nessim Kisserli
Contents
Contents iii
1 Introduction 2
1.1 Software protection . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Security through diversity . . . . . . . . . . . . . . . . . . . . 3
1.3 Targeted diversification . . . . . . . . . . . . . . . . . . . . . . 5
2 Defending with targeted diversification 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 CPU simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 The necessity of building a module of CPU . . . . . . . 7
2.2.2 Overview of the basic execution environment . . . . . . 7
2.2.3 Class design: states of CPU simulator . . . . . . . . . . 8
2.2.4 The model of snippet: bridge between CPU simulator
and genetic algorithm . . . . . . . . . . . . . . . . . . 10
2.2.5 Extension of the CSnippet class . . . . . . . . . . . . . 12
2.2.6 Modelling a register on a bit level . . . . . . . . . . . . 13
2.2.7 Weaknesses of the symbolic framework . . . . . . . . . 14
2.2.8 Symbolic analyzer . . . . . . . . . . . . . . . . . . . . 14
2.3 Genetic programming . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Genetic computing algorithm . . . . . . . . . . . . . . 16
2.3.3 Predefined vs. random Initial population . . . . . . . . 17
2.3.4 Fitness value and fitness function . . . . . . . . . . . . 17
2.3.5 Roulette-wheel selection . . . . . . . . . . . . . . . . . 21
i
2.3.6 Genetic operator for reproduction . . . . . . . . . . . . 22
2.3.7 Discussion of insertion function . . . . . . . . . . . . . 25
2.3.8 Discussion of convergence . . . . . . . . . . . . . . . . 27
3 Attacking targeted diversification 30
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Program representation and program analysis . . . . . . . . . 31
3.2.1 Graph data structures . . . . . . . . . . . . . . . . . . 31
3.2.2 Control Flow Graph . . . . . . . . . . . . . . . . . . . 33
3.2.3 Searching graphs . . . . . . . . . . . . . . . . . . . . . 34
3.2.4 Representation of a program as a search tree . . . . . . 38
3.3 An attack algorithm . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Best Matching Node Search . . . . . . . . . . . . . . . 44
3.3.2 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Heuristics of Best Matching Node Search . . . . . . . . 47
3.3.4 Selection of heuristics and automated generation . . . . 53
3.4 A comparison algorithm . . . . . . . . . . . . . . . . . . . . . 55
3.4.1 Longest Common Subsequence . . . . . . . . . . . . . . 55
3.4.2 Longest Common Subsequence at a Diverse Population 59
4 Experimental results 61
4.1 Diverse snippet generation by genetic computing algorithm . . 61
4.1.1 Predefined initial population vs. Random initial pop-
ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.2 Tuning the parameters of genetic computing algorithm 63
4.1.3 Conclusion and weakness . . . . . . . . . . . . . . . . . 64
4.2 Attacking diversified software containing snippets . . . . . . . 66
4.2.1 Best Matching Node Search experimental results . . . . 66
4.2.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Conclusions 77
5.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.1 For BMNS algorithm . . . . . . . . . . . . . . . . . . . 78
5.2.2 A new diversification model . . . . . . . . . . . . . . . 79
5.2.3 Homogeneity of instructions in a snippet . . . . . . . . 80
A Appendix 82
A.1 Assembly instructions in diversified software . . . . . . . . . . 82
A.2 Shortest edit script . . . . . . . . . . . . . . . . . . . . . . . . 84
A.3 Best Matching Node Search algorithm in pseudocode . . . . . 86
List of Tables 90
List of Figures 92
Bibliography 96
Abstract
This thesis is discussing about software security through targeted
diversification. It is a continuation of the previous year thesis by Mer-
ckx [20]. The first part of this thesis is focused on defending software
with the use of targeted diversification. For that we implement a
genetic computing algorithm that generates code snippets. This im-
plementation is presented at Chapter 2. An important factor for every
software protection method is to be tested against attack attempts.
So in this thesis, we also focused on attacking diversified population.
The second part presents an attack scheme that targets diversified
software population. The scheme is presented at Chapter 3.
At Chapter 4 we present the data from our experimental results
and finally at Chapter 5 we combine the empirical knowledge we
gained from the two parts of the thesis and conclude about defen-
sive systems using diversification.
Keywords: software protection, targeted diversification, cracking, simi-
larity measures, control flow graph analysis.
0Credits: A big thank to the best daily supervisors Jan Cappaert and Nessim Kisserli.
1
Chapter 1
Introduction
1.1 Software protection
Software protection started a long “cat-and-mouse” struggle between devel-
opers and crackers. Software protection is a broader term which involved the
copy protection of computer software and the counter measure of software
cracking. Usually, the term copy protection is used interchangeably with
software protection.
Copy protection, also known as copy prevention or copy restriction, is a
system for preventing the unauthorized reproduction of copyrighted media
like movies, music and computer software [23]. Very often software copy
protection is achieved by integrating security code in the application. Though
security code itself can be susceptible to attack, software protection ideally
makes the software itself resistant to attack [20]. Several modern Digital
Rights Management (DRM) techniques and technical protection measures
were discussed by Merckx in the previous version of this thesis [20].
Apart from anti-piracy measures like copy protection, software protec-
tion also includes mechanisms against tampering, reverse engineering and
exploitations. These software protection techniques are commonly applied
to software distributions as a countermeasure against cracking. Table 1.1,
lists respectively the current user-level software protection techniques and
their targeted phase of the cracking process.
2
Countermeasure Efficient against Circumvented byStatic tamper Tampering Loaders, analysis,resistance (key generators, serials)Dynamic tamper Tampering Loaders, analysis,resistance (key generators, serials)Anti-debugger code Dynamic analysis Debugger plug-ins, patchingObfuscation Analysis Deobfuscators, analysisEncryption, packers Static analysis Unpackers, dynamic analysis
Table 1.1: Merckx’s table of countermeasures
According to Main and van Oorschot, the process of devising an attack
on an application to defeat the program’s security code typically follows four
stages; “Analysis”, “Tampering”, “Automation” and “Distribution”. In the
preceding work [20], Merckx discussed state-of-the-art countermeasures and
concluded that they all target either the analysis or the tampering phase
of the cracking process. Once an individual has successfully cracked the
application, these techniques provide no protection against the distribution
and widespread applicability of the crack. The effort to break the piracy
chain has obviously failed, since cracks are still widely available, even for
heavily protected applications.
In the following sections, class breaks and how to interrupt the piracy
chain before the automation and distribution phases are discussed.
1.2 Security through diversity
The goal of introducing diversification in software security is to prevent the
problem of breaking one instance will lead to breaking all instances.
As it is shown in [18] and [20] cracking an application is a many step pro-
cedure. Those steps are mainly the “Analysis”, “Tampering”, “Automation”
and finally the “Distribution”. It is almost impossible to stop the cracking
procedure at the “Analysis” or the “Tampering” step, mainly because the
open architecture of modern technology.
The diversification of an application targets to stop the cracking pro-
3
cedure at the “Automation” step of the cracking procedure. If a cracker
succeeds in bypassing current protection mechanisms and disable the appli-
cation security code of his instance, there is no guarantee that the same
attack would work if the software population has a certain degree of diver-
sity. A classical automated attack can still be devised but its chance of being
applicable to a specific instance is diminished. Figure 1.1 shows a hypothet-
ical curve of the number of distributed cracked instances of an application
during the four stages of the cracking procedure. Preventing the automation
step will dramatically reduce the number of cracked software instances. In
security systems we refer to the “Class Break” when the protection system
fails and this failure affects all the instances of that software. We can see
that a “Class break” appears when the crackers manage to automate the
cracking procedure and distribute it. This hypothetical curve is for illustra-
tion purposes only and should not be viewed as backed by a mathematical
model. Although research suggests the spread of pirated content resembles
that of epidemiological diseases and can be modelled by various economic
equations [28].
Figure 1.1: Cracked software distribution.
Because in software development the decisions are taken by an “if then
else” structure or an equivalent the protection will almost always be one or
more of those structures. When the program is compiled, these structures
are translated to conditional branch or jump instructions. In the end the
cracker needs to locate and tamper with those branch instructions, therefore
4
is sometimes called “branch jamming”.
The most common method of automation is the creation of a small patch.
The crackers create a small file that will reproduce the step of the “Tamper-
ing” automatically. Patching a file is an easy procedure requiring only that
the target file (to be patched) be very similar to the source file from which
the patch was created. The idea of diversification is to destroy this similarity
between source and target files, thus forcing the patch to fail.
Typically, patches work by locating one or more specific addresses at
the target file and modifying one or more bytes. More general patching
techniques exist that search for specific bytes in the target file and modify
them.
Diversification will force both patching methods to fail in target instances
which differ from the source used in creating the patch. A similar approach
by software ageing, which requires software to be updated at regular periods
is presented at paper [16].
1.3 Targeted diversification
In practice, the goal of diversification is to generate syntactically different
but semantically equivalent programs. These diverse programs hide the true
semantic difference between this version and a previous version amidst a large
number of artificial syntactic differences.
According to the conclusion of previous thesis, diverse programs created
by inserting harmless snippets (discussed further in Section 2.3.4) is shown
to be resistant to current automated tampering attacks. While multiple
precautions were discussed to counter potential attacks on the diversification
scheme, there are no actual experiments to support it. In this thesis, we
focus on the “arm-race” between the defence, creating diverse populations of
snippets, and the attack, using our BMNS algorithm, see also Section 3.3.1.
Through experiments, we will show when and how diversification fails to
protect the software and how diversification can be modelled in a way that
it will overcome the current attack methods.
5
Chapter 2
Defending with targeted
diversification
2.1 Introduction
In this chapter we will, illustrate how genetic programming techniques can
be adapted to create code snippets that look like real code but overall do not
affect a program when they are inserted in that program. The ideas that we
will present extend and improve the ideas of Merck [20]. The goal of these
snippets is to provide means to create diverse program instances - with the
same functionality - that are resistant to global attacks, such as commonly
known “cracks” and more intelligent patching programs that tamper with
software in a malicious way.
First we introduce why and how we establish a CPU model to evaluate the
snippet. Based on the conclusion of the previous thesis, the diversification
is coming by the insertion of junk code (harmless snippets) which modifies
the offset of some original instructions. The CPU model here guarantees the
health of snippets. Then we explain how we implement the genetic computing
algorithm to generate diverse snippets. Improvements to the diversity scheme
are discussed base on the experimental result.
6
2.2 CPU simulator
2.2.1 The necessity of building a module of CPU
As mentioned before, the final goal is to change the offset of the crucial
instructions through inserting harmless snippets. Given the high risk of
directly executing harmful snippets, a CPU simulator is necessary to assess
the overall effect of a snippet’s execution on the host programme. In our
implementation, a state model of basic execution environment of CPU is set
up. Comparing the state before and after execution of the snippet a decision
is made to judge if this snippet is harmless or not. To build a CPU module,
we first describe the concerned execution environment of CPU.
2.2.2 Overview of the basic execution environment
Any program or task running on a processor is given a set of resources for
executing instructions and for storing code, data, and state information.
These resources make up the basic execution environment for a processor.
The basic execution environment is used jointly by the application programs
and the operating system or executive running on the processor [5].
Basic program execution registers - On the X86 architecture, the
eight general-purpose registers, the six segment registers, the EFLAGS regis-
ter, and the EIP (instruction pointer) register comprise a basic execution en-
vironment in which a set of general-purpose instructions be executed. These
instructions perform basic integer arithmetic, handle program flow control,
operate on bit and byte strings, and address memory. We limit our simulation
of the basic execution environment to 8 general purpose registers, the stack,
and the extended flags register. The EIP register is modified when a snippet
was inserted into the original code, but with the guarantee of environment
before and after the inserted snippet, the change is harmless.
Stack - To support procedure or subroutine calls and the passing of pa-
rameters between procedures or subroutines, a stack and stack management
resources are included in the execution environment. The stack is located in
memory. The key point is to keep the pointer and content of stack identical
7
to the state they were in before the snippet executed.
Actually, many other environment variables exist in the modern proces-
sor, like x87 FPU registers, MMX registers, etc. More complex implemen-
tation is possible to make the snippet harder to be recognized by crackers.
However, it may be harder to guarantee the correct execution of more com-
plex snippets.
2.2.3 Class design: states of CPU simulator
According to the requirement of execution environment, the main objects
like bit, register and stack compose the main state of the CPU simulator.
A collection object, “CInstructionCollections”, stores all the modifications
by instructions which can be executed by the CPU simulator. When the
CPU simulator executes instructions, an initial state is recorded first. After
reading the instructions one by one, the state of CPU simulator is modified
according to the detail procedure which is stored in the collection object. See
Figure 2.1
Through this implementation, a CPU simulator can take charge in the
verification of snippets and output the detail report to the snippet object.
This is used by the genetic algorithm to evaluate the degree of harmlessness
of the snippet.
An initial fitness value is defined inside the class “CInstructionCollections”
for each type of instructions. Then the fitness function of the genetic algo-
rithm can take both initial fitness value and combination of different instruc-
tions as reference to score that snippet.
8
Figure 2.1: The CPU simulator.
1. Read snippets from file and store it in object.
2. Emulate snippets.
(a) Initialize the CPU state.
(b) For every instruction DO Until <the end of this snippet>,
i. Read instruction and find the matched operation.
ii. Change the CPU state according to the matched oper-
ation.
(c) Report the difference between the initial state and post state.
Table 2.1: The CPU simulator pseudocode.
9
2.2.4 The model of snippet: bridge between CPU sim-
ulator and genetic algorithm
The term snippet we mentioned before refers to a sequence of instructions.
Moreover, there are many useful attributes in the class “CSnippet” which not
only can boost the verification and also support the process of the genetic
algorithm. Fig 2.2 is the main functions of class “CSnippet”.
Figure 2.2: Class of snippet.
In the beginning, the operation “open(filename)” input the assemble in-
structions from a file and store them in the vector“ instructions” in the form
of the class “CInstruction”.
The routine “updataMap()” will be triggered when the contents in the
“ instructions” are changed. For the general instruction like transfer instruc-
tion, arithmetic and logic instruction, CPU simulator only need to change
involved CPU states according to the manual of real CPU (base on IA-32
Intel in our case). But for the (un)conditional jump instruction within a big
snippet, a more complex structure which takes charge the record of execution
order is necessary.
10
A vector object “ map” is established for storing the structure “La-
belMap”. In the instructions container of snippet, there is already a uniform
index for each instruction. The label and the index of that label store in an
independent container as Figure 2.3. When a snippet with (un)conditional
jump instruction is executed, instead of searching for the destination label
in the whole instruction container, it’s more efficient to look for the exactly
index of label in the “LabelMap” container.
Figure 2.3: An instance of LabelMap.
Another problem should be mentioned is the renaming of labels. After
genetic computing of several generations, there may be some label in the
same name within one snippet. It is because the roulette wheel selection and
genetic operation (which will be introduced in the chapter genetic computing)
may have a chance to generate new snippet based on different combination
of one snippet.
The attributes “ harmless”, “ fitness” and “ msg” will only be set after
the verification of CPU simulator. “ harmless” mark if the snippet is harm-
less or not. “ fitness” keep the total fitness value which is the key reference
of the genetic algorithm for evaluating and generating the next generation.
11
“ msg” store the detail report of the modification of each CPU state.
2.2.5 Extension of the CSnippet class
Consider about the complexity to generate a harmless result successfully,
we define three types of snippets: harmful snippets, harmless snippets and
semi-harmful snippets.
Harmless snippets - These are snippets whose execution has no effect
on the state of the CPU, i.e. the contents of all the registers are restored
(including the extended flags register) as is the state of the stack. This is the
’ideal’ snippet.
Semi-harmful snippets - These are snippets whose execution modifies the
state of the CPU minimally (e.g. fails to restore the contents of 1 or 2 regis-
ters). Such snippets may still be used after careful selection of an insertion
point (e.g. at a point in the program where the contents of the 2 registers is
no longer needed).
Harmful snippets - These are snippets which cannot be considered semi-
harmful and are of no use to us (besides their limited contribution to the
gene pool).
Establishing new attributes of snippet which described the modification
detail of CPU state is a good extension for the future work. The semi-
harmless snippet and insertion function, which will be mentioned in the
chapter genetic algorithm, also come from this idea.
Consider about the effectiveness of snippet for hiding the crucial instruc-
tion, we define another three types of snippets: Type A, Type B and Type
C.
• Type A - The snippet without jump instruction to modify CFG 3.2.2.
• Type B - The snippet modified the CFG.
• Type C - Type A or Type B snippet which also imitate the Before,
After and After Branch block of crucial node.
12
2.2.6 Modelling a register on a bit level
Figure 2.4 shows the UML class diagram to model general purpose registers,
flag registers and a single bit. Because some instructions only modify a single
bit of the flag register, modelling at bit level is necessary. Moreover, instead
of only storing 0 or 1 values, a symbol is used to represent the content a
register bit. For example “EAX00” represents the least significant bit of
register EAX.
Figure 2.4: Class of register
Not only the flag register bits are better to be modelled as a set of bits,
also the general purpose registers should be. For instance, when an instruc-
tion “ROL AL, 2” is executed and the value of AL is 10101010. Without
considering the effect on the flags, the algebraic value of AL after the exe-
cution is exactly same as the old one. An algorithm which only tracks the
13
algebraic value of register may conclude that the state is same while this is
only the case for this particular value in AL. With the introduction of sym-
bolic value, the initial state of AL (the first eight bits of EAX) is marked
as:
EAX07 | EAX06 | EAX05 | EAX04 | EAX03 | EAX02 | EAX01 | EAX00
After the execution of instruction, the result is:
EAX05 | EAX04 | EAX03 | EAX02 | EAX01 | EAX00 | EAX07 | EAX06
With this symbolic representation, it is clear that the state of the EAX
register is changed by the instruction “ROL AL, 2”. A symbolic representa-
tion is thus correct in all cases, independent of what register contents might
be.
2.2.7 Weaknesses of the symbolic framework
While the symbolic representation is more correct than the algebraic rep-
resentation, it makes it more difficult to model basic arithmetic like ADD,
SUB, etc. Under the symbolic representation there is no real value inside the
CPU state. For instance, if an instruction “ADD EAX, 1” is executed. On
a bit level, the least significant bit of register EAX changes from “EAX00”
to “EAX00 + 1” and maybe also affect the following bits via the carry.
Without the information of the initial value of EAX, it is impossible
to judge whether a carry digit should be transferred to “EAX01” and to
decide whether the flag should be set or not. Solutions for this problem
will be discussed in Sections and of symbolic analyzer 2.2.8 and insertion
function 2.3.7 in the chapter genetic algorithm.
n practice, the symbolic representation is successful to model instructions,
such as: XCHG, MOV, PUSH, POP, NOP, etc., which only affect whole
registers of the CPU state (thus rather on a register level than on a bit
level).
2.2.8 Symbolic analyzer
To improve the descriptive capability of symbolic representation, a symbolic
analyzer can be introduced to solve some logical and arithmetical calcula-
14
tions. For instance:
XOR EAX, EAX
NOT EAX
XOR EAX, EAX
NOT EAX
After the execution of these instructions, the content in the last significant
bit is:
EAX01
name EAX01
content ¬((¬(EAX01∧EAX01))∧ EAX01)
Because of the transitive features of ∧ (bitwise exclusive or) and ¬ (bitwise
complement), and the mathematical properties of ∧, the content of EAX
should be the same as before and the snippet is harmless. An analyzer that
can solve equations of this form can be a further extension for our modelling
framework. It then can solve equations such as:(((EAX+EBX).3).2)/1)-
EBX/2-EBX/2, where . means ROR and / means ROL.
2.3 Genetic programming
2.3.1 Introduction
In keeping with work done during the previous thesis, we use genetic pro-
gramming to generate snippets according the specified requirement of the
original code to achieve the software diversity [20].
Improvements to the diversity scheme are made in 2 areas: Evolving
new ways to defeat attackers in the inevitable arms-race with crackers, and
improving the genetic model to produce ’better’ snippets. First, let’s begin
with the genetic computing algorithm.
15
2.3.2 Genetic computing algorithm
Definition 2.1. A genetic algorithm (or GA) is a search technique used in
computing to find exact or approximate solutions to optimization and search
problems.
A typical genetic algorithm requires two things to be defined [9]:
• A genetic representation of the solution domain.
• A fitness function to evaluate the solution domain.
In our case, the representation of the solution is the snippet which is a
sequence of instructions. The fitness function quantifies the harmlessness of
snippets. That’s why a CPU module, which can track the execution envi-
ronment, is needed. Once the genetic representation and the fitness function
are defined, GA proceeds to initialize a population of solutions and then
improve it through selection and reproduction. The execution procedure is
show below.
1. Initialization: choose initial population
2. Evaluate the fitness of each individual in the population
3. Repeat
(a) Selection: select best-ranking individuals to reproduce
(b) Reproduction: breed new generation through crossover and
mutation (genetic operations) and produce to offspring
(c) Evaluate the individual fitness of the offspring
(d) Replace worst ranked part of population with offspring
4. Until <terminating condition>
Table 2.4: Genetic computing algorithm pseudocode.
16
The main procedure can be categorized to four main steps: initialization,
selection, reproduction and termination.
2.3.3 Predefined vs. random Initial population
It is possible to generate the initial population randomly then using the
fitness functions to select the suitable snippets to participate the next round
of reproduction. In this way, the result is more unpredictable. The GA
will automatically select the best choice for us according to the definition of
fitness function. The cost for GA to discover a fit individual from a random
initial population is the disadvantage of this approach, but it allows us to
explore a space we otherwise wouldn’t.
In some case, we hope the final result have some similarity with the initial
one. The predefined initial population is a solution to achieve this goal.
Through setting up the initial population manually, a “clue” was given to
the GA and guide the production of the next generation.
So, it’s depending on the requirement of environment to decide which one is
better.
2.3.4 Fitness value and fitness function
Definition 2.2. A fitness function is a particular type of objective function
that quantifies the optimality of a solution in a genetic algorithm.
With these more optimal snippets, a new generation derived from them
will hopefully be even better.
The implementation and evaluation of the fitness function is an important
factor in the speed and efficiency of the algorithm. The fitness function shows
the fitness value for each individual and the genetic algorithm depends on
the grads of the fitness value to find the point of best fitted individual. The
term fitness landscape can be considered as another way of looking at the
fitness function. In the section of Discussion of convergence 2.3.8, a further
discussion is introduced.
For our implement, a combination strategy of both initial fitness value and
17
fitness function were involved. To decide the value of an instruction, initially,
a fitness value is assigned to it depending on the complexity and the practical
using rate of each instruction. If we expect one instruction have a more
chance to present in the following generation, a higher initial fitness value
should assign to it. Then, the snippet, which consists of several instructions,
has a total fitness value. Base on this value, a farther verification occurs
depending on the relationship between the combinations of instructions.
Below, there are some factors which can be applied in the fitness function:
harmless, semi-harmless harmful snippet
A harmless snippet is the snippet which has no harm to all the states of
execution environment after its execution. It is the ideal snippet we want and
it can be inserted into any place of original code. But in practice, complete
harmless snippets normally only have simple structures and the instructions
which can be applied in the snippet are limited. This feature makes it difficult
to generate diverse result making the locating of the snippets easier for the
cracker.
Semi-harmful means the snippet which influence only one or less states
of the CPU. This kind of snippets is more real and also more difficult to be
tracked. These snippets can be categorized into different level as reference
of fitness score.
• Lv1. affects one or some flags
• Lv2. swap or assign the content of registers
• Lv3. swap the content in bit level
• Lv4. lose information in bit level
To insert the snippets in Lv1 and Lv2, an insertion function, which will be
mentioned later, has to be established first for locating the suitable positions
which can satisfy the requirements of them. For the snippets in level 3, “rol
%eax, $16” for instance, swap the first and last sixteen bits of EAX on a 32
bit architecture. It still has but very low chance to restore the content by
18
rotating it back. But for the snippet in level 4, “shl %eax, $8” for instance,
the first eight bits already lost and can not restore anymore.
The harmful snippet is the worst snippet for GA. Snippets that not only
change CPU states but also mess up some state in bit level, overwrite the
original contents in the stack and are stuck in the dead loop belong to this
category. These snippets bring more complexity for GA to successfully evolve
into harmless snippets because of the destructive influences. If this kind
of snippets had been chose to build next generation, it’s difficult for the
algorithm to converge to a good result.
In practice, we first design an encouragement strategy which assigns more
fitness value to the harmless snippet, less fitness value for the semi-harmless
snippet and punishment for the harmful snippet.
The length of snippet
The length snippet itself is not a crucial factor of fitness function since the
restriction of length is not necessary in our case. But the longer the snippet,
the trickier combination of instructions can be implemented. Long snippet
gives more space to perform the complex combination of instructions. We
even can let the snippet really do something instead of junk codes.
Snippet01 Snippet02
.L1: jmp .L3
pushl %ebx .L2:
movl %ebx, %eax addl %eax, $1
subl %eax, $10 cmpl %eax, %ebx
jmp .L2 jne .L2
.L4: popl %ebx
jmp .L4
.L3:
There are snippet01 and snippet02 as described above. If only looking
at the snippet01, it changes the content of register EAX and can only insert
into the original code in places where the contents of the EAX register are
19
no longer needed. But associating with snippet02 together, the two snippets
as one united snippet is harmless. After the stepping into of snippet01, the
execution flow then directly jumps to snippt02 where the register EAX is
restored. Then it jump back snippet01 again and return to the flow of the
original code. Because there is an unconditional jump at the beginning of
snippet02, it’s safe to be inserted any where of the original code. Here we
only demonstrate the possible solution and didn’t consider the effect on the
flags. In practice, more complex chain actions happen.
Compared with the long snippet, the short snippet contains fewer com-
plexities. But frequent short snippets might blend in more with the sur-
rounding code. In our implement, a variable is established to refer the de-
sired length of the snippet. A departure from this predefined value decreases
the fitness value of this snippet.
Homogeneity of instructions in a snippet
A snippet’s homogeneity is directly related to its fitness as illustrated in
section 5.2.3. The idea of homogeneity of snippets is coming from the result
of attack in the worst scenario 4.2.1. The results show that the snippet
should have similarities with the blocks Before, After and After Branch of
the critical node 3.2.4.
the other parameters
There are many other parameters which can be taken as the reference of
fitness function. For instance, the number of repeated instructions in one
snippet can be considered as a factor of fitness function. Because the snippets
made up of repeated instructions lack complexity to prevent being located by
a cracker. But the trade-off of which attribute gives the main contribution
and how many attributes may decrease the gradient of fitness landscape have
to be consider depending on the environment.
20
2.3.5 Roulette-wheel selection
There are several generic selection algorithms. Certain selection methods
rate the fitness of each solution and preferentially select the best solutions.
Other methods rate only a random sample of the population, as this process
may be very time-consuming. In our experience, a computing algorithm
based on the roulette-wheel selection is built, which is one of the popular
and well-studied selection methods.
Figure 2.5: Roulette-wheel selection
The selection process is more like a roulette wheel game in which each
candidate solution represents a pocket on the wheel. The sizes of the pockets
are proportionate to the probability of selection of the snippets. Selecting N
snippets from the population is equivalent to playing N games on the roulette
wheel.
The candidate snippets with a higher fitness will have more chance to be
selected. There is also a chance that some weaker snippets, which have less
fitness values, may survive the selection process. Though these snippets may
be weak, it may include some combinations of instructions which could prove
useful following the reproduction process. The main steps of roulette wheel
are shown below.
1. Normalize the fitness value. Normalization means multiplying the fit-
ness value of each individual by a fixed number, so that the sum of all
fitness values equals 1. The population is sorted by descending fitness
values.
2. Compute the accumulated normalized fitness values. The accumulated
21
fitness value of an individual is the sum of its own fitness value plus the
fitness values of all the previous individuals. The accumulated fitness
of the last individual should of course be 1.
3. Randomly select individual until a purpose population reached. A ran-
dom number R between 0 and 1 is chosen. The selected individual is
the first one whose accumulated normalized value is greater than R.
It’s obvious that this selection algorithm cause the snippets in the pop-
ulation pool to converge to a same figure, because the genetic algorithm is
a optimal searching algorithm and the snippet with high fitness has chance
to be chosen several times. It’s true that the modification of this step can
give the next generation more diversity. We can force the selection to ignore
the snippet which already been choose up to a threshold, keep the diverse
population for the production of the next generation. But in this way, we
influence the scheme of genetic algorithm. A better choice is dynamically
increase the probability of the mutation operation when the difference in
the population drops below a threshold. Then through more mutation, the
diversity of population spread. In a naıve way, we can just use the charac-
teristic of random selection and force it to stop early within few generations.
After executing the genetic algorithm several times, only the best solution
(snippet) of each loop is picked and a new population base on them is build.
2.3.6 Genetic operator for reproduction
In genetic algorithms, the genetic operator is used to vary the programming
of a chromosome or chromosomes (instruction or snippet in our case) from
one generation to the next. These processes result in the next generation
population of snippets is different from the previous generation. Generally,
the average fitness value will have increased by this procedure for the pop-
ulation, since only the best candidate from the first generation are selected
for breeding. At the same time, a small proportion of less fit snippets are
selected because of the roulette-round selection.
22
Insertion crossover
There are several type of crossover had been introduced in the genetic algo-
rithm [9]. Depending on the environment of our specified case, the uniform
length of individual is not necessary and the rate of successful generation is
more important. From this point, an insert crossover, which is just directly
insert one snippet into another snippet, have been chosen as the main genetic
operator because of success at producing harmless snippets. As the Figure 2.6
showed, the insert point generate randomly.
Figure 2.6: Insertion crossover
The disadvantage of this kind crossover is that after several generations
the length of snippet will increase swiftly. Allowing the fitness function to
reward or punish snippets for their length is a solution for this problem. A
parameter can be defined for the optimal snippet length and snippets whose
length diverges from this preference receive a lower fitness value.
After generations of genetic algorithm, the insertion crossover has a high
rate to combine two same snippets to a redundancy successor. This repeti-
tive composite within the snippet result in the convergence happening more
swiftly and finally present a bad result. To avoid this tragedy happen, we can
dynamically decrease the rate of insertion crossover and increase the rate of
cut and splice crossover (see next section). Also, more mutation operations
in evolution will help to restrain this tendency.
Cut and splice crossover
The “Cut and splice” crossover operation is following the idea of one-point
crossover of previous thesis. For distinguishing from the popular one-point
23
crossover in the genetic algorithm, we use “cut and splice” as the new name
which describes the function more clearly.
The insertion crossover is more or less ensuring for keeping the snippet
survives in the next generation, but the “cut and splice” crossover is more
dangerous and may create harmful successors even all their parents are harm-
less in the previous generation.
Figure 2.7: “Cut and splice” crossover
Each parent has a separate choice crossover point and the instructions
after each point just swap to generate the children. This kind of crossover
has a stronger effect to the characteristic of snippet. In the worst case, the
state of CPU after executing this kind of snippet will be corrupted at the
bit level. By trade-off, it brings more diversity and also keep some level of
characteristic of parents comparing with the insertion crossover.
Randomly shift mutation
Some argue that crossover is the most important, while mutation is only
necessary to ensure that potential solutions are not lost. Others argue that
crossover only serves to propagate innovations originally found by muta-
tion. There are many references in Fogel [13] that support the importance
of mutation-based search. And it’s clear that the mutation operator bring
more diversity to a population. On a search level, mutations help the GA
explore random portions of the domain’s search space thus helping it avoid
being trapped by local minima. Please see the section of convergence as
reference 2.3.8.
In our case, a randomly shift mutation operator is established as illus-
trated in the Figure 2.8. The instruction of A and B, which are randomly
24
Figure 2.8: Randomly shift mutation
chosen, are swapped after the mutation. From the view of instructions level,
it’s risky to change the order of execution. The change of CPU state is unpre-
dictable and some specified instruction may cause the modification in the bit
level rather then state variable level. It’s much difficult to restore it and for
genetic algorithm to find the totally harmless one in the further evaluations.
But in some case, if only one or several CPU states, which are not in the bit
level, are changed, the result is still possible work with the combination of
insertion function.
Different with the procedure of previous thesis, we didn’t randomly choose
one instruction from the instruction pool to replace one in the snippet. In
this way, the probability to become harmless is greater than randomly picking
up from instruction pool.
2.3.7 Discussion of insertion function
The genetic algorithm is an optimal searching algorithm which is tries to
locate the best solution depending on fitness value. But for our purpose,
the best solution is not needed. Instead of that, we expect diverse snippets
with acceptable fitness values and as diverse as they can. We have shown
4.1 that the totally harmless snippets generated by our genetic algorithm
work. But the snippet diversity and combination of snippet instructions is
extremely limited. If we simply insert these snippets into an assembly code
randomly, it does not cause too much trouble for a cracker to locate the
crucial instruction. A more tricky insertion algorithm is needed to decide
where to insert it and in which condition to insert it. To challenge the
BMNS algorithm 3.3.1, the genetic algorithm should use instructions similar
25
to those found in the critical sections and the same crucial instruction within
the original codes to confuse the cracker.
1. Locate the crucial instruction.
2. Choose the snippets with this specified instruction from the
predefined snippet library as the initial population.
3. Run the genetic algorithm several loops to generate diverse
results.
4. Choose the harmless and semi-harmless snippets in the results
as the candidate of insertion.
5. Use insertion function to locate the suitable position where
the CPU states satisfy the requirement of specified snippet
and add the compatible snippet inside the original code.
Table 2.6: Insertion algorithm pseudocode.
The insertion function here makes the semi-harmless snippets functionally
harmless in some particular environments. For instance, there is a semi-
harmless snippet only change the zero flag to 0. The insertion function has
to find a place where the ZF is already 0 or where it is not live at all.
Figure 2.9 above shows part of original program code where a “jz” in-
struction occurred. Here in the Basic Block 01, which is after label L1 and
before the next instruction which can modify the flag register, we can declare
that the zero flag is stabilized at 0. So, the semi-harmless snippet can be
inserted here because the reset of flag zero to 0 did not cause any risk for the
execution of original code.
To implement this algorithm, a snippet library for each potential instruc-
tion has to be defined first. Through tuning the fitness function, we can
give the genetic algorithm various tendencies to generate the snippets which
26
Figure 2.9: JZ conditional jump
change specified CPU states. A snippets library can be built by sorting the
result snippets based on their tendencies. Then, insertion function only needs
to pick up the snippet from this library according to the involved environment
of original code.
2.3.8 Discussion of convergence
In evolutionary algorithms convergence means the population contains sub-
stantially similar individuals. Individuals in the population with the better
phenotypes [26] are selected to have more children than the less fit ones. The
selection according “surviving of the fittest” reduces the spread of phenotypes
in the population. The spread in the population is unequally reduced, so that
individuals are more tightly clustered about the better phenotypes discov-
ered so far [17]. The crossover and mutation operations spread the genotypes
again. But after many generations in genetic algorithms, selections and ge-
netic operations cause phenotypes to become concentrated. The population
spread caused by mutations is balanced by selection.
In many problems, GA may have a tendency to converge towards local
optima rather than the global optimum of the problem. This means that
it does not know how to avoid short-term fitness to gain longer-term fit-
ness. This problem occurs depending on the shape of the fitness landscape,
27
the nature of the problem, the quality of the representation of the problem
domain.
An obvious alternative for this searching process is for our explorer to
start at some point in the landscape and simply follow ascending gradients.
This approach is called hill climbing. If the explorer cannot see the landscape
around him he could still climb a hill by choosing the steepest direction.
Figure 2.10: Sketch of a fitness landscape. The arrows indicate the preferredflow of a population on the landscape, and the points A, B, and C are localoptima. The red ball indicates a population. [9]
In our case, after several generation of genetic algorithm, all the snippets
in the population pool converge to a same sequence of instructions and seldom
change any more. It can be considered the population (red ball) has already
climbed up the local peak A. But it is clear the peak B is the highest peak in
this landscape and if this is only a segment of problem domain, the highest
peak is uncertain.
This problem may be solved by using a different fitness function, increas-
ing the rate of mutation, or by using selection techniques that maintain a
diverse population of solutions. But the No Free Lunch theorem [34] proves
that there is no general solution to this problem.
But, the potential goal for our project is to achieve the snippet diversity,
not to searching the best fitted snippet for the fitness function. With the
28
random selection characteristic of genetic algorithm, we can reach some level
of diversity in the middle phase of the evaluation. As the evolution progresses,
the genetic algorithm tends to drop this diversity and converge to one optimal
result. So, for preserving the diversity, it is better to stop the evaluation when
the difference in the population drops to a threshold. We can simply take
the snippets in the population here or change the mutation rate and continue
the evaluation to explore other parts of the search space.
One naıve solution is that we can simply take the snippets in the pop-
ulation when the threshold is reached and repeat the algorithm again. The
diversity of result is more depending on the characteristic of the random se-
lection. Another solution is that we continually increase the mutation rate
when the convergence is up to the threshold and maintain the evaluation
to explore the undetected domain until an appropriate level of difference is
satisfied.
29
Chapter 3
Attacking targeted
diversification
3.1 Introduction
This chapter is dedicated to a developed method that specifically attacks
software protected by targeted diversification. We named this method Best
Matching Node Search (BMNS). The target of this attack method is to test
and see the effectiveness of protecting software through targeted diversifica-
tion.
We start the chapter by mentioning some theory about the Control Flow
Graph (CFG) (Section 3.2.2) and searching graphs (Section 3.2.3). After
the chapter is dedicated to the explanation of the BMNS algorithm (Sec-
tion 3.3.1) and finally at explaining the Longest Common Subsequence (LCS)
(Section 3.4.1) and how it can assist us to attack a diversified software pop-
ulation.
The idea of the attack method is simple. It is based on the following
assumption and information.
Assumption 3.1. The critical instruction is a conditional jump.
Assumption 3.2. We know which the critical instruction is, where it is
located in each diverse instance of the program, and which instructions sur-
round it.
30
The first assumption is safe as long the protection of the software is im-
plemented on high level programming languages with the use of “if then
else” structures. These structures will produce at the binary level some con-
ditional jump instructions. The cracking techniques that tampers the con-
ditional jump instructions are the most common and are known us “branch
jamming” [20].
The second assumption it actually refers that the attackers can analyse
and tamper the binary code successfully on one instance of a diversified
program.
The attack method is generating a tree similar to the CFG of the pro-
gram and looks in that tree for a “fingerprint” that will locate the critical
instruction. To achieve that, the attacker generates heuristics from the sur-
rounding instructions around the critical conditional jump instruction. Using
this heuristics a search algorithm traverses the tree and locates the most sim-
ilar conditional jump which should be the desired critical instruction.
Also we investigate if the attacker can have more than one instance from
the diverse population. Then the attacker can use comparison algorithms
that will automatically generate for him the instructions that are common
around the node in all instances, getting this way the “fingerprint”. With
those instructions the attacker can generate automatically the needed heuris-
tics to locate the critical instruction.
3.2 Program representation and program anal-
ysis
3.2.1 Graph data structures
A graph is an abstract data type structure. It is abstract because we can
represent many different data types as graphs. A graph is a very general
data representation. The graph data structure concept is taken directly from
the graphs in mathematics.
A graph consists from two different sets of objects. The first set of objects
31
is called either points, or nodes, or vertices. We will refer to it as nodes. The
second set of objects is called either edges, or lines, and we will refer to it as
edges.
So a graph consists from a set of nodes and a set of edges that establish
relationships (connections) between the nodes. In proper graphs which are
undirected an edge from the node A to the node B is considered to be the
same as the edge from the node B to the node A. If the graph is a directed
graph (digraph), we consider that each direction has a different directed edge.
Definition: [10] A graph or undirected graph G is an ordered pair
G := (V, E) that is subject to the following conditions:
1. V is a set, whose elements are called points, vertices or nodes,
2. E is a set of pairs (unordered) of distinct vertices, called edges
or lines.
The vertices belonging to an edge are called the ends, endpoints,
or end vertices of the edge.
V (and hence E) are usually taken to be finite sets, and many of the
well-known results are not true (or are rather different) for infinite
graphs because many of the arguments fail in the infinite case.
The order of a graph is |V |, the number of vertices.
A graph’s size is |E|, the number of edges.
The degree of a vertex is the number of other vertices it is connected
to by edges.
Table 3.1: Graphs definition.
In practice two main data structures for the representing graphs is used:
1. The adjacency list, which is implemented by representing each node as
a data structure that contains a list of all adjacent nodes.
2. And the adjacency matrix, in which the rows and columns of a two-
dimensional array represent source and destination vertices and entries
32
in the graph indicate whether an edge exists between the vertices.
Graphs are used in many areas of both mathematics and computer sci-
ence. It is a very flexible method to represent relationships between objects.
All trees can be represented by graphs but not all graphs are trees. The main
characteristic for a graph to be a tree is that there is a single unique path
along edges from the root to any particular node. That means the in a tree
not two edges will join together to the same node.
For further reading on graphs, see [4], [10] and [30].
3.2.2 Control Flow Graph
A Control Flow Graph (CFG) is a directed graph of the execution paths that
a program can follow. The nodes of the CFG are blocks of code, called basic
blocks. Each basic block starts in one entrance point and finishes when the
first jump is encountered. The jump is also the block’s end point. Thus basic
blocks do not contain any jumps or branches themselves. The directed edges
of the CFG are the jumps that exit from each block. This jumps direct to
the start of the next block. In the CFG there are two special blocks, the
entry block and the exit block. The entry block is the part of the program
that the execution begins. The exit block is the block which all execution
ends.
At Table 3.2 you can see the Pascal code of a small program and at
Figure 3.1 you see the control flow graph of the program [29].
33
FOR i := 0 to 30 DO
BEGIN
s := a[i];
IF s < 0 THEN
a[i] := (s+4)∧2
ELSE
a[i] := cos(s+4);
b[i] := s+4;
END
END
Table 3.2: A CFG example.
Figure 3.1: The CFG of the example at Table 3.2.
The CFG is mainly used by compilers for optimizing the source code.
Optimizations like detecting dead loops are performed with the use of the
CFG. Furthermore is the CFG also used for static analysis tools.
For further reading on Control Flow Graph, see [2], [10] and [22].
3.2.3 Searching graphs
Definition 3.1. Depth first search: [25] Any search algorithm that considers
outgoing edges of a vertex before any neighbours of the vertex, which is, out-
going edges of the vertex’s predecessor in the search. Extremes are searched
34
first.
This is easily implemented with recursion. An algorithm that marks all
vertices in a directed graph in the order they are discovered and finished,
partitioning the graph into a forest.
Depth first search (DFS) is a searching algorithm used to traverse in a
tree, tree structure, or a graph. Usually one starts from the root node of
the tree (select any node as root node in the case you are exploring a graph)
and explores as deep as possible reaching the first leaf of the tree before
backtracking.
DFS is a uniformed search that always expands the first child node that
appears at the search tree and goes deeper and deeper until it finds the goal
node, or until it reaches a node that does not have any child nodes (leaf
node). Then the algorithm backtracks, returning to the last non expanded
node and explores it. DFS is implemented as a recursive algorithm or in
non-recursive implementations, all new expanded nodes are added to a LIFO
(Last In First Out) stack for exploration.
A problem with DFS is that some search trees have a bigger length than
can be contained in memory, when DFS is searching a tree like that it suffers
from non-termination and cannot find the solution. Also if a tree has an in-
finite loop in the structure, then again DFS will suffer from non-termination
and will never look some of the tree nodes. The simple solution of “re-
membering the visited nodes” does not always work because of insufficient
memory. The solution given is to maintain an increasing limit of the depth
of the tree, this searching method is called iterative deepening depth-first
search.
For the graph shown at Figure 3.2 and considering that all edges are
bidirectional a depth-first search starting at A, assuming that the left edges
in the shown graph are chosen before right edges, and assuming the search
remembers previously-visited nodes and will not repeat them (since this is a
small graph), will visit the nodes in the following order: A, B, D, F, E, C,
G.
Performing the same search, without remembering previously visited no-
des, results in visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc.
35
Figure 3.2: Searching a graph example.
Getting caught in the A, B, D, F, E cycle forever, and never reaching C or
G.
To conduct a DFS search:
1. Form a one-element queue consisting of a zero-length path
that contains only the root node.
2. Do until the queue is empty,
(a) Remove the first path from the queue.
(b) Create new paths by extending the first path to all the
neighbours of the terminal node.
(c) Reject all new paths that introduce loops.
(d) Add the new paths, if any, to the front of the queue.
Table 3.3: The DFS algorithm.
Another algorithm to explore a graph, tree, or tree structure is the
Breadth first search (BFS).
36
Definition 3.2. Breadth first search: [24] A search algorithm that considers
neighbours of a vertex, that is, outgoing edges of the vertex’s predecessor in
the search, before any outgoing edges of the vertex. Extremes are searched
last.
For the graph shown at Figure 3.2 a BFS algorithm starting from node
A, and assuming that the left edges in the shown graph are chosen before
right edges, and also assuming that the search remembers previously-visited
nodes and will not repeat them (since this is a small graph), the algorithm
will visit the nodes in the following order: A, B, C, E, D, F, G.
To conduct a BFS search:
1. Form a one-element queue consisting of a zero-length path
that contains only the root node.
2. Do until the queue is empty,
(a) Remove the first path from the queue.
(b) Create new paths by extending the first path to all the
neighbours of the terminal node.
(c) Reject all new paths that introduce loops.
(d) Add the new paths, if any, to the back of the queue.
Table 3.4: The BFS algorithm.
The main difference that DFS has with BFS can be clearly seen at Ta-
bles 3.3 and 3.4. The last line of the algorithm in the DFS adds the new
paths at the front of the queue forcing the algorithm to traverse first the
depth, but in BFS the new paths are added at the back of the queue.
Space complexity of DFS is much lower than BFS. It also lends itself
much better to heuristic methods of choosing a likely-looking branch. Time
complexity of both algorithms are proportional to the number of vertices plus
the number of edges in the graphs they traverse O(|V |+ |E|).
37
For further reading on Depth, Breadth first search, see [8], [32] and [33].
3.2.4 Representation of a program as a search tree
Before mentioning how we search and identify a certain node with our algo-
rithm, we present a binary tree that is a simplification to the control flow
graph 3.2.2. A downside of our tree representation is that it is infinite. How-
ever, this has no practical implications to our algorithm as it is capable of
detecting loops.
To explain how we can represent any program as a tree first we must
explain how the conditional jump instructions operate. A conditional jump
instruction checks the state of one or more status flags from the control
register (EFLAGS ) and, depending of the state of the flags it performs a jump
to a target instruction, changing the execution of the program /refDUJMP.
A small example: the jump carry (jc) instruction would first check the
state of the carry flag and, if the carry flag was set to 1 the instruction would
jump to a target instruction or otherwise (the carry flag was set to 0) the
execution of the program would continue normally to the next instruction
after the jc. The state of the flags changes from the execution of instructions
and from the processor states.
We can now represent any program as a search tree if we consider the
conditional jump instructions (jc, jnc, je, jne, jo, jno, etc) as the tree nodes.
These nodes always expand in two separate branches.
Before each of the tree nodes we will have a number of instructions.
We will refer to those instructions as the Before node instructions. After
every node the will be two sets of instructions. The first set that we will
refer to as the After node instructions are the instructions that will follow if
the conditions of the node are not met (sometimes called the “fall-through”
path). The second set of instructions is the instructions that the node would
jump at if the conditions are met, we will refer to this set as the After Branch
node instructions. The later is sometimes called the ”target” path as well.
Figure 3.3 illustrates a node and the three blocks.
Following the execution of the program we generate the tree. All the
38
Figure 3.3: The three node blocks.
instructions that we encounter we place them in the instruction sets (Be-
fore, After, After Branch). If we encounter a jump instruction jmp at the
execution then we jump at the stated location and we do not include the
jump instruction in the instruction sets. Also the node instructions are not
included in the instruction sets.
We must make some assumptions to simplify the generation of the tree.
Assumption 3.3. The conditional jump we are looking for, it must not be
a dynamic jump.
Assumption 3.4. There are no dynamic jmp instructions that we must
follow.
Assumption 3.5. The call instruction will always be followed by a ret in-
struction that will give back the control to the calling location.
Assumption 3.6. Also we assume that we can disassemble properly and
read only the “opcode” of the instructions and ignore their address based
parameters.
The first two assumptions refer to jump instructions that would generate
the jump target address by the use of a register. This instruction is difficult
to follow because we need to execute all the instructions that would modify
the register to be used.
39
The third assumption is a simplification telling us that it is not necessary
to follow the call instructions because every call should return to the next
instruction. If the critical instruction is inside in one of the subroutines called
by the call instruction then we can search that subroutine separately.
The last assumption is needed for the search algorithm to look only the
constant information and not information that would change because of di-
versification. The fourth assumption it can be implemented by making a
table that has how many bytes we must read for the instruction code and
how many for the parameters for each instruction see Section A.1. This pro-
tects the algorithm from reading data bytes that have the same hexadecimal
number with the instructions we are looking for and understanding them
wrongly.
Table 3.5 represents a small program in assembly that we will use as an
example of how we create the search tree. You can notice that the parameters
for most of the instructions are emitted; we only keep parameters for the
conditional and unconditional jump instructions.
0: addl 7: subl 14: decl 21: addl
1: subl 8: jc 7 15: cmpl 22: addl
2: cmpl 9: addl 16: jc 20 23: imul
3: jc 8 10: addl 17: addl 24: idiv
4: jmp 6 11: incl 18: addl 25: jc 2
5: addl 12: imul 19: cmpl 26: addl
6: cmpl 13: subl 20: jc 10 27: nop
Table 3.5: The example program.
The program execution starts with the add instruction followed by the
subl, cmpl, and reaching to our first node jc 8. This will also be our starting
node.
The instructions that were found before (addl, subl, cmpl) are kept as the
Before set of that node.
40
Next the algorithm will expand the node until it finds two new nodes or
the end of the program.
It expands the node by first following the instructions that would be
executed if the conditions of the conditional jump instruction were not met
and creates the After set with the following instructions (cmpl, subl) and we
find the next node jc 7.
If you look in the program you will notice that the next instruction is
jmp 6, the jump instruction is not added to the sets but instead it is followed.
Because of that the fifth instruction addl is not added to the set because it
will not be executed in this case.
Last step to expand the node is to follow the conditional jump as if the
conditions where met. That means we jump at the eighth instruction and
we encounter again the node jc 7 but without encountering any instructions
before. This creates the After Branch set of node jc 8 to be empty.
Continuing like this we construct the tree of Figure 3.4. The nodes
coloured blue are loops. We consider a loop when the node has the same
before instructions and reaches at the same node. We can see in the tree
that the node jc 7 is repeated many times. It can actually be reached with
three different ways. The three different approaches to the instruction (jc 7 )
give us three different nodes.
We will represent a node by x : y, where:
• x is the address of the node.
• y is the entrance address via which the node was reached.
The node representation for the first node jc 8 will become 3:0. Also the
node jc 8 is encountered as node 3:2 at the bottom of the tree see Figure 3.5.
The node jc 7 has three different representations 8:4, 8:8, 8:7 ; the node jc 20
has two representations 16:9, 16:10 ; the node jc 10 is represented as 20:17
and 20:20 ; last is the node jc 2 which is represented only as 25:21.
Using this representation is easier to detect loops. We only need to com-
pare the address of the node and the entrance address instead of comparing
the instruction blocks. We need to consider the nodes different by the en-
trance address, and not only the node address. This way the same conditional
41
Figure 3.4: The represented search tree.
42
Figure 3.5: The represented tree with node representation.
43
jump instruction when is reached from different entrance address, will have
different Before block and thus could give different heuristic value.
3.3 An attack algorithm
3.3.1 Best Matching Node Search
Before explaining the algorithm it is necessary to explain what we mean
when we refer to a node. For our search algorithm a node is the different
conditional jump instructions in the program. Earlier at Section 3.2.4 we
explained how these nodes are created and how we can represent a program
as a tree by following its execution.
The goal of the Best Matching Node Search algorithm is to find the node
that marks the critical instruction of any program that belongs to the diverse
population of the original program. These programs will have the critical
instruction in different locations because of the diversification. The BMNS
algorithm generates a search tree with the conditional jump instructions as
the tree nodes. After the algorithm searches the tree to find the node that
has the highest similarity with the critical instruction of the program. This
similarity measure is actually a “fingerprint” of the found critical instruction
from the attacker. We create this “fingerprint” from the instructions that
are executed before and after the conditional jump instruction.
This efficient search is to be used for the automation of the attacking
procedure against diverse software population. This search should be able to
locate the critical instruction that the attacker needs to patch in a diverse
population of software.
Our aim with this algorithm is to find out if the harmless snippets are
sufficient to hide the critical instruction from search attacks. If with good
heuristics we can easily and sufficiently detect the location of the critical
instruction and if the cost of using an algorithm like this is small then the
harmless snippets are not capable of blocking the automation of the cracking
procedure thus failing their initial goal.
For the selection of the heuristics we assume that the attacker knows
44
which instructions are modelled for use as snippets. This way we can avoid
selecting those instructions for heuristics and thus making the similarity mea-
sure more efficient. This assumption is merely for optimisation purposes and
is in no way required for the BMNS algorithm to successfully locate the
critical instruction.
Initialize Q with first Node
WHILE Q IS NOT empty
Take first Node from Q
StartAddress = Node.StartAddress
CurAddress = Node.StartAddress
WHILE Tree[CurAddress ] IS NOT empty
Collect Before Block
IF Found a Node THEN
After = ExpandNode(CurAddress + 1)
IF NOT EXIST AfterNode IN Q THEN
Add AfterNode(ExpAddress :CurAddress + 1)) to Q
END IF
AfterBranch = ExpandNode(Node.JumpAddress)
IF NOT EXIST AfterBranchNode IN Q THEN
Add AfterBranchNode(ExpAddress :Node.JumpAddress) to Q
END IF
Compute Node.Heuristics
IF Node.Heuristics > BestNode.Heuristics THEN
BestNode.Heuristics = Node.Heuristics
END IF
Add Node to ExpandedQ
Remove Node from Q
END IF
Increase CurAddress
END WHILE
END WHILE
Table 3.6: The BMNS algorithm.
45
At Table 3.6 you can see the BMNS algorithm in pseudocode. Also a more
detailed pseudocode version and an implementation in C++ of the algorithm
can be found at the Appendix A.3 and after demand.
The BMNS algorithm is based on the DFS algorithm. We only made
modifications to meet the specifications of our specific problem, and thus it
should retain the same time complexity with the DFS algorithm.
To calculate the time complexity of the BMNS algorithm first we must
calculate the time complexity of the heuristics function, which is O(n) with
n in the worst case being the maximum size of the blocks, because the block
size is chosen to be much smaller than the program size this will give us a
constant time complexity O(1) which we can ignore.
The complexity for the expansion of one node is again O(n) with n being
now the total length of the node, at the worst case the n is the length of
the whole program if the program has only one node. For the expansion
of all nodes this will give us O(m1 + ... + mi + ... + mn) with n being the
number of the nodes and mn is the length of each node. Finally because the∑ni=1 mi gives us roughly the length of the program we can write that the
time complexity for expanding all nodes will be O(l) where l is the length of
the program.
To complete the time complexity for the algorithm we need to calculate
the traverse between the nodes. This time complexity is O(n) with n being
the number of nodes the program has. Finally this gives us a total complexity
of O(n+l) with l being the length of the program and n the number of nodes.
Indeed the BMNS algorithm retains the same time complexity with the DFS
or BFS algorithms. The complexity calculations were based on [4].
3.3.2 Heuristics
Two fundamental goals in computer science are:
1. Finding algorithms with provably good run times.
2. And with provably good or optimal solution quality.
46
A heuristic is an algorithm that gives up one or both of these goals. For
example, it usually finds pretty good solutions, but there is no proof that the
solutions could not get arbitrarily bad; or it usually runs reasonably quickly,
but there is no argument that this will always be the case. Therefore, we
would like to define a heuristic algorithm as follows:
Definition 3.3. A heuristic algorithm is a programming strategy based on
trial-and-error methods and feedback evaluation. It does not guarantee opti-
mal solutions or good execution times, but it is often usable in practice due
to its reasonable good results.
3.3.3 Heuristics of Best Matching Node Search
At a non diverse program population, the critical instruction would remain at
the same position and have exactly the same parameters. This fact makes it
easy to always locate and change in the same way for the whole population,
and thus, it allows us to automate the cracking of the program. This is
sometimes called a “global attack” and is the “Class break” of the protection
system. On the other hand, through diversification of the program each
instance has the critical instruction at a different location and most likely
with different parameters. This makes the automation of the cracking more
complicated. The crackers can still locate and change the critical instruction
in their instance of the program, but can they locate it automatically in all
diversified instances?
In this section we will discuss how we create a “fingerprint” using the
assembly instructions which are not affected by the diversification.
We know that the snippets that are inserted are harmless, and that they
do not actually change the “semantics” of the assembly program. Hence, a
snippet when executed must restore the program state back to its initial state
before the snippet started being executed 2.3.4. The thing that snippets do
change, is that they relocate parts of the assembly code. This relocation also
changes a part of the assembly instructions, as it is shown at Section A.1. It
is important that the actual assembly source remains the same and just has
noise (harmless snippets) in between it.
47
Taking in account the above, and that the snippets do not actually change
the critical instruction or the neighboured instructions, we can generate a
“fingerprint” that will identify the critical instruction. This is done by fol-
lowing the execution flow of the program. Besides of the inserted harmless
snippet instructions, we should also encounter the normal instructions of
the program that remain the same. Finding those instructions allows us to
generating a “fingerprint” that will identify the critical instruction in all or
almost all instances of the diverse population.
To find the critical instruction, we will look at the surrounding instruc-
tions. There are three different blocks of surrounding instructions to look at
as we saw at Section 3.2.4.
Figure 3.6: A node and its surrounding instructions.
For example, in Figure 3.6 we see the instructions around the node 20:17
from the example program in Figure 3.5. If we take those instructions for
heuristics we have the heuristic blocks at Table 3.7. The asterisk following
an instruction indicates that we ignore the parameters of that instruction,
and the asterisk that separates two instructions indicates that between those
48
instructions any number of other instructions can be found and will be ig-
nored.
Node read blocks
Before After After Branch
1 addl * pushl * addl *
2 movl * movl * pushl *
3 pushl * subl * exch *
4 movl * popl * movl *
5 movl * cmpl * popl *
6 incl *
Heuristic blocks
Before After After Branch
1 addl * addl * addl *
* * *
2 addl * addl * incl *
* * *
3 cmpl * imul * imul *
* *
4 idiv * subl *
*
5 decl *
*
6 cmpl *
Table 3.7: Example of calculating heuristic value of a
node.
Searching now the execution tree of the program, it is very rare to find
a node different from node 20:17 that has the exact same surrounding in-
structions. This enables us to look for the node 20:17 without knowing its
location in the assembly. Also by relaxing the searching criteria and only
49
looking for the best matching surrounding instructions then we can ignore
possible snippets that have been inserted around and close to the node.
A problem that appears, is that if the selected instructions for the heuris-
tics are from an inserted snippet, then in the other instances those instruc-
tions would not exist. This results in the possibility to find several matches,
including false positives.
The three heuristics blocks can be used in a variety of ways with each
way creating a different heuristics function. We use them by comparing each
of the three heuristic blocks with the equivalent constant blocks that were
expanded around the node. The compare means that we must find the same
instructions in the constant blocks at the same order. We could use weights
to each of the found instruction but mainly we add the order in which the
instruction where found. So the first instruction has a magnitude of one, the
second of two, etc.
The weight parameter for the heuristics can play an important role if one
block can give a much greater value from the other blocks as we illustrated
in the previous example. We could increase the weights at the Before and
After blocks so that are closer to the After Branch block value. This would
protect of finding a node that is similar only to the After Branch block and
not similar in the Before and After block.
Another important use of the weights could be that if there are multi-
ple similar nodes that have differences only at one heuristic block then the
weights could increase the value of that specific block making it more impor-
tant and discarding this way some duplicate nodes.
As an example we reconsider the blocks shown at Table 3.7. The Before
block compared to the “Before heuristics” will produce a heuristic value of
1 for the addl instruction, the After block with the “After heuristics” will
give us a heuristic value of 0 for not finding even the first addl instruction
and finally the After Branch block will have a heuristic value of 3 for finding
an addl instruction in the beginning and also finding the second instruction
of the heuristics block the incl at the end. The node will have in total a
heuristic value of 4. The maximum heuristic value that a node could have
using those heuristics is 6 for the before, 10 for the after and 21 for the after
50
branch; in total a node can reach a heuristic value of 37. Different heuristics
give different maximum values.
At Table 3.8 we can see the assembly source from a compiled program.
This program sorts three numbers. Instead of implementing a swap function
to sort the three numbers we purposefully repeated the same swap source
code to make the three swaps. This way there is a big similarity at the
instructions of the program. We use this example to show how similar nodes
look like and also to show that even in that case we can find heuristics that
can locate each of the nodes.
0: movl $-1, -4(%ebp) 16: movl %eax, -4(%ebp) 0: movl 16: movl
1: movl -8(%ebp), %edx 17: movl -8(%ebp), %eax 1: movl 17: movl
2: movl -12(%ebp), %eax 18: movl %eax, -16(%ebp) 2: movl 18: movl
3: cmpl %eax, %edx 19: movl -4(%ebp), %eax 3: cmpl 19: movl
4: jge 11 20: movl %eax, -8(%ebp) 4: jge 11 20: movl
5: movl -12(%ebp), %eax 21: movl -12(%ebp), %edx 5: movl 21: movl
6: movl %eax, -4(%ebp) 22: movl -16(%ebp), %eax 6: movl 22: movl
7: movl -8(%ebp), %eax 23: cmpl %eax, %edx 7: movl 23: cmpl
8: movl %eax, -12(%ebp) 24: jge 31 8: movl 24: jge 31
9: movl -4(%ebp), %eax 25: movl -16(%ebp), %eax 9: movl 25: movl
10: movl %eax, -8(%ebp) 26: movl %eax, -4(%ebp) 10: movl 26: movl
11: movl -8(%ebp), %edx 27: movl -12(%ebp), %eax 11: movl 27: movl
12: movl -16(%ebp), %eax 28: movl %eax, -16(%ebp) 12: movl 28: movl
13: cmpl %eax, %edx 29: movl -4(%ebp), %eax 13: cmpl 29: movl
14: jge 21 30: movl %eax, -12(%ebp) 14: jge 21 30: movl
15: movl -16(%ebp), %eax 31: cmpl $-1, -4(%ebp) 15: movl 31: cmpl
32: jne 0 32: jne 0
Table 3.8: The sort three example program with and without
parameters.
At Figure 3.7 is the program of Table 3.8 represented as a tree. It is easier
to notice at the tree representation that all nodes of the programs only use
the instruction movl and look very similar between them.
51
Figure 3.7: The represented search tree.
52
The difference is mainly at the amount of movl instructions each of the
node has. An important notice is that in a real implementation part of
the instruction parameters would be read and give different values to the
instructions. For example if we look at instructions 5 and 6 we can notice
that the have different parameters comparing constant with register. Those
parameters would not change from diversification and would give different
heuristic values that we could use.
But even without taking in account the extra parameters we can notice
that the node 14:5 has the greatest amount of movl instructions both at
the Before instructions and at the After instructions. This makes that node
unique from the other nodes. Also node 32:25 is unique for having the great-
est amount of movl instruction in the After Branch block. For the rest of
the nodes 4:0 and 24:15 unfortunately any heuristics that we can select will
guide us of having duplicate results. The main reason for that is node 14:5
which will always get at least the same heuristic value with those two nodes.
3.3.4 Selection of heuristics and automated generation
We saw how we can generate a search tree out of the assembly of a source,
we saw how we can search that tree and also we saw that the heuristics
should be some of the neighbourhood instructions. Next we define better
which specific instructions must be used for heuristics, and investigate the
possibility of improving our heuristics automatically.
Having a single instance of the program to be attacked forces us to select
manually which instructions will be used for the heuristics. It is very im-
portant to know that some instructions are more relevant as heuristics than
others. The main reasons for that are:
Lemma 3.1. The rarity of an instruction in the assembly. The rarer it is,
the more useful that instruction will be for the matching.
Lemma 3.2. The knowledge if an instruction has been modelled for snip-
pets. If an instruction has been modelled for snippets and is used in harmless
material then that instruction is a poor choice for use in heuristics.
53
Lemma 3.3. The continuation of some instructions could be unique. Some
instructions might have a specific order of appearance that will make them
different in any part of the source. This instructions as a set should be used
to identify the location.
Let us elaborate on these rules of thumb. The rarer an instruction is,
then the lower the probability to encounter it close to a node. That makes
the instruction a good candidate for the search algorithm.
An instruction that is modelled for diversification could easily be found
almost everywhere. Also the worst is that in a different diversified instance
that instruction could never be found! If by accident we use an instruction
that is from a harmless snippet because at that point it looked important
then in a different instance of the program that instruction could be missing!
This could disrupt completely our entire analysis.
The continuation of the instructions is a very important element. It is
actually the whole idea of the “fingerprint” search. In an assembly even the
rarest of the instructions likely will occur multiple times. But how probable
is that a part of source code that has a different function from another part
will have the same instructions at the exact same order?
From the above lemmas we can define a characterizing block as.
Definition 3.4. A characterizing block is one that even if we perform the
BMNS algorithm with heuristics only for that block we still find the target
node.
That means that the node has a sequence of instructions that are unique
in the whole assembly. When a node has characterizing blocks it is easier to
locate it. And always that block should be used for the heuristics.
It is interesting also to examine the case that the attacker has more
than one diversified instance of the program at his disposal. In that case
the attacker could easily develop an automation tool for the selection of the
heuristics.
For this automation tool the attacker would use the searching algorithm
to find the three instruction blocks around the critical instruction. This
procedure would be repeated for each of the different instances the attacker
54
has. Thus the attacker would get multiple times the instruction blocks around
the critical node. This blocks then can be compared between them and give
an idea of the original instructions that are included.
For this comparison procedure a diffing algorithm can be used. The
attacker would use a custom made diffing tool that would generate him the
instructions that are common to the blocks of all instances he has access at.
These common instructions could be used for the selection and improvement
of the heuristics.
A diffing tool used for the purpose of automating the generation of heuris-
tics should have the following specific characteristics:
1. It will be able to compare from two instances up to n instances of the
diversified program.
2. It will only return the Longest Common Subsequence (LCS) of the
instances and not the differences.
It is necessary that the diffing tool is able to compare multiple instances.
The diversified population might have similar snippets which will generate
many common instructions around the nodes. This makes the diffing proce-
dure more difficult and demands more instances for the automated creation
of the heuristics. The more instances that the attacker has, the more efficient
the automated generation of heuristics will be.
It is important that even with two only instances the diffing can help the
generation of the heuristics by disregarding many snippets.
The attacker mainly will use the diffing as a guideline to get efficient
heuristics faster.
For more information on diffing see Chapter 3.4.1.
3.4 A comparison algorithm
3.4.1 Longest Common Subsequence
The longest common subsequence problem, is about finding between two or
more sets of sequence the largest common part. In most of its applications
55
it is used for two different sets.
The LCS problem is considered to be NP-hard for comparing n different
sets. There exist algorithms that can find the LCS for two different sets in
polynomial time [14], but the general solution for n sets is solved in expo-
nential time. The recursive algorithm shown in Table 3.9 solves the LCS
problem for two sets in exponential time.
FUNCTION lcs(x, y)
n = length(x), m = length(y)
IF length(x) = 0 OR length(y) = 0 THEN RETURN “”
best = lcs(x[1, n - 1], y[1, m])
IF length(best) < length(lcs(x[1, n], y[1, m - 1])) THEN
best = lcs(x[1, n], y[1, m - 1])
END IF
IF x[n] = y[m] AND length(best) < length(lcs(x[1, n - 1], y[1,m - 1]))
THEN best = lcs(x[1,n-1],y[1,m-1]) + x[n]
RETURN best
Table 3.9: A recursive LCS algorithm [1].
To solve the LCS problem in polynomial time dynamic programming
must be used. A polynomial time algorithm is shown at Table 3.10. Even
if the algorithm at Table 3.10 has polynomial complexity it unfortunately
needs a lot of memory. The memory space grows quadratically Θ(n2). When
comparing texts the algorithm can be improved by the use of hashing tables.
This improves both the speed and memory requirements.
Hashing will replace the strings with numbers. This will reduce the mem-
ory needed from the algorithm by having to handle just numerical identities
of the strings. Also it will improve the execution time because instead of
comparing text the algorithm will compare numbers which is much faster.
56
FUNCTION lcs(x, y)
n = length(x), m = length(y)
FOR i = 0 TO n
FOR j = 0 TO m
IF i = 0 OR j = 0 THEN table[i, j] = “”
IF x[i] = y[j] THEN table[i, j] = x[i]
ELSE table[i, j] = table[i - 1, j]
IF length(table[i, j] < length(table[i, j - 1])) THEN
table[i, j] = table[i, j - 1]
END IF
IF x[i] = y[j] AND length(table[i, j]) < length(table[i - 1, j - 1])
THEN
table[i, j] = table[i - 1, j 1] + x[i]
END IF
END FOR
END FOR
RETURN table[n, m]
Table 3.10: A dynamic LCS algorithm [1].
Another optimization that is often used, is to first compare the beginning
and ending part of the sets, and then use the algorithm only for the part
of the sets where differences occur. This optimization reduces the size of
the sets and can give for big sets that have small differences really faster
execution. In the worst case that the first and last elements of the sets are
different it only costs two extra comparisons.
The dynamic programming algorithm actually keeps an MxN array L
which contains the length of all the common subsequence that the two sets
have. The array is filled by following the recursive approach shown at Ta-
ble 3.11. It takes O(mn) time to fill the array and the last L[m, n] element
of the array contains the total length of the LCS.
57
L[i, j] =
0 if i = 0 or j = 0
L[i - 1, j - 1] + 1 if i, j > 0 and ai = bj
max(L[i, j - 1], L[i - 1, j]) otherwise
with 0 ≤ i ≤ m and 0 ≤ j ≤ n
and a1a2...am, b1b2...bm the sets to compare
Table 3.11: Calculating the length array L.
At Table 3.12 you can see two examples of two sequences and there longest
common subsequences. We can notice that even in small examples the prob-
lem produces multiple solutions.
Using the dynamic LCS algorithm and the second example from Table 3.9
we can construct Table 3.13 that displays the LCS length table. The number
shows the length of the LCS. To read this table and find the LCS one needs
just to follow where the number changes. For example L[1, 1] equals 1 which
give us the element A, the next change is in L[2, 3] where the cell value equals
2 and the element is C. Finally the last element D will be given from L[4, 4]
that equals 3.
Example 1. Sets A = CBADAABCC, B = ABBCADACBDDBA
Largest Common Subsequences = BADAC, BADAA,
BADAB, CADAA, CADAB, . . .
Example 2. Sets A = ABCD, B = ACBDC
Largest Common Subsequences = ABC, ABD, ACD
Table 3.12: An LCS example.
We must notice that most of the used algorithms only find and use the
first LCS that the encounter. Finding all of the LCS that is contained at
the length table it requires an extra algorithm that has time complexity
O(mn) [14].
The LCS is further used to generate the Shortest Edit Script (SES) (Ap-
pendix A.2) which is the smallest script that transforms a sequence set to
another one. Generating automatically the SES is equal of generating the
58
LCS. Usually referring to diffing means to generate the SES.
0 1 2 3 4
A B C D
0 0 0 0 0 0
1 A 0 1 1 1 1
2 C 0 1 1 2 2
3 B 0 1 2 2 2
4 D 0 1 2 2 3
5 C 0 1 2 3 3
Table 3.13: A length table generated by the dynamic
LCS algorithm.
For further reading on the Longest Common Subsequence problem, see
[1], [11], [14], and [15].
3.4.2 Longest Common Subsequence at a Diverse Pop-
ulation
An algorithm that finds the LCS of multiple instances could be used to
attack a diverse population of software. The idea of this attack would be
that extracting the LCS will be equal on extracting the original source code.
For the LCS to be equal with the original source code we assume that all the
differences of the diverse population are introduced with the use of snippets.
Then that software can be cracked and published. Or someone could make
a patch with the SES that would transform any instance to the cracked LCS
version.
The first thing we should ask is if an algorithm like that is possible?
Usually diffing is done to two or even three instances. An algorithm for n
instances is feasible but because the problem is as shown at Section 3.4.1
NP-hard this algorithm will run slowly. But that is not the real problem
of this cracking method. The algorithm could be executed until comple-
tion on a suitably powerful machine, requiring neither monitoring nor user-
59
intervention. Even if the algorithm would take a few days to execute, it
would still yield the desired result.
Using the cracked LCS can be prevented by calculation a checksum for
every diverse instance, this checksum will be depended on the diversity of
the instance. Then this checksum can be checked to ensure that the correct
diverse instance is being executed. This security could be used from and for
the libraries of the application making it difficult to modify all of them.
Still the above approach can be successfully attacked. There is another
also technique that can be easily used to protect the software from extracting
the LCS. As we have seen the snippets inserted are harmless but their in-
structions are harmful. A small number of similar snippets could be inserted
on the same locations of each of the instance with the purpose of modifying
the LCS and inserting really harmful instructions in it. At Table 3.14 you
can see two snippets that when an algorithm extracts the LCS will leave a
trace of that snippets and will also modify them from harmless to harmful.
Considering the similarity between the snippets there should be a signif-
icant possibility that some of the instructions contained in the snippets will
be included to the LCS. Of course that will destroy the program by leaving
instructions that will be harmful.
Snippet 1 Snippet 2 Snippets’ LCS
pushl %eax exch %eax, %ebx
movl %ebx, %eax pushl %ebx
exch %eax, %ebx movl %eax, %ebx exch %eax, %ebx
popl %eax popl %eax popl %eax
Table 3.14: Two similar snippets and their LCS.
So we can conclude that implementing an LCS algorithm for n instances
is possible, but the result from the LCS algorithm will not be suitable for
attacking the diverse software.
60
Chapter 4
Experimental results
4.1 Diverse snippet generation by genetic com-
puting algorithm
4.1.1 Predefined initial population vs. Random initial
population
The figure 4.1 is the experiment of genetic computing algorithm under one
hundred random initial population, fifty generations, thirty percentage rate
of crossover and thirty percentage rate of mutation. The random generated
snippets have a fixed length of three. We select this length for the initial
population so the insertion crossover operator will yield not lengthy harmless
snippets. The figure 4.2 shows an experiment under the same conditions but
with twenty predefined initial population. Each snippet is harmless and is
manually created with a random length. We can see that the behavior of
the GA, by the number of different and harmful snippets is different between
predefined and random initial population.
Even most of the randomly generated snippets in the initial population
are harmful. After several generations, the genetic computing algorithm
significantly decreases the number of harmful snippets. Also, based on the
curves of best fitness value and average fitness value, the GA is improving
the snippets step by step.
61
Figure 4.1: Random initial population
Figure 4.2: Predefined initial population
62
A difference between randomly generated initial population and prede-
fined initial population is that the curves of different snippets and harmful
snippets are more fluctuant. It’s reasonable because the structure and combi-
nation of instructions in predefined initial population have more complexity
comparing with the short randomly generated initial population.
4.1.2 Tuning the parameters of genetic computing al-
gorithm
Based on different combinations of operation rate, we tested the influence of
genetic operation. All the results are under the conditions: fifty generation,
one hundred random initial population with fixed length four, same fitness
function. All the experiments had yield out reasonable harmless snippets.
Figure 4.3: Crossover 0.05% Mutation rate 0.5%
Comparing with the results of figure 4.3 and figure 4.4, it is surpris-
ing to find out that the result with low rate of crossovers and high rate of
mutations have a better average fitness. Also the average fitness converges
to the best fitness value. From the observation of the decrease velocity of
63
harmful snippets, the crossover operators indeed have more influence to the
health of snippets. This means that our design of random shift mutation is
not capable enough.
Figure 4.4: Crossover 0.5% Mutation rate 0.05%
4.1.3 Conclusion and weakness
Using a strategic combination of early stopping (when a threshold of dis-
tinctness in the population is reached), careful rectification of the fitness
function and genetic operators, the genetic computing algorithm is shown to
be suitable for creating diverse snippets.
As mentioned in the section 2.3.8, genetic algorithm as an optimal search-
ing measure has the tendency to converge to one best result. According to
our experiments, at the beginning first five steps the difference in the pop-
ulation drop significantly. By tuning the parameter and fitness functions of
genetic algorithm, we can slow this process and even bring more diversity to
the next generations. But by the nature of convergence, this diversity orig-
inates on few individuals and eventually leads to limited snippet diversity.
For instance:
64
Snippet01 Snippet02 Snippet03
pushl %ebx nop nop
exch %eax, %ebx pushl %ebx pushl %ebx
pushl %ebx exch %eax, %ebx exch %eax, %ebx
movl %eax, %ecx pushl %ebx pushl %ebx
nop exch %eax, %ebx exch %eax, %ebx
popl %eax movl %eax, %ecx movl %eax, %ecx
pushl %ebx movl %eax, %ecx movl %eax, %ecx
movl %eax, %ecx movl %eax, %ecx movl %eax, %ecx
nop popl %eax movl %eax, %ecx
exch %eax, %ebx popl %ebx nop
movl %eax, %ecx popl %eax
nop popl %ebx
popl %eax
popl %ebx
Table 4.1: Generated snippets.
At Table 4.1 there are three snippets that were automatically generated
by the genetic computing algorithm after fifty generations. It is clear that
these snippets are similar and may come from one same individual.
We only implemented few instructions because of the complexity of chain
effects of assemble instructions. It’s reasonable to hypothesis that the intro-
duction of more complex instructions will decrease the rate of successfully
generating harmless snippets but will increase the diversity of the snippets.
65
4.2 Attacking diversified software containing
snippets
4.2.1 Best Matching Node Search experimental results
The following experiments were performed to evaluate the effectiveness of
protecting software through diversification by inserting harmless code snip-
pets. It is important to know if the simple harmless snippets that are mod-
elled can actually make a difference in protecting software.
For the experiments we used a number of different tools.
1. A genetic computing algorithm utility was used, for generating diverse
harmless snippets.
2. A snippet insertion utility was used, which takes the generated snippets
and inserts them randomly in an existing assembly file. Also this utility
transforms the assembly labels to line numbers like a compiler would
transform the labels to addresses.
3. A custom diffing utility was used for the automated generation of
heuristics for each node of the assembly file.
4. A search utility was used, which performs the BMNS algorithm 3.3.1.
All four utilities were implemented and you can find the source code at
the Appendix.
The goal of our experiments was to locate the same conditional jump
instructions in a diversified program population with the use of the BMNS
algorithm.
We used four different snippet libraries of ten snippets each that were
generated by the genetic computing algorithm utility. We also took four
different source codes and compiled them without assembling to get their
assembly. Using the snippet insertion utility we generated ten different in-
stance of each of the assemblies and of each of the four different libraries.
This gave us sixteen different sets of diversified population, each having ten
66
differently diversified programs. The snippets were inserted randomly in the
assemblies. The amount of snippets was set to be from 100 to 150, exactly
how many snippets were inserted in each assembly was random.
The four generated libraries all contained “Type A” snippets 2.2.5. The
snippets were different in size, used instructions and order of the instructions.
Assembly File File 1 File 2 File 3 File 4 Snippet Results
9.00 10.001 10.00 9.00
Snippet 6.671,2 8.661 5.002 10.001
Library 1 10.00 10.00 10.00 8.00
10.00 3.251,2 10.00 10.001 8.72
10.00 10.001 10.00 10.001
Snippet 10.00 6.52 5.002 5.001,2
Library 2 10.00 10.00 10.00 2.002
7.00 10.001 10.001 8.751 8.39
5.001,2 10.00 10.001 10.00
Snippet 10.001 10.00 5.002 9.00
Library 3 9.001 10.00 10.001 5.001,2
9.00 10.001 10.001 9.501 8.84
10.00 10.001 10.00 10.001
Snippet 5.501,2 5.002 10.00 10.00
Library 4 10.001 10.001 4.001,2 10.00
10.00 10.001 10.00 10.001 9.03
File Results 8.82 8.96 8.69 8.52 Total = 8.75
All the results are out of ten.1 Manually modified heuristics.2 Identical nodes exist.
Table 4.2: The experimental results with “Type A” snip-
pets.
For each of these sixteen different sets we selected randomly four different
nodes, which we considered to contain critical instruction. We then used the
automatically generated heuristics from the diffing utility to try and locate
67
each node at each diverse assembly. If the automated heuristics located
the node at nine out of ten times then we kept that result. Otherwise by
modifying the heuristics manually we tried to improve the results. The results
of how often we locate a node by the use of the BMNS algorithm, can be
seen at Table 4.2.
The automated heuristics were generated by looking at the node only
from one calling block. Some nodes might have multiple calling blocks and
for those nodes we could use either one of the different calling blocks to
generate the heuristics. Usually before modifying the automated heuristics
we try the automated heuristics from the rest of the calling blocks.
The times that the automated heuristics were not sufficient enough to
locate the correct node we manually modified them. The modified heuristics
are marked on the table with 1.
The results are how many times the BMNS algorithm found the correct
node out of the ten diverse instances. When the result is a decimal number,
it means that the BMNS algorithm found both the correct node as well as
other duplicate nodes. This behaviour is observed for the following reasons:
1. The heuristics are not adequate.
2. The node looks very similar or even is identical with other node(s) in
the program.
3. The snippet insertion destroys the “fingerprint” of the node.
For the results at Table 4.2 the heuristics used are of the best possible;
the results are, close to optimal. We have marked which results are from a
node that has identical or very similar other nodes with 2. In those cases we
usually find the critical node in all instances but we also get other nodes with
the same heuristic values. In the case that there exists one identical node to
the node we search, then the results of the BMNS algorithm are to find both
of the nodes with the same heuristic value and that gives us a score of five
out of ten. When a node is called from a different previous block we consider
it a different node, because of that sometimes we find more than once the
wanted node and this increases the success results.
68
For the results at Table 4.2 the heuristics used are of the best possible;
the results are, if not the best possible, from the best possible. We have
marked which results are from a node that has identical or very similar other
nodes with 2. In those cases we usually find the wanted node in all instances
but we also get other nodes with the same heuristic values. In the case that
there exists one identical node to the node we search, then the results of the
BMNS algorithm are to find both of the nodes with the same heuristic value
and that gives us a score of five out of ten. When a node is approached
from a different location we consider it a different node, because of that
sometimes we find more than once the wanted node and this increases the
success results.
Tables 4.3 and 4.4, show how many times modifying the automated
heuristics we increased the success results of the BMNS algorithm. Those
tables show how many identical nodes we encountered. Table 4.3 shows
this results compared to the assembly file and Table 4.4 shows this results
compared to the snippet library.
Assembly File File 1 File 2 File 3 File 4
# of identical nodes 3 3 4 3
# of modifying heuristics 6 9 5 9
All the results are out of sixteen.
Table 4.3: The assembly file results.
Snippet Library Library 1 Library 2 Library 3 Library 4
# of identical nodes 3 4 3 3
# of modifying heuristics 6 6 9 8
All the results are out of sixteen.
Table 4.4: The snippet libraries results.
Figure 4.5 shows us the average success results of the BMNS algorithm
depending the assembly file that is used. The magenta coloured line is the
69
Figure 4.5: Average results by assembly file.
total average success of the algorithm. We notice that the assembly file four
has the worst results. The node that was used for the seventh experiment on
that file had another four identical nodes. Because of that node success is
only two out of ten. That bad result affected the assembly file average result
significantly.
Figure 4.6 displays the average success results depending on the snippet
library used. Again the magenta coloured line represents the total average
success of the algorithm. The results show that snippet library two has the
smallest success. Again this result is affected much from the identical node
at the seventh experiment of the assembly file four. Looking at Table 4.4 we
notice that snippet library two had one extra identical node from the rest
snippet libraries.
At Figure 4.7, the magenta coloured line represents the total average
success, and the yellow coloured line represents the average success of the
individual file. We can notice that around four out of sixteen nodes of each
70
Figure 4.6: Average results by snippet library.
file have a success value less than the average. The results are similar also
for the snippet libraries Figure 4.8.
We can notice that eventually the success results are more affected by the
number of identical nodes than the actual characteristics of the “Type A”
snippets 2.2.5. The assembly files in average have three identical nodes in
the sample of sixteen random nodes. The rest of the nodes could be located
with a very good success rate.
A last observation about the above experiments. The snippets were in-
serted within, at least one of the three blocks of the critical node, more that
80 percent of the time.
The results at Table 4.5 were taken by performing the experiments with
a snippet library that is using “Type B” snippets 2.2.5. This library changes
the control flow graph of the program. This modification to the library
creates some extra difficulties for the attack. First it adds “fake” nodes
increasing the size of the tree significantly. Also if the added snippet is
71
Figure 4.7: Results of the assembly files.
Figure 4.8: Results of the snippet libraries.
72
inserted near the critical node the information about the heuristic block
there is mangled. We can see at Table 4.5 that the success of the BMNS
algorithm has been decreased to some nodes. We also can notice that some
nodes were less affected. We elaborate more on the influence of snippets at
Section 4.2.2.
“Type B” Snippet library
File 1 File 4
9 4.51
91 91
9 6.51
9 6.67 Total = 7.83
All the results are out of ten.1 Manually modified heuristics.
Table 4.5: The experimental results with “Type B” snip-
pets.
Finally we experimented with a snippet library that contains “Type C”
snippets 2.2.5. This means that the snippets imitate the instructions sur-
rounding the critical node. The problem with these snippets is that they are
not harmless and it is not known they can be made harmless by a genetic
computing algorithm and still keep their similarity with the critical node
instructions.
This experiment was performed on a specific node which it had character-
izing blocks 3.4 and with the previous snippet libraries would perform well.
The node was successfully located 5.66 times out of the 10 with heuristics
selected looking at only one diverse instance and reached 8.33 out of 10 with
heuristics taken after comparing the node from five diverse instances. As
can be seen from table 4.6, the results do not improve if we compare more
than five diverse instances. We also note that comparing only two instances
produces better information than comparing four instances. This could be a
result of comparing a bad instance containing many snippets.
73
# instances compared 1 2 4 5 7 10
Result 5.66 8 7 8.33 8.33 8.33
Table 4.6: The experimental results with “Type C” snip-
pets.
4.2.2 Weaknesses
The BMNS algorithm performed very well with “Type A” snippets 2.2.5.
But when the insertion of “Type B” snippets 2.2.5 was tested we observed
a decrease to the performance of the algorithm. We can hypothesise that a
node is affected more from inserting “Type B” snippets 2.2.5:
Hypothesis 4.1. When a node has only one characterizing block 3.4.
Hypothesis 4.2. When the node’s After Branch block is part of one of the
other block.
Hypothesis 4.3. If all three blocks get mangled from the insertion of the
snippets.
The first hypothesis is saying that we locate the node because a specific
block is unique and the other two blocks do not actually influence the result
of the BMNS algorithm then the probability of a snippet destroying the node
is big.
The second hypothesis says actually that the After Branch block is the
same with one of the other blocks. That means that mangling one of the two
blocks by inserting a snippet it will mangle also the other block.
The third assumption is a rare case for random snippet insertion. But
generally if the snippets destroy all three block by inserting them to each
block then that node will not have instructions at the surrounding blocks.
To better overcome the difficulties that snippets with conditional jumps
create the BMNS algorithm could prune the search tree by making the as-
sumption that it looks for a specific conditional jump as a critical instruction.
74
This would prune much of the nodes in the tree making the remainder of the
conditional jump instructions available for heuristic calculation.
Another modification that could be implemented in the BMNS algorithm,
is for the algorithm to keep a short history of the previous nodes, then using
that history it can calculate heuristics by combining the blocks. This mod-
ification would protect the destruction of the blocks and would improve the
results when the inserted snippets use conditional jump instructions.
We saw in the experiments that inserting snippets that are similar to the
critical node and combine the same conditional jump with the node affects
the result of the algorithm significantly 4.6. The reasons are the same with
inserting snippets with conditional jump instructions as explained above but
this specific case introduces an extra difficulty.
Hypothesis 4.4. Snippets that illustrate the critical instruction blocks affect
the BMNS algorithm more.
Adding “Type C” snippets 2.2.5 results in other nodes to have greater
similarity than the original node has after is affected by snippets. Even if our
experiment showed that the node is located successfully some of the times
it is very probable that with the use of a better insertion function that does
not inserts randomly the snippets the BMNS algorithm will have difficulty
locating the correct node within the best nodes.
Last if a model of diversification like the model that is presented at 5.2.2
becomes functional then the BMNS algorithm will fail on locating the critical
instruction. This diversification model could be attacked only by a program
that would analyse the execution and understand the use of the instructions.
An algorithm like that would actually crack every instance and any program
that is protected by the method that the algorithm knows.
That the BMNS algorithm sometimes locates duplicate nodes, which ex-
ist in the original source code, it is not an actual weakness for the BMNS
algorithm. Even if the algorithm finds a large amount of similar nodes, as
long as it locates the critical instruction within those nodes, the crack could
patch all of the nodes producing multiple cracked programs. Some of those
programs would be negatively affected by the patch but a simple execution
75
on each of those programs will reveal which is the correct cracked program.
4.2.3 Conclusions
After experimenting with diverse program instances, we concluded that tar-
geted diversification could improve the software protection under some con-
ditions. The BMNS algorithm presented at Section 3.3.1 shows that it can
locate the critical instructions in a diverse population most of the times.
The experiments certainly showed that snippets that do not alter the
CFG (Section 3.2.2) of the program will not hinder at all the automation of
the cracking. Altering the CFG complicates things for the attackers, but not
enough. While the original semantics remain unchanged inside the program
the attackers can find ways to locate it. Diversification should attempt to
alter real parts of the source code changing not only the CFG but also the
content of the binary code.
Looking from another perspective at software defence, someone could say
that a patch that partly disassembles and generates a CFG-alike tree to
locate the location that needs to be patched resembles to be an expensive
attack. But this depends much from the protected software. Small cheap
utilities that are distributed through internet could benefit from a protection
like that. But attackers certainly will use this more advance patch system to
attack expensive specialized applications. Also there exist crackers (white-
hats [18]) that attack software not for profit, but to display that all systems
can be cracked and find vulnerabilities to security systems.
76
Chapter 5
Conclusions
5.1 General Conclusions
Our work illustrates that the arms race between the “Defenders” and the
“Attackers” is a never ending game of evolving techniques to either protect or
to attack. Reality shows us that no matter the developed protection systems
after a short period someone will develop an attack method to defeat the
new protections measures.
Because genetic algorithms are mostly used as optimization algorithms,
the characteristics of convergence will finally destroy the diversity of the
snippet population. Using a strategic combination of early stopping (when a
threshold of distinctness in the population is reached), careful rectification of
the fitness function and genetic operations, the genetic computing algorithm
is shown to be suitable for creating diverse snippets. Despite that only a
limited number of instructions are implemented in our CPU model due to
implications of the symbolic representation, reasonably diverse and harmless
snippets were yielded based on an initial population of basic snippets. Com-
bining these results with the attack chapter results, one important point is
clear: snippets generated without considering instruction homogeneity with
the critical node only offer minimal resistance against the BMNS algorithm.
The BMNS algorithm is a new attack system that targets specifically
the protection systems that use diversification to hide the critical instruc-
77
tion of software. It is capable of searching in a simplified CFG and uses
a “fingerprint” to locate the critical instruction. The empirical results in
Chapter 4.2.1 show that diversification can be attacked by using this ad-
vanced search method. At the previous year thesis [20], it was claimed that
diversification protects from searching algorithms. This is true as long the
searching algorithm only searches the diverse file for specific bytes.
Based on our empirical experience, we can say that snippets which do
not modify the CFG of the program have a poor performance. To make
diversification a realistic protection method, the snippets have to modify the
CFG of the program. Even then the BMNS algorithm showed resilience and
managed to perform well enough.
We also experimented with snippets that both imitate the critical instruc-
tion and modify the CFG. The experiments that we performed with “Type
C” snippets revealed us the necessity of a targeted snippet insertion function
instead of the simple solution of random insertion function.
Already, the diversification starts to evolve towards methods modifying
and obfuscating the CFG so the naıve BMNS algorithm will loose the required
information to identify the critical node [19]. But again the analysis tools
are also improved to discover the effects of this methods [6].
5.2 Further work
5.2.1 For BMNS algorithm
Implementing and testing the BMNS algorithm suggested to us possible im-
provements and additions that could be implemented. Some of the additions
might only improve the results against the current diversification methods,
but there is an interesting improvement that would evolve the BMNS algo-
rithm and intimidate further and more advanced diversification systems.
The BMNS algorithm should namely be able to keep a history of visited
nodes. This way it could merge basic blocks that were separated by the
insertion of snippets that extend the CFG by inserting opaque control flow
transfers.
78
This modification should extend the BMNS algorithm to overcome snip-
pets that use conditional jumps to change the CFG. Even targeted snippets
that would be inserted directly to protect the critical node will be bypassed
by this addition.
5.2.2 A new diversification model
Diversification is a promising approach to protect software. We believe that
with appropriate diversification method the automation of cracking could be
thwarted to a great extent.
We want to propose a diversification system that actually changes the
structure of the software. This can be accomplished at the compilation level
of the software. To develop software someone can use different high level
programming languages and different compilers, resulting in a diverse pop-
ulation of program instances that all have exactly the same input/output
behaviour. The system we want to propose is based on a high level source
code that can be compiled in many different ways generating functionally
equivalent but structurally very different binaries.
Figure 5.1: Diversified compiler model.
At Figure 5.1 we present a schematic of the proposed diversifier compiler
model. Each time when the compiler needs to compile a high level instruc-
tion, it has to select from a pool of different assembly implementations, either
randomly or with more advanced methods. This comes down to compiling
one high level program to many different binaries.
A diversified population generated from a compiler like this will most
likely not have many common patterns for an automated crack to locate and
patch. This diverse population is not achieved with our current state-of-the-
art snippets and our diversification is working on certain parts of the software
79
itself.
Some easy strategies to implement on a compiler which will generate di-
verse population could be: randomly sorting the subroutines, changing the
used registers, randomly locating the program data, using inverted condi-
tional jumps, etc. It is interesting to note the difficulty to modify the control
flow graph. All the mentioned strategies do not actually modify the structure
of the control flow chart. The strategies should target to destroy as much
commonality as possible.
A diversified population as such will require either a search algorithm that
understands the meaning of the instructions, automatically performs analyse
and tampering, and thus actually automates the crack procedure. This would
resemble a universal cracker. Or it would require a very effective decompiler
which will return a very similar source code like the original high level source
code to recognize via a high-level semantic analysis which instruction is the
critical one.
5.2.3 Homogeneity of instructions in a snippet
At this moment, the generation of diverse snippets by our genetic comput-
ing algorithm has less relation with the original code. The initial fitness
value of each instruction could be based on the statistical analysis properties
of the original code, but the information of the structure and combination
not yet taken into account. Considering the attack results in the worst sce-
nario 4.2.1, only the snippet with the closest resemblance to the Before, After
and After Branch blocks of the critical node, results in a significant increase
in the cracking cost. Hence, for the maximizing snippet efficiency to obscure
crackers, homogeneity of instructions in a snippet must strongly influence an
individual’s fitness.
A strategy based on the weakness of BMNS is introduced.
Figure 5.2 shows part of the assembly instructions from the critical node
used by the BMNS algorithm. We defined an encouragement strategy ac-
cording to the different levels of the homogeneity. In this idea, if the auto-
generated snippet has the same structure with original assembly codes, a
80
Figure 5.2: Homogeneity encouragement strategy
great bonus (x 6) is assigned to the fitness value. If only part of the snippet
satisfies these requirements, fewer bonuses will be assigned. By this ap-
proach, more information from the original code is used to contribute to the
fitness value. Through the evaluation of a genetic algorithm, the population
will have a better tendency to converge to the homogeneous snippets.
We could measure the similarity of the snippets with the structure of the
original assembly by the use of an LCS algorithm [12].
81
Appendix A
Appendix
A.1 Assembly instructions in diversified soft-
ware
The assembly instructions have a specific format. The topic of this section
is to explain this format, and to show that only a small part of the assembly
instructions is modified when we use diversification. Because of that we can
use heuristics to locate a specific instruction in the assembly.
In Figure A.1 you can see the most general format of an assembly in-
struction. All assembly instructions will have a subset of the shown sections
and will always contain the “Opcode” part.
Figure A.1: Assembly instruction format.
First we must note that an assembly instruction is the series of some
82
bytes and we represent them in hexadecimal format. There exists also a
symbolic representation that is easier to read for humans and understand
the instruction.
Each part of the assembly instructions has a specific size range in bytes.
The first part “Instruction Prefixes” it is not commonly used any more. It
is from one to four bytes long and diversification does not modify it.
The second part of an assembly instruction is the most important. It is
the “Opcode”. This part specifies which specific instruction is going to be
used. It varies on size from one to three bytes. Some times also three bits of
the operand are included to the next part of the instruction. This part does
not change by diversification. This is the part that we will use mostly for
heuristics to locate the critical instruction.
The third part of an assembly instruction is the “ModR/M” part. This
part defines which register the instruction is going to use or which type of
memory addressing the instruction will use. Some encodings need also a
second byte that is called “SIB”. This fourth part is connected with the
“ModR/M” part and it only appears if it is required from the “ModR/M”
part. Those two parts are an extension of the “Opcode” and should be used
for the heuristics. There is a possibility that the diversification could change
part of this code by moving the distance an instruction jumps to. This would
force a change from a short jump to a long jump. Because we do not look
at the jump instructions but we only follow them, this specific modification
does not affect our method.
The fifth part is the Displacement. This part actually tells the instruction
to which memory to jump and take information. It is zero, one, two or
four bytes long. Depending of how far the program must jump to go to the
necessary data. This part is the main part of the instructions that is changed
from diversification. The rest parts of the instructions remain the same.
Last part of an assembly instruction is the Immediate. This part is similar
to the Displacement part. It can have a length of zero, one, two or four bytes.
This can be data or constant address in the source code. In the case of that
part containing an address it will be modified by the diversification.
Both for the “displacement” and the “immediate” part, their size is de-
83
fined at the “ModR/M” part of the instruction. So reading that part of the
instruction will guide us how to skip this information.
Let us illustrate an example of how the instruction mov eax, ebx (Intel
representation) will look at its hexadecimal format. The eax, ebx registers
are 32-bit registers, this tells us that the mov instruction must be the one for
32-bit registers. Looking at the manual we find that the operation code of
the mov instruction for 32-bit registers in hexadecimal value is 89. Next we
need to define the “ModR/M” part of the instruction that is for moving from
the eax to the ebx register. Again looking at the manual we find that the
“ModR/M” byte will have the hexadecimal value 18. Finally the instruction
mov eax, ebx has the value 89 18 in hexadecimal.
Now, let us look at a more complex example that contains an address
within. If for example we have the instruction mov [label], eax. This instruc-
tion tells to the processor to read from the memory location [label] and put
the data in the eax register. This instruction again is 32-bit mov instruction
and thus it will have the same operand code 89. But the “ModR/M” byte
now will change. Looking at the manual we find that the hexadecimal value
of the “ModR/M” will now be 05. This information is not enough for the in-
struction to be executed. We also need a displacement to the address [label].
So the last part of the instruction will be a 32-bit displacement address that
the value will change at each instance of the diverse population. Finally we
have the instruction 89 05 xx xx xx xx where the xx xx xx xx contains the
displacement address in hexadecimal form. Still we can see that a part of
the instruction does not change and might still be located with a search for
the appropriate opcode.
For further reading on assembly instructions, see [5].
A.2 Shortest edit script
The Shortest Edit Script (SES) is the smallest script that transforms a se-
quence set to another one. Generating automatically the SES is equal of
generating the LCS. When we refer to diffing we mean generating the SES.
The algorithm presented at Table A.1 generates the LCS/SES in polyno-
84
mial time for two different sets (a, b). The time complexity of the algorithm
is O(ND) where N is the sum of the lengths of the two sets and D is the
length of the produced SES.
CONSTANT MAX = M + N
INTEGER V[-MAX..MAX]
V[1] = 0
FOR D = 0 TO MAX
FOR k = -D TO D STEP 2
IF k = -D OR k 6= D AND V[k - 1] < V[k + 1] THEN
x = V[k + 1]
ELSE
x = V[k - 1] + 1
END IF
y = x - k
WHILE x < N AND y < M AND ax+1 = by+1
x = x + 1
y = y + 1
END WHILE
V[k] = x
IF x ≥ N AND y ≥ M THEN
Length of an SES is D
STOP
END IF
END FOR
END FOR
Length of an SES is greater than MAX
Table A.1: The greedy LCS/SES algorithm [21].
An example of an LCS and the SES of two sets can be found at Table A.2.
At the SES the D symbol means that delete so 1D, 2D means that we must
delete the character at position 1, 2 the I symbol stands for Insert and it
means that we must insert a symbol after the location, so 3IB means insert
85
B after character 3. All the modifications are considered to take place si-
multaneous so the locations mentioned are for the initial sequence. The SES
would modify the set A and produce the set B.
Sets A = ABCABBA and B = CBABAC
Largest Common Subsequences = CABA, BABA, CBBA...
Shortest Edit Script = 1D, 2D, 3IB, 6D, 7IC
Table A.2: An example of SES.
For further reading on the Shortest Edit Script see [11], and [21].
A.3 Best Matching Node Search algorithm in
pseudocode
GLOBAL HeuBefore, HeuAfter, HeuAfterBranch, Tree
READ HeuBefore, HeuAfter, HeuAfterBranch, Tree
CurAddress = 0, StartAddress = 0
NextInstruction = NULL, Node = NULL
Before = empty, After = empty, AfterBranch = empty
Q = empty, ExpandedQ = empty
WHILE Tree[CurAddress] IS NOT NULL
NextInstruction = Tree[CurAddress]
IF NextInstruction = “jmp <address>” THEN
CurAddress = $address
ELSE IF NextInstruction = conditional jump instruction THEN
Add Node(CurAddress:StartAddress) to Q
EXIT WHILE
ELSE IF NextInstruction = any other instruction THEN
Increase CurAddress
END IF
86
END WHILE
WHILE Q IS NOT empty
Take first Node from Q
StartAddress = Node.StartAddress
CurAddress = Node.StartAddress
WHILE Tree[CurAddress] IS NOT NULL
NextInstruction = Tree[CurAddress]
IF NextInstruction = Jmp $address THEN
CurAddress = $address
ELSE IF NextInstruction = conditional jump instruction THEN
After = ExpandNode(CurAddress + 1)
AfterBranch = ExpandNode(Node.JumpAddress)
Node.Heuristics = CalculateHeu(Before, HeuBefore)
Node.Heuristics += CalculateHeu(After, HeuAfter)
Node.Heuristics += CalculateHeu(AfterBranch, HeuAfterBranch)
IF Node.Heuristics > BestNode.Heuristics THEN
BestNode.Heuristics = Node.Heuristics
END IF
Add Node to ExpandedQ
Remove Node from Q
EXIT WHILE
ELSE IF NextInstruction = any other instruction THEN
IF Before IS full THEN
Shift all instruction one place higher
Before[last] = NULL
END IF
Add NextInstruction to Before
Increase CurAddress
END IF
END WHILE
END WHILE
Table A.3: The BMNS algorithm in pseudocode.
FUNCTION ExpandNode(ExpAddress) RETURNS Block
87
Block = empty
StartAddress = ExpAddress
WHILE Tree[ExpAddress] IS NOT NULL
NextInstruction = Tree[ExpAddress]
IF NextInstruction = Jmp $address THEN
ExpAddress = $address
ELSE IF NextInstruction = conditional jump instruction THEN
IF NOT EXIST Node IN Q THEN
Add Node(ExpAddress:StartAddress) to Q
END IF
EXIT WHILE
ELSE IF NextInstruction = any other instruction THEN
IF Block IS NOT full THEN
Add NextInstruction to Block
END IF
Increase CurAddress
END IF
END WHILE
RETURN Block
Table A.4: The expand node function in pseudocode.
FUNCTION CalculateHeu(Block, HeuBlock) RETURNS H
H = 0, I = 0, J = 0
WHILE Block [I ] IS NOT NULL
IF Block [I ] = HeuBlock [J ] THEN
H = H + 1
J = J + 1
END IF
I = I + 1
END WHILE
RETURN H
Table A.5: The heuristic function in pseudocode.
88
List of Tables
1.1 Merckx’s table of countermeasures . . . . . . . . . . . . . . . . . . 3
2.1 The CPU simulator pseudocode. . . . . . . . . . . . . . . . . . . . 9
2.4 Genetic computing algorithm pseudocode. . . . . . . . . . . . . . . 16
2.6 Insertion algorithm pseudocode. . . . . . . . . . . . . . . . . . . . . 26
3.1 Graphs definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 A CFG example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 The DFS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 The BFS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 The example program. . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 The BMNS algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Example of calculating heuristic value of a node. . . . . . . . . . . 49
3.8 The sort three example program with and without parameters. . . 51
3.9 A recursive LCS algorithm [1]. . . . . . . . . . . . . . . . . . . . . 56
3.10 A dynamic LCS algorithm [1]. . . . . . . . . . . . . . . . . . . . . . 57
3.11 Calculating the length array L. . . . . . . . . . . . . . . . . . . . . 58
3.12 An LCS example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.13 A length table generated by the dynamic LCS algorithm. . . . . . 59
3.14 Two similar snippets and their LCS. . . . . . . . . . . . . . . . . . 60
4.1 Generated snippets. . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 The experimental results with “Type A” snippets. . . . . . . . . . 67
4.3 The assembly file results. . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 The snippet libraries results. . . . . . . . . . . . . . . . . . . . . . 69
4.5 The experimental results with “Type B” snippets. . . . . . . . . . 73
4.6 The experimental results with “Type C” snippets. . . . . . . . . . 74
89
A.1 The greedy LCS/SES algorithm [21]. . . . . . . . . . . . . . . . . . 85
A.2 An example of SES. . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.3 The BMNS algorithm in pseudocode. . . . . . . . . . . . . . . . . . 87
A.4 The expand node function in pseudocode. . . . . . . . . . . . . . . 88
A.5 The heuristic function in pseudocode. . . . . . . . . . . . . . . . . 88
90
List of Figures
1.1 Cracked software distribution. . . . . . . . . . . . . . . . . . . . . . 4
2.1 The CPU simulator. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Class of snippet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 An instance of LabelMap. . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Class of register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Roulette-wheel selection . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Insertion crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 “Cut and splice” crossover . . . . . . . . . . . . . . . . . . . . . . . 24
2.8 Randomly shift mutation . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 JZ conditional jump . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10 Sketch of a fitness landscape. The arrows indicate the preferred flow
of a population on the landscape, and the points A, B, and C are
local optima. The red ball indicates a population. [9] . . . . . . . . 28
3.1 The CFG of the example at Table 3.2. . . . . . . . . . . . . . . . . 34
3.2 Searching a graph example. . . . . . . . . . . . . . . . . . . . . . . 36
3.3 The three node blocks. . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 The represented search tree. . . . . . . . . . . . . . . . . . . . . . . 42
3.5 The represented tree with node representation. . . . . . . . . . . . 43
3.6 A node and its surrounding instructions. . . . . . . . . . . . . . . . 48
3.7 The represented search tree. . . . . . . . . . . . . . . . . . . . . . . 52
4.1 Random initial population . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Predefined initial population . . . . . . . . . . . . . . . . . . . . . 62
4.3 Crossover 0.05% Mutation rate 0.5% . . . . . . . . . . . . . . . . . 63
4.4 Crossover 0.5% Mutation rate 0.05% . . . . . . . . . . . . . . . . . 64
4.5 Average results by assembly file. . . . . . . . . . . . . . . . . . . . 70
91
4.6 Average results by snippet library. . . . . . . . . . . . . . . . . . . 71
4.7 Results of the assembly files. . . . . . . . . . . . . . . . . . . . . . 72
4.8 Results of the snippet libraries. . . . . . . . . . . . . . . . . . . . . 72
5.1 Diversified compiler model. . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Homogeneity encouragement strategy . . . . . . . . . . . . . . . . 81
A.1 Assembly instruction format. . . . . . . . . . . . . . . . . . . . . . 82
92
Bibliography
[1] The Algorithmist. Longest common subsequence. http://
www.algorithmist.com/index.php/Longest Common Subsequence, 2007.
[2] Frances E. Allen. Control flow analysis. SIGPLAN Not., 5(7):1–19, 1970.
[3] Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. Software piracy
prevention through diversity. In DRM ’04: Proceedings of the 4th ACM work-
shop on Digital rights management, pages 63–71, New York, NY, USA, 2004.
ACM Press.
[4] Timothy Budd. Classic Data Structures in Java. Addison-Wesley, 2001.
[5] Intel Corporation. IA-32 Intel Architecture Software Developers Manual, vol-
umes 1-3, 1997-2005.
[6] Mila Dalla Preda, Matias Madou, Koen De Bosschere, and Roberto Gia-
cobazzi. Opaque predicates detection by abstract interpretation. In Proceed-
ings of the 1st International Workshop on Emerging Applications of Abstract
Interpretation (EAAI06), pages 35–50, Vienna, Austria, 2006. ENTCS.
[7] Thomas Dullien. Graph-based comparison of executable objects. In In sym-
posium sur la Scurit des technologies de l’information et des communications.
University of Technology in Florida, 2005.
[8] Wikipedia The Free Encyclopedia. Depth-first search, breadth-first search.
http://www.wikipedia.org/, 2007.
[9] Wikipedia The Free Encyclopedia. Genetic algorithm, fitness landscape.
http://www.wikipedia.org/, 2007.
93
[10] Wikipedia The Free Encyclopedia. Graph (mathematics), graph (data struc-
ture), graph theory, control flow graph. http://www.wikipedia.org/, 2007.
[11] Wikipedia The Free Encyclopedia. Longest common subsequence problem,
diff. http://www.wikipedia.org/, 2007.
[12] H. Fashandi and A.M.E. Moghaddam. A new rotation invariant similarity
measure for trajectories. In Computational Intelligence in Robotics and Au-
tomation, 2005. CIRA 2005. Proceedings. 2005 IEEE International Sympo-
sium, pages 631–634, 2005.
[13] David B. Fogel. Evolutionary Computation: Toward a New Philosophy of
Machine Intelligence. Wiley-IEEE, 2006.
[14] Ronald I. Greenberg. Fast and simple computation of all longest common
subsequences, 2002.
[15] Ronald I. Greenberg. Bounds on the number of longest common subsequences,
2003.
[16] Markus Jakobsson and Michael K. Reiter. Discouraging software piracy using
software aging. In Security and Privacy in Digital Rights Management : ACM
CCS-8 Workshop DRM 2001. Springer Berlin / Heidelberg, 2001.
[17] William B. Langdon and Riccardo Poli. Foundations of Genetic Programming.
Springer, 2002.
[18] A. Main and P.C. van Oorschot. Software protection and application security:
Understanding the battleground, 2004.
[19] Anirban Majumdar and Clark Thomborson. Manufacturing opaque predi-
cates in distributed systems for code obfuscation. In ACSC ’06: Proceedings
of the 29th Australasian Computer Science Conference, pages 187–196, Dar-
linghurst, Australia, Australia, 2006. Australian Computer Society, Inc.
[20] Gert Merckxt. Software security through targetted diversification. Master’s
thesis, Katholieke Universiteit Leuven, 2005-2006.
[21] Eugene W. Myers. An o(ND) difference algorithm and its variation. Algo-
rithmica, 1(2):251–266, 1986.
94
[22] Chris Nielson Flemming Nielson Hanne R. Hankin. Principles of Program
Analysis. Springer, 2005.
[23] Thomas Obnigene. Dvd glossary.
http://www.filmfodder.com/movies/dvd/glossary/glossary.htm, 2007.
[24] National Institute of Standards and Technology. breadth-first search.
http://www.nist.gov/dads/HTML/breadthfirst.html, 2007.
[25] National Institute of Standards and Technology. depth-first search.
http://www.nist.gov/dads/HTML/depthfirst.html, 2007.
[26] Justinian P. Rosca. Analysis of complexity drift in genetic programming.
Genetic Programming 1997: Proceedings of the Second Annual Conference,
pages 286–294, 1997.
[27] Todd Sabin. Comparing binaries with graph isomorphisms.
http://www.bindview.com/Services/Razor/Papers/2004/
comparing binaries.cfm, 2004.
[28] Margaret Sackeyfio. Mathematical modeling of music downloading and online
piracy. Master’s thesis, Baruch College, 2005.
[29] Olin Shivers. Control-Flow Analysis of Higher-Order Languages or Taming
Lambda. PhD thesis, School of Computer Science Carnegie Mellon University
Pittsburgh, 1991.
[30] Jeremy P. Spinrad. Efficient Graph Representations. American Mathematical
Society, 2003.
[31] JSrg Tiedemann. Automatic construction of weighted string similarity mea-
sures. Department of Linguistics Uppsala University, 1999.
[32] Kent State University. Graph algorithms, depth first search (dfs), breadth
first search (bfs).
http://www.personal.kent.edu/∼rmuhamma/Algorithms/algorithm.html.
[33] Patrick Henry Winston. Artificial Intelligence Third Edition. Addison-Wesley,
1992.
95
[34] David H. Wolpert and William G. No free lunch theorems for search. IEEE
transactions on evolutionary computation, 1997.
96