reverse engineering techniques
DESCRIPTION
REVERSE ENGINEERING TECHNIQUES. Presented By Sachin Saboji 2sd06cs079. CONTENTS. Introduction Front End Data Analysis Control Flow Analysis Back End Challenges Conclusion Bibliography. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/1.jpg)
REVERSE ENGINEERING TECHNIQUES
Presented BySachin Saboji2sd06cs079
![Page 2: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/2.jpg)
CONTENTS Introduction Front End Data Analysis Control Flow Analysis Back End Challenges Conclusion Bibliography
![Page 3: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/3.jpg)
INTRODUCTION The purpose of reverse engineering is to
understand a software system in order to facilitate enhancement, correction, documentation, redesign, or reprogramming in a different programming language.
![Page 4: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/4.jpg)
What is Reverse Engineering Reverse engineering is the process of
analyzing a subject system to identify the system’s components and their interrelationships and create representations of the system in another form or at a higher level of abstraction.
![Page 5: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/5.jpg)
Difference between Reverse Engineering and Forward Engineering.
REVERSE ENGINEERING FORWARD ENGINEERING
Given an application, deduce tentative requirements. Given requirements, develop an application.
Less certain. An implementation can yield different requirements,
depending on the reverse engineer’s interpretation.
More certain. The developer has requirements and must deliver
an application that implements them.
Adaptive. The reverse engineer must find out what the developer
actually did.
Prescriptive. Developers are told how to work.
Less mature. Skilled staff sparse. More mature. Skilled staff readily available.
Can be performed 10 to 100 times faster than forward
engineering.(days to weeks of work).
Time consuming (months to years of work).
The model can be imperfect. Salvaging partial information is still
useful.
The model must be correct and complete or the application will
fail.
![Page 6: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/6.jpg)
How Reverse Engineering can be achieved.
1. Decompilers2. The Phases of a Decompilers.3. The Grouping of Phases.
![Page 7: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/7.jpg)
Decompilers
A decompiler is a program that reads a program written in a machine language and translates it into an equivalent program in a high level language.
![Page 8: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/8.jpg)
The Phases of a Decompiler A decompiler is structured in a similar
way to a compiler, by a series of phases that transform the source machine program from one representation to another.
No lexical analysis or scanning phase in the decompiler.
![Page 9: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/9.jpg)
The Grouping of Phases
The decompiler phases are normally grouped into three different modules
Front-end UDM Back-end
![Page 10: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/10.jpg)
FRONT END
The front-end is a machine dependent module, which takes as input the binary source program, parses the program, and produces as output the control flow graph and intermediate language representation of the program.
![Page 11: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/11.jpg)
Syntax Analysis
The syntax analyzer is the first phase of the decompiler.
This sequence of bytes is checked for syntactic structure.
Valid strings are represented by a parse tree, which is input into the next phase, the semantic analyzer.
![Page 12: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/12.jpg)
Semantic Analysis The semantic analysis phase determines
the meaning of a group of machine instructions.
It collects information on the individual instructions of a subroutine.
![Page 13: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/13.jpg)
Intermediate Code Generation In a decompiler, the front-end translates
the machine language source code into an intermediate representation, which is suitable for analysis by the Universal Decompiling Machine.
The low-level intermediate representation is implemented in quadruples
opcode dest src1 src2
![Page 14: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/14.jpg)
Control Flow Graph Generation
The control flow graph generation phase constructs a call graph of the source program, and a control flow graph of basic blocks for each subroutine of the program.
Basic Blocks A basic block is a sequence of
instructions that has one entry point and one exit point.
![Page 15: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/15.jpg)
Control Flow Graphs
A control flow graph is a directed graph that represents the flow of control of a program.
The nodes of this graph represent basic blocks of the program, and the edges represent the flow of control between nodes.
Graph Optimizations Flow-of-control optimization is the method by
which redundant jumps are eliminated.
![Page 16: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/16.jpg)
Universal Decompiling Machine
The UDM is an intermediate module that is completely machine and language independent
It performs the core of the Decompiling analysis.
Two phases included in this module are the data flow and the control flow analyzers.
![Page 17: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/17.jpg)
Data Flow Analysis
The low-level intermediate code generated by the front-end is an assembler type representation that makes use of registers and condition codes.
Data flow analysis is the process of collecting information about the way variables are used in a program, and summarizing it in the form of sets.
![Page 18: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/18.jpg)
Code Improving Optimizations
This section presents the code-improving transformations used by a decompiler.
Dead-Register Elimination Dead-Condition Code Elimination Condition Code Propagation
![Page 19: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/19.jpg)
Control Flow Analysis The control flow graph constructed by the
front-end has no information on high-level language control structures.
Such a graph can be converted into a structured high-level language graph by means of a structuring algorithm.
![Page 20: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/20.jpg)
Structuring Algorithms In Decompilation, the aim of a structuring algorithm
is to determine the underlying control structures of an arbitrary graph, thus converting it into a functional and semantic equivalent graph.
Structuring Loops In order to structure loops, a loop in terms of a graph
representation needs to be defined. This representation must be able to not only determine the extent of a loop, but also provide a nesting order for the loops.
![Page 21: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/21.jpg)
Finding the type of Loop Finding the Loop Follow Node
Application OrderThe structuring algorithms presented in the
previous sections determine the entry and exit (i.e. header and follow) nodes of sub graph that represent high-level loops.
![Page 22: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/22.jpg)
The Back-end
This module is composed by the code generator.
The code generator generates code for a predefined target high-level language.
Generating Code for a Basic Block For each basic block, the instructions in
the basic block are mapped to an equivalent instruction of the target language.
![Page 23: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/23.jpg)
Generating Code for asgn Instructions Generating Code for call Instructions Generating Code for ret Instructions Generating Code from Control Flow
Graphs
![Page 24: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/24.jpg)
Challenges The main difficulty of a decompiler parser is the
separation of code from data, that is, the determination of which bytes in memory represent code and which ones represent data.
Less mature. Skilled staff sparse. The model can be imperfect. Salvaging partial
information is still useful. Less certain. An implementation can yield different
requirements, depending on the reverse engineer’s interpretation.
The determination of data types such as arrays, records, and pointers.
![Page 25: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/25.jpg)
Conclusion This seminar report has presented
techniques for the reverse compilation or Decompilation of binary programs.
A decompiler for a different machine can be built by writing a new front-end for that machine, and a decompiler for a different target high-level language can be built by writing a new back-end for the target language.
![Page 26: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/26.jpg)
Further work on Decompilation can be done in two areas: the separation of code and data, and the determination of data types such as arrays, records, and pointers.
![Page 27: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/27.jpg)
Bibliography
A. V. Aho, R. Sethi, and J. D. Ullman. Principles of Compiler Design. Narosa Publication House, 1985.
Michael Blaha and James Rumbaugh. Object-Oriented Modeling and Design with UML. Second Edition, Pearson Education 2008.
Cristina Cifuentes. Reverse Compilation Techniques, PhD Thesis. Queensland University of Technology, 1994.
http://www.cc.gatech.edu/reverse.
![Page 28: REVERSE ENGINEERING TECHNIQUES](https://reader031.vdocuments.site/reader031/viewer/2022020920/56815d51550346895dcb59f8/html5/thumbnails/28.jpg)
THANK YOU