practical ( introduction to) reverse engineering

Click here to load reader

Upload: acacia

Post on 25-Feb-2016

65 views

Category:

Documents


5 download

DESCRIPTION

Practical ( Introduction to) Reverse Engineering. Julio Auto . Agenda. Part I - 101 Why this presentation? (I mean ... WHY?!?!) A few concepts ( Mumble jumble ++) Demo (Show me the goods ) Part II - 1337 Advancing RE (Do your own!) - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Julio Auto

Practical (Introduction to) Reverse EngineeringAgendaPart I - 101Why this presentation? (I mean... WHY?!?!)A few concepts (Mumble jumble++)Demo (Show me the goods)Part II - 1337Advancing RE (Do your own!)Something extra (Finish pretty er... almost )Linkz, lulz, refz, and shoutzQ & (maybe) AWhy?Suggested by the H2HC crewBased on my article Cracking CrackMes, published earlier this year while working for my previous employer, Scanit MERE is getting lots of attention, and many people seem interested in learning itStill, it remains largely a black artWhy? (2)It seems, then, that moving up from ground zero is the most problematic stepThis presentation tries to help fix itIt aims to expose instant useful knowledgeAnd pointers to where go digging deeperInstead of advanced research _results_, basic _techniques_ and _processes_Obs.: Well be targeting the Windows platform most of the time in this speechConceptsReverse Engineering is a very self-explicative termYou take something and, from there, try to learn how (some aspect of) it was engineeredIts also obviously broadFor example, its often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source codeMy Own ConceptThink of the times you asked yourself why and how and let it go without an answer............RE is not letting goA Few ApplicationsMalware AnalysisVulnerability AnalysisSecurity Assessment of 3rd-party COTSEvaluation/Breaking of copy-protection schemesAssorted hows and whysWhy Still a Black Art?Perhaps because people think its only good for SW cracking Perhaps because DRM has become a nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?)Perhaps because many people still think it should be illegal (wtf?!)How To LearnThe Crack-Me approachThe one I illustrate in the paper I mentionedSmall and targeted challenges with different levels and obstacles to choose fromThe real life approachChoose a real-world problem and attack itTough but rewardingWell demo a bit of bothTools of The TradeProbably millions of tools that can give you some useful piece of info about your targetIll try to restrict myself to the most relevant/common, thenUnfortunately, many of the best tools are commercialOn the other hand, many of them have free/student/evaluation versions For the rest... Well, remember the real life approach? ;)DebuggersObvious importanceFairly good varietyIts nice to play and know your way with all of themBut mastering them all is quite hard, so youll most likely elect your debugger of choice in little timeChoose your debugger well!Debuggers (2)WinDbgMy personal choice of debuggerDeveloped by MSFTComes for free in the Debugging Tools for Windows packageAmazingly rich in featuresExtensible with some C++ programmingNot the easiest or simplest dev environmentVery rich API, thoughPoor interfaceDebuggers (3)Visual Studio DebuggerIts crap, not suited for reversingBut its pretty and nice for developers :>Seriously, dont try to go very far reversing with itIt may use up the rest of your sanityDebuggers (4)OllyDbgEnjoys quite a lot of popularity in the reversing communityNice interfaceIn particular, a nice disassembly viewComes in a few tuned versions, being one of the most popular...Debuggers (5)Immunity DebuggerDeveloped by Immunity Inc. (someone from the dev team in the audience? )Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact withVery neat plugin supportEmbeds a command-line with windbg-aliased commandsMaintains a forum to support developers/users of ImmDbg pluginsDebuggers (6)gdbThe standard debugger on *NIX systemsQuite complete debuggerNot the best thing in the RE world, but overall a good debuggerDisassemblersReading assembly is not the sweetest thing for most peopleThe way the code is represented is extremely important and makes an increasingly great difference in big RCE tasksTherefore, being confortable with your disassembler is essentialDisassemblers (2)Pretty much every debugger is capable of disassemblingApart of that, theres lots of other tools that can do it tooIn Linux, objdump is pretty much a standard toolHowever, one particular tool is specially known for its disassembly featuresDisassemblers (3)IDA ProSupports many binary formats and architecturesDisplays the code in graphs, which greatly enhance the visualizationBlock-level CFGsMany things can be customized/adjustedGraph layout, data types, annotations...Quite frankly, its in every reversers toolkitIDA Pro is a commercial tool currently in version 5.3But version 4.9 is available in a free edition System Monitoring ToolsAll of those from the SysInternals SuiteProcess ExplorerRegMonFileMonTCPViewEtc...Advanced ToolsBinary DiffersBinDiffDecompilersHex-RaysRE FrameworksERESI ;)PaiMei and all the PyThings

DemoWell try and beat a crack-me challengeThis crack-me was taken from a real competitionHITB Dubai 2007 CTFPerhaps it can serve as a tip for H2HCs CTF as well RE Advanced TopicsCutting to the chase, advancing RE basically means automating stuffMany of the RE tools are scriptable/programmable/extensibleDeveloping smart ways to deal with repetitive tasks is the way for more effective analysesRE Advanced Topics (2)Less often, you might see opportunities to advance RE in ways not based on automationDefeating a new anti-debug trickDeveloping new environments for REVirtualization, Sandboxing...Or even radically changing paradigmsE.g. The graph-based approach to binary navigationRE Advanced Topics (3)Perhaps the most important lesson here is not to reinvent the wheelRe-use the tools you have!Youll be amazed at how much stuff you can do by glueing pieces togetherHaving that said...Perhaps the tools you have are not perfectOr you might wanna re-do something just for learningBut be sure to have the right goals in mind!Teaching By (Bad) ExamplesI wanted to do something really neat to show these concepts in practiceUnfortunately, I didnt manage to finish it in time The thing is currently under test/final touchesHowever the idea is so cool and in such a (relatively) advanced state, that I decided to talk about it anywayProblemSuppose you have ways to reproduce a high-profile, possibly exploitable bug Yay!BUT....The target is closed-source softwareThe target is as large and complex as an operating system and way less documentedThe input is huge and has a complex, possibly undisclosed formatThe source of the bug can be anywhere in the inputFrom user-input to actual bug/crash, about 3 million instructions happenWHAT DO YOU DO????Introducing LEPLEP tries to answer a big question in this problem:What exact part of this input is causing the bug?If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bugFor this, I have invented a new technique: Staged Partial Tracing-Based Backwards Taint AnalysisBecause not sounding like a Ph.D. is so 2001 :>And also because we all just love new terms we can go media-cuckoo aboutIntroducing LEP (2)One-liner idea: If we know when our input is brought to memory and know where its mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came fromWe do it in two stages, with a component for each: the tracer and the analyzerSimple, huh?Fundamental ConceptsWhen we trace the program, it becomes linear, i.e. control-flow is irrelevantDataflow becomes evidentAliasing is not an issue (in essence, it disappears)All info we need is available in runtimeIn particular, effective addressesIf the input is as big as the problem states, it should be no problem to find it in memoryWe get most of the info we need from the disassembly text (ASCII)! Its like hacking with grep again!LEP TracerA WinDbg extensionTraces every instruction until the program raises an exceptionDumps the following instruction info to a file:MnemonicDestination operandSource operandDependences of the source op e.g. mov eax,[ecx+edx*2]LEP Tracer (2)Discards control-flow changing instructionsDiscards in/out instructions (all relevant input should be in memory already?)Discards other groups of instructions that will be supported as we goFPU, MMX, SSE{2,3}, etc...This might be one of the reasons it currently doesnt workTries to parse the right info even when the debugger is too stupid to work as expected Why not to compute effective addresses in reped instructions?LEP AnalyzerReads the file generated by the tracer and goes bottom-up investigating the dataflowYou have to specify the piece of data that causes the last instruction to fail usually (always?) a registerAnd the memory range(s) where your input was mapped into, at the time the trace was takenIgnores register slices for simplicityal = ah = ax = eax = raxLEP Analyzer (2)When the source operand of a given instruction is an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destinationIf it overwrites, we finish the analysis for this branchmov eax, deadf0f0hElse if it transforms, we keep looking for another def of the same destination operandinc eaxThis gives a very special meaning for LEPs existenceOtherwise, searching for occurences of the faulting data inside the input could be just as effectiveLEP also tries to identify non-obvious constant overwritesxor eax, eaxEngineering Tech-TalkLEP was intended to be written entirely in PythonDidnt work for performance reasons LEP Tracer is written in C++, since its a WinDbg extensionIt makes use of a reference of the x86 instruction set written in XMLThe XML is mapped to C++ using CodeSynthesis XSD XML Data BindingLEP Analyzer was firstly written in PythonThen I also re-wrote it in C++LEP Analyzers search algorithm was initially a DFSThen I implemented it as a BFSDemo IIPlaceholder slide :>LEP ReleaseAs much as I like to make my software free and wide open, I have chosen not to release LEP to the public for nowInstead, Im willing to share it with whoever contacts me directly (by e-mail, for example)Basically, I just wanna know who is using it and what its gonna be used forMakes no difference if you come from Wall Street or from an underground cracking ghetto drop me a lineLinkz & RefzCracking CrackMeshttp://www.scanit.net/rd/wp/wp04X86 Opcode and Instruction Reference, by MazeGenhttp://ref.x86asm.net/CodeSynthesis XSD XML Data Binding for C++http://www.codesynthesis.com/products/xsd/Thousands of elite RE projectshttp://www.google.com Seriously though, contact me if you cant find anything39Greetz & ShoutzFilipe Balestra for lending me the bug used in the 2nd demoH2HC crew for letting me ruin their conference againThe ERESI team, with whom I have most of my discussions about RE, programa analysis, etcAll of the great people that I know from the security sceneIts simply impossible to mention each and everyone of you, but you know who you are!Questions?

Julio Auto

Practical (Introduction to) Reverse Engineering