proceedings of the general track: 2003 usenix annual ...manish prasad and tzi-cker chiueh suny...

15
USENIX Association Proceedings of the General Track: 2003 USENIX Annual Technical Conference San Antonio, Texas, USA June 9-14, 2003 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2003 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

USENIX Association

Proceedings of theGeneral Track:

2003 USENIX AnnualTechnical Conference

San Antonio, Texas, USAJune 9-14, 2003

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

© 2003 by The USENIX Association All Rights Reserved For more information about the USENIX Association:

Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.orgRights to individual papers remain with the author or the author's employer.

Permission is granted for noncommercial reproduction of the work for educational or research purposes.This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

Page 2: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 211

A Binary Rewriting Defense against Stack based Buffer Overflow Attacks

Manish Prasad and Tzi-cker ChiuehSUNY Stony Brook

{mprasad,chiueh}@cs.sunysb.edu

AbstractBuffer overflow attack is the most common and ar-guably the most dangerous attack method used in Inter-net security breach incidents reported in the public lit-erature. Various solutions have been developed to ad-dress the buffer overflow vulnerability problem in bothresearch and commercial communities. Almost all thesolutions that provide adequate protection against bufferoverflow attacks are implemented as compiler exten-sions and hence require the source code of the pro-grams being protected to be available so that they canbe re-compiled. While this requirement is reasonable inmany cases, there are scenarios in which it is not fea-sible, e.g., legacy applications that are purchased froman outside vendor. The work reported in this paper ex-plores application of static binary translation to protectInternet software from buffer overflow attacks. Specif-ically, we use a binary rewriting approach to augmentexisting Win32/Intel Portable Executable (PE) binaryprograms with a return address defense (RAD) mech-anism [1], which protects the integrity of the return ad-dress on the stack with a redundant copy. This paperpresents the disassembly and instrumentation issues in-volved in static binary translation, how our tool achievessatisfactory disassembly precision in the presence of in-direct branches, position-independent code sequences,hand crafted assembly code and arbitrary code/data mix-ing, and how it ensures safe binary instrumentation inmost practical cases. The paper reports our experienceswith this approach, based on results of applying the re-sulting prototype to rewriting several commercial gradeWindows applications (Ftp server, Telnet Server, DNSserver, DHCP server, Outlook Express, MS FrontPage,MS Publisher, Telnet, Ftp, Winhlp, Notepad, CL com-piler, MS NetMeeting, MS PowerPoint, MS Access,etc.), as well as experimentation with published bufferoverflow exploits.

1 IntroductionBuffer overflow attacks exploit a particular type of pro-gram weakness: lack of array/buffer bound check in thecompiler or in the applications. Accordingly, the idealsolution to the buffer overflow vulnerability problem isto build a bound checking mechanism into the compiler,or to require applications to strictly follow a program-

GrowthStack

GrowthBuffer

copy)(direction of

High Address

Low Address

Previous Frame Pointer

(pushed by CALL inst)Return Address

(pushed by callee)

Local Variables

Local Buffer

copy)

Other Local Variables

Function Parameters(pushed by caller)

*

*

*

*

(target of unbounded *

* Could be potentially overwritten by a stack

based buffer overflow

Figure 1: The typical stack layout of a function when itis called, and how some of the stack entries, including thereturn address could be corrupted by an unsafe copy oper-ation.

ming guideline that checks the bound of an array/bufferupon each access. Neither solution is considered practi-cal at this point. A more promising approach is to trans-form a given application into a form that is immune frombuffer overflow attack without requiring any modifica-tion to the compiler or the application itself.

To understand a buffer overflow attack, consider a typ-ical stack layout when a function is called, as shown inFigure 1. Lack of bound checking during a buffer copyoperation causes areas adjacent to the buffer (as shownby * in Figure 1) to be overwritten. A generic bufferoverflow attack [2] involves exploiting such an unsafecopy to overwrite the return address on the stack withthe address of a piece of malicious code, which is in-jected by the attacker and most likely reside also on thestack; when theRET instruction (which pops the returnaddress from the stack) in the victim function is exe-cuted, program control is transferred to the injected ma-licious code.

A less common form of buffer overflow attack in-volves corrupting memory pointer variables on the stackinstead of return addresses [18]. The requirements forsuch an attack to occur are:

Page 3: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association212

1. A pointer variablep that is physically located onthe stack after the buffera[] , to be overflowed,

2. An overflow bug that allows overwriting thispointerp by overflowinga[] (taking user-specifieddata as source), usually with the address of a GlobalOffset Table (GOT) entry, which contains the ad-dress of a dynamic library function,

3. A copy function such asstr(n)cpy/memcpywhich takes*p as the destination and user-specifieddata as the source, withoutp being modified be-tween overflow and copy,

4. A call to a common library function (likeprintf), the GOT entry of which is to be over-written.

So leveraging the unsafe copy, the attacker overflowsa[] , thus overwritingp with the address of a GOT entryof a common library function, sayprintf(). By pro-viding the address of the exploit code as input to the safecopy taking*p as the destination, the attacker has man-aged to corrupt the GOT entry ofprintf(). So anysubsequent call toprintf() would transfer control tothe exploit code.

Another form of buffer overflow attack overwrites upto the old base pointer [19] on the stack with the addressof malicious code, so that when the caller function re-turns, the control is transferred to the exploit code.

A seemingly non-stop stream of buffer overflow at-tacks [3, 4] has called for an effective and practical so-lution to protect application programs against such at-tacks on the Windows platform. Past approaches to pro-tecting programs against buffer overflow vulnerabilitiesrelied on compiler extensions, either to perform arraybounds check or to prevent the return address on thestack from being overwritten. Although fairly success-ful in preventing most conventional buffer overflow at-tacks [2], these approaches require access to programsource code. All known systems that take such an ap-proach require the availability of the protected appli-cation’s source code. While integrating software pro-tection into a compiler is technically desirable, this ap-proach exhibits several practical limitations. First, re-quiring access to source code makes it difficult to protectthird-party legacy applications whose source code is un-available for various reasons. Second, because modernsoftware applications tend to be built on third-party li-braries, access to the source code of these libraries againis unlikely. Finally, for those program segments thatare written in assembly code directly, their high-level-language source code that is amenable to compiler anal-ysis simply does not exist. The goal of this paper to de-scribe our experiences and the extent of success that wehave achieved in applying a combination of well knowndisassembly techniques to implement a binary rewritingsolution that aims to provide the same level of protection

as its compiler-based counterpartThere are two major technical challenges in applying

binary rewriting to the buffer overflow attack problem.First, to determine where to insert protection instruc-tions, the boundary of each function in an input programneeds to be clearly identified, which in turns requires anaccurate disassembler that can correctly decode each in-struction in an executable binary. Unfortunately, 100%disassembly accuracy is difficult because the problem ofdistinguishing code from data embedded into code re-gions is fundamentally undecidable. Second, even if thefunction boundaries are successfully identified, insert-ing protection code into a given binary without disturb-ing the addresses used in its existing instructions is itselfnon-trivial. The main problem here is that in many casesit is possible that the binary does not have enough sparespace to hold a jump instruction to the inserted protec-tion code, let alone to hold the protection code itself.

Section 2 surveys related work in the areas of staticbinary translation and buffer overflow defense. Section3 discusses the design and implementation of our dis-assembly engine and the binary instrumentation issueswith emphasis on the approaches we employ to ensureprogram safety and to preserve the semantics of inputprograms. Section 4 details the software architectureand implementation of the binary rewriting RAD pro-totype. Section 5 presents experimental results on theprototype’s resistance to attacks, ability to preserve thesemantics of applications, space cost and performanceoverheads. Finally, section 6 summarizes the main re-sults of this work and charts out directions for future re-search.

2 Related WorkAmong past efforts in binary rewriting, ATOM [5] andEEL [6] run on RISC architectures, where the disas-sembly problem is simplified due to uniform instructionsize. Etch [7] is a tool for rewriting Win32/Intel PE ex-ecutables primarily for optimization. LEEL [8] workson Linux/x86 binaries, albeit with limitations with re-spect to control flow analysis in presence of indirectcontrol transfer instructions and arbitrary code/data mix-ing. UQBT [9] is an architecture independent static bi-nary translation framework for migrating legacy applica-tions across processor architectures. Galen Hunt’s De-tours [24], is a system for run-time binary interceptionof Win32 functions.

Most buffer overflow defense proposals involvecompile-time analysis and transformation. Stackguard[10] and Microsoft Compiler Extension [11] place ’ca-nary words’ on the stack between the local variables andreturn address at the function prologue, and monitors thereturn address on the stack by checking the integrity ofthe ’canary word’, at the epilogue. Both are vulnerable

Page 4: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 213

to attacks based on corrupting old frame pointers [19] onthe stack or local pointer variables [18]. Stackshield [12]and RAD [1] save a copy of the return address at the pro-logue and compare it with the return address on the stackat the epilogue. Our binary rewriting implementation isbased on this model of buffer overflow defense. Both areresilient to frame pointer based attacks [19] but vulner-able to memory pointer corruption [18] attacks. IBM’sgcc extension [14] also does local variable reordering,placing pointer variables at lower addresses than buffersin addition to the above, offering some protection againstmemory pointer attacks [18], unless the unsafe copy isfrom higher to lower indices of the array. CASH [15]and others [16] perform array bounds check to preventoverflow of buffers. CASH achieves significant over-head reduction (4% overhead) by exploiting Intel seg-mentation hardware as compared to the others in thiscategory, which typically incur very high overhead (70%to 140%).

Other proposed approaches to protect programs frombuffer overflow attacks rely on run-time interception andchecking. Lucent Bell Labs’ Libsafe [20] intercepts un-safe library calls at run-time and performs bounds check-ing on the arguments e.g. for strcpy(), it would check thelength of the source string and check it against the upperbound on the length of the destination string based onthe current frame pointer value. Although it preventsthe return address from being modified, it is possibleto corrupt local pointer variables. Libverify [20] per-forms dynamic binary translation to perform return ad-dress check. However, we suspect that Libverify mightincur high overhead, since it adds checking code andperforms code instrumentation at run-time. A very re-cent example of applying optimized run-time interpreta-tion to security problems is program shepherding [21],which is built on top of a dynamic optimization frame-work called RIO. Apart from offering advantages likecomplete transparency, it achieves significant overheadreduction as compared to what one would expect froman interpretation/emulation system using a variety of op-timization techniques viz. traces and interpreted codecaching. Another example of dynamic binary transla-tion (applied to parallel computing) is Paradyn [27]. JunXu et. al [28] suggest a hardware-based solution againstbuffer overflow attacks without requiring access to pro-gram source code.

To the best of our knowledge, the binary-rewritingRAD system described in this paper is the first attemptthat employs static binary translation to protect existingbinaries against buffer overflow attacks without requir-ing access to program source code, symbol tables or re-location information.

3 Binary-Rewriting Return Address De-fense

A successful binary rewriting RAD system requiresidentifying the boundary of every procedure in the in-put program and inserting a protection instruction se-quence into every procedure without disturbing the in-put binary’s internal referencing structure. The follow-ing two subsections discuss in more detail these two is-sues and their associated solutions.

3.1 Binary Disassembly

3.1.1 Disassembly ChallengesTo accurately locate the procedure boundary, one needsto identify each instruction in the binary through a dis-assembler. There are two main classes of disassemblyalgorithms [22]. Alinear sweepalgorithm starts withthe first byte in the code section and proceeds by decod-ing each byte, including any intermediate data byte, ascode, until an illegal instruction is encountered. Arecur-sive traversalalgorithm starts at the program’s main en-try point and proceeds by following each branch instruc-tion encountered in a depth-first or breadth-first manner,essentially a control flow analysis. Neither approach is100% precise. The chief impediments to accurate disas-sembly are:

1. Data embedded in the code regions,2. Variable instruction size,3. Indirect branch instructions,4. Functions without explicitCALL sites within the

executable’s code segment,5. Position independent code (PIC) sequences, and6. Hand crafted assembly code.

1) and 2) render the linear sweep algorithm less effec-tive than ideal, whereas 3), 4) and 5) degrade the efficacyof the control flow analysis used in the recursive traver-sal algorithm.

Distinguishing code from data in a binary file is afundamentally undecidable problem. Because the lin-ear sweep algorithm decodes each byte as code as longas it looks like a legitimate code byte, it ends up inter-preting many data bytes as instructions. The reason forthis behavior is that, in the Intel x86 instruction set, 248out of 256 possibilities can be a legitimate starting bytefor an instruction, making it more likely to mistake datafor instruction. The fact that the Intel x86 instructionset allows variable instruction size further aggravates theproblem of code/data distinction. Consider the follow-ing example sequence of bytes:

0x0F 0x85 0xC0 0x0F 0x85 .....

If we consider0x0F as a code byte then we’ll end upwith the following disassembly:

jne offset

Page 5: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association214

On the other hand if we consider 0x0F as a data byteand0x85 as a code byte, then we get something like:

0x0F // datatest eax, eaxjne offset

Thus a single disassembly error could result in manysubsequent bytes being interpreted incorrectly, withthe extent of error potentially unbounded. In con-trast, a fixed-instruction-size architecture exhibits a self-correcting property: an interpretation error for one in-struction word does not propagate to the next instructionword.

The recursive traversal algorithms cannot obtain100% accurate disassembly results, either, because it isdifficult to construct a complete control flow of an in-put binary in the presence of indirect branch instruc-tions such ascall/jmp reg32 (e.g.call eax) orcall/jmp m32 (e.g. jmp dword[esp + xx]).One solution to this problem is to perform additionaldata flow analysis such as inter-procedural slicing and/orconstant propagation [23] to figure out at compile timethe value of the register or memory location used in indi-rect control transfer instructions. Apart from being dif-ficult to implement, such an approach tends to greatlyincrease the disassembly time and itself does not guar-antee 100% accuracy.

Procedures for which no explicit call sites in the in-put program can be identified include exception or signalhandlers, callback functions, which is rife in GUI appli-cations, and procedures all calls to which are throughindirect branch instructions. Because there is no identi-fiable call to these functions, they cannot be discoveredthrough control flow analysis, and as a result may bemisclassified as data. In practice, signal/exception han-dlers pose few problems because their entry points areincluded in the program header in some cases.

The addressing in position independent code (PIC)does not rely on any particular position in the program’saddress space. Thus PIC code and jump table never haveabsolute address references. Instead the references arein the form of offsets with respect to a base value that isknown at run time, mostly through theeip value. Forexample,

10: call 1015:

:25: pop eax // gets the return addr

// value 15 into eax26: call dword[eax + 20] // call foo

::

35: // foo

In this case, in spite of having explicitCALL sites,standard control flow analysis cannot discover the targetlocation of the functionfoo().

Hand crafted assembly code makes it difficult to iden-tify procedure boundaries because they do not necessar-ily follow the code conventions established by standardcompilers. These conventions provide useful hints to re-solve potential ambiguities. As an example of code con-vention violation, some assembly code programs jumpfrom one function into another function without goingthrough the latter’s main entry point.

3.1.2 Disassembly Engine ImplementationOur disassembly engine is built on the x86 instructionset parsing and disassembly capabilities of an existingdisassembler [13]. We use a combination of well-knowndisassembly techniques, viz. recursive traversal and lin-ear sweep (described briefly in the previous subsection3.1.1) and complement them with compiler-independentpattern matching heuristics. We assume that the dataexpected in a code section are typically dispatch tables(address bytes), strings and compiler alignment bytes.Since the goal of this project is to insert protection codeinto every procedure of the input binary, we should iden-tify as many code bytes as possible; otherwise the trans-formed binary may have security holes. However, weplace maintenance of original program semantics at ahigher priority than security, so whenever in doubt wemark bytes as data instead of code, thus avoiding unsafebinary instrumentation. The following is a step-by-stepdescription of the disassembly process:

1. Identify potential address bytes for dispatch tablediscovery and strings. Dispatch tables typicallycontain code section addresses. Since we knowthe address range of the code section, we can markany sequence of 4 bytes, which have a value thatlies within this address range as ’potential address’bytes. Sequences of printable characters that havea certain minimum length and are terminated by anull character are marked as ’potential strings.’

2. Starting from the program’s main entry point,which is obtained from the input binary’s PE header[17], we perform a control flow analysis on the bi-nary to traverse the paths of the program’s controlflow graph. All code bytes identified in this step aremarked as ’definitely code’ and all associated databytes marked as ’definitely data’. We also iden-tify targets ofCALL instructions as function entrypoints and targets of conditional and unconditionaljump instructions as jump targets. Since this stepcan distinguish data from code with 100% accu-racy, it overrides analysis results from other stepswhenever there is a conflict. For example, the fol-lowing byte sequence will be identified as acallinstruction because the result from Step (2) over-

Page 6: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 215

rides that of Step (1).

Identified as instruction’call dword[0x30001344]’ in Step (2)<--------------------------->0xFF 0x15 0x44 0x13 0x00 0x30

<----------------->Identified as ’potentialaddress’ (0x30001344) inStep (1)

3. To identify the entry points of potential callbackfunctions, for which there are no explicit call sites,we look for instruction sequences such as:

push imm32mov reg32, imm32

Typically the target address of a callback func-tion is usually passed as an argument to some func-tion, with which the callback function is regis-tered. Such an argument could be passed throughthe stack as an immediate value (push imm32)or through a register, which contains the addressvalue (mov reg32, imm32; push reg32). Ifthe byte atimm32 has not been identified as a ’po-tential address’ or a ’potential string’ in Step (1),and if it looks like a legitimate instruction startingbyte, we consider it as a function entry point (al-though despite being a legitimate code byte it maynot actually be a function entry point) and proceeddisassembling the subsequent bytes as instructions.

4. To identify other types of functions for which thereare no explicit call sites, we next look for bytes inthe code section that have not been identified ascode or data yet. Every time such a byte is located,we start instruction parsing if it looks like a legit-imate instruction starting byte. In both Steps (3)and (4), the point where such instruction parsingbegins is called a ’reset point’. Instruction parsingcontinues until an unconditional branch instruction(ret or jmp) is encountered. If the result of aninstruction parsing procedure is inconsistent withany previously identified byte or leads to an illegalinstruction byte, the result since the reset point isrevoked and all the bytes from the ’reset point’ tothe current position are marked as data, thus avoid-ing any potentially unsafe binary rewriting. Afteran unconditional branch we look for the next suit-able ’reset point’ to start the next instruction parsingattempt.

5. Because any sequence of instruction bytes shouldend with an unconditional branch instruction (jmporret), we look for code sequences that end with-out an unconditional branch (jmp or ret) instruc-tion and mark such code sequences as data. Thischeck provides a final line of defense to eliminateany potentially incorrectly identified instruction se-

quences. For example, in the following code se-quence, Bytes 1 to 3 will be marked as data.

1: mov eax, ebx3: push eax4: data

Also, the byte next to an unconditional branchhas to be either a data byte or if it is a code byte,it must be a branch target (as the previous instruc-tion, being an unconditional branch, doesn’t fallthrough). Therefore, in the case that the byte next toan unconditional branch is a code byte and has notbeen marked as a function entry point or a jumptarget, we mark it as a function entry point, eventhough they could just as well be targets of a branchinstruction inside the same function.

The motivation behind this ”optimistic” identifi-cation of functions, as seen in steps 3) and 5) willbe explained in subsection 4.3.

3.2 Binary InstrumentationBecause it is not always possible to derive the high-levelcontrol flow of an input binary, the process of insertingadditional code to counter buffer overflow attacks mustproceed in a way that does not disturb the memory refer-ences used in the instructions of the binary program thatis to be protected.

3.2.1 Where to Insert RAD CodeThe additional code required by RAD [1] involves

� Saving a copy of the return address on the stack inthe return address repository (RAR) at the functionprologue, and

� Checking the return address on the stack with thesaved copy in the RAR at the function epilogue,popping it off the RAR in the event we have amatch, or flag an exception otherwise.

Instead of adding function prologues and epilogues toevery function, we choose to do so only for ’interesting’functions, which are functions that contain a sequence ofinstructions for stack frame allocation and deallocationfor local variables. A function without local variablescould never be vulnerable to a stack based buffer over-flow.

3.2.2 How to Insert RAD CodeSo as to not disturb the original binary’s address space,we choose to create a separate new code section, notpresent in the original PE binary (information regard-ing the PE format is in [25]), appended to the end ofthe original binary to hold the additional prologue andepilogue code for each function. Moreover this new sec-tion, mapped to a non-interfering portion of the addressspace, will be set as read-only. Thus neither the RAD

Page 7: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association216

code is corrupted by the application nor is the applica-tion corrupted by the RAD code. To redirect control tothe inserted code at a function’s prologue and epilogue,we need to replace some instructions at the function pro-logue and epilogue with aJMP to the correspondingRAD code. When such an instrumented function is in-voked, theJMP instruction, which replaces the prologue,transfers control first to the RAD prologue code, then ex-ecutes the original prologue instructions and then jumpsback to the original function to continue execution fromthe instruction immediately after the original functionprologue. Epilogue instructions are replaced in a sim-ilar manner. However, the execution proceeds first witha JMP to the epilogue code in the new section, first ex-ecuting the original epilogue instructions until theRET,then the RAD epilogue checking code and then returnif there are no problems. Because the size of an uncon-ditional JMP instruction is 5 bytes, we need at least 5bytes worth of instruction space to accommodate aJMPinstruction. Instructions that are target of existing branchinstructions cannot be replaced.

A function prologue, which needs to allocate stackspace for local variables, typically comprises 3 instruc-tions :

1. push ebp // save old frame ptr// (1 byte instruction)

2. mov ebp, esp // set the top of// the stack as the// current frame ptr// (2 byte inst)

3. sub esp, x // allocate x bytes on// the stack for local// variables (3 to 6// byte instruction)

oradd esp, -x

Alternatively it could also be done using theENTERinstruction, however most compilers do not useENTERfor stack frame allocation. Thus, an ’interesting’ func-tion prologue includes at least 6 bytes worth of instruc-tions. Hence, we can comfortably instrument an ’inter-esting’ function prologue to redirect control to the RADprologue code using a 5-byteJMP instruction. On theother hand, a typical stack frame deallocation instructionsequence looks like one of the following three cases:

1. add esp, x // dealloc. stack// space, x bytes// were allocated// (3-6 byte inst)

pop ebp // restore caller’s// frame ptr (1 byte)

ret // return (1 byte)2. mov esp, ebp // dealloc. stack

// space, any// number of bytes

// allocated on the// stack (2 byte

// instruction)pop ebp // restore caller’s

// frame ptr (1// byte)

ret // return (1 byte)3. leave // dealloc. stack

// frame & restores// old frame ptr

// (1 byte)ret // return (1 byte)

From 2) and 3), we see that stack frame deallocationcould be done with 2 to 4 bytes worth of instructions. Sowe need to replace some more instructions in addition tothe stack frame deallocation instructions to hold aJMPinstruction. In most cases, we do find enough space thisway. However, it is possible that the first instruction ofthe stack frame deallocation sequence is a jump target,e.g.:

jne x:

x: leaveret

In this case, if we replace instructions prior toLEAVE,then the jump targetx would be disturbed. From ourexperiences, the scenario of not being able to find 5 bytesworth of instructions at a function’s epilogue does occurin practice but is relatively rare. For such a situation tooccur in practice, two conditions need to be met:

a) Most development environments on Windows, bydefault, set certain compilation options which gen-erate calls to stack checking code, prior to stackframe deallocation, to check for adherence to cer-tain calling conventions (which basically dictatecaller and callee duties as regards function frameinitialization and cleanup). Calling convention ad-herence check is desirable because of functions be-ing called using function pointers and calls to li-brary functions. If we disable these options thecompiler won’t generate these stack checking callsand thus will not generate extra bytes prior to stackframe deallocation.

b) There should be a high level code sequence like:

goto label;::

label:return;

So in such rare scenarios (our experiments show typi-cally 0.03 - 3% of all functions, sec. 5.2.2, table 9), weuse a simple although expensive approach to solve thisproblem. When not enough instructions are available,we replace the first byte of the instruction prior toret

Page 8: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 217

Page 9: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association218

entry in the IAT, at run time, contains the address ofan imported function and the on-disk binary file. Foreach IAT entry there is a corresponding entry in the INT,which points to the name of the corresponding importedfunction. At load time, the loader overwrites the IATentries with the virtual addresses where the correspond-ing functions are mapped. To locate the entry point ofVirtualProtect(), we look up the INT by its name and re-trieve the index of the associated IAT entry if there isa match. If VirtualProtect() is already imported by theinput program, then its entry point is readily availablefrom its IAT entry. If there is a match in the INT butthe corresponding IAT entry is empty, then we need todynamically resolve the entry point of VirtualProtect()by calling GetModuleHandle(), which gets the base ad-dress of the DLL containing the API function in ques-tion, and then calling GetProcAddress(), which gets theaddress of the desired API function from the export di-rectory of its containing DLL. The entry points of Get-ModuleHandle() and GetProcAddress() themselves areobtained from the IAT through their containing DLL,kernel32.dll. In the case that none of these API func-tions are themselves imported, which is quite rare, thenthe PE component needs to emulate the operation of Get-ModuleHandle() and GetProcAddress(). This emulationidea is derived from an undocumented virus code andis based on the following observation: At the programentry point, the top of the stack contains a return ad-dress, which points to somewhere inside the functionCreateProcess(), which in turn belongs to the DLL ker-nel32.dll. With this address, one can scan through thememory until where kernel32.dll is mapped is found.Once the base address of kernel32.dll is known, the entrypoints of GetModuleHandle() and GetProcAddress() areavailable through the DLL’s export table, and in turn theentry point of any API function can be identified throughthese two functions.

Finally, the entry point of the executable in the PEheader should be changed to the new initialization code,which locates the entry point of VirtualProtect(), andcalls it to protect two pages surrounding the RAR.

4.3 Limitations

4.3.1 Security Weaknesses Due to Disas-sembly Limitations

Two aspects of disassembly that relate to sources of falsesecurity alarms and security loopholes are:

� Functions Missed by our Disassembly Engine(False Negatives)

� Falsely Identified Functions (False Positives)

Let’s look at each of these aspects and evaluate whenand how these could result in false alarms or missedattacks. There is a trade-off between security and

program correctness. While we make every effort toseal all known security holes, we consider preservingoriginal program semantics as a more important goalthan attempting perfect security.

False NegativesThese (if any) are mainly callback functions (withoutan explicitCALL within the code section) and/or func-tions invoked using Position-Independent Code (PIC)sequences (wherein there would be no absolute addressreferences to a function within the code section). Theseare not covered by pure control flow analysis. Despitethis, typically, only a certain fraction of such functionsget missed out. The following ”representative” scenariosshould make this clear. Functions missed by the controlflow analysis step could be:

a) partly/fully misidentified as datab) identified fully as code

If the start or the end of a function is misidentified asdata, then we might miss out on an interesting prologueor epilogue respectively. Either cases result in an un-protected return, which might turn out to be a securityloophole, if that particular function has a buffer over-flow vulnerability. Ditto is the case when a function isfully identified as data. If a piece of code somewherein the middle of a function is misidentified as data, thenthe function is misidentified as data, then the function isdivided into two, and hence all returns in this functionbeyond this dividing point would be treated as a part ofan uninteresting function, and hence are left uninstru-mented and could miss an attack.

A function with its body fully identified as code, couldstill be missed out during control flow analysis and havetheir unidentified entry points preceded either by dataor an unconditional branch instruction from the previ-ous function. In either of the cases, we would indeedmark the function entry point (last step (5) of disassem-bly engine sec. 3.1.2). When data preceding the entrypoint of such a function aligns properly with the codebytes to form a legitimate instruction sequence, an origi-nally interesting prologue could become uninteresting,thus exposing an attack opportunity. In all the casespresented so far, however, program semantics are notjeopardized. But if data misidentified as code turns anuninteresting function prologue to an interesting one, itmight generate a false alarm, if the epilogue happens tobe interesting. Another false alarm scenario is if thefunction entry point is preceded by some data and thefirst identified code byte happens to be a jump target(happens with inter-procedural jumps), in which casethe two functions get merged into one. However, inter-procedural jumps occur only in handcrafted assembly oras insetjmp()/longjmp() cases.

Page 10: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 219

Apart from functions, jump targets reached by PICjump tables could be missed. This could affect pro-gram correctness, if these targets happen to be withininstrumented prologues or epilogues, a very unlikelyscenario, though.

False PositivesFunctions with multiple entry points are treated as twoseparate functions. Targets of PIC jump tables, whichcannot be discovered statically could get marked asfunction entry points, if they lie immediately after an un-conditional branch or a sequence of data bytes (last step(5) of disassembly engine sec. 3.1.2). Code section ad-dresses which appear as immediate (imm32) operandsto mov r32, imm32 or push imm32, could beidentified as function entry points even if they are tar-gets of an indirect jump (non-PIC jump table targets are,however, treated specially and identified).

Function boundary identification helps prevent sce-narios where the prologue is instrumented, but the epi-logue is not and vice versa. Since the latter case couldcause false alarms (since the epilogue checking codewould be trying to find a match for the return addresson the stack, but since it was never saved (no pro-logue instrumentation code), it won’t find it in the Re-turn Address Repository (RAR), thus flagging a falseexception). We want to avoid that altogether, whichcan be achieved by ”optimistic” identification of func-tions. This false identification, however could result in afunction having an instrumented prologue, but an unin-strumented epilogue. Such a function, if called too fre-quently in a manner that it exits from an uninstrumentedepilogue, then the RAR will eventually overflow, sincethere is no code to pop the return address off the RAR.Another potential problem due to false identification ismissed attacks. If false identification causes an ”entrypoint” to be inserted within a function body then the sin-gle function gets divided into two. Here the ”second”function won’t have an interesting prologue, hence allsubsequent returns in this function will be missed. How-ever, false identification of functions never jeopardizesprogram correctness unless, of course, an entire chunkof data misidentified as code forms a function, with bothinteresting prologue and epilogue, which is an extremelyunlikely scenario.

In summary, PIC, indirect branches and callback func-tions could cause some security loopholes in the inputprograms to be un-protected. Empirical results show thatindirect branches typically are 5-8 % of all branch in-structions (Section 5.2.1, Table 5). Only a fraction ofthis (if at all any) could possibly result in a missed at-tack.

As for false alarms, they could arise due to handcrafted assembly code, mostly with inter-procedural

jumps and/or entry and exit points in different functions.An example of such a case was observed when we in-strumented Microsoft Access. Here, the control jumpsfrom Fn1 to label, which is inFn2 and exits fromFn2.

Fn1: // no ’interesting’ prologue:

jne label:

ret // no ’interesting’ epilogue

Fn2: // ’interesting prologue’:

label::

ret // ’interesting’ epilogue

Fn1 has an uninstrumented prologue, so its return ad-dress is not saved in the RAR andFn2’s epilogue is in-strumented, so a return address check is done on exitfrom Fn2. The RAD epilogue of Fn2 will flag an ex-ception, since it cannot find the on-stack return addressin the RAR, thus a false alarm.

Other recipes for false alarms include data misiden-tified as code which looks exactly like an interestingprologue, or an entire chunk of data which appears likean interesting function, both of which are rather uncom-mon.

4.3.2 Potential Attacks Due to Limitationsof RAD

As in RAD [1], the current binary-rewriting RAD pro-totype can protect applications from any kind of bufferoverflow attack that corrupts the return address on thestack. Thus it can resist conventional stack smashing at-tacks and frame pointer based attacks [19]. However, itcannot prevent memory pointer corruption attacks [18],which do not affect the return address in any way. Theysimply modify the contents of the import table (GlobalOffset Table - GOT or Import Address Table - IAT),which makes it impossible for RAD to detect them. For-tunately, no actual network security breach incidents thatare based on this type of attacks have been reported.

4.3.3 Multi-Threaded ApplicationsThe current implementation doesn’t handle multi-threaded applications. An idea to implement the solutionfor multi-threaded applications, comes from [26]. Wecan access the Thread Information Block (TIB) structureusing the FS segment register. Code generated by com-pilers to set up exception handlers and to allocate storagefor thread local variables, typically reveal this use of theFS register. The TIB contains an array of slots for threadlocal storage. What we could do is have a separate RARspace for each thread (taking care that RAR spaces of

Page 11: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association220

Step SizeReturn Address Repository 16 KbytesException Handler 130 bytesInstalling Exception Handler 19 bytesSet up RAD mine-zones 55 bytesSearch for VirtualProtect() 371 bytesTotal 16.2 - 16.6 KBytes

Table 1: The constant space overhead of binary-rewritingRAD. The last row corresponds to the step that searches thekernel32.dll export table for the entry point of VirtualProtect().

two threads don’t bump in to each other), and store theaddress of the RAR in one of the thread local storageslots, which can be used by the RAD prologue and epi-logue code, to figure out which RAR to work with. How-ever, the use of the FS register, although a well-knownfact in the Windows world, still falls into the categoryof undocumented information. There would probablybe Win32 API functions, which do something like this,however the cost of invoking an API call at every RADprologue and epilogue would be prohibitive.

4.3.4 Self-Modifying CodeSelf-modifying code, like those missed functions due toindirect branches, makes control flow analysis difficult.Moreover, if a piece of code is added only at run-time tothe heap, there is no way RAD can add checks to it.

5 Experimental ResultsTo validate the correctness of the binary-rewriting RADprototype, we need to verify that the RAD code is in-jected into appropriate places in the input binary ANDthe RAD code does protect the input binary from bufferoverflow attacks in a way that does not incur significantspace overhead or run-time performance cost. In the fol-lowing subsections, we present results that show that thecurrent binary-rewriting RAD prototype does do a rea-sonable job in disassembly accuracy and low-overheadprotection against buffer overflow attacks.

5.1 Micro-Benchmark ResultsTo establish the baseline performance for the binary-rewriting RAD prototype, we apply it to a set of syn-thetic programs and measure its space and performanceoverhead. Table 1 shows the constant space overheadassociated with binary-rewriting RAD, which excludesthe per-function prologue and epilogue RAD code. Ev-ery instrumented function needs a prologue and epiloguechecking code, which take 38 and 41 bytes, respectively.

We then measure the performance overhead of an in-strumented function due to its prologue and epilogueRAD code. Depending on whether an epilogue RADcode is triggered by a jump instruction or by a software

exception handler, the measured performance overheadis different. We tested three different instrumented func-tions:

� void fn() that does nothing and invokes prologueand epilogue RAD code through a jump instruction

� void fn() that does nothing and invokes prologueRAD code through a jump and epilogue RAD codethrough a software exception

� void fn() that does some amount of computation(incrementing a variable 25000 times), withoutmaking any other function calls

The performance penalty of the binary-rewriting RADprototype is defined as:�������������� ����

�������������, and

was measured using the Pentium performance counter,which has a resolution of 2 nsec.

For the do-nothing test function case, the over-head of RAD is 34.25%, which is higher than ex-pected, considering that both the prolog and epi-logue RAD code size is just about 9 to 11 instruc-tions, each of which is such a simple instructionas ’push reg32’, ’ pop reg32’, ’ mov reg32,mem32’, ’ cmp reg32, mem32’, ’ add mem32,imm32’ ’ sub reg32, imm32’ etc., none of whichappear to be costly. We believe this performance over-head is due to additional instruction cache misses thatarise because the code region of the test function is sep-arate from that of its prolog and epilogue RAD code. Ifa function’s epilogue does not contain enough space tohold a jump instruction, the epilogue RAD code is im-plemented inside an exception handler. The additionalperformance overhead due to exception delivery and re-turn, as compared to two jump instructions, is almostquadrupled, as shown in the second test function caseof Table 2.When the test function is doing somethingmore computation-intensive, as in the third test functioncase, the relative performance overhead of RAD imme-diately becomes negligible. Compared with the origi-nal RAD system [1], which works on source code only,binary-rewriting RAD performs better in all three cases,because its prolog and epilogue code is implemented inassembly and is thus more efficient. This result is some-what surprising as the original RAD system places per-function prolog and epilogue code together with the as-sociated function, rather than in a separate code region,and therefore does not incur additional instruction cachemiss penalty.

5.2 Macro-Benchmark Results

We experimented with a wide variety of commercialgrade Windows applications, including BIND DNSserver, DHCP server, a third-party FTP server, MicrosoftTelnet Server, MS FrontPage, MS Publisher, MS Pow-erPoint, MS Access, Outlook Express, CL compiler,

Page 12: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 221

Test Function Cycle Counts - Original Cycle Counts - RAD Relative PenaltyNull function 292 392 34.25%Null function + epilogue 271 641 136.53%Incrementing function 350,425 350,722 0.085%

Table 2:Per-function performance overhead due to the RAD code injected into an instrumented function. TheNull function +epiloguecase is the same as theNull function case except its epilogue is invoked through a software interrupt. TheIncrementingfunction case corresponds to a function that increments a variable 25000 times. All measurements are in terms of Pentium cyclecounts.

MSDEV.EXE (Visual C++ development environment),Windows Help (Winhlp), and Notepad. After rewrit-ing, all the above programs behave exactly the sameas before, except MS Access, which generated a falsealarm due to hand crafted assembly code (described inSection 4.3), and the third-party FTP server, which hasan internal exception handler that conflicts with the de-bugger exception handler that binary-rewriting RAD in-stalls. The initial experiences collected from runningthe binary-rewriting RAD prototype against a wide ar-ray of regular desktop applications and Internet servers,which are the prime targets for buffer overflow attacks,convinced us that this prototype is sufficiently mature topreserve the program semantics of complex production-grade applications while providing them with protectionagainst buffer overflow attacks. Of course, more exhaus-tive tests are required to be absolutely sure about theaccuracy of disassembly and the protection strength ofRAD.

5.2.1 Disassembly Accuracy

The binary-rewriting RAD prototype uses control flowanalysis and a set of other heuristics to distinguish be-tween code from embedded data. In general, controlflow analysis is quite effective in identifying the code re-gions for non-interactive applications, which usually donot have many call-back functions. However, for inter-active GUI applications, such as those in Microsoft Of-fice suite, control-flow analysis alone is not quite as ef-fective because of the hidden call-back functions. There-fore these applications represent the most challengingtest programs for a disassembler. Table 3 shows the dis-assembly accuracy of three such programs, MS Power-point, MS Frontpage, and MS Publisher. The disassem-bly accuracy of all these programs is above 99%. Theway we measure disassembly accuracy is to manually in-spect the resulting assembly code and determine whetherthe instructions look “reasonable.” From our experi-ences, instructions that are disassembled from data tendto appear out of place and thus can be easily detected.

Since the manual inspection method used above maynot seem rigorous enough, to further verify our dis-assembly results, we experimented with certain Cyg-win ported Unix applications (with available sources)

on Windows, compiled with gcc’s profiling options andthen analyzed them offline with gprof.

Thus the % of both missed functions and unprotectedreturns in interesting functions appear to be reasonable.The missed functions in Apache were typically func-tions without any absolute address references in the codesection, which were invoked through a table of functionpointers to which the addresses of those functions wereassigned statically. The table, being a static array vari-able, is located in the .data section and so were the func-tion addresses. The static call graph generated by gprofalso shows the parents of these functions as ’unidenti-fied’.

Because the results obtained from control flow analy-sis is guaranteed to be correct, it is useful to measure thepercentage of instructions that can be identified throughcontrol flow analysis, which gives an indication of howuseful other heuristics are in identifying instructions, es-pecially for GUI-intensive interactive applications. Ta-ble 5 shows the total number of instruction bytes in eachtest application and the percentage of them that controlflow analysis can successfully detect. As expected, fornon-interactive applications, which rarely use any call-back functions, control flow analysis can achieve a veryhigh detection accuracy, more than 97%. However, forinteractive applications, the percentage is around 80%.The difference between the coverage percentages in Ta-ble 3 and 4 for the three programs, MS Powerpoint, MSFrontpage, and MS Publisher, represent the contribu-tion of the pattern-based heuristics that binary-rewritingRAD employs to the total code region coverage.

Finally, because control flow analysis plays such animportant role in the disassembly process, it is instruc-tive to investigate deeper why it cannot detect all the in-structions in the program. Other than functions that donot have explicit call sites, indirect branch instructionsare the main culprit. We measure the percentage of in-direct control branch instructions in the test applicationsand the results are shown in Table 5. Again, interactiveapplications such as MS Powerpoint and Access tend tohave a higher percentage of indirect branches than oth-ers, which reflects the event-driven programming styleof these applications, and correspondingly a more ex-tensive use of function pointers and switch statements.

Page 13: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association222

Application Code section No. of incorrectly decoded Accuracysize bytes (approximation)

MS PowerPoint 4.059 MB 2500 99.93%MS Publisher 2.314 MB 50 99.99%MS FrontPage 983 KB 900 99.91%

Table 3:Disassembly accuracy achieved as measured through manual inspection of the resulting assembly code. Higher accuracymeans that more bytes are successfully disassembled.

Application No. of functions No. of functions Miss No. of returns % of returns(source code) (disassembly) % unprotected unprotected

Gzip 234 234 (80) 0 2 0.85Wget 626 626 (140) 0 3 0.48

Apache 1191 1159 (350) 2.69 38 3.19Whois 148 148(15) 0 0 0

OpenSSL 2820 2812(780) 0.283 7 0.248

Table 4:Evaluation of disassembly results by comparison with original program sources. The numbers in the parenthesis on thethird column represent the number of falsely identified functions.

In summary, our disassembly results appear to bebetter than the previous best reported in the literature[22], which claims 99.9% precision using binaries withrelocation information, but most of their experimentswere on smaller programs, all of which were plain com-mand line programs without any GUI callback functions,which makes disassembly tougher. Furthermore, thepresence of symbol table information in binaries (pos-sibly inadvertently) eliminates problems regarding func-tion boundary identification. However, there is a ques-tion of the symbol table format. It could either be thegeneric COFF symbol table, supported by the PE/COFFbinary formats, or it could be a compiler specific format,like the VC++ .pdb. Apparently, compilers tend to fa-vor their proprietary formats for symbol table over thegeneric COFF format. This is evident since the defaultcompilation options for generating debug information donot produce COFF symbol tables, and generate propri-etary symbol tables instead.

5.2.2 Run-Time OverheadAn important consideration in the design of RADis the minimization of performance overhead due toper-function prologue and epilogue RAD code. Therelative performance overhead of RAD with respect to atest application is defined as

��������� ��� ��� � � � ��������� � �� �� � �

������ ��������� � �� ������ � �

The results are in Table 6, which shows the run-timeperformance overhead of binary-rewriting RAD for typ-ical Internet applications is quite small, around 1%. Thespace overhead of binary-rewriting RAD for real appli-

cations is also quite reasonable, as shown in Table 7. Thehighest percentage is still smaller than 35%. Both resultsdemonstrate that the overhead of binary-rewriting RADis quite reasonable for practical applications, given theadditional protection it provides.

Because the cost of invoking an epilogue RAD codethrough an exception handler is around four times as ex-pensive as that through a jump instruction, it is importantto find out how frequent epilogue RAD code is invokedthrough an exception handling mechanism. If it occursfrequently, then perhaps a more sophisticated mecha-nism needs to be developed. Table 7 also shows thepercentage of functions in the test applications whoseepilogue RAD code is triggered via an exception han-dler. The statistics in Table 7 show that the percentageof functions that do not have enough instruction spacefor a jump instruction is fairly low, less than 3%, whichjustifies our design decision of using this expensive solu-tion in these infrequent cases. Please note that, these areresults of static analysis. It is possible that, at run-timeone of these functions get invoked 50% of the times,in which case the performance might get seriously hit.While it is possible to instrument binaries to report the% of functions called at run-time which need the use ofthe INT 3 software interrupt, we are not clear if thatwould say much, since at the end of the day, we can stilljust say that, since the % of such functions (from our ex-periments) is typically 0.03% to 2.5%, probabilisticallythe % of such functions among those called at run-timewould be of a similar order.

5.3 Resilience to Buffer Overflow Attacks

The Windows help program (Winhlp32.exe) on Win-dows NT 4.0 with Service Pack 4 has a buffer over-

Page 14: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical ConferenceUSENIX Association 223

Application % Covered by Control Flow Analysis % of Indirect Branch InstructionsWFtpd (Ftp server) 97.13% 5.81%BIND (DNS server) 97.32% 5.42%

MS Access 84.57% 8.29%Notepad 97.54% 1.73%

MS Powerpoint 80.11% 7.54%Windows Help 99.67% 1.41%MS FrontPage 87.20% 8.98%MS Publisher 93.86% 8.94%

Table 5:Column 2 shows the percentage of a program’s code section bytes that is detected purely through control flow analysis.Column 3 shows the percentage of indirect branch instructions among all the branch instructions.RET instructions are not includedin this count.

Application Original execution Binary RAD execution % Overheadtime (msec) time (msec)

BIND 122.56 123.85 1.05%DHCP server 122 123.5 1.23%PowerPoint 145 150 3.44%

Outlook Express 138.2 140 1.29%

Table 6:Whole program performance overhead due to the insertion of binary-rewriting RAD code. For BIND, the response timemeasurement is averaged over 10 queries issued using the client programdig.exe. For the DHCP server, the measurement isthe startup and initialization time averaged over 6 runs. For PowerPoint, the measurement is the time taken to render a 90Kbytepresentation averaged over 6 runs. For Outlook Express, the measurement is the startup and initialization time averaged over 6runs.

flow vulnerability, which occurs when it reads a contentfile (.CNT) with a very long heading string. We instru-mented Winhlp32.exe using our binary-rewriting RADtool, and the augmented binary successfully resists theattack mounted by a published exploit code [3].

6 Conclusions and Future WorkWe have presented a buffer overflow defense mecha-nism using static binary translation based on the RAD[1] model. To the best of our knowledge, this is the firstwork reported in the open literature that applies staticbinary translation technology to a concrete applicationsecurity problem. While a robust binary rewriting in-frastructure, such as tools like Etch [7], does exist, pub-lished papers on these systems have never documentedin detail, the design and implementation issues involved,the solutions adopted to address them and their effective-ness in a quantitative manner. Our contribution lies, notin inventing new approaches to static binary translationbut in being the first study to implement state-of-the-arttechniques into a working system and evaluate their ef-fectiveness on commercial-grade Windows applications.We believe that this paper exhaustively covers most bi-nary translation issues in substantial depth and detailand presents a comprehensive set of experimental resultsto demonstrate the efficacy of the design decisions wehave made. Finally, the resulting binary-rewriting RADsystem achieves qualified success as an important tool

to protect legacy applications whose source code is notavailable against buffer overflow attacks, and thus signif-icantly broadens the applicability of buffer overflow de-fense mechanisms developed in the research literature.Although, it may not achieve the stated goal of provid-ing the same level of protection as its compiler-basedcounterpart, in a few cases, it is primarily due to a fun-damental deficiency, one that none of the known worksin the binary translation literature have done better with,as far as we can tell.

Currently, we are exploring more robust and foolprooffall-back mechanisms to deal with scenarios of incorrectdisassembly and lack of sufficient space for ’in place’translation. As an immediate next step, we intend toexperiment our binary translation engine with Dynam-ically Linked Libraries (DLLs), since a major chunk ofWindows services are implemented as DLLs. Finally,we aim to apply the lessons from exploring static bi-nary translation techniques to build copy- and tamper-resistant software.

7 Acknowledgments

To Sang Cho for his open source disassembler [13] andto our shepherd, Dawn Song and the anonymous review-ers for their valuable feedback.

Page 15: Proceedings of the General Track: 2003 USENIX Annual ...Manish Prasad and Tzi-cker Chiueh SUNY StonyBrook {mprasad,chiueh}@cs.sunysb.edu Abstract Buffer overflow attack is the most

2003 USENIX Annual Technical Conference USENIX Association224

Application % Increase in Size of Executable File % of Functions Using theINT 3 HandlerWFtpd (Ftp server) 34.06% 2.57%BIND (DNS server) 32.65% 0.00%

MS Access 11.29% 2.61%MS Powerpoint 9.74% 0.83%Windows Help 32.79% 0.098%MS FrontPage 16.45% 0.031%MS Publisher 10.84% 1.58%

Table 7: Column 2 shows the space overhead of binary-rewriting RAD for different test applications in terms of percentageincrease in size of the executable file after rewriting. Column 3 shows the percentage of functions among those identified that needto invoke RAD epilogue code through theINT 3 handler

References[1] Tzi-cker Chiueh and Fu-hau Hsu,RAD: A compile time

solution for buffer overflow attacks, 21st IEEE Interna-tional Conference on Distributed Computing Systems(ICDCS), Phoenix, AZ, April 2001

[2] Aleph One, Smashing the stack for fun and profit,Phrack Magazine 7 (49), November 1996

[3] David Litchfield, Windows NT buffer over-runs Winhlp32: http://community.core-sdi.com/ juliano/mnemonix-whlpbo.htm

[4] dark spyrit, Win32 Buffer Overflows - Location, Ex-ploitation and Defense, Phrack Magazine 55 (15), May2000

[5] A. Srivastava and A. Eustace,ATOM: A System forBuilding Customized Program Analysis Tools, SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI), pages 196–205, June1994.

[6] James Larus and Eric Schnarr,EEL: Machine-independent executable editing, SIGPLAN Conferenceon Programming Languages, Design and Implementa-tion, pages 291–300, June 1995.

[7] Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman,Wayne Wong, Hank Levy, and Brian Bershad.Instru-mentation and optimization of win32/intel executablesusing Etch. In USENIX Windows NT Workshop, 1997.

[8] LEEL, http://www.geocities.com/fasterlu/leel.htm

[9] C. Cifuentes and M. Van Emmerik,UQBT: Adapt-able Binary Translation at Low Cost, IEEE Computer,March 2000.

[10] Crispin Cowan et al.,Stackguard: Automatic adaptivedetection and prevention of buffer-overflow attacks, 7thUSENIX Security Symposium, San Antonio, TX, Jan-uary 1998.

[11] Microsoft compiler extension for buffer overflow de-fense,http://go.microsoft.com/fwlink/?Linkid=7260

[12] Stackshield,www.angelfire.com/sk/stackshield/

[13] Win32 Disassembler,www.geocities.com/ sangcho

[14] Hiroaki Etoh. GCC extension for protect-ing applications from stack-smashing attacks.http://www.trl.ibm.co.jp/projects/security/ssp

[15] CASH: Checking Array Bound Vio-lation Using Segmentation Hardware,http://www.ecsl.cs.sunysb.edu/softsecure/project.html

[16] R. Jones and P. Kelly, Backwards-compatible bounds checking for arraysand pointers in C programs, http://www-ala.doc.ic.ac.uk/ phjk/BoundsChecking.html

[17] Intel Architecture Software Developer’s Manual: Vol-ume 3: System Programmer’s Guide

[18] Bulba and Kil3r. Bypassing StackGuard and Stack-Shield. Phrack, 5(56), May 2000.

[19] Phrack Magazine 55 (8), May 2000: Klog - The framepointer overwrite

[20] Arash Baratloo, Timothy Tsai, and Navjot Singh,Trans-parent run-time defense against stack smashing attacks,USENIX Annual Technical Conference, June 2000.

[21] Vladimir Kiriansky, Derek Bruening, Saman Amaras-inghe, Secure Execution Via Program Shepherding,11th USENIX Security Symposium, August 2002, SanFrancisco, California.

[22] Benjamin Schwarz, Saumya Debray, Gregory Andrews,Disassembly of executable code revisited, WorkingConference on Reverse Engineering, Oct 2002.

[23] C. Cifuentes, M. Van Emmerik,Recovery of Jump TableCase Statements from Binary Code, International Work-shop on Program Comprehension, May 1999

[24] Galen Hunt and Doug Brubacher,Detours: Binary In-terception of Win32 Functions, 3rd Usenix NT Sympo-sium, Seattle, July 1999.

[25] Matt Pietrek,An In-Depth Look into the Win32 PortableExecutable File Format, MSDN magazine, Feb 2002

[26] Matt Pietrek, Under the Hood, Microsoft Systems Jour-nal, 11(5), May 1996.

[27] Barton P. Miller, Mark D. Callaghan, Jonathan M.Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin,Karen L. Karavanic, Krishna Kunchithapadam and TiaNewhall,The Paradyn Parallel Performance Measure-ment Tools, IEEE Computer 28, 11, pp.37-46 (Novem-ber 1995).

[28] Jun Xu, Zbigniew Kalbarczyk, Sanjay Patel and Ravis-hankar K. Iyer,Compiler and Architecture Support forDefense against Buffer Overflow Attacks, 2nd Work-shop on Evaluating and Architecting System Depend-ability (EASY), San Jose, CA, October, 2002