005 pietrek matt just enough assembly to survive kit hood sve

8/13/2019 005 Pietrek Matt Just Enough Assembly to Survive Kit HOOD SVE

1/31

1

Figure 1 Common I ntel x86 R egisters

EAX Multipurpose. Return values from a function are usually stored inEAX. Low 16 bits are referenced as AX. AX can be further subdividedinto AL (the low 8 bits), and AH (the upper 8 bits of AX).

EBX Multipurpose. Low 16 bits are referenced as BX. BX can be furthersubdivided into BL (the low 8 bits), and BH (the upper 8 bits of BX).

ECX Multipurpose. Often used as a counter, for example, to hold thenumber of loop iterations that should be performed. Low 16 bits arereferenced as CX. CX can be further subdivided into CL (the low 8bits), and CH (the upper 8 bits of CX).

EDX Multipurpose. Low 16 bits are referenced as DX. DX can be furthersubdivided into DL (the low 8 bits), and DH (the upper 8 bits of DX).

ESI Multipurpose. In certain operations that move or compare memory,ESI contains the source address. Low 16 bits are referenced as SI.

EDI Multipurpose. In certain operations that move or compare memory,EDI contains the destination address. Low 16 bits are referenced asDI.

ESP Stack pointer. Implicitly changed by PUSH, POP, CALL, and RETinstructions.

EBP Base pointer. Usually points to the current stack frame for aprocedure. Procedure parameters are usually at positive offsets fromEBP (for example, EBP+8). Local variables are usually at negativeoffsets (for example, EBP-16). Sometimes, optimizing compilerswon't use a stack frame, and use EBP as a multipurpose register.

EFLAGS Rarely directly referenced. Instead, instructions implicitly set or clearbitfields within the EFLAGS register to represent a certain state. Forexample, when the result of a mathematical operation is zero, theZero flag is toggled on in the EFLAGS register. The conditional jumpinstructions make use of the EFLAGS register.

FS 16-bit. Under Win32, the FS register points to a data structure withinformation pertaining to the current thread. FS is a segmentregister (segment registers are beyond the scope of this discussion).Intel CPUs have six segment registers, but the operating system setsthem up and maintains them. Win32 compilers only need to explicitlyrefer to the FS segment register, which is used for things likestructured exception handling and thread local storage.

Pr ocedure Entry and Exit

These instructions are automatically inserted by the compiler to create a standardmethod for accessing parameters and local variables. This method is called astack frame, as in "frame of reference." In fact, the Intel CPU dedicates the EBPregister to maintaining a stack frame. For this group of instructions, it's especiallyimportant to note that not every procedure will use exactly the same sequence,and that certain things may be omitted entirely.

Sequence PUSH EBP / MOV EBP,ESP / SUB ESP,XXPurpose Sets up the EBP stack frame for a new procedureExamples

PUSH EBP

MOV EBP, ESP

SUB ESP, 24

Description "PUSH EBP" saves the previous frame pointer on the stack. "MOVEBP,ESP" sets the EBP register to the same value as the stack pointer (ESP)."SUB ESP,XX" creates space for local variables below the EBP frame.

In optimized code, you may see this sequence interspersed with otherinstructions (for example, "PUSH ESI"). Since "PUSH EBP" and "MOV EBP,ESP"both use the EBP register, a processor with multiple pipelines would ordinarily


2/31

2

need to stall one of the pipelines. By interspersing other instructions that don'tuse the EBP register, the processor can do more work in the same amount oftime.

I nstruction ENTERPurpose Sets up the EBP stack frame for a new procedure

Examples ENTER 8, 0 ; Sets up stack frame with

; 8 bytes of local variables

Description The ENTER instruction first became available on the 80286processor. It was intended to replace the "PUSH EBP / MOV EBP,ESP / SUBESP,XX" sequence with a single, smaller instruction. On current processors theENTER instruction is slower than the three-instruction sequence, so ENTER israrely used.

Sequence MOVE ESP,EBP / POP EBPPurpose Removes the EBP stack frame before leaving a procedureDescription The "MOV ESP,EBP" instruction bumps up the stack pointer past

any space allocated for local variables on the stack. "POP EBP" restores the stackframe pointer to point at the previous EBP frame. This sequence is normallyfollowed by a return instruction to return control to the calling procedure.

I nstruction LEAVEPurpose Removes the EBP stack frame before leavingDescription The LEAVE instruction is the inverse of the ENTER instruction. It

can also be used to remove a frame set up by the "PUSH EBP / MOV EBP,ESP"sequence. The LEAVE instruction is only 1 byte long, which is smaller than thelonger "MOV ESP,EBP / POP EBP" sequence. Unlike the ENTER instruction, there'sno performance penalty for using it, so some compilers use LEAVE.

I nstruction PUSH register Purpose Saves the previous values of register variablesExamples

PUSH EBX

PUSH ESI

PUSH EDI

Description Sometimes compilers use a general-purpose register to hold thevalue of parameters or local variables. This can be more efficient than storing the

same value in memory. These are commonly known as register variables. TheEBX, ESI, and EDI registers are most often used as register variables.The convention most compilers use is that register variable values are preserved

across procedure calls. If the compiler decides to use register variables in aprocedure, it is responsible for preserving the value of the registers that it alters(typically, EBX, ESI, and EDI). Typically, compilers preserve these register valueson the stack as part of setting up the procedure's stack frame. If the compileruses only one or two of the aforementioned registers, it needs to preserve onlythose registers.

I nstruction POP register Purpose Restores the previous values of register variablesExamples

POP EDI


3/31

3

POP ESI

POP EBX

Description In preparing to return from a procedure, the register variableregisters need to be restored to their previous values. These instructions remove

a value from the stack and place it into the designated register.

Accessing Variables

The Intel CPU has many instructions that work with variables, which are justlocations in memory. For example, you can add or subtract from a variablerepresenting a counter. Likewise, a variable may contain a pointer to something.There are just too many instructions to describe here, and in most cases theinstruction name gives a good clue about what the instruction is doing. However,I will show how variables of different storage classes appear in assemblylanguage.

I nstruction instruction [global ]Purpose Global/static variablesExamples

MOV EAX,[00401234]

MOV [00401238],ESI

PUSH [77852432]

ADD [00620428],00001000

Description When you see an instruction that includes an actual machineaddress inside the square brackets, it's accessing memory that was declared aseither a global or static variable. These addresses are known at program loadtime, so the instruction contains the actual memory address to read or write.

I nstruction instruction [ parameter ]Purpose Procedure parameters and this pointersExamples

MOV ESI,[EBP+14]

MOV [ESP+30],EAX

ADD [EBP+0C],2

OR [ESP+20],00000010

Description Parameters to procedures are usually passed on the thread'sstack. Since these values are pushed before the procedure call and before thecalled procedure sets up its stack frame, the parameters appear at positiveoffsets from the stack frame base pointer (EBP). Just about any instruction thatmakes reference to memory above EBP (for example, "[EBP+8]") is making useof a procedure parameter. The advantage of using EBP for accessing parametersis that EBP doesn't change throughout the lifetime of a procedure. This makes iteasier to keep track of the procedure's parameters.

Prior to the 80386, the only effective way to access parameters was with thebase pointer register. The 386 added the ability to access memory just as easily

with displacements from the stack pointer (ESP) register. Thus, optimized codecan dispense with setting up an EBP frame and still reference parameters by


4/31

4

using positive offsets from ESP. For example, "ADD [ESP+20],4" adds four towhatever DWORD is at [ESP+20]. From a debugging standpoint, using ESP toaccess parameters is inconvenient. Since ESP can change during a procedure, agiven parameter may be at different offsets from ESP at different points in aprocedure's code.

One last word on parameters. In C++, the this pointer of a member function isreally a hidden parameter. Usually the this pointer is the last parameter pushedon the stack before the call. In Visual Basic, the self-referential me is the samething as the C++ this pointer.

I nstruction instruction [ local ]Purpose Local VariablesExamples

MOV ESI,[EBP-14]

MOV [EBP-30],EAX

SUB [ESP],2

AND [ESP+4],00000010

Description From the vantage point of an assembly instruction, local variablesaren't much different than parameters when an EBP frame is used. The onlydistinction is that local variables are at negative offsets from the EBP stack frame.You can get an idea of how big the sandbox for local variables will be byexamining the "SUB ESP,XX" instruction near the beginning of the procedure.

Things do get messy when the compiler decides to omit an EBP frame. Whenthis happens, the compiler addresses both local variables and parameters aspositive offsets from the ESP register. There's no good way to tell a local apartfrom a parameter in this situation except to find out how much space the

procedure has allocated for locals (see above). If the offset is less than the spaceallocated, it's a local. Otherwise, it's probably a parameter.

I nstruction LEA variable Purpose Load Effective AddressExamples

LEA EAX,[ESP+14]

LEA EDX,[EBP-24]

Description Despite the square brackets, LEA doesn't actually read memory or

dereference a pointer. Instead, it loads the first operand with an address specifiedby the second parameter. For example, "LEA EAX,[ESP+14]" takes the currentvalue of the ESP register, adds 14 to it, and puts the result in EAX.

LEA's primary use is to obtain the address of local variables and parameters. Forexample, in C++, if you use the & operator on a local variable or parameter, thecompiler will likely generate an LEA instruction. As another example, "LEAEAX,[EBP-8]" loads EAX with the address of the local variable at EBP-8.

A less obvious use of LEA is as a fast multiplication. For example, multiplying avalue by 5 is relatively expensive. Using "LEA EAX,[EAX*4+EAX]" turns out to befaster than the MUL instruction. The LEA instruction uses hardwired addressgeneration tables that makes multiplying by a select set of numbers very fast (forexample, multiplying by 3, 5, and 9). Twisted, but true.


5/31

5

Calling P rocedures

I nstruction CALL location Purpose Transfer control to another procedureExamples

CALL 00682568

CALL [00401234]

CALL ESI

CALL [EAX+24]

Description The CALL instruction doesn't need much explanation in itself. Itpushes the address of the instruction following it onto the stack, then transferscontrol to the address given by the argument. The various ways of specifying atarget address are worth mentioning, however.

The simplest form of the CALL instruction is when the argument contains thedestination address as an immediate value (for example, "CALL 00682568"). Thistype of call is almost always to another location within the same module (EXE orDLL). Slightly more complicated is when the CALL instruction indirects through anaddress (for example, "CALL [00401234]"). You'll see this form of CALLinstruction when calling a function imported from another module. It's also seenwhen calling through a function pointer stored in a global variable.

Two other forms of CALL instruction use registers as part of their address. If just a register name is specified (for example, "CALL ESI"), the CPU transfers towhatever address is in the register. If a register is used within brackets, perhapswith an additional displacement ("CALL [EAX+24]"), the instruction is callingthrough a table of function addresses. Where would these come from? You mayknow these tables by the more familiar name of vtables. In the precedinginstruction example, the sixth member function is being called. (24 divided by thesize of a DWORD is 6.)

I nstruction PUSH value Purpose Places a parameter onto the stack in preparation for calling procedureExamples

PUSH [00405234] ; Push a global variable

PUSH [EBP+C] ; Push a parameter

PUSH [EBP-14] ; Push a local variable

PUSH EAX ; Push whatever is in EAX

PUSH 12345678 ; Push an immediate value.

Description When it comes to passing parameters, all variations of the PUSHinstruction are used by the compiler. Global variables, local variables,parameters, the results of a calculation, and immediate values can all be passedwith a single instruction. When you see a sequence of PUSH instructions prior to aCALL instruction, the odds are good that the PUSHes are putting the parametersonto the stack.

As mentioned earlier, if a member function or method is being called, the this orme pointer is usually passed last. In some cases, the this pointer is passed in the

ECX register instead. You can identify when this occurs by looking for code that


6/31

6

initializes the ECX register and then does nothing with it before the CALLinstruction.

I nstruction RETPurpose Return from a procedure call

Examples RET

RET 8

Description The RET instruction returns from a procedure call. It simply popswhatever value is currently at [ESP] into the EIP (instruction pointer) register.The "RET XX" form does the same thing, and then adds XX to the ESP value. Thisis how __stdcall procedures clear parameters off the stack before returning totheir caller. (Most Win32 APIs are __stdcall based.) By dividing the number ofcleared bytes by four (the size of a DWORD), you can usually figure out howmany parameters a procedure takes. For instance, a procedure that returns with

a "RET 8" instruction takes two parameters.Functions that return an integer or pointer value usually return the value in theEAX register. By examining what's in EAX before executing the RET instruction,you can see the function's return value.

I nstruction ADD ESP, value Purpose Removes parameters off the stackExamples

ADD ESP,24

Description When calling procedures that don't remove parameters before

returning, it's up to the calling function to remove its parameters. This is the casewith cdecl functions, which is the default for C and C++ code. The "ADD ESP,XX"function bumps up the stack pointer so that any passed parameters are below theresulting ESP.

If the function doesn't take a variable number of parameters, the "ADD ESP,XX"instruction gives insight to how many parameters the called procedure accepts.(See the description above for "RET XX".) If the called procedure takes a variablenumber of parameters (like printf and wsprintf do), the "ADD ESP,XX" instructiontells you how many parameters were passed for that particular CALL.

Flow Control

In the context of this column, flow control means code that affects which portionsof a program's code are subsequently executed. At the simplest level, this meansconditional execution (colloquially known as if statements). More complex flowcontrol sequences such as while loops and for statements are usually built fromthe lower-level if statement constructs. In one case though (the LOOPinstruction), the processor has built-in knowledge of these higher-level languageconstructs.

Before I get to these instruction sequences, let me highlight two things that caneasily trip you up. For starters, the term "Jcc" is used as a stand-in for any of the16 conditional jump instructions. The cc means condition code.

More insidiously, there are several sets of Jcc instructions that are aliases forone another. For example, JZ (Jump if Zero flag set) is the same instruction as JE

(Jump if Equal). Likewise, JNZ (Jump if Zero flag NOT set) is the same instructionas JNE (Jump if Not Equal). Unfortunately, some disassemblers use the JZ/JNZ


7/31

7

form, while others use the JE/JNE form. Is this confusing? Yes! The moral of thestory: be prepared to mentally substitute an aliased form of the instruction if itmakes the code easier to understand.

if ( MyVariable == 2 )

{

// Whatever code you want

}

If the CMP instruction determines that MyVariable isn't 2, the flag will be setso that the JNE instruction that follows will skip over the code in curlybrackets.

Sequence TEST value, value / Jcc location Purpose Determine if a bit is set, and branch accordinglyExamples

TEST EAX,EAX

JNZ 00400124

TEST EDX,00400024

JZ 77f85624

Description The TEST instruction does a logical AND of the two arguments,which sets or clears the Zero flag in the EFLAGS register. The next instruction

(JZ or JNZ) does a jump to the target address if the Zero flag is set or cleared,depending on the instruction used. If the JZ/JNZ doesn't jump, execution

Sequence CMP value, value / Jcc location

Purpose Compare two values, and branch accordinglyExamples

CMP EAX,2

JE 10036728

CMP [EBP+20],1000

JNE 00427824

Description The CMP instruction is used when two values are to be

compared. The CMP instruction sets or clears a variety of flags, including theZero, Sign, and Overflow flags. From this, a variety of Jcc instructions can thenbe used to branch accordingly. Most often, the JE and JNE instructions follow aCMP instruction.

The following C++ code sequence would be implemented with a CMP / JNEsequence:


8/31

8

continues at the following instruction.This sequence is typically used to test one or more bits as part of an if

statement. For example, this C++ code could be implemented using a "TEST /JZ" sequence.

if ( MyVariable & 0x00400024 )

{

// Whatever code you want

}

If MyVariable has any of the same bitfields set as in the value 0x00400024,the Zero flag won't be set. This prevents the JZ instruction from jumping, andexecution falls into the code in the curly brackets.

I nstruction JMP location Purpose Transfer control to some other locationExamples

JMP 10047820

if ( MyVariable == 2 )

{

// some code

// JMP past "else" code

}

else

{

// some other code

The second place where JMP instructions crop up is as part of a loop. At theend of the loop's code, some code sequence determines if it's time to breakout of the loop. If the loop isn't finished, a JMP instruction transfers controlback to the beginning of the loop's code.

The third scenario where you'll see JMP instructions is when a procedure hasa common exit sequence. That is, no matter how many return statementsthere are in the procedure, there's only one spot in the code that cleans up thestack frame and returns. In this situation, a return statement in the middle ofthe procedure's code is implemented as a JMP to the common exit sequence


9/31

9

code.It's also possible that you'll encounter a JMP instruction from a goto

statement. Fortunately, most programmers don't bother with goto's anymore.Finally, if you see a JMP instruction that simply jumps to the next instruction,you're probably in code that wasn't compiled with optimizations enabled.

I nstructions LOOP, LOOPZ, LOOPNZPurpose Purpose Jump back to the beginning of a loop's code, if conditions are

rightExamples

LOOP 00401234

LOOPZ 65432108

Description The LOOP instruction uses the contents of the ECX register as acounter. Each invocation of the LOOP instruction decrements the ECX register. Inthe simplest case, the LOOP instruction branches back to the beginning of theinstruction sequence if ECX isn't zero. The LOOPZ and LOOPNZ only branch if ECXis nonzero, and the Zero flag in EFLAGS is set accordingly.

The C++ for loop construct can be implemented with the LOOP instruction if thenumber of iterations is known ahead of time. Before executing the actual codeinside the loop, ECX is loaded with the number of iterations. At the end of thecode inside the loop is a LOOP instruction. After the specified number ofiterations, ECX becomes zero and the LOOP instruction doesn't branch.

Bitwise Manipulation

The bitwise instructions are used to turn individual bits on and off in a value. The

value can be a global variable, a local variable, a parameter, or a register. Here,I'll show the two most common instructions, AND and OR. There's also an XORinstruction, but it's less commonly used.

I nstruction AND value ,bitfieldPurpose Performs a logical AND of the bitfields of two operandsExamples

AND EAX,00001000

AND [ESI+4],00000004

Description Unlike the TEST instruction (see above), the AND instructionactually modifies the destination operand. For example, in C++, the statement

MyVar &= 0x00010001;

could be implemented as:

AND [MyVar],00010001h

The AND instruction is also used to turn off particular bitfields. To do this, thedesired bits to be turned off are set to the off (zero) state in the source operand.All of the bits to be left alone are set to true in the source operand.


10/31


11/31

11

I nstruction CMPSB, CMBSW, CMPSDPurpose Compares two strings in memoryExamples

REPE CMPSB

Description These instructions are used to compare the BYTEs, WORDs, orDWORDs pointed to by the ESI and EDI registers with the EFLAGS setappropriately after the comparison. Each iteration of the CMPSx instructioncauses the ESI and EDI registers to increment by the appropriate amount (one,two, or four bytes).

It's not hard to see how the C++ memcmp function could be implemented byusing the REPE prefix with the CMPSx instructions. The REPE prefix causes theCMPSx instruction to keep iterating while the two memory locations are equal andECX is nonzero. The memcmp function could be implemented using "REPECMPSB", although optimized code will use "REPE CMPSD" for the bulk of thestring and "REPE CMPSB" for the last three or fewer bytes.

I nstruction MOVSB, MOVSW, MOVSDPurpose Moves BYTEs, WORDs, or DWORDs from the source string to the

destination stringExamples

REP MOVSD

Description The MOVSx instructions copy memory pointed to by ESI into thememory pointed at by EDI. After each iteration, ESI and EDI are incremented.Typically, MOVSx is used with the REP prefix to copy a predetermined number ofBYTEs, WORDs, or DWORDs. The number to copy is specified in the ECX register.The C++ memcpy function can be implemented using "REP MOVSB".

I nstruction STOSB, STOSW, STOSDPurpose Sets a series of BYTEs, WORDs, or DWORDs to a specified valueExamples

REP STOSB

Description The STOS x instructions copy the value in AL, AX, or EAX into thememory pointed to by the EDI register. Typically, STOSB is used with the REPprefix to copy the number of bytes specified in the ECX register. The C++memset function can be implemented with "REP STOSB", or by a combination of"REP STOSD" and "REP STOSB".

Miscellaneous In this final group are random instructions that you'll often encounter. Of the list,"XOR EAX,EAX" is most prevalent.

I nstruction XOR register, register Purpose Sets a register's value to zeroExamples

XOR EAX,EAX

Description Using the XOR instruction to zero out a register takes less spacethan the equivalent MOV instruction. For example, "MOV EAX,0" takes five bytes,while "XOR EAX,EAX" uses only two bytes. Is using XOR twisted? Yes. But after

years of stepping through assembly code, you too will automatically substitute"zero out the register" when you see this instruction.


12/31

12

I nstruction MOVZX DWORD value, byte or word value Purpose Copies an unsigned value into a larger typeExamples

MOVZX EAX,BYTE PTR [EBP+8]

MOVZX EAX,WORD PTR [00451234]

Description In most languages, a value of a smaller type can be copied into orused in place of a larger type. For example, in C++ an unsigned char can becopied into an unsigned short (aka a WORD). Likewise, an unsigned short can beused where an unsigned long is expected. The compiler uses MOVZX (move withzero extend) to convert the smaller type into a larger type. In C++, BYTEs can beconverted to WORDs or DWORDS, and WORDs can be converted to DWORDs.

I nstruction MOVSX DWORD value, byte or word value Purpose Copies a signed value into a larger typeExamples

MOVSX EAX,BYTE PTR [EBP+8]

MOVSX EAX,WORD PTR [77f81234]

Description In most languages, a value of a smaller type can be copied into orused in place of a larger type. For example, in C++ a char can be copied into ashort. Likewise, a short can be used where a long is expected. The compiler usesMOVSX (move with sign bit extend) to convert the smaller type into a larger type.In C++, chars can be converted to shorts or longs, and shorts can be convertedto longs.

I nstructions: MOV EAX,FS:[0], MOV FS:[0],ESPPurpose Establish a new structured exception handling frameExamples

MOV EAX,FS:[00000000]

Push EAX

MOV FS:[00000000h],ESP

Description In Win32, the FS register points to the Thread Environment Block(TEB). A data structure unique for each thread, the TEB contains values that thesystem uses to control the thread. At offset 0 in the TEB is a pointer to the firstnode in the structured exception handling chain. When you see code that usesFS:[0], it's usually setting up or tearing down a try block.

I nstruction MOV EAX,FS:[18]Purpose Makes a linear pointer to the TEBExamples

MOV EAX,FS:[18]

MOV EAX,[EAX+24]

Description The TEB is always pointed to by the FS register. To make codeportable, it's helpful to use a flat, linear address for the TEB. The TEB's linear

address can be found at offset 0x18 in the TEB. Code that reads from FS:[18] is


13/31

13

preparing to read some other value from the TEB. Step through all threeinstructions in GetCurrentThreadId under Windows NT to see this for yourself.

I nstruction MOV ECX,FS:[2C]Purpose Makes a pointer to the Thread Local Storage (TLS) array

Examples ECX,DWORD PTR FS:[0000002C]

EDX,DWORD PTR [ECX+EAX*4]

Description At offset 0x2C in the TEB is a pointer to the TLS array for thethread. This array contains 64 DWORDs, each corresponding to a particular indexvalue that would be passed to TlsGetValue. Code that uses FS:[2C] is using TLS.

To The Code!

To show many of the instructions and sequences that I've described, I wrote theInstructionDemo program. A quick look at the source code in Figure 2 showsthat the two functions don't do anything worthwhile. But the code is wellcommented, pointing out the particular instruction or instruction sequence it'sdesigned to produce.

I compiled InstructionDemo.CPP with the following command line:

CL InstructionDemo.CPP

I then disassembled the relevant parts of the executable and annotated thelisting. Above each instruction or sequence is the C++ code responsible for it(see Figure 3 ). This is similar to what the Developer Studio IDE does whenyou select "Go To Disassembly" in the source window. Many of the instructionsdon't need explanation, but it's worthwhile to point out a few things.

First, examine the instructions at offset 0x401000. They're establishing thestack frame for the procedure, including creation of space below the frame forlocal variables. If you look throughout the procedure, you won't see the EBX andESI registers used, so the stack frame only preserves the EDI register.

After a whole bunch of variable initialization instructions, notice that the signedtype promotion (char to long) at offset 0x401040 requires two instructions. Thisis because (in the general case) the Intel architecture doesn't allow oneinstruction to reference two memory addresses. Therefore, the assignment mustgo through a register that acts as an intermediate location.

Also interesting is the if statement starting at offset 0x40104D. After the codethat executes when the expression evaluates to TRUE, note the JMP instructionat offset 0x0x401060. This JMP instruction makes the CPU skip over all the codefor the else clause. A bit later (at offset 0x40106C), another if statement usesthe TEST instruction to see if bitfields are set. In that sequence, the compilertreats the ECX register as a private, unnamed local variable.

Examining the for loop at offset 0x4010A9 is interesting because of the waythe compiler orders the initialization, termination condition, and post-iterationcode. The MOV instruction at 0x4010A9 performs the initialization, and thencontrol JMPs past the post-iteration code to get to the termination conditioncode. The termination condition code looks very similar to an if statement. If you

understand what the code is doing here, you can see how a for statement couldbe rewritten using if and goto statements.


14/31

14

Starting at offset 0x4010E1, the code begins pushing parameters on the stackin preparation for calling printf. It's important to realize that the parameters arepassed right to left. Note that there are two distinct LEA instructions. The firstcalculates the address of the szBuffer array, while the second calculates theaddress of the argc parameter. After the call to printf at offset 0x4010F9, thecode cleans all the pushed parameters off the stack with the "ADD ESP,14"instruction.

In the MySubProcedure code starting at offset 0x40112E, the stack framesetup is considerably more complex than the prior procedure's. The instructionslike "PUSH 00405058" and "MOV EAX,FS:[00000000]" are building a frame forthe structured exception handler code that results from using __try. Also, thistime the stack frame setup code preserves all the register variable registers(EBX, ESI, and EDI).

At offset 0x401154, the code modifies the TLS variable called tlsVariable. The"MOV ECX,DWORD PTR FS:[0000002C]" instruction loads the ECX register with apointer to the array of 64 DWORDs that each thread uses for TLS. The nextinstruction uses an advanced addressing form to index into the array and readthe slot corresponding to a particular TLS index. ECX contains the pointer to thearray, while EAX contains the TLS index. The code multiplies EAX by four (thesize of a DWORD), and adds it to the TLS array pointer.


15/31

15

Figure 2 I nstructionDemo.CPP

//==========================================// Matt Pietrek// Microsoft Systems Journal, February 1998// Program: InstructionDemo.CPP// FILE: InstructionDemo.CPP//==========================================#define WIN32_LEAN_AND_MEAN#include #include #include

// Force these functions inline (/O2 would normally do this#pragma intrinsic( memset, strlen, strcmp )

__declspec(thread) int tlsVariable = 0; // Make a thread local variable

int g_myGlobalVariable; // Make a global variable

void MySubProcedure( void );

int main( int argc, char *argv[] )

{ char szBuffer[128];char *pszString = "Hello";unsigned long localUnsignedLong = 2;unsigned char localUnsignedChar = 2;long localSignedLong = 2;char localSignedChar = 2;int i;

g_myGlobalVariable = 0x12345678; // Assignment to global

localSignedLong = localSignedChar; // signed type promotion

// Conditional executionif ( localUnsignedLong == 2 )

localSignedLong = 1;else

localSignedLong = 2;

// Using TESTif ( localUnsignedLong & 0x00040008 )

i = 3;

// AND'ing off bitfieldslocalUnsignedLong &= 0x01020304;

// OR'ing on bitfieldslocalSignedLong |= 0x05060708;

// LOOP codefor ( i = 0; i < 4; i++ )

localUnsignedLong += i;

// Procedure invocationprintf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer );

// Using STOSD / STOSBmemset( szBuffer, 0, sizeof(szBuffer) );

// Using SCASBi = strlen( szBuffer );

MySubProcedure( );

return 0;}

void MySubProcedure( void ){

tlsVariable = 2;

// Use of try/except code__try{

g_myGlobalVariable = 2;}


16/31

16

__except( EXCEPTION_EXECUTE_HANDLER ){

g_myGlobalVariable = 4;}

}


17/31

17

Figure 3 I nstructionDemo Mixed Source andAssembly

int main( int argc, char *argv[] ){401000: PUSH EBP401001: MOV EBP,ESP401003: SUB ESP,00000098401009: PUSH EDI

char *pszString = "Hello";40100A: MOV DWORD PTR [EBP-0000008C],00406030

unsigned long localUnsignedLong = 2;401014: MOV DWORD PTR [EBP-00000088],00000002

unsigned char localUnsignedChar = 2;40101E: MOV BYTE PTR [EBP-00000094],02

long localSignedLong = 2;401025: MOV DWORD PTR [EBP-00000084],00000002

char localSignedChar = 2;40102F: MOV BYTE PTR [EBP-00000098],02

g_myGlobalVariable = 0x12345678; // Assignment to global401036: MOV DWORD PTR [004088E8],12345678

localSignedLong = localSignedChar; // signed type promotion401040: MOVSX EAX,BYTE PTR [EBP-00000098]401047: MOV DWORD PTR [EBP-00000084],EAX

// Conditional executionif ( localUnsignedLong == 2 )

40104D: CMP DWORD PTR [EBP-00000088],02401054: JNE 00401062

localSignedLong = 1;401056: MOV DWORD PTR [EBP-00000084],00000001

else401060: JMP 0040106C

localSignedLong = 2;401062: MOV DWORD PTR [EBP-00000084],00000002

// Using TESTif ( localUnsignedLong & 0x00040008 )

40106C: MOV ECX,DWORD PTR [EBP-00000088]401072: AND ECX,00040008401078: TEST ECX,ECX40107A: JE 00401086

i = 3;40107C: MOV DWORD PTR [EBP-00000090],00000003

// AND'ing off bitfieldslocalUnsignedLong &= 0x01020304;

401086: MOV EDX,DWORD PTR [EBP-00000088]

40108C: AND EDX,01020304401092: MOV DWORD PTR [EBP-00000088],EDX

// OR'ing on bitfieldslocalSignedLong |= 0x05060708;

401098: MOV EAX,DWORD PTR [EBP-00000084]40109E: OR EAX,050607084010A3: MOV DWORD PTR [EBP-00000084],EAX

// LOOP codefor ( i = 0; i < 4; i++ )

4010A9: MOV DWORD PTR [EBP-00000090],000000004010B3: JMP 004010C4

4010B5: MOV ECX,DWORD PTR [EBP-00000090]4010BB: ADD ECX,014010BE: MOV DWORD PTR [EBP-00000090],ECX4010C4: CMP DWORD PTR [EBP-00000090],044010CB: JNL 004010E1

localUnsignedLong += i;


18/31

18

4010CD: MOV EDX,DWORD PTR [EBP-00000088]4010D3: ADD EDX,DWORD PTR [EBP-00000090]4010D9: MOV DWORD PTR [EBP-00000088],EDX4010DF: JMP 004010B5

// Procedure invocationprintf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer );

4010E1: LEA EAX,[EBP-80]4010E4: PUSH EAX4010E5: LEA ECX,[EBP+08]4010E8: PUSH ECX4010E9: MOV EDX,DWORD PTR [EBP+08]4010EC: PUSH EDX4010ED: MOV EAX,DWORD PTR [EBP-00000088]4010F3: PUSH EAX4010F4: PUSH 004060384010F9: CALL 004011C04010FE: ADD ESP,14

// Using STOSD / STOSBmemset( szBuffer, 0, sizeof(szBuffer) );

401101: MOV ECX,00000020401106: XOR EAX,EAX401108: LEA EDI,[EBP-80]40110B: REP STOSD

// Using SCASBi = strlen( szBuffer );

40110D: LEA EDI,[EBP-80]401110: OR ECX,FF401113: XOR EAX,EAX401115: REPNE SCASB401117: NOT ECX401119: ADD ECX,FF40111C: MOV DWORD PTR [EBP-00000090],ECX

MySubProcedure( );401122: CALL 0040112E

return 0;401127: XOR EAX,EAX

}401129: POP EDI40112A: MOV ESP,EBP40112C: POP EBP40112D: RET

void MySubProcedure( void ){40112E: PUSH EBP40112F: MOV EBP,ESP401131: PUSH FF401133: PUSH 00405058401138: PUSH 004012F840113D: MOV EAX,FS:[00000000]401143: PUSH EAX401144: MOV DWORD PTR FS:[00000000],ESP40114B: SUB ESP,0840114E: PUSH EBX40114F: PUSH ESI401150: PUSH EDI401151: MOV DWORD PTR [EBP-18],ESP

tlsVariable = 2;401154: MOV EAX,[004088EC]401159: MOV ECX,DWORD PTR FS:[0000002C]401160: MOV EDX,DWORD PTR [ECX+EAX*4]401163: MOV DWORD PTR [EDX+00000004],00000002

__try{

40116D: MOV DWORD PTR [EBP-04],00000000

g_myGlobalVariable = 2;401174: MOV DWORD PTR [004088E8],0000000240117E: MOV DWORD PTR [EBP-04],FFFFFFFF401185: JMP 004011A1

__except( EXCEPTION_EXECUTE_HANDLER )401187: MOV EAX,0000000140118C: RET


19/31

19

40118D: MOV ESP,DWORD PTR [EBP-18]

g_myGlobalVariable = 4;401190: MOV DWORD PTR [004088E8],00000004

}40119A: MOV DWORD PTR [EBP-04],FFFFFFFF

}4011A1: MOV ECX,DWORD PTR [EBP-10]4011A4: MOV DWORD PTR FS:[00000000],ECX4011AB: POP EDI4011AC: POP ESI4011AD: POP EBX4011AE: MOV ESP,EBP4011B0: POP EBP4011B1: RET

Wrap-up

In the real world, you will no doubt encounter instructions beyond what I'vedescribed here. But now you should be familiar with most of the commonly usedregisters and how memory is addressed. You should be able to tell a localvariable apart from a parameter. You should also be able to distinguish thesetype classes from global and TLS variables.

Beyond the basic theory, I've also shown a reasonably large subset of theinstructions that Win32 compilers generate. It's unlikely that my introduction willenable you to start writing your code in MASM. Still, with this working knowledge,you can be more confident when your debugger takes you to dark, scary places inother people's code, especially when even the dim light of source code isn'tavailable.

Read more about assembly language in the June 1998 installment of Under theHood .


20/31

20

Common I nstructions

I nstructions INC value , DEC value Purpose Increments or decrements integer value by 1Example

INC ESI

INC [EBP-8]

DEC [EAX+4]

The INC and DEC instructions are used to increment and decrement values keptin memory or registers. As you might imagine, these instructions map preciselyto the ++ and - - operators in C++ for standard integer operations.

You could use the ADD or SUB instructions to achieve the same effect as INCand DEC, although it would be more expensive in terms of size. Since they are socommonly used, the smallest versions of the INC/DEC instructions take only asingle byte. Looking at the Intel opcode map, you'll see that there's an opcode foreach of the eight general-purpose registers that INC can be used against (EAX,EBX, ECX, EDX, ESI, EDI, ESP, and EBP). Another eight opcodes are used for theDEC instruction and the same set of registers.

I nstructions MUL value , value DIV value , value Purpose Multiplication and divisionExample

MUL EAX,EDX

MUL AL,BYTE PTR [EBP-14h]

DIV EAX,EBX

I didn't cover the ADD and SUB instructions in my February column since theiroperation is straightforward. However, the MUL and DIV instructions have somequirks that make them difficult to read and downright quirky to write. Throughoutthis column, when I mention (E)AX, I'm referring to AL, AX, or EAX. Likewise,when I mention (E)DX, I'm referring to DL, DX, or EDX.

Both MUL and DIV treat their operands as unsigned values. The operands can'tbe immediate values (such as 3); rather, they must be in registers or memory.You may have noticed that the destination value (the first argument) alwaysseems to be (E)AX. This is by design. The use of the (E)AX register is an implicitpart of the instruction. Beyond the implicit use of (E)AX, the (E)DX register is alsosilently involved. The high bits of the MUL instruction end up in (E)DX. Likewise,

for the DIV instruction, E(DX) holds the remainder and (E)AX holds the quotient.If you write any assembler code, MUL and DIV get even weirder. The assembler

(both MASM and the Visual C++ inline assembler) won't let you specify the(E)AX operand. Thus, if you want the instruction MUL EAX,ECX, you would writeMUL ECXjust another example of the intuitive language syntax that's madeassembly language wildly popular in recent years.

I nstructions IMUL value , value IDIV value , value Purpose Signed multiplication and divisionExample

IMUL WORD PTR [EBP+8]

IMUL EDX,ECX,8


21/31

21

IDIV EAX,DWORD PTR [EDX]

The IMUL and IDIV instructions treat the operands as signed values. Contrastthis to MUL and DIV, which work on unsigned values. IDIV uses (E)AX as theimplicit first operand, just as DIV does. Also, like its DIV counterpart, IDIV onlyworks with register or memory values. IMUL, on the other hand, doesn't fit the

general patterns of MUL, DIV, and IDIV. It can work with immediate values and itcan have a non-(E)AX register as the destination. There's even a form of theIMUL instruction that takes three operands. To my knowledge, this is the onlyinstruction in the Intel opcode set with this distinction.

I nstructions PUSHAD, POPADPurpose Saves or restores all general-purpose registers via the stack

PUSHAD and POPAD push or pop EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDIon the stack, in that order. These instructions are used in situations where manyregisters may be modified and the programmer wants to leave no evidence of theexecution in the code. Although interrupt handlers are pass for mostprogrammers, they're a perfect example of where PUSHAD and POPAD come in

handy. Besides taking fewer opcodes than eight individual PUSH instructions,they also execute faster (five clock cycles on a Pentium).

I nstructions PUSHFD, POPFDPurpose Push or pop the EFLAGS register

In some cases, it's inconvenient to use the flags set by a prior operationimmediately. Alternatively, you may want to make sure that some operationyou're about to execute won't change the current flag values. For thesesituations, PUSHFD and POPFD are the easiest methods to save and restore thosebits.

PUSHFD is one of the atomic components of an interrupt. When an interrupt oran exception occurs, the following code effectively executes:

PUSHFD, PUSH CS, PUSH EIP.

Following the three pushes, the EIP register changes to the interrupt handleraddress contained in the appropriate slot in the Interrupt Descriptor Table (IDT).Likewise, the IRETD effectively does a POPFD as part of returning from aninterrupt.

I nstructions SHL, SHR, SHLD, SHRDPurpose Shift bits to the left or rightExample

SHL EBX,3

SHR EBX,CL

SHLD EDX,ECX,4

SHRD ESI,EDI,CL

The SHL and SHR instructions are logically equivalent to the C++ >operators. Many of you probably recall that bitwise shifting is a quick way toperform multiplication and division by powers of 2. For example, the SHL EBX,3instruction has the same effect as multiplying EBX by 8 (23 == 8). Indeed, if youwrite C++ code that multiplies or divides an unsigned value by 2, 4, 8, 16, and soon, it will most likely compile to a SHL instruction.

When shifting left, the low-order bits are filled with zeroes. The final high-orderbit that's "shifted out" is moved to the carry flag (CF). In other words, the carry


22/31

22

flag is like a virtual 33rd bit. When shifting right, the high-order bits are filledwith zeroes, and the last bit shifted out moves to the carry flag.

I nstruction ADD [EAX],ALPurpose None

You may see a lot of this particular instruction, and you'll probably see itrepeated. However, ADD[EAX], AL has no special significance. The opcode bytesfor this instruction are 00 00. In other words, it's what you'll see if you're viewinga series of data bytes that all contain the value 0. Nothing to see here. You canall go home now.

I nstruction CLDPurpose Clears the direction flag

In my February 1998 column , I described the string instructions LODSx, SCASx,STOSx, and MOVSx. Each of these instructions uses the ESI or EDI register topoint at the memory to be read or written to. These instructions are typicallyused in conjunction with the REP, REPE, or REPNE prefixes, which cause thestring instruction to execute several times until some specific condition is met.

After each REPx-induced iteration, the CPU changes the ESI or EDI register topoint to an adjacent memory location. The direction in which the registers moveis given by the direction flag. If the direction flag is clear, ESI or EDI isincremented after each instruction (thus causing the next higher memory locationto be referenced in the next iteration). When the direction flag is set, ESI or EDIdecrements after each iteration.

Most of the time it's easiest to work moving forward in memory (toward higheraddresses) so that the direction flag is usually clear. However, it's generally notsafe to assume that the flag is clear. Thus, you'll often see the CLD instructionsomewhere before a string operation such as REP MOVSB.

I nstructions NOT value , NEG value

Purpose Negation of valuesExample

NOT DWORD PTR [EBP-8]

NEG EDX

The NOT instruction does ones-complement negation. That is, it applies the NOToperation to each bit in the operand. An initial value of 0 will become 0xFFFFFFFFafter a NOT instruction. The C++ ~ operator is typically implemented via the NOTinstruction.

The NEG instruction does twos-complement negation. (If you're not 100 percentup on ones versus twos-complement negation, don't feel bad. I learned this stuff

10 years ago in college, and I've completely forgotten it!) An easier way to thinkof the NEG instruction is that it puts a - sign in front of the value. Thus, usingNEG on -3 yields 3, while NEG applied to 4 yields -4. To summarize, you canthink of NOT as affecting individual bits, while NEG operates on the entire value.

I nstruction NOPPurpose No operation

The NOP instruction does nothing and affects nothing. It's a single-byte opcodethat executes in one clock cycle and is primarily used to pad code. For example, acompiler might want the beginning of a procedure to start on a 16- byteboundary. The compiler/linker would insert enough NOP instructions between theend of one procedure and the beginning of the next procedure to create the

desired alignment.If you're confident in your assembler abilities, the NOP instruction can beapplied to code in memory or in the executable file. You might know that some


23/31

23

instruction you're about to execute will cause a fault in a debugger. If you want toskip that instruction, use the debugger to write enough NOP opcodes (0x90) toeliminate the instruction. This is useful to squash hardcoded INT 3 breakpointinstructions while you're running under the debugger, effectively not stopping atthe breakpoint. Really advanced users can implement NOP instructions toobliterate entire regions of code in an executable. (Warning! Harder than itlooks.)

Another advanced use of the NOP instruction is when you want to make it easyto patch or hook into your code. At the beginning of a procedure or block of code,put in enough NOP instructions for the desired goal. Subsequent patching orhooking code can write JMPs, CALLs, or whatever into the NOP area.

I nstruction INT 3Purpose Debugger interrupt

INT 3 has two usesone intended by the original CPU designers, the otheraccidental. The INT 3 instruction is the standard method to suspend a programand transfer control to a debugger. In normal use, programs don't include INT 3instructions in their code. Rather, when you set a traditional breakpoint with a

debugger, it temporarily overwrites the target instruction with an INT 3instruction. (The LODPRF32 program from my July 1995 column illustrates this.)Note that an INT 3 instruction is the heart of the DebugBreak API for Intel CPUs.

The other offbeat use of the INT 3 instruction is as a paranoid NOP. In thosecases where a NOP would be used for padding (and theoretically never executed),an INT 3 can be used instead. Like NOP, an INT 3 instruction is only a single byte.The key difference is that if a bug crept in and you executed the INT 3instruction, you'd pop into the debugger. In the same scenario, the CPU wouldblithely sail through NOP instructions and wreak havoc someplace farther awayfrom the original error.

The Microsoft linker uses INT 3s as paranoid NOPs when creating padding forincremental linking. The linker also uses them as padding between procedures it

wants to align on a particular memory boundary. Usually this alignment is on amultiple of 16 bytes unless you have the "optimize for size" compiler option set.Figure 1 shows a section of code from CALC.EXE that illustrates INT 3 padding inaction.

Figure 1 I NT 3 Padding 1285EC4: INT 31285EC5: INT 31285EC6: INT 31285EC7: INT 31285EC8: INT 31285EC9: INT 31285ECA: INT 31285ECB: INT 31285ECC: INT 31285ECD: INT 31285ECE: INT 31285ECF: INT 31285ED0: CMP DWORD PTR [0128F4E8],011285ED7: JNE 01285EDE

1285ED9: CALL 012875B01285EDE: MOV EAX,DWORD PTR [ESP+04]1285EE2: PUSH EAX1285EE3: CALL 012875F01285EE8: ADD ESP,041285EEB: PUSH 000000FF1285EF0: CALL DWORD PTR [0128F4E4]1285EF6: ADD ESP,041285EF9: RET

1285EFA: INT 31285EFB: INT 31285EFC: INT 31285EFD: INT 3


24/31

24

1285EFE: INT 31285EFF: INT 31285F00: MOV EAX,DWORD PTR [ESP+04]1285F04: MOV [0128F4F0],EAX1285F09: RET

I nstruction LOCKPurpose This instruction locks the memory bus during the next instructionExample

LOCK INC DWORD PTR [EDX+04]

Technically speaking, LOCK is an instruction prefix rather than an instruction inits own right. In a multiprocessor environment, multiple processors could accessthe same memory location at the same time. The LOCK prefix insures that theinstruction associated with it will have exclusive access to the destination memorylocation.

If you've ever examined the EnterCriticalSection API, you'll see that if thecritical section isn't currently held, the code essentially just increments a counter.A LOCK prefix is used with an INC instruction to guarantee that one thread won'tincrement the counter while another thread on another CPU is reading it. You'llalso see the LOCK instruction used with multiprocessor synchronization APIs suchas InterlockedExchange and InterlockedIncrement.

A final thought on the LOCK prefix: you may recall a bug on older Pentium CPUswhere a particular instruction sequence could cause the CPU to freeze up. (Seethe February 1998 Editor's Note if you need a refresher.) That instructionsequence isn't a valid sequence, and the LOCK prefix plays a vital role in theensuing CPU meltdown.

Common I nstruction Sequences

Sequence CMP register_X, immediate_value_AJE XXXXXXXXCMP register_X, immediate_value_BJE XXXXXXXX

Purpose C++ switch statementExample

CMP EAX,1

JE 00400248

CMP EAX,3

JE 0040026E

CMP EAX,7

JE 004002A0

This sequence (compare and JMP if equal) is the most straightforwardencoding of a C++ switch statement that I've seen. It's also very easy to pickout when you encounter it in a debugger. In the example code, the switchstatement would look something like this:


25/31

25

switch ( value )

{

case 1: // code for case 1



}

The trick to understanding this code sequence is realizing that compiler-generated code for switch statements usually differs from your mental model.The code for all the case comparisons is usually generated in one place.Following the value comparison code are discrete blobs of code that implementthe code specified for a particular case. The value comparison code isoptimized to quickly figure out just which case blob to jump to.

By no means is this sequence the only encoding for switch statements. More

efficient encodings may involve JMP tables or subtractive countdowns using thezero flag. However, these encodings definitely don't fit into my criteria of "justenough to get by."

Sequence opcode [register+offset]Purpose Structure member accessExample

PUSH [EAX+157C]

MOV EAX,[ESI+34]

ADD [EAX+44],ESI

Here's a common scenario: you have a pointer to a structure or classinstance with which you read, write, or otherwise manipulate some field. Inthis situation, the compiler typically puts the pointer value into a register. Theoffset of the specified field within the structure is then added to the register.For instance, consider this structure:

struct Foo {

int i;

short j;

char k;

}

If you had a pointer to an instance of this structure and wanted to add 2 toeach structure member, the code would look something like this (assuming ESIpoints to the structure instance):

ADD DWORD PTR [ESI],2 ;; Foo.i


26/31

26

ADD WORD PTR [ESI+4],2 ;; Foo.j

ADD BYTE PTR [ESI+6],2 ;; Foo.k

Note that for the first structure field (i ), the field offset is 0, so no addition isneeded. The i field is 4 bytes long, placing the next field (j) at offset 4. The jfield is a short, so it's only two bytes long. The final field (k) is at offset 6,which I arrived at by adding 4 and 2.

Compilers must place structure fields into memory locations in exactly thesame sequence as the structure is declared. Thus, you can usually look at anystructure or class definition and figure out the offsets of various fields. Beaware that compilers often place padding between structure fields so that eachfield starts at some natural boundary (typically 4 or 8 bytes). Using #pragmapack lets you specify the exact padding (or lack thereof) in your structuredefinitions.

Sequence MOV value,EAX, many times in a rowPurpose Serial initialization of several variables to the same valueExample

MOV EAX,0

MOV [EBP-4],EAX

MOV [EBP-10],EAX

MOV [EBP-18],EAX

When a collection of variables is assigned the same value, the compiler mayload the value into a register and copy the register into each of the variables. Forexample, at the beginning of a function you might initialize several int variables

to the value 0. The example code sequence shows one way this might beencoded.

Sequence CMP register_X,01SBB register_X, register_XNEG register_X

Purpose Converts 0 input value to 1, all other values to 0Example

CMP EAX,01

SBB EAX,EAX

NEG EAX

In many cases, generated code needs to inspect a value to determine if it's 0. Ifso, the result of the inspection should be nonzero (typically 1). If the input valueis any value other than 0, the result should be 0. Using 0 to mean BooleanFALSE, and everything else being TRUE, this instruction sequence does a logicalNOT of the input value.

The code comprising this instruction sequence certainly isn't intuitive. Itsdistinctive characteristic is the use of the SBB instruction (integer subtraction withborrow). SBB is rarely used outside of this sequence.

The first instruction (CMP) sets or clears the carry flag as appropriate. SBB thenuses the carry flag as part of its subtraction. Since the two arguments to SBB inthis sequence are always the same, the carry flag alone determines the outcome(which is always 0 or -1). The NEG instruction finishes up by changing a -1 to a 1and leaving 0 values alone.


27/31

27

Oops! How did I Get Here?

Let's examine some of the common clues you can look for when something faultsand you're rudely popped into the debugger. Think of this as a first aid quickreference. You won't find instructions on surgery here, but the common cuts and

scrapes can be dealt with.Picture this scenario: everything is working fine until suddenly your programstops in the debugger because of a fault, and none of the code looks familiar.Never fear. The faulting address usually yields some sort of information thatsteers you toward a resolution.

One of the more common and easy to find bugs is calling through a NULLfunction pointer. The signature characteristic of this bug is that the instructionpointer (EIP) is 0 or very close to 0.

Under Windows NT , the first 64KB of the address space is off limits, so thefault occurs exactly at address 0. In Windows 95, it's slightly more tricky.Memory at address 0 is accessible, but it's certainly not code. In this case, thefaulting address may or may not be 0. However, the faulting address will almostcertainly be just a little bit higher (for example, 0x00000003). When thishappens, the CPU miraculously manages to execute one or two "instructions"before it hits something that triggered a fault.

Regardless of where you faulted, the vital information you need to know is:where were you executing before the NULL function pointer was called? In thesesituations the stack window may not be helpful, since the calling routine almostcertainly won't appear in the stack window. This is a by-product of the way callstacks are walked. (See my May 1997 column for details on stack walking.)

Luckily, when a NULL pointer call happens, there is a way to see where youcame from. A CALL instruction pushes a return address on the stack. If you canfind this return address, you can change the code window to display at thatlocation. To find the return address, use the data window to display memorystarting at the ESP value. Make sure that the memory is being displayed in theDWORD format. The first DWORD at ESP is most likely the return address.Remember, the return address you obtain will be for the instruction after the badCALL instruction. You'll need to back up in the code window to see the code thatled up to the CALL.

In Figure 2 , I've shown a NULL function pointer fault in the Visual Studio debugger. In the register window, the ESP value is 0x12FF7C. This is the samevalue that I've changed the data pane to display in DWORD format. The leftcolumn is the memory address. The second DWORD at the top (0x00401009) isthe return address.


28/31

28

Figure 2 A NULL Pointer Fault

Incidentally, if the DWORD at ESP doesn't turn out to be a valid return address,it's certainly worth your while to look further up on the stack for values that looklike they could be return addresses. If something looks like a valid address,

change the code window to display at that address and see if you can make senseof it. If your ESP register is bogus, try looking for return addresses at positiveoffsets from the EBP register. Remember, this isn't an exact science. You'resifting through the rubble, looking for something that will give you a clue as towhere you'll start doing more in-depth investigation.

Moving away from NULL function pointers, let's say you've faulted in some codethat you don't recognize, but the faulting address is nowhere near 0. What'sworse, the code looks like garbage. In other words, it doesn't look like the normalinstructions you'd see. Instead, you see instructions such as ARPL, AAA, andOUTSB. There are two likely ways your code got there. First, you may have calledthrough a corrupted function pointer. Second, you may have corrupted the returnaddress on the stack. When the RET instruction executed, control transferred tothe bogus address.

In either situation, the underlying problem is valid code addresses that wereoverwritten with garbage. In this case, your chance of getting a valid returnaddress is lessened. However, you may be able to get an idea of what happenedby looking at the faulting address. Try interpreting the fault address as a streamof datayou may find a pattern.


29/31

29

Figure 3 HoseStack.cpp

#include #include

int main(){

char szBuffer[4];

strcpy( szBuffer, "Hello World!\n" );printf( szBuffer );

return 0;}

Figure 3 shows the code for a small Hello World program with a big bug. TheszBuffer array is only four characters wide, while the strcpy function copies thewhole 13 bytes of "Hello World!" This buffer overrun actually overwrites thestack frame where function main's return address is stored. When I run theprogram, it correctly prints out "Hello World!," but then faults at address

0x21646C72.The faulting address yields a clue if you think of the address as a pattern ofbytes. In memory, 0x21646C72 is stored as four sequential bytes: 0x72,0x6C, 0x64, and 0x21. Note that each of these values is above 0x20, andbelow 0x80. That happens to be the range of printable ASCII characters.Looking up the four bytes in an ASCII table, you get

0x72 = 'r'

0x6C = 'l'

0x64 = 'd'

0x21 = '!'

As you can see, those four bytes form the end of the string "Hello World!"You could then search your code for places where rld! appears. While not aperfect answer, you'll have substantially narrowed down the places to begin aninitial search for the problem. Admittedly, this is a contrived example andthere are tools available that find these types of memory overwrites.Nonetheless, I've found many obnoxiously difficult bugs only because I noticeda familiar pattern in the corrupted data.

Other Common Causes of Faults

Figure 4 String I nstructions and Registers

MOVSB, MOVSW, MOVSD Writes to ESI, reads from EDI

SCASB, SCASW, SCASD Reads from EDI

STOSB, STOSW, STOSD Writes to EDI

LODSB, LODSW, LODSD Reads from ESI


30/31

30

Common sources of faults are the string instructions shown in Figure 4 . Usuallystring instructions were either given bad data to start with or they operated pasttheir intended range of memory. Remember, these string instructions implicitlyuse ESI, EDI, or both registers. They're almost always used with a REP, REPE, orREPNE prefix, which causes the instruction to execute multiple times with theregisters incrementing or decrementing after each iteration.

Tracking down the core cause of a fault from one of these string instructions isalmost always trivial. Figure 4 shows which registers the instructions use.Regardless of the particular instruction in the group, the registers are pointervalues. It's immediately noticeable if a NULL pointer is the culprit. For example, ifthe faulting instruction is REP STOSB and you see that EDI is 0, you know thatthe CPU was trying to write using a NULL pointer.

If the registers in question aren't 0, check if their value is a multiple of 4KBthesize of a page on Intel CPUs. It's entirely possible that the instruction hasexecuted successfully a number of times until the ESI or EDI register pointed to apage of memory that's not accessible. An easy way to know if you're on a pageboundary is to look at the bottom three digits of the hex address. If they are 000,you're on a page boundary.

You can double-check this invalid memory diagnosis by trying to displaymemory at the value of ESI or EDI. If the debugger can't see it, your code can'teither. I'm assuming you're using an application debugger such as Visual Studio.If you're using a system-level debugger, this may not be true since the memorymay only be visible from kernel mode. On the other hand, if you're using asystem-level debugger, you probably already know how to track down this kind ofproblem.

If you use recursive functions (or just lots of stack space), stack faults mightplague you. Unfortunately, the operating system and debugger don't go out oftheir way to clarify that it's a stack overflow problem. For example, Figure 5 shows a very simple program that recurses until it runs out of stack space andfaults. Figure 6 shows the none too helpful fault dialog that results.

Figure 5 RecursionOverflow.cpp

int foo( int i ){

return foo( i );}

int main(){

return foo( 2 );}

Figure 6 An Unhelpful Fault Dialog

If you select Cancel to debug, the Visual Studio debugger briefly tells you that astack overflow occurred, but not at the same time as it shows you the faulting

instruction. However, there are clues you can infer from the debugger that wouldindicate a stack overflow. For starters, the ESP register value is probably on a4KB boundary. Likewise, the faulting instruction is probably a PUSH. There are


31/31

other ways to cause a stack fault, but most of the time it will look something likewhat I've described.

While I'm on the subject of the stack, my final tidbit this month is on problemscaused by PUSHing or POPing too much data to or from the stack. When wholeprograms were written in assembly language, programmers spent a lot of timematching up every PUSH instruction with an equivalent POP or ADD ESP,XXinstruction. However, since compilers are so widespread, this tedious process isn'tnormally necessary.

Believe it or not, it's still sometimes necessary to verify that what's pushed onthe stack eventually gets removed. For example, if the code for calling a __stdcallfunction places two DWORD values on the stack, the called function should endwith a RET 8 instruction. Likewise, if you see a __cdecl function being called withthree DWORD parameters, there should be an ADD ESP,0Ch instruction followingthe call. More importantly, the called function should return with a simple RETinstruction. If you're not familiar with __cdecl versus __stdcall functions, see myFebruary 1998 column.

These kinds of stack parameter mismatch problems can be minimized byfollowing a few simple rules. First, make sure that there's only one prototype for

any given function. Put that prototype in a .H file, never in a .C or .CPP file.Finally, make sure that the source file that actually defines the function includesthe .H file. If you follow all these steps, you'll get a compiler or linker error ratherthan a bogus program.

I've seen programmers cheat by including prototypes for just one or twofunctions in their code modules. (You know who you are!) These functions have aprototype in a .H file, but the programmer doesn't want to incur the overhead ofbringing in a whole .H file for just a few items. Inevitably something changes andthe programmer ends up counting PUSHs, POPs, and ADD ESPs because the codecrashes.

Wrap-up I use "DUMPBIN /DISASM filename.obj" to look at the code generated by theC++ compiler. However, Paul DiLascia (my fellow MSJ columnist) mentioned thatVisual C++ has a compiler switch, /Fas, that produces an .ASM file from the inputC++ code. The .ASM file that is generated contains all the necessary blood andguts that go along with hardcore assembler programming. Although you maynever need to program in assembler, it's always enlightening to see what yourtools are doing under the hood.

Have a question about programming in Windows? Send it to Matt [email protected].

005 pietrek matt just enough assembly to survive kit hood sve

Documents