improvement of compiled instruction set simulator by ... · simulator by increasing flexibility and...

28
1 Moo-Kyoung Chung © Improvement of Compiled Instruction Set Simulator by Increasing Flexibility and Reducing Compile Time Moo-Kyoung Chung, Chung-Min Kyung, Department of EECS, KAIST

Upload: lenhi

Post on 04-Jun-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

1 Moo-Kyoung Chung©

Improvement of Compiled Instruction Set Simulator by Increasing Flexibility and

Reducing Compile Time

Moo-Kyoung Chung,Chung-Min Kyung,

Department of EECS, KAIST

Page 2: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

2 Moo-Kyoung Chung©

Outline

Previous worksIntroduction to Instruction Set Simulator (ISS)

• Native code execution• Interpretive ISS• Compiled ISS

Improvement of compiled ISSNew approachesReducing compile timeIncreasing flexibility

Experimental resultConclusion

Page 3: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

3 Moo-Kyoung Chung©

Instruction Set Simulator (ISS)

Used for processor design and software designInstruction set simulation/ architecture explorationEarly system verificationPre-silicon software development

Essential for hardware/software co-simulationFor embedded system, SoC designConnected with HDL simulator or emulator

ISSS/W

H/W

HostHost

HDL simulator HDL simulator or or SystemCSystemC

ISSS/W

H/W

HostHostEmulatorEmulator

or H/W Prototypeor H/W Prototype

Page 4: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

4 Moo-Kyoung Chung©

Instruction Set Simulator

Native code executionInterpretive ISSCompiled ISS

Static compiled ISSDynamic compiled ISS

Page 5: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

5 Moo-Kyoung Chung©

Native Code Execution

Target application code is compiled for host machine, executed on the host machineFastestInaccurate

Only for functionality verificationOnly support high-level language

Cannot support hardware dependent instructionCannot support assembly languageCannot support library or OS which does not available source code.

Difficult to measure performanceTarget processor instructions may different from the host processor instructions

Difficult to handle I/O accessTrap-based method

Page 6: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

6 Moo-Kyoung Chung©

Interpretive ISS

Simulation loop ( fetch, decode, execute )

Flexible & AccurateWorking in the similar way to the processor behavior

Easy to implement, easy to estimate performanceAlmost commercial available simulatorsSlow

Several millions of simulated instruction per second (MIPS)

for( ; ; ){inst = fetch( pc );opcode = decode( inst );switch( opcode ){

…case ADD:

…break;

}}

Page 7: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

7 Moo-Kyoung Chung©

Static Compiled ISS

TargetCompiler

TargetApplication

Code

TargetExecutable

Binary

BinaryTranslation

HostExecutable

(ISS)

C CodeGeneration

SimulationC code

HostC Compiler

HostExecutable

(ISS)

(A) Using Binary Translation(A) Using Binary Translation

(B) Using C intermediate code(B) Using C intermediate code

No comaptibility for host machine

Page 8: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

8 Moo-Kyoung Chung©

Static Compiled ISS

AdvantageFast

• Faster than the corresponding interpretive simulator• Move instruction fetch and decode step into compile process• Host C compiler optimizes the simulation C code.

– Powerful optimization effect of host C compiler– Unnecessary activities of the processor hardware are not

simulated.– e.g.) Carry flag does not always need to be updated for all the

data processing instructions, if the next instructions do not use and overwrite it.

AccurateEasy to estimate performance

Page 9: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

9 Moo-Kyoung Chung©

Static Compiled ISS

DisadvantageCannot support dynamic program code

• All the target instructions should be compiled in the static time.– Self-modifying code– External code (loading)– Dynamic linking library– Multiple instruction set (ARM:Thumb)

Enormous compile time overhead for the software designerIndirect branch instruction

• C compiler is hard to optimize the code• Performance (simulation speed) drop

Large application • Enormous memory usage• Generated binary is much larger then original binary.

Low locality of binary • Basic block is larger in that scale

Page 10: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

10 Moo-Kyoung Chung©

Dynamic Compiled ISS

Dynamic compilationMoving compilation step into simulation run-time. Using binary translation

• Cannot use intermediate C code• Problems on the run-time C chuck code compilation

Using translation cache

Relatively slowRun-time compilation (binary translation) overhead

Flexible and Relatively accurate

TargetExecutable

InstructionFetch

BinaryTranslation

TranslationCache

Execution

Cache hit ?

Yes

No

Page 11: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

11 Moo-Kyoung Chung©

ISS

AccuracyStatic compile ISS = Interpretive ISS > Dynamic Compiled ISS > Native code execution

Simulation SpeedNative code execution > Static compiled ISS > Dynamic compiled ISS > Interpreted ISS

SimplicityNative code execution > Interpretive ISS > Static compiled ISS > Dynamic Compiled ISS

Compilation SpeedNative code execution = Interpreted ISS = Dynamic compiled ISS > Static compiled ISS

FlexibilityInterpretive ISS = Dynamic Compiled ISS > Native code execution > Static compile ISS

Page 12: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

12 Moo-Kyoung Chung©

Objective

How to reduce compile time (startup cost) of the static compiled ISS?How to increase flexibility of the static compiled ISS?

Page 13: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

13 Moo-Kyoung Chung©

Improvement of Compiled-ISS

New approachUsing the object files (relocatable format, ELF) as input files instead of binary executable fileMaking the generated simulation program have the same data and control flow as the target program has.Making the static compiled ISS have built-in interpreter.

AdvantagesReducing the compile time recompiled timeIncreasing flexibility

• Supporting indirect branch efficiently• Supporting dynamic code

Fast speed• Taking all the advantages of the static compiled-ISS

Page 14: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

14 Moo-Kyoung Chung©

ISS Generation Flow

Source 1

Source 2

Source 3

CrossCompile

C CodeGenerationObject 1

Object 2

Object 3

C 1

C 2

C 3

HostCompile Simulator

Target Source Target Source FilesFiles

Compile, Compile, Excluding LinkExcluding Link

RelocatableRelocatable filesfilesLibrary filesLibrary files

Simulation Code Simulation Code GenerationGeneration Target Simulation Target Simulation

C CodeC CodeHost ExecutableHost ExecutableISSISS

Relocatable file. After COMPILE, before LINK of C compilation

Having same structure

Page 15: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

15 Moo-Kyoung Chung©

C Code Generation

Object 1

Code AnalysisCode Analysis

Code Analyzer Decoder

CFG, DFG Decoded Data

C Code Generation

Simulation C Code 1

ELF Loader

Simulation Code Simulation Code GenerationGeneration

Symbol Text

Decoded Info.

ELF (Executable and Linkable Format) is the most widely used file format for object, executable and library file.

Identical structure to the target source file

Extracting the CFG, DFG using Symbol table and decoded information

Page 16: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

16 Moo-Kyoung Chung©

Generating Constructed C Codeint result;void cfunction( int number ){

if( number >= 5 )result = number - 5;

}(A) Target C Source(A) Target C Source

...14:[e51b3010] ldr r3,[r11,#0x10] 18:[e3530004] cmps r3,#0x4 1c:[da000003] ble #0xc 20:[e59f300c] ldr r3,#0xc 24:[e51b2010] ldr r2,[r11,#0x10] 28:[e2422005] sub r2,r2,#0x5 2c:[e5832000] str r2,[r3,#0x0] ... (B) Object File (Disassemble)(B) Object File (Disassemble)

(C) Simulation C Code(C) Simulation C Code

1 int T_result;2 void T_cfunction()3 {4 ...5 LDType=W;Rd=3;Rn=11;LDDir=PRE_DOWN;Imm=0x10;LDWBack=0; LDR_L_I();6 Rn=3;Imm=0x4;SType=SHT_LSL;SAmt=0x0; CMP_I();7 WR_COND();8 Imm=0xc;Cond=0x000d; B();9 if( conpass ) goto T___newsym_30;

10 LDType=W;Rd=3;Rn=15;LDDir=PRE_UP;Imm=0xc;LDWBack=0; LDR_L_I();11 R[3] = &T_result;12 LDType=W;Rd=2;Rn=11;LDDir=PRE_DOWN;Imm=0x10;LDWBack=0; LDR_L_I();13 Rd=2;Rn=2;Imm=0x5;SType=SHT_LSL;SAmt=0x0; SUB_I();14 LDType=W;Rd=2;Rn=3;LDDir=PRE_UP;Imm=0x0;LDWBack=0; STR_L_I();15 *(R[3]+0) = R[2];16 T_newsym_30:17 ...18 }

Page 17: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

17 Moo-Kyoung Chung©

Reducing Compile Time

Previous static compiled ISSThe simulation C code has a large function that contains all of the generated simulation code from the target binary.

• Increasing the function size, the C code compilation time is more increased because of the host compiler optimization.

Even a slight change of the source code causes the time-consuming compilation process.

How to reduce compile timeThe simulation C code is composed of many of small functions.

• Generated C file has the same structure with target C code– The same CFG/DFG

• It speeds up compiler optimization and reduce compile timeSelective Compilation

• Compiling only the files that are changed – Using “make” utility

• It speeds up regeneration of ISS

Page 18: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

18 Moo-Kyoung Chung©

Gen.oGen.o

CompileCompile

Reducing Compile Time

A.cA.c B.cB.c C.cC.c D.cD.c A.cA.c B.cB.c C.cC.c D.cD.c

a.oa.o B.oB.o C.oC.o D.oD.o

CompileCompile

A.oA.o B.oB.o C.oC.o D.oD.o

CompileCompile

Appl.exeAppl.exe

LinkLink

SimulatorSimulator

LinkLink

Gen.cGen.c

Code GenCode Gen

A_G.cA_G.c B_G.cB_G.c C_G.cC_G.c D_G.cD_G.c

Code GenCode Gen

SimulatorSimulator

linklink

A_G.oA_G.o B_G.oB_G.o C_G.oC_G.o D_G.oD_G.o

CompileCompile

Previous Simulator Previous Simulator Generation ProcessGeneration Process

New Simulator New Simulator Generation ProcessGeneration Process

Target ProgramTarget Program

Generated Generated C codeC code

Time-consuming because it handles the large single function.

It is fast because it handles many of small C filesRe-compilation

should have all the time-consuming compilation steps

Only the modified files goes through these steps

Page 19: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

19 Moo-Kyoung Chung©

Supporting Indirect BranchSupporting Indirect Branch

Could not determine the branch target address at compile timePrevious Static Compiled-ISS

• To support the indirect branch, it is necessary to insert labelsinto every start line of the instruction simulation code in the simulation C file.

• Those labels take basic block apart.• It makes interference with the compiler optimization.

Runtime Branch Target Search• There should be a symbol (label) at a possible branch target

address in target C code according to the normal usage of C language

• Since the simulation C code has the same CFG with the target code, It also has the corresponding symbols.

• I made so-called “Dynamic Branch Handler” which finds the destination symbol (label) and jumps to the address

• We can handle the indirect branch without adding labels.

Page 20: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

20 Moo-Kyoung Chung©

Function_A(){...}

Function_B(){...}

Function_C(){Inst.1 simulation code

Inst.2 simulation code

Inst.3 simulation code

Inst.4 simulation code

//Bx R1target = DynamicBH( R1 )*(target)()

...

}

Supporting Indirect BranchTarget_code(){

Label_250: ...

Label_700: ...

Label_1000:Inst.1 simulation code

label_1001: Inst.2 simulation code

label_1002: Inst.3 simulation code

label_1003: Inst.4 simulation code

label_1004: // Bx R1goto addr2(R1)

...

}

Generated C Code of Generated C Code of Previous CompiledPrevious Compiled--ISSISS

Generated C Code ofGenerated C Code ofNew ApproachNew Approach

Unnecessary labels make interference with the optimization and make simulation slow.

No redundant labels. Taking better optimization effect of host C compiler

ISS does not know which address will be the destination at compile time.

ISS knows the possible branch destination address where should be a lable

Page 21: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

21 Moo-Kyoung Chung©

Supporting Dynamic Code

Static Compiled ISSCannot support run-time change of the execution code.

• Self-modifying code• External memory code• Downloaded code

Dynamic Code HandlerBuilt-in Interpreter

• Handles the dynamic code.• Fetch, Decode, Dispatch Cache

Target processor resources are shared between the two ISS’sIt is necessary to check the modification of binary to be executed.The code executed by the interpretive block runs without speed improvement of the compiled ISS.

Page 22: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

22 Moo-Kyoung Chung©

Supporting Dynamic Code

DecodeDecode

YesYes

Compiled ISSCompiled ISS

Dynamic Code Dynamic Code HandlerHandler

Target ProcessorTarget ProcessorResourceResource

DispatchDispatchCacheCache

Dispatch CacheDispatch CacheManagerManager

YesYes

NoNo

Store to TEXTStore to TEXT

Next PCNext PC

NoNo

BuiltBuilt--in Interpreterin Interpreter

Simulation FlowSimulation FlowData AccessData Access

AddrAddr/Data/Data

External CodeExternal Code

YesYes

NoNoModified?Modified?

ModifiedModifiedCode ?Code ?

SelfSelf--ModifyingModifyingCode TableCode Table

ExecuteExecute

TEXT TEXT Range ?Range ?

Execute Instruction Simulation CodeExecute Instruction Simulation Code

The next instruction was not compiled at static time

Self-modifying Code

Compiled ISS shares the target processor resource data

Could not get speedup of compiled ISSOnly for the dynamic code.

Page 23: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

23 Moo-Kyoung Chung©

Experimental Result

Performance of Compiled ISSPlatform

• CPU : Intel Xeon CPU 2GHz, 512K Cache• OS : Linux Redhat 7.2

Target Processor• ARM 7

Target Application• IDCT• Matrix multiply• FIR• JPEG Decoder• MP3 Decoder

Page 24: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

24 Moo-Kyoung Chung©

Simulation Speed

FIR

IDCT

Matrix Multiply

Benchmarks(Target

Program)

x44x150x145x1

65.8225.5217.81.511,812 M

X38x176x169x1

31.21431370.811,140 M

Interpretive ISS

33.9198.3179.10.97

X35x205x185x11,601 M

OBSIM(sec.)Commercial

ISS (sec.)GNU(GDB) ISS (sec.)

Native Execution

(sec.)

Executed Instruction

Count

Page 25: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

25 Moo-Kyoung Chung©

Compile Time

MP3 Decoder

JPEG Decoder

Benchmarks(Target

Program)

76.9

50.7

Recompile Time (sec.)

78.6

52.3

Total Compile

Time (sec.)

Existing Method

9.065.218 C Files199,220 Lines

OBSIM

7.243.912 C Files137,875 Lines

Recompile Time (sec.)

Total Compile

Time (sec.)

Source

Page 26: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

26 Moo-Kyoung Chung©

Summary

New approachKeeping speed of static compiled ISSReducing the compile timeIncreasing the flexibility

• Supporting indirect branch without speed losses• Supporting dynamic code

Practical useCo-simulation for embedded system exploration

• Fast simulation speed• Fast compilation/recompilation speed• Easy to estimate performance• Powerful semi-hosting features

Page 27: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

27 Moo-Kyoung Chung©

Reference

[1] Reshadi M., Mishra P., Dutt N., “Instruction set compiled simulation: a technique for fast and flexible instruction set simulation”, 38th DAC, Proceedings of, 2003

[2] Jianwen Zhu, Gajski D.D., “An ultra-fast instruction set simulator”, VLSI Systems, IEEE Transactions on, June 2002, Volume: 10 , Issue: 3

[3] Reshadi M., Dutt N., “Reducing compilation time overhead in compiled simulators”, 21st ICCD, Proceedings of, 2003

[4] Amicel R., Bodin F., “Mastering startup costs in assembler-based compiled instruction-set simulation”, sixth Annual Workshop on Interaction between Compilers and Computer Architectures, Proceedings of, 2002

[5] Nohl A., Braun G., Schliebusch O., Leupers R., Meyr H., Hoffmann A., “A universal technique for fast and flexible instruction-set architecture simulation”, 39th DAC, Proceedings of, 2002

[6] Zivojnvic V., Tjiang S., Meyr H., “Compiled simulation of programmable DSP architectures”, IEEE Workshop VLSI Signal Processing, Proceedings of, 1995.

[7] Emmett Witchel, Mendel Rosenblum, “Embra: fast and flexible machine simulation”, ACM SIGMETRICS, Proceedings of, May 1996, Volume 24 Issue 1

Page 28: Improvement of Compiled Instruction Set Simulator by ... · Simulator by Increasing Flexibility and Reducing Compile Time ... zFor embedded system, ... Cross Compile C Code

28 Moo-Kyoung Chung©

Reference

[8] R. F. Cmelik, D. Keppel Shade, “A fast instruction-set simulator for execution profiling”, ACM SIGMETRICS, Proceedings of, 1994

[9] ARM9 User Manual manual. Available at http://www.arm.com[10] Zivojnovic V., Meyr H., “Compiled HW/SW co-simulation”, 33rd DAC,

Proceedings of, 1996 [11] Hoffmann A., Kogel T., Nohl A., Braun G., Schliebusch O., Wahlen O.,

Wieferink A., Meyr H., “A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language”, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Nov. 2001, Volume: 20 , Issue: 11

[12] Eric C. Schnarr, Mark D. Hill, James R. Larus, “Facile: a language and compiler for high-performance processor simulators”, Programming language design and implementation, Proceedings of, 2001

[13] Jong-Yeol Lee, In-Cheol Park, “Timed compiled-code simulation of embedded software for performance analysis of SOC design”, 39th DAC, Proceedings of, 2002

[14] Bammi J.R., Harcourt E., Kruitzer W., Lavagno L., Lazarescu M.T., “Software performance estimation strategies in a system-level design tool”, Eighth CODES, Proceedings of, 2000

[15] Nagendra G.D., Kumar V.G.P., Sheshadri B.S., “Simulation Bridge: a framework for multi-processor simulation”, Tenth CODES, Proceedings of, 2002