school of electrical engineering and computer science...

35
H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea

Upload: others

Post on 04-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science

Seoul National University, Korea

Page 2: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

2 Virtual Machine & Optimization Lab

  Android apps are programmed using Java

  Android uses DVM instead of JVM for running Java

  Some people believe that Android is successful partly due to DVM; is this really true?

 How DVM performs compared to JVM? •  Evaluate on the same board using the same benchmarks

 How DVM affects the performance of Android apps? •  Analyze runtime profile

Page 3: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

3 Virtual Machine & Optimization Lab

  Comparison of DVM and JVM   Evaluation of DVM and JVM   Evaluation of Android apps   Conclusion

Page 4: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

4 Virtual Machine & Optimization Lab

  VM for executing Java in Android platform •  Java code in applications, framework, and core libraries

•  Executes dex files instead of class files of Java VM (JVM)

•  DX (class-to-dex)

•  Dex file has different bytecode ISA

Page 5: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

5 Virtual Machine & Optimization Lab

  DVM has a register-based bytecode, while JVM has a stack-based bytecode

JAVA SOURCE CODE public static int add(int a, int b) {

int c = a + b; return c;

} JVM DVM

0: iload_0 1: iload_1 2: iadd 3: istore_2 4: iload_2 5: ireturn

|0000: add-int v0, v1, v2 |0002: return v0

Page 6: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

6 Virtual Machine & Optimization Lab

DVM interpreter is supposed to be faster than JVM’s, due to fewer bytecode count and operand accesses •  According to Shi’s “stack vs. register” paper [TACO’08] •  DVM has two interpreters (assembly version, C version),

while our JVM has C version only

Page 7: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

7 Virtual Machine & Optimization Lab

Higher performance requires just-in-time compilation, which translates bytecode to native code at runtime

  Both VMs employ adaptive compilation •  Interpret initially, when finding hot spot, compiling it

  DVM’s JIT compilation unit is a hot path called a trace, while JVM’s is a hot method •  For lower memory footprint, yet competitive performance •  But, the reality is …

Page 8: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

8 Virtual Machine & Optimization Lab

1 2

3

4

5

6

7

1

2

4

3

4

5

7

6

7

Blocks:Loop

  Interpret initially, count at each trace entry •  Trace entry: target of jump, next bytecode of trace

  If counter > threshold, trace recording starts   Trace recording stops when meeting a branch

or a method call; trace is enqueued for JITC   A join BB can be compiled multiple times

  Chaining is used for control transfer at the end of a trace: chaining cells are added •  [Jump to a VM internal function + address cache]

Page 9: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

9 Virtual Machine & Optimization Lab

  Code quality: too short (~3 bytecode) traces •  Fewer optimizations, higher overhead of chaining cells

  Preciseness of hot trace detection •  Counters are shared among traces to reduce space

  Register allocation •  Cannot map virtual registers to physical registers globally

–  v0=v0+v1 requires two loads from v0 and v1 and a store to v0

Can affect performance and memory, negatively

Page 10: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

10 Virtual Machine & Optimization Lab

Generated Machine code (12 instructions generated)

Java Source Code Dalvik Bytecode public static int factorial( ) { int result = 1; for(int i = 1 ; i < 10000 ; i++) { result = result * i; } return result; }

|0000: const/4 v0, #int 1 // #1 |0001: move v1, v0 |0002: const/16 v2, #int 10000 // #2710 |0004: if-ge v0, v2, 000a // +0006 |0006: add-int/2addr v1, v0 |0007: add-int/lit8 v0, v0, #int 1 // #01 |0009: goto 0002 // -0007 |000a: return v1

// if-ge v0, v2, 000a LDR R3, [RFP, #0] CMP R3, R2 STR R2, [RFP, #8] BGE label2 B label1

label2: ……

label1: // add-int/2addr v1, v0 LDR R0, [RFP, #4] LDR R1, [RFP, #0] ADDS R0, R0, R1 STR R0, [RFP, #4]

// add-int/lit8 v0, v0, #int 1

ADDS R1, R1, #1 // goto 0002 STR R0,[RFP, #4] STR R1,[RFP, #0]

Page 11: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

11 Virtual Machine & Optimization Lab

Java Source Code Java Bytecode public static int factorial( ) { int result = 1; for(int i = 1 ; i < 10000 ; i++) { result = result * i; } return result; }

|0000: iconst_1 |0001: istore_0 |0002: iconst_1 |0003: istore_1 |0004: iload_1 |0005: sipush 10000 |0008: if_icmpge <21> |0011: iload_0 |0012: iload_1 |0013: iadd |0014: istore_0 |0015: iinc 1 1 |0018: goto <4> |0021: iload_0 |0022: ireturn

L2: // sipush 10000

LDR v8, [pc, #+0] @const 10000

// if_icmpge <21> CMP v4, v8 LSL #0 BGE L1

//iinc 1 1 ADD v4, v4, #1

STR v4, [rJFP, #-4] //goto <4>

B L2

L1: ……

// iload_0 // iload_1

// iadd ADD v3, v3, v4 LSL #0

// istore_0 STR v3, [rJFP, #-8]

Generated Machine code (8 instructions generated)

Page 12: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

12 Virtual Machine & Optimization Lab

  Tablet PC with ARM Cortex-A8 and 1GB memory

  Android 2.3 Gingerbread on Linux 2.6.35

  PhoneME advanced JVM (HotSpot) on Linux 2.6.32

  EEMBC GrinderBench

  DVM JITC generates Thumb2 code, while JVM JITC generates ARM code •  Thumb2 reduces code size by 15%, performance by 6%

Page 13: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

13 Virtual Machine & Optimization Lab

0

0.5

1

1.5

2

2.5

Chess kXML Parallel PNG RegEx Geomean

JVM Interpreter DVM Interpreter DVM C Interpreter

DVM assembly interpreter is faster than JVM’s, but its C interpreter is similar

Page 14: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

14 Virtual Machine & Optimization Lab

0

0.2

0.4

0.6

0.8

1

1.2

Chess kXML Parallel PNG RegEx Geomean

JVM Dynamic Bytecode Count DVM Dynamic Bytecode Count

DVM executes 40% fewer bytecode instructions

Page 15: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

15 Virtual Machine & Optimization Lab

0

0.5

1

1.5

2

2.5

Chess kXML Parallel PNG RegEx Geomean

JVM Dynamic Bytecode Size DVM Dynamic Bytecode Size

DVM requires a 60% larger program than the JVM for achieving the same job

Page 16: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

16 Virtual Machine & Optimization Lab

0

2

4

6

8

10

12

14

16

18

20

Chess kXML Parallel PNG RegEx Geomean

JVM JITC DVM JITC

DVM with JITC is three times slower than JVM with JITC

Page 17: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

17 Virtual Machine & Optimization Lab

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Chess kXML Parallel PNG RegEx Geomean

JVM Compiled Bytecode Size DVM Compiled Bytecode Size

DVM compiles a smaller amount of bytecode because of its trace-based JITC

Page 18: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

18 Virtual Machine & Optimization Lab

0

0.5

1

1.5

2

2.5

Chess kXML Parallel PNG RegEx Geomean

JVM Generated Code Size DVM Generated Code Size

DVM generates 35% larger machine code than the JVM’s

Page 19: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

19 Virtual Machine & Optimization Lab

Chess kXML Parallel PNG RegEx Avg.

Ratio 1.18 1.08 1.15 1.15 1.13 1.13

How many times a Dalvik bytecode is translated redundantly?

Page 20: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

20 Virtual Machine & Optimization Lab

0

0.5

1

1.5

2

2.5

3

3.5

4 C

hess

kXM

L

Para

llel

PN

G

RegEx

Geom

ean

How many instructions are generated for 1 byte of bytecode ?

JVM: ~1.3 instructions/1 byte of JVM DVM: ~2.7 instructions/1 byte of DVM = ~4.5 instructions/1 byte of JVM

Chaining cell overhead

Page 21: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

21 Virtual Machine & Optimization Lab

0

1

2

3

4

5

6

7

8

Chess kXML Parallel PNG RegEx Geomean

JVM Compile Time DVM Compile Time

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

Chess kXML Parallel PNG RegEx Geomean

JVM Compile Overhead DVM Compile Overhead

DVM compilation time is 4 times longer

Page 22: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

22 Virtual Machine & Optimization Lab

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Chess kXML Parallel PNG RegEx Geomean

DVM Original DVM Trace Extension DVM Trace Extension (Opt)

Even if we extend the trace and add more optimizations, the impact is not high

Page 23: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

23 Virtual Machine & Optimization Lab

  Low code quality due to short trace, low optimization •  Expanding the trace would not help much

  Little difference for Jelly Bean JITC •  A preliminary implementation of a naïve method-based JIT

C is included (but disabled currently)

  One question: how come Android apps work fine?

Page 24: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

24 Virtual Machine & Optimization Lab

  Profile results based on OProfile •  DVM portion (interpreter and JITC code) •  Native portion (kernel+library and native app)

  Run the apps for ~5 sec (since EEMBC runs ~5 sec) Applications Category Running Details AngryBirds Game Load the stage 1-1

DoodleJump Game Play for 5 seconds Seesmic SNS Refresh facebook feed Twitter SNS Refresh timeline Astro File Manager File Navigator Search file system

Google Sky Map Navigation Navigate constellations

Page 25: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

25 Virtual Machine & Optimization Lab

Fortunately, the DVM portion is much smaller, so slower DVM affects much less

0%

20%

40%

60%

80%

100%

Native Native app DVM

Page 26: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

26 Virtual Machine & Optimization Lab

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Interpreter(except GC) GC JITC

Page 27: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

27 Virtual Machine & Optimization Lab

Garbage collection (GC) portion is way too high •  GC for benchmarks take less than 2% •  GC might be too frequent or takes longer time

JITC portion is much smaller than interpreter’s: Why? •  Fewer hot spots than benchmarks? •  Reuse of JITC-generated code is lower?

Page 28: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

28 Virtual Machine & Optimization Lab

1

10

100

1000

10000

100000

1000000

Numbers are log scale

App loops iterate much fewer than benchmark loops.

Page 29: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

29 Virtual Machine & Optimization Lab

1000

10000

100000

1000000

10000000

App methods are called much fewer than benchmark methods

Numbers are log scale

Page 30: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

30 Virtual Machine & Optimization Lab

1

10

100

1000

10000

100000

1000000

Numbers are log scale

App traces are executed much fewer than benchmark traces

Page 31: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

31 Virtual Machine & Optimization Lab

0

50

100

150

200

250

300

350

400

450

500

App traces are generated much more than benchmark traces

Page 32: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

32 Virtual Machine & Optimization Lab

  Apps generate more traces, yet app traces are executed far fewer than benchmark traces •  Perhaps even not enough to justify the JITC overhead

Is JITC really useful for App performance?

Page 33: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

33 Virtual Machine & Optimization Lab

0.7

0.8

0.9

1

1.1

1.2

Angrybirds DoodleJump Seesmic Twitter Astro File Manager

Google Sky Map

Geomean

Interpreter JITC

App performance goes down when we turn on JIT compiler

Loading time only

Page 34: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of

34 Virtual Machine & Optimization Lab

  We believe Dalvik’s trace-based JITC has a severe performance problem in its current form

  We do not experience any critical problem in running the Android apps, though •  Dalvik portion in the total running time is not dominant

  Android apps lack hot spots unlike benchmarks •  Requiring a faster warm spot detection or ahead-of-time

compilation

Page 35: School of Electrical Engineering and Computer Science ...jtres2012.imm.dtu.dk/slides/JTRES_2012_dalvik_oh.pdf · Android apps are programmed using Java Android uses DVM instead of