introduction to the java bytecode - so@t - 20130924

130
2013-09-24 Java ByteCode 1 Yohan BESCHI – Java Developer @yohanbeschi +Yohan Beschi

Upload: yohanbeschi

Post on 10-May-2015

1.206 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Introduction to the Java bytecode - So@t - 20130924

2013-09-24 Java ByteCode 1

Yohan BESCHI – Java Developer

@yohanbeschi

+Yohan Beschi

Page 2: Introduction to the Java bytecode - So@t - 20130924

A word about me

⦿ Started coding more than 15 years ago

⦿ Expertise ⦿ RIAs

⦿ Performances

⦿ Industrialization

⦿ Writings ⦿ So@t Blogger

⦿ Developpez.com Writer

⦿ InfoQ FR Editor

2013-09-24 Java ByteCode 2

Page 3: Introduction to the Java bytecode - So@t - 20130924

2013-09-24 Java ByteCode 3

Page 4: Introduction to the Java bytecode - So@t - 20130924

Why this talk ?

⦿ To understand JVM exceptions

⦿ Can help dealing with performance issues

⦿ To write a compiler for the JVM

⦿ And the most important, it’s fun !

2013-09-24 Java ByteCode 4

Page 5: Introduction to the Java bytecode - So@t - 20130924

What we won’t see

⦿ A detailed explanation of the JVMS

⦿ Features from Java 5 and higher

⦿ Tools like ASM, BCEL, Javassist, etc.

⦿ JSR-292

2013-09-24 Java ByteCode 5

Page 6: Introduction to the Java bytecode - So@t - 20130924

What we will see

⦿ An introduction to the inner working of the JVM

⦿ A big part of the JVM instruction set

⦿ Unicode and Java

⦿ An introduction to the Class File Format

2013-09-24 Java ByteCode 6

Page 7: Introduction to the Java bytecode - So@t - 20130924

Terminology used in this talk

⦿ A JVM, THE JVM or Hotspot = A virtual machine following the JVMS

⦿ Java Compiler = javac

2013-09-24 Java ByteCode 7

Page 8: Introduction to the Java bytecode - So@t - 20130924

Bytecode - What and Why ?

⦿ Intermediate language between the Java Source Code and machine code

⦿ Close to an Assembly Language

⦿ Efficient execution by an interpreter

2013-09-24 Java ByteCode 8

Page 9: Introduction to the Java bytecode - So@t - 20130924

JIT Compilation

⦿ JIT = Just In Time

⦿ Interpreted bytecode is slower than compiled machine code

⦿ Used to improve the runtime performances

⦿ Optimizations

⦿ Caching

2013-09-24 Java ByteCode 9

Page 10: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (1/6)

package org.bytecode;

public class Demo {

public static void main(String[] args) {

final int sum = add(3, 5);

System.out.println(sum);

}

private static int add(int i, int j) {

return i + j;

}

}

2013-09-24 Java ByteCode 10

$ javac -g:none org/bytecode/Demo.java

Page 11: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (2/6)

public class org.bytecode.Demo

minor version: 0

major version: 51

flags: ACC_PUBLIC, ACC_SUPER

... to be continued ...

2013-09-24 Java ByteCode 11

$ javap –verbose -p org/bytecode/Demo

Page 12: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (3/6) Constant pool:

#1 = Methodref #6.#14 // java/lang/Object."<init>":()V

#2 = Methodref #5.#15 // org/bytecode/Demo.add:(II)I

#3 = Fieldref #16.#17 // java/lang/System.out:Ljava/io/PrintStream;

#4 = Methodref #18.#19 // java/io/PrintStream.println:(I)V

#5 = Class #20 // org/bytecode/Demo

#6 = Class #21 // java/lang/Object

#7 = Utf8 <init>

#8 = Utf8 ()V

#9 = Utf8 Code

#10 = Utf8 main

#11 = Utf8 ([Ljava/lang/String;)V

#12 = Utf8 add

#13 = Utf8 (II)I

#14 = NameAndType #7:#8 // "<init>":()V

#15 = NameAndType #12:#13 // add:(II)I

#16 = Class #22 // java/lang/System

#17 = NameAndType #23:#24 // out:Ljava/io/PrintStream;

#18 = Class #25 // java/io/PrintStream

#19 = NameAndType #26:#27 // println:(I)V

#20 = Utf8 org/bytecode/Demo

#21 = Utf8 java/lang/Object

#22 = Utf8 java/lang/System

#23 = Utf8 out

#24 = Utf8 Ljava/io/PrintStream;

#25 = Utf8 java/io/PrintStream

#26 = Utf8 println

#27 = Utf8 (I)V

2013-09-24 Java ByteCode 12

Page 13: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (4/6) {

public org.bytecode.Demo();

flags: ACC_PUBLIC

Code:

stack=1, locals=1, args_size=1

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: return

... to be continued ...

2013-09-24 Java ByteCode 13

Page 14: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (5/6) public static void main(java.lang.String[]);

flags: ACC_PUBLIC, ACC_STATIC

Code:

stack=2, locals=2, args_size=1

0: iconst_3

1: iconst_5

2: invokestatic #2 // Method add:(II)I

5: istore_1

6: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

9: iload_1

10: invokevirtual #4 // Method java/io/PrintStream.println:(I)V

13: return

... to be continued ...

2013-09-24 Java ByteCode 14

Page 15: Introduction to the Java bytecode - So@t - 20130924

What will I learn ? (6/6) private static int add(int, int);

flags: ACC_PRIVATE, ACC_STATIC

Code:

stack=2, locals=2, args_size=2

0: iload_0

1: iload_1

2: iadd

3: ireturn

}

... end ...

2013-09-24 Java ByteCode 15

Page 16: Introduction to the Java bytecode - So@t - 20130924

Class File as a Text File using PJBA*

.class org/isk/bytecode/Adder

.method add(II)I

iload_0

iload_1

iadd

ireturn

.methodend

.classend

2013-09-24 Java ByteCode 16

*PJBA: Plume Java Bytecode Assembler

Page 17: Introduction to the Java bytecode - So@t - 20130924

Descriptors (1/2)

2013-09-24 Java ByteCode 17

Descriptor Type

Z boolean

B byte

S short

C char

I int

J long

F float

D double

V void

[<type> Array of type <type>

L<type>; Object of type <type>

Page 18: Introduction to the Java bytecode - So@t - 20130924

Descriptors (2/2)

⦿ Descriptors are used to define fields and methods

2013-09-24 Java ByteCode 18

Bytecode Java

add(II)I int add(int i1, int i2)

concat(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;

String concat(String s1, String s2)

merge([Z[Z)[Z boolean[] merge(boolean[] a1, boolean[] a2)

Page 19: Introduction to the Java bytecode - So@t - 20130924

Introduction to the JVM

2013-09-24 Java ByteCode 19

Page 20: Introduction to the Java bytecode - So@t - 20130924

The JVM in few words

⦿ Application Virtual Machine

⦿ Stack based

⦿ Symbolic references

⦿ Garbage collection

⦿ Platform independent

⦿ Network Byte Order (ie. Big-endian)

2013-09-24 Java ByteCode 20

Page 21: Introduction to the Java bytecode - So@t - 20130924

From Source code to the JVM (1/2)

2013-09-24 Java ByteCode 21

Java Code (.java)

Java ByteCode (.class)

Java Compiler (javac)

Class Loader Execution

Engine

Runtime Data Areas

Java Virtual Machine

Page 22: Introduction to the Java bytecode - So@t - 20130924

From Source code to the JVM (2/2)

⦿ ClassLoader: loads the bytecode from class files into the Runtime Data Areas

⦿ Execution Engine: executes the bytecode

⦿ Runtime Data Areas: areas used during a program execution

⦿ Some areas are created during the initialization of the JVM and others are by threads.

2013-09-24 Java ByteCode 22

Page 23: Introduction to the Java bytecode - So@t - 20130924

Run-Time Data Areas (1/2)

2013-09-24 Java ByteCode 23

Run-Time Data Areas

Thread

Program Counter

Java Stack

Native Method Stack

Heap

Method Area

Run-Time Constant Pool

Page 24: Introduction to the Java bytecode - So@t - 20130924

Run-Time Data Areas (2/2)

⦿ Heap: run-time data area from which memory for all class instances and arrays is allocated

⦿ Method Area: stores per-class structures

⦿ Run-Time Constant Pool: is a per-class or per-interface run-time representation of the constantPool table in a class file

2013-09-24 Java ByteCode 24

Page 25: Introduction to the Java bytecode - So@t - 20130924

Runtime Data Areas (2/2)

⦿Threads: daemon and non-daemon

⦿ Program counter: address of the Java Virtual Machine instruction currently being executed

⦿ JVM Stacks: LIFO stacks of Frames

⦿ Native Method Stacks

⦿ Frames: stores data and partial results, performs dynamic linking, returns values for methods, and dispatches exceptions.

2013-09-24 Java ByteCode 25

Page 26: Introduction to the Java bytecode - So@t - 20130924

Threads and Stack Frames

2013-09-24 Java ByteCode 26

Java Virtual Machine

Thread 4

Thread 3

Thread 2

Thread 1 F1

F1

F1

F1

F2 F3

F2

F2 F3 F4 F5 F6

Page 27: Introduction to the Java bytecode - So@t - 20130924

Frames (1/2)

2013-09-24 Java ByteCode 27

Java Virtual Machine

PC

Frame Class

Local Variables

0 1 2 3 4 5 6 7 8

Operand Stack

Method Code

Constant Pool

Page 28: Introduction to the Java bytecode - So@t - 20130924

Frames (2/2)

⦿ Local Variables: array of variables

⦿ Operand Stack: LIFO stack of operands

⦿ Dynamic Linking: translates symbolic method references into concrete method references and translates variable accesses into appropriate offsets in storage structures associated with the run-time location of these variables.

⦿ Java Stack (Frame) != Operand Stack

2013-09-24 Java ByteCode 28

Page 29: Introduction to the Java bytecode - So@t - 20130924

Frame 1 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

ra

PC

1/10 2013-09-24 Java ByteCode 29

Page 30: Introduction to the Java bytecode - So@t - 20130924

Frame 1 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

ra

PC

1

2/10 2013-09-24 Java ByteCode 30

Page 31: Introduction to the Java bytecode - So@t - 20130924

Frame 1 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

ra

PC

2

1

3/10 2013-09-24 Java ByteCode 31

Page 32: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

4/10 2013-09-24 Java ByteCode 32

Page 33: Introduction to the Java bytecode - So@t - 20130924

Cadre 1 Class

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

ra

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

2

1

Cadre inactif

Frame 1

Local Variable

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

5/10 2013-09-24 Java ByteCode 33

Page 34: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

1

Cadre 1

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

ra

2

1

Cadre inactif

Cadre 1

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

ra

2

1

Cadre inactif

Frame 1

Local Variable

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

6/10 2013-09-24 Java ByteCode 34

Page 35: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

2

1

Cadre 1

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

ra

2

1

Cadre inactif

Frame 1

Local Variable

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

7/10 2013-09-24 Java ByteCode 35

Page 36: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

3

Cadre 1

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

ra

2

1

Cadre inactif

Frame 1

Local Variable

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

8/10 2013-09-24 Java ByteCode 36

Page 37: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

Frame 2 Class

Local Variables

0 1 2 3 4 5 6 7 8

Stack

public static int add(int i1, int i2) { push lv0 push lv1 add the top of the stack return the top of the stack }

1

PC

2

3

Cadre 1

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

ra

3

Cadre inactif

Frame 1

Local Variable

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

2

1

Inactive Frame

9/10 2013-09-24 Java ByteCode 37

Page 38: Introduction to the Java bytecode - So@t - 20130924

Class

public static void main(String[] a) { push literal 1 push literal 2 invoke static method add() store the result in lv1 // … }

PC

Frame 1

Local Variables

0 1 2 3 4 5 6 7 8

Stack

ra

3

3

10/10 2013-09-24 Java ByteCode 38

Page 39: Introduction to the Java bytecode - So@t - 20130924

JVM Instructions

2013-09-24 Java ByteCode 39

Page 40: Introduction to the Java bytecode - So@t - 20130924

JVM types (1/2)

⦿ int

⦿ long

⦿ float

⦿ double

⦿ reference

2013-09-24 Java ByteCode 40

Page 41: Introduction to the Java bytecode - So@t - 20130924

JVM types (2/2)

⦿ boolean, byte, short and char are treated as int

⦿ But we can have arrays of byte, short and char

⦿ long and double values take two slots in the operand stack and the local variables

⦿ A reference is a pointer to an object in the heap

2013-09-24 Java ByteCode 41

Page 42: Introduction to the Java bytecode - So@t - 20130924

Mnemonics (1/3)

⦿ An mnemonic is a textual form of an operation (iadd, lload_1, etc.)

⦿ Each mnemonic matches a number between 0 and 255 (1 byte) in a class file.

⦿ This number is called an operation code or simply an opcode

2013-09-24 Java ByteCode 42

Page 43: Introduction to the Java bytecode - So@t - 20130924

Mnemonics (2/3)

2013-09-24 Java ByteCode 43

Letter Type Size (in bit)

b byte 8

s short 16

c char 16

i int 32

l long 64

f float 32

d double 64

a reference 32/64*

* Depending on the JVM

Page 44: Introduction to the Java bytecode - So@t - 20130924

Mnemonics (3/3)

⦿ Instructions dealing with the stack or the local variables start with a letter corresponding to a type

⦿ The instruction « iadd » will add 2 integers

⦿ In a class file, instructions can only exist in a method.

2013-09-24 Java ByteCode 44

Page 45: Introduction to the Java bytecode - So@t - 20130924

Arguments and operands

⦿ An argument follows an instruction

⦿ ldc « Hello World! »

⦿ An operand is from the operand stack

⦿ iadd

2013-09-24 Java ByteCode 45

Page 46: Introduction to the Java bytecode - So@t - 20130924

Returning a value (1/2)

2013-09-24 Java ByteCode 46

Hex Mnemonic

0xac ireturn

0xad lreturn

0xae freturn

0xaf dreturn

0xb0 areturn

0xb1 return

Page 47: Introduction to the Java bytecode - So@t - 20130924

Returning a value (2/2)

public static void doNothing() {

return; // Optional

}

2013-09-24 Java ByteCode 47

.method doNothing()V

return

.methodend

Page 48: Introduction to the Java bytecode - So@t - 20130924

Predifined Constants (1/3)

2013-09-24 Java ByteCode 48

Hex Mnemonic

0x01 aconst_null

0x02 iconst_m1

0x03 iconst_0

0x04 iconst_1

0x05 iconst_2

0x06 iconst_3

0x07 iconst_4

0x08 iconst_5

Hex Mnemonic

0x09 lconst_0

0x0a lconst_1

0x0b fconst_0

0x0c fconst_1

0x0d fconst_2

0x0e dconst_0

0x0f dconst_1

Page 49: Introduction to the Java bytecode - So@t - 20130924

Predifined Constants (2/3)

⦿ The JVM supports constants of type int, float, long, double and String

⦿ These instructions push the constant to the stack

2013-09-24 Java ByteCode 49

Page 50: Introduction to the Java bytecode - So@t - 20130924

Returning a value (3/3)

public static double get() {

return 1.0;

}

2013-09-24 Java ByteCode 50

.method get()D

dconst_1

dreturn

.methodend

Page 51: Introduction to the Java bytecode - So@t - 20130924

User defined constants (1/3)

2013-09-24 Java ByteCode 51

Hex Mnemonic Argument

0x10 bipush n

0x11 sipush n

0x12 ldc n

0x13 ldc_w n

0x14 ldc2_w n

Page 52: Introduction to the Java bytecode - So@t - 20130924

User defined constants (2/3)

⦿ These instructions push the constant to the stack

⦿ bipush is used for constants between -128 and 127

⦿ sipush is used for constants between -32 768 and 32 767

⦿ « ldc »’s instructions are used for every other values.

2013-09-24 Java ByteCode 52

Page 53: Introduction to the Java bytecode - So@t - 20130924

User defined constants (3/3)

public static short get() {

return 14909;

}

2013-09-24 Java ByteCode 53

.method get()S

sipush 14909

ireturn

.methodend

Page 54: Introduction to the Java bytecode - So@t - 20130924

ldc, ldc_w, ldc2_w

⦿ For these instructions the argument (n) is not the actual value, but an index in the Constant Pool

⦿ « _w » means wide. The size of the index is 2 bytes instead of 1.

⦿ « ldc » and « ldc_w » are used for values of type int, float and String

⦿ « ldc2_w » is used for values of type double and long. « 2 » means two slots in the operand stack

2013-09-24 Java ByteCode 54

Page 55: Introduction to the Java bytecode - So@t - 20130924

Local Variables (1/6) – Loading

2013-09-24 Java ByteCode 55

Hex Mnemonic Argument

0x15 iload n

0x16 lload n

0x17 fload n

0x18 dload n

0x19 aload n

0x1a iload_0

0x1b iload_1

0x1c iload_2

0x1d iload_3

Hex Mnemonic

0x1e lload_0

0x1f lload_1

0x20 lload_2

0x21 lload_3

0x22 fload_0

0x23 fload_1

0x24 fload_2

0x25 fload_3

Hex Mnemonic

0x26 dload_0

0x27 dload_1

0x28 dload_2

0x29 dload_3

0x2a aload_0

0x2b aload_1

0x2c aload_2

0x2d aload_3

Page 56: Introduction to the Java bytecode - So@t - 20130924

Local Variables (2/6) – Loading

public static int load(int i) {

return i;

}

2013-09-24 Java ByteCode 56

.method load(I)I

iload_0

ireturn

.methodend

Page 57: Introduction to the Java bytecode - So@t - 20130924

Local Variables (3/6) - Storing

2013-09-24 Java ByteCode 57

Hex Mnemonic Argument

0x36 istore n

0x37 lstore n

0x38 fstore n

0x39 dstore n

0x3a astore n

0x3b istore_0

0x3c istore_1

0x3d istore_2

0x3e istore_3

Hex Mnemonic

0x3f lstore_0

0x40 lstore_1

0x41 lstore_2

0x42 lstore_3

0x43 fstore_0

0x44 fstore_1

0x45 fstore_2

0x46 fstore_3

Hex Mnemonic

0x47 dstore_0

0x48 dstore_1

0x49 dstore_2

0x4a dstore_3

0x4b astore_0

0x4c astore_1

0x4d astore_2

0x4e astore_3

Page 58: Introduction to the Java bytecode - So@t - 20130924

Local Variables (4/6) – Storing

public static void store() {

int i = 17;

double d = 3.5;

}

2013-09-24 Java ByteCode 58

.method store()V

bipush 17

istore_0

ldc2_w 3.5

dstore_1

return

.methodend

Page 59: Introduction to the Java bytecode - So@t - 20130924

Local Variables (5/6)

⦿ « n » is the index in the Local Variables

⦿ Slots in Local Variables are not typed, but you need to be careful about the size of each type (example following)

2013-09-24 Java ByteCode 59

Page 60: Introduction to the Java bytecode - So@t - 20130924

Local Variables (6/6)

ldc "hello world"

astore_2

ldc2_w 3.14d

dstore_1

aload_2 # error!

# dstore_1 stored a double at index

1 and 2. Therefore, we can’t access

to the String anymore

2013-09-24 Java ByteCode 60

Page 61: Introduction to the Java bytecode - So@t - 20130924

Math (1/5) – Arithmetic Operations

2013-09-24 Java ByteCode 61

0x60 iadd

0x61 ladd

0x62 fadd

0x63 dadd

0x64 isub

0x65 lsub

0x66 fsub

0x67 dsub

0x68 imul

0x69 lmul

0x6a fmul

0x6b dmul

0x6c idiv

0x6d ldiv

0x6e fdiv

0x6f ddiv

0x70 irem

0x71 lrem

0x72 frem

0x73 drem

0x74 ineg

0x75 lneg

0x76 fneg

0x77 dneg

Page 62: Introduction to the Java bytecode - So@t - 20130924

Math (2/5) – Notations

⦿ Infix notation : 3 + 4 * 7

⦿ Prefix notation : + 3 * 4 7

⦿ Postfix notation : 3 4 7 * +

Let’s see an example !!

2013-09-24 Java ByteCode 62

Page 63: Introduction to the Java bytecode - So@t - 20130924

Local Variables (3/5) – Loading

public static int add() {

return 2 * (7 – 5) * (8 – 5);

}

2013-09-24 Java ByteCode 63

# Infix: 2 * (7 – 5) * (8 – 5)

# Postfix: 2 7 5 - * 8 5 - *

Page 64: Introduction to the Java bytecode - So@t - 20130924

Math (4/5) – Notations .method add()I

# Stack before -> after

iconst_2 # [] -> 2 bipush 7 # 2 -> 2, 7 iconst_5 # 2, 7 -> 2, 7, 5 isub # 2, 7, 5 - > 2, 2 (7 - 5 = 2) imul # 2, 2 -> 4 (2 * 2 = 4) bipush 8 # 4 -> 4, 8 iconst_5 # 4, 8 -> 4, 8, 5 isub # 4, 8, 5 -> 4, 3 (8 - 5 = 3) imul # 4, 3 -> 12 (4 * 3 = 12) ireturn

.methodend

2013-09-24 Java ByteCode 64

Page 65: Introduction to the Java bytecode - So@t - 20130924

Math (5/5) – few more…

2013-09-24 Java ByteCode 65

<< >> >>>

0x78 ishl

0x79 lshl

0x7a ishr

0x7b lshr

0x7c iushr

0x7b lushr

& | ^

0x7e iand

0x7f land

0x80 ior

0x81 lor

0x82 ixor

0x83 lxor

casting

0x85 i2l

0x86 i2f

0x87 i2d

0x88 l2i

0x89 l2f

0x8a l2d

0x8b f2i

0x8c f2l

0x8d f2d

0x8e d2i

0x8f d2l

0x90 d2f

int to byte, char and short

0x91 i2b

0x92 i2c

0x93 i2s

Page 66: Introduction to the Java bytecode - So@t - 20130924

Stack instructions (1/2)

2013-09-24 Java ByteCode 66

Hex Mnemonic Description

0x57 pop Pop the first element off the stack

0x58 pop2 Pop the first two elements off the stack

0x59 dup Duplicate the first element and push it to the stack

0x5a dup_x1 Duplicate the first element and add it under the second

0x5b dup_x2 Duplicate the first element and add it under the third

0x5c dup2 Duplicate the first two elements and push them to the stack (keeping the order)

0x5d dup2_x1 Duplicate the first two elements and add them under the third one (keeping the order)

0x5e dup2_x2 Duplicate the first two elements and add them under the fourth one (keeping the order)

0x5f swap Swap the first two elements

Page 67: Introduction to the Java bytecode - So@t - 20130924

Stack instructions (2/2) ⦿ One element = one slot in the operand stack

⦿ long and double values must be considered as two elements each

⦿ The JVMS is refering to long and double as types of category 2 (taking 2 slots), other types are of category 1 (see « Types and the Java Virtual Machine » in the JVMS)

2013-09-24 Java ByteCode 67

Page 68: Introduction to the Java bytecode - So@t - 20130924

pop - 1/2

2013-09-24 Java ByteCode 68

Cadre 1

Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

iconst_1 pop

PC

1

Page 69: Introduction to the Java bytecode - So@t - 20130924

pop - 2/2

2013-09-24 Java ByteCode 69

Cadre 1

Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

iconst_1 pop PC

Page 70: Introduction to the Java bytecode - So@t - 20130924

dup – 1/2

2013-09-24 Java ByteCode 70

Cadre 1

Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

iconst_2 iconst_1 dup

PC

2

1

Page 71: Introduction to the Java bytecode - So@t - 20130924

dup - 2/2

2013-09-24 Java ByteCode 71

Cadre 1

Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

iconst_2 iconst_1 dup PC

2

1

1

Page 72: Introduction to the Java bytecode - So@t - 20130924

dup2_x2 - (form 3) 1/2

2013-09-24 Java ByteCode 72

Cadre 1 Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

dconst_1 iconst_1 iconst_2 dup2_x2

PC

1

2

1.0

Page 73: Introduction to the Java bytecode - So@t - 20130924

dup2_x2 - (forme 3) 2/2

2013-09-24 Java ByteCode 73

Cadre 1 Classe

Variables Locales

0 1 2 3 4 5 6 7 8

Pile

dconst_1 iconst_1 iconst_2 dup2_x2 PC

1

2

1.0

2

1

Page 74: Introduction to the Java bytecode - So@t - 20130924

Unicode & Java

2013-09-24 Java ByteCode 74

Page 75: Introduction to the Java bytecode - So@t - 20130924

Unicode 101 ⦿ Unicode 6.2 contains a repertoire of more

than 110,000 characters covering 100 scripts

⦿ Each character is associated with a number called Code Point

⦿ Unicode defines a codespace of 1,114,112 code points in the range U+0000 to U+10FFFF

⦿ Unicode is a character set, not an encoding

⦿ Unicode defines two encodings the Unicode Transformation Format (UTF) and the Universal Character Set (UCS)

2013-09-24 Java ByteCode 75

Page 76: Introduction to the Java bytecode - So@t - 20130924

UTF 101 – UTF-8

⦿ In UTF-8 a character can be encoded in 1, 2, 3 or 4 bytes

2013-09-24 Java ByteCode 76

Range Byte 1 Byte 2 Byte 3 Byte 4

U+0000 - U+007F 0xxxxxxx

U+0080 - U+07FF 110xxxxx 10xxxxxx

U+0800 - U+FFFF 1110xxxx 10xxxxxx 10xxxxxx

U+10000 - U+1FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Page 77: Introduction to the Java bytecode - So@t - 20130924

UTF 101 – UTF-16

⦿ In UTF-16 a character can be encoded in 2 or 4 bytes

⦿ Code Points from the BMP

⦿ U+0410 (А - CYRILLIC CAPITAL LETTER A) => 0x04 0x10

⦿ Code Points from a supplementary plane

⦿ U+64321 => 0xD9 0x50 0xDF 0x21

2013-09-24 Java ByteCode 77

Page 78: Introduction to the Java bytecode - So@t - 20130924

UTF 101 – UTF-32

⦿ In UTF-32 a character is encoded in 4 bytes. Its code point doesn’t need any transformation

⦿ U+64321 => 0x00 0x06 0x43 0x21

2013-09-24 Java ByteCode 78

Page 79: Introduction to the Java bytecode - So@t - 20130924

2013-09-24 Java ByteCode 79

Why should I care about Unicode ?

Page 80: Introduction to the Java bytecode - So@t - 20130924

Java Source File encoding

⦿ Java source files can be encoded in various encodings (usually UTF-8)…

⦿ But you MUST always indicate to the compiler what it is…

⦿ Using the option -encoding

2013-09-24 Java ByteCode 80

http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

Page 81: Introduction to the Java bytecode - So@t - 20130924

Class File encoding

⦿ In a class file all strings (packages, classes, fields, methods and literals) are encoded in Modified UTF-8

⦿Modified UTF-8 is almost like UTF-8 but:

⦿ The NULL character is encoded using 2 bytes

⦿ Only formats with 1, 2 or 3 bytes are used (which is enough for the BMP)

⦿ For supplementary planes each surrogate is encoded as a character

2013-09-24 Java ByteCode 81

Page 82: Introduction to the Java bytecode - So@t - 20130924

JVM encoding

⦿ The JVM encodes Strings in UTF-16…

⦿ Therefore extreme care should be taken when handling an external stream of data (a file or from the network)

2013-09-24 Java ByteCode 82

Page 83: Introduction to the Java bytecode - So@t - 20130924

Class File Format

2013-09-24 Java ByteCode 83

Page 84: Introduction to the Java bytecode - So@t - 20130924

Class File Structure (1/2)

2013-09-24 Java ByteCode 84

ClassFile {

int magic;

short minorVersion;

short majorVersion;

short constantPoolCount;

ConstantPoolEntry[] constantPool;

short accessFlags;

short thisClass;

short superClass;

short interfacesCount;

short[] interfaces;

short fieldsCount;

Field[] fields;

short methodsCount;

Method[] methods;

short attributesCount;

Attribute[] attributes;

}

byte, short and int should be

considered as unsigned types

Page 85: Introduction to the Java bytecode - So@t - 20130924

Class File Structure (2/2)

2013-09-24 Java ByteCode 85

public class org.bytecode.Demo

minor version: 0

major version: 51

flags: ACC_PUBLIC, ACC_SUPER

Constant pool:

#1 = Methodref #6.#14 // java/lang/Object."<init>":()V

#2 = Methodref #5.#15 // org/bytecode/Demo.add:(II)I

#3 = Fieldref #16.#17 // java/lang/System.out:Ljava/io/PrintStream;

#4 = Methodref #18.#19 // java/io/PrintStream.println:(I)V

#5 = Class #20 // org/bytecode/Demo

#6 = Class #21 // java/lang/Object

#7 = Utf8 <init>

#8 = Utf8 ()V

...

private static int add(int, int);

flags: ACC_PRIVATE, ACC_STATIC

Code:

stack=2, locals=2, args_size=2

0: iload_0

1: iload_1

2: iadd

3: ireturn

...

Page 86: Introduction to the Java bytecode - So@t - 20130924

Content of a class file (1/2)

⦿ A class file is a binary file where each elements have a well defined size (except strings as we shall see).

⦿ To write and read class files, the JDK provides two classes:

⦿ java.io.DataOutputStream

⦿ java.io.DataInputStream

2013-09-24 Java ByteCode 86

Page 87: Introduction to the Java bytecode - So@t - 20130924

Content of a class file (2/2)

⦿ From an AST it’s quiet simple to generate a class file:

DataOutputStream dos = new DataOutputStream(…);

dos.writeInt(this.magic);

dos.writeShort(this.minorVersion);

dos.writeShort(this.majorVersion);

dos.writeShort(this.constantPoolCount);

//…

2013-09-24 Java ByteCode 87

Page 88: Introduction to the Java bytecode - So@t - 20130924

magic (1/2)

2013-09-24 Java ByteCode 88

ClassFile {

int magic;

short minorVersion;

short majorVersion;

short constantPoolCount;

ConstantPoolEntry[] constantPool;

// ..

}

⦿ It’s value is always 0xCAFEBABE

Page 89: Introduction to the Java bytecode - So@t - 20130924

minorVersion and majorVersion (1/2)

2013-09-24 Java ByteCode 89

ClassFile {

int magic;

short minorVersion;

short majorVersion;

short constantPoolCount;

ConstantPoolEntry[] constantPool;

// ..

}

Page 90: Introduction to the Java bytecode - So@t - 20130924

minorVersion and majorVersion (2/2)

2013-09-24 Java ByteCode 90

⦿ Indicate the version of the class file format

⦿ Oracle's JVM implementation in:

⦿ JDK release 1.0.2 supports class file format versions 45.0 through 45.3 inclusive.

⦿ JDK releases 1.1.* support class file format versions in the range 45.0 through 45.65535 inclusive.

⦿ For k ≥ 2, JDK release 1.k supports class file format versions in the range 45.0 through 44+k.0 inclusive.

Page 91: Introduction to the Java bytecode - So@t - 20130924

Constant Pool (1/3)

2013-09-24 Java ByteCode 91

ClassFile {

int magic;

short minorVersion;

short majorVersion;

short constantPoolCount;

ConstantPoolEntry[] constantPool;

// ..

}

Page 92: Introduction to the Java bytecode - So@t - 20130924

Constant Pool (2/3)

⦿ The constant pool is a central part of a class file.

⦿ It has no equivalent in Java.

⦿ It’s like a symbol table, doing a mapping between the code and constants of several kinds.

⦿ The index of the array constantPool starts from 1.

2013-09-24 Java ByteCode 92

Page 93: Introduction to the Java bytecode - So@t - 20130924

Constant Pool (3/3)

⦿ A ConstantPoolEntry has this format:

ConstantPoolEntry {

byte tag;

byte[] info;

}

⦿ « tag » defines the type of constant

⦿ The content of the byte array (info) is different from tag to tag

2013-09-24 Java ByteCode 93

Page 94: Introduction to the Java bytecode - So@t - 20130924

Constant type ⦿ As for the JDK 1.4 there are 11 kind of

constants:

2013-09-24 Java ByteCode 94

Constant Type Value

ConstantUtf8 1

ConstantInteger 3

ConstantFloat 4

ConstantLong 5

ConstantDouble 6

ConstantClass 7

ConstantString 8

ConstantFieldref 9

ConstantMethodref 10

ConstantInterfaceMethodref 11

ConstantNameAndType 12

Page 95: Introduction to the Java bytecode - So@t - 20130924

ConstantUTF8

⦿ The most common constant

⦿ Used for all kind of strings (package name, class name, method name, etc.)

public class ConstantUTF8 {

byte tag = 0x01;

short length;

byte[] string;

}

2013-09-24 Java ByteCode 95

Page 96: Introduction to the Java bytecode - So@t - 20130924

ConstantInt and ConstantFloat

⦿ Used to store int and float values!

public class ConstantInt { byte tag = 0x03; int value; } public class ConstantFloat { byte tag = 0x04; // The float is converted to an int

int value;

}

2013-09-24 Java ByteCode 96

Page 97: Introduction to the Java bytecode - So@t - 20130924

ConstantLong and ConstantDouble

⦿ Used to store long and double values!

public class ConstantLong { byte tag = 0x03; long value; } public class ConstantDouble { byte tag = 0x04; // The double is converted to a long

long value;

}

2013-09-24 Java ByteCode 97

Page 98: Introduction to the Java bytecode - So@t - 20130924

ConstantString

⦿ Used for String constants.

⦿ Unlike ConstantUTF8, ConstantString contains the index of a ConstantUTF8 in the constant pool.

public class ConstantString {

byte tag = 0x08;

short utf8Index;

}

2013-09-24 Java ByteCode 98

Page 99: Introduction to the Java bytecode - So@t - 20130924

ConstantClass

⦿ A ConstantClass works like a ConstantString. Except that the ConstantUTF8 is holding a fully qualified class name. Like « java/lang/Object » or because an array is an object « [[I »

public class ConstantClass {

byte tag = 0x07;

short utf8Index;

}

2013-09-24 Java ByteCode 99

Page 100: Introduction to the Java bytecode - So@t - 20130924

ConstantNameAndType

⦿ Contains the indexes of two ConstantsUTF8 holding the name and type/descriptor of a field or a method

public class ConstantNameAndType {

byte tag = 0x0C;

short nameUtf8Index;

short descriptorUtf8Index;

}

2013-09-24 Java ByteCode 100

Page 101: Introduction to the Java bytecode - So@t - 20130924

The Last Three (1/2)

⦿ ConstantFieldref

⦿ ConstantMethodref

⦿ and ConstantInterfaceMethodref contains:

⦿ the index of a ConstantClass

⦿ the index of a ConstantNameAndType

2013-09-24 Java ByteCode 101

Page 102: Introduction to the Java bytecode - So@t - 20130924

The Last Three (2/2)

public class ConstantFieldref { byte tag = 0x09;

short classIndex;

short nameAndType8Index; }

public class ConstantMethodref { byte tag = 0x0A;

short classIndex;

short nameAndType8Index; }

public class ConstantInterfaceMethodref { byte tag = 0x0B;

short nameUtf8Index;

short descriptorUtf8Index; }

2013-09-24 Java ByteCode 102

Page 103: Introduction to the Java bytecode - So@t - 20130924

accessFlags (1/3)

2013-09-24 Java ByteCode 103

ClassFile {

// …

short constantPoolCount;

ConstantPoolEntry[] constantPool;

short accessFlags;

short thisClass;

short superClass;

// …

}

Page 104: Introduction to the Java bytecode - So@t - 20130924

accessFlags (2/3)

⦿ Indicate the modifiers of a class using masks. Each bit is a modifier set if equals to 1 and not set if equals to 0

2013-09-24 Java ByteCode 104

Flag name Value Java keyword

ACC_PUBLIC 0x0001 public

ACC_FINAL 0x0010 final

ACC_SUPER 0x0020 -

ACC_INTERFACE 0x0200 interface

ACC_ABSTRACT 0x0400 abstract

Page 105: Introduction to the Java bytecode - So@t - 20130924

accessFlags (3/3)

⦿ For example:

0000 a0b0 00cd 000e

Where:

a = 0x0400 = 0000 1000 0000 0000 (ACC_ABSTRACT)

b = 0x0200 = 0000 0010 0000 0000 (ACC_INTERFACE)

c = 0x0020 = 0000 0000 0010 0000 (ACC_SUPER)

d = 0x0010 = 0000 0000 0001 0000 (ACC_FINAL)

e = 0x0001 = 0000 0000 0000 0001 (ACC_PUBLIC)

2013-09-24 Java ByteCode 105

Page 106: Introduction to the Java bytecode - So@t - 20130924

thisClass & superClass (1/2)

2013-09-24 Java ByteCode 106

ClassFile {

// …

short constantPoolCount;

ConstantPoolEntry[] constantPool;

short accessFlags;

short thisClass;

short superClass;

// …

}

Page 107: Introduction to the Java bytecode - So@t - 20130924

thisClass & superClass (2/2)

⦿ Contains the index of a ConstantClass.

⦿ « this » and « super » have the same meaning as in Java.

⦿ thisClass is the fully qualified name of the current class

⦿ superClass is the fully qualified name of the superClass. (java/lang/Object) by default.

2013-09-24 Java ByteCode 107

Page 108: Introduction to the Java bytecode - So@t - 20130924

Not this time…

2013-09-24 Java ByteCode 108

ClassFile {

// …

short interfacesCount;

short[] interfaces;

short fieldsCount;

Field[] fields;

// …

short attributesCount;

Attribute[] attributes;

}

Page 109: Introduction to the Java bytecode - So@t - 20130924

methods

2013-09-24 Java ByteCode 109

ClassFile {

// …

short methodsCount;

Method[] methods;

// …

}

Page 110: Introduction to the Java bytecode - So@t - 20130924

methods

⦿ Each Java method can be represented like this in a class File

class Method {

short accessFlags;

short nameIndex;

short descriptorIndex;

short attributesCount;

Attribute[] attributes;

}

2013-09-24 Java ByteCode 110

Page 111: Introduction to the Java bytecode - So@t - 20130924

Method – accessFlags (1/2)

⦿ Each Java method can be represented like this in a class File

class Method {

short accessFlags;

short nameIndex;

short descriptorIndex;

short attributesCount;

Attribute[] attributes;

}

2013-09-24 Java ByteCode 111

Page 112: Introduction to the Java bytecode - So@t - 20130924

Method – accessFlags (2/2)

⦿ Working like accessFlags for a ClassFile, they indicate the modifiers of a method

2013-09-24 Java ByteCode 112

Flag Name Value Java Keyword

ACC_PUBLIC 0x0001 public

ACC_PRIVATE 0x0002 private

ACC_PROTECTED 0x0004 protected

ACC_STATIC 0x0008 static

ACC_FINAL 0x0010 final

ACC_SYNCHRONIZED 0x0020 synchronized

ACC_NATIVE 0x0100 native

ACC_ABSTRACT 0x0400 abstract

ACC_STRICT 0x0800 strictfp

Page 113: Introduction to the Java bytecode - So@t - 20130924

nameIndex & descriptorIndex (1/2)

⦿ Each Java method can be represented like this in a class File

class Method {

short accessFlags;

short nameIndex;

short descriptorIndex;

short attributesCount;

Attribute[] attributes;

}

2013-09-24 Java ByteCode 113

Page 114: Introduction to the Java bytecode - So@t - 20130924

nameIndex & descriptorIndex (1/2)

⦿ Contain an index of ConstantUTF8 holding respectively the name and the descriptor of the method

2013-09-24 Java ByteCode 114

Page 115: Introduction to the Java bytecode - So@t - 20130924

attributes

⦿ Each Java method can be represented like this in a class File

class Method {

short accessFlags;

short nameIndex;

short descriptorIndex;

short attributesCount;

Attribute[] attributes;

}

2013-09-24 Java ByteCode 115

Page 116: Introduction to the Java bytecode - So@t - 20130924

Attribute (1/3)

⦿ The Attribute structure can be found inside other ones: ⦿ ClassFile

⦿ Field

⦿ Method

⦿ Code

2013-09-24 Java ByteCode 116

Page 117: Introduction to the Java bytecode - So@t - 20130924

Attribute (2/3)

⦿ There are several different kind of attributes (9 for the JDK 1.4): ⦿ SourceFile ⦿ ConstantValue ⦿ Code ⦿ Exceptions ⦿ InnerClasses ⦿ Synthetic ⦿ LineNumberTable ⦿ LocalVariableTable ⦿ Deprecated

We will see only the Code Attribute today.

2013-09-24 Java ByteCode 117

Page 118: Introduction to the Java bytecode - So@t - 20130924

Attribute (3/3) - Structure

Attribute {

short nameIndex;

int attributeLength;

byte[] info;

}

2013-09-24 Java ByteCode 118

Page 119: Introduction to the Java bytecode - So@t - 20130924

Code Attribute

Code { short attributeNameIndex; int attributeLength; short maxStack; short maxLocals; int codeLength; byte[] code; short exceptionsCount; Exception[] exceptions; short attributesCount; Attribute[] attributes; }

2013-09-24 Java ByteCode 119

Page 120: Introduction to the Java bytecode - So@t - 20130924

attributeIndex (1/2)

Code { short attributeNameIndex; int attributeLength; short maxStack; short maxLocals; int codeLength; byte[] code; short exceptionsCount; Exception[] exceptions; short attributesCount; Attribute[] attributes; }

2013-09-24 Java ByteCode 120

Page 121: Introduction to the Java bytecode - So@t - 20130924

attributeIndex (2/2)

⦿ Contains the index of a ConstantUTF8 containing the value « Code » (The type name of the attribute)

2013-09-24 Java ByteCode 121

Page 122: Introduction to the Java bytecode - So@t - 20130924

attributeLength (1/2)

Code { short attributeNameIndex; int attributeLength; short maxStack; short maxLocals; int codeLength; byte[] code; short exceptionsCount; Exception[] exceptions; short attributesCount; Attribute[] attributes; }

2013-09-24 Java ByteCode 122

Page 123: Introduction to the Java bytecode - So@t - 20130924

attributeLength (2/2)

⦿ Is the length of the attribute (without the six first bytes) in byte.

⦿ It can be calculated like this :

2 + 2 + 4 // maxStack + maxLocals + codeLength

+ code.length

+ 2 // exceptionsCount

+ 8 * exceptions.length // an Exception takes 8 bytes

+ 2 // attributesCount

+ attributes.length

2013-09-24 Java ByteCode 123

Page 124: Introduction to the Java bytecode - So@t - 20130924

maxStack & maxLocals (1/2)

Code { short attributeNameIndex; int attributeLength; short maxStack; short maxLocals; int codeLength; byte[] code; short exceptionsCount; Exception[] exceptions; short attributesCount; Attribute[] attributes; }

2013-09-24 Java ByteCode 124

Page 125: Introduction to the Java bytecode - So@t - 20130924

maxStack & maxLocals (2/2)

⦿ Respectively the maximum size of the operand stack and the local variables

⦿ These sizes can be find out with the instructions used in the method.

2013-09-24 Java ByteCode 125

Page 126: Introduction to the Java bytecode - So@t - 20130924

code (1/2)

Code { short attributeNameIndex; int attributeLength; short maxStack; short maxLocals; int codeLength; byte[] code; short exceptionsCount; Exception[] exceptions; short attributesCount; Attribute[] attributes; }

2013-09-24 Java ByteCode 126

Page 127: Introduction to the Java bytecode - So@t - 20130924

code (1/2)

⦿ Contains all the instructions of a method

⦿ Each instruction take 1 byte

⦿ + the size of their arguments

⦿ Only ¼ of the instruction set have arguments

2013-09-24 Java ByteCode 127

Page 128: Introduction to the Java bytecode - So@t - 20130924

It’s only the beginning

2013-09-24 Java ByteCode 128

Page 130: Introduction to the Java bytecode - So@t - 20130924

2013-09-24 Java ByteCode 130