efetch : optimizing instruction fetch for event-driven web applications

35
EFetch: Optimizing Instruction Fetch for Event- Driven Web Applications Gaurav Chadha, Scott Mahlke, Satish Narayanasamy University of Michigan August, 2014 University of Michigan Electrical Engineering and Computer Science 1

Upload: jadon

Post on 06-Jan-2016

48 views

Category:

Documents


2 download

DESCRIPTION

EFetch : Optimizing Instruction Fetch for Event-Driven Web Applications. Gaurav Chadha , Scott Mahlke , Satish Narayanasamy University of Michigan August, 2014. University of Michigan Electrical Engineering and Computer Science. Evolution of the Web. Web 1.0. Web 2.0. server. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

1

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications

Gaurav Chadha, Scott Mahlke, Satish Narayanasamy

University of Michigan

August, 2014

University of MichiganElectrical Engineering and Computer Science

Page 2: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

2

Evolution of the WebWeb 1.0 Web 2.0

server

client

published content

user generated

content

published content

user generated

content

• Static Web Pages• Passively view content

• Dynamic Web Pages• Collaborate and generate

content

Page 3: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

3

Evolution of WebWeb 1.0 Web 2.0

server

client

published content

user generated

content

published content

user generated

content

• Rich user experience

compute compute

compute

Page 4: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

4

Evolution of the WebWeb 1.0 Web 2.0

yahoo.com in 1996 yahoo.com in 2014

30x more instructions executed

Good client-side performance

Rich User Experience Browser responsiveness

Page 5: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

5

Core Specialization

Private Caches Private Caches Private Caches Private Caches

Core 1 Core 2 Core 3 Core 4

Private Caches Private Caches Private Caches Private Caches

Core 1 Core 2 Core 3 Core 4

Multi-core processor

Page 6: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

6

Web Core

Private Caches Private Caches Private Caches Private Caches

Core 1 Core 2 Core 3 Core 4

Private Caches Private Caches Private Caches Private Caches

Core 1 Core 2 Core 3 Core 4WebBoostWeb Core

Multi-core processor

Page 7: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

7

WebBoost 1.0

Script performance: High L1-I cache misses

Goal: Specialized instruction prefetcher for web client-side script

Other

Web client-side script performance

Browser responsiveness

Web browser computational components

JavaScriptGCParsingCSSPaintLayoutOther

Web 1.0 Web 2.0

Page 8: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

8

Poor I-Cache Performance

• Web pages tend to support numerous functionalities– Large instruction footprint– Lack hot code

graphics effects image editing

online forms document editing

web personalization games

audio & video

• Web client-side script inefficiencies : code bloat– JIT compiled by JS engine– Dynamic typing

V8 IonMonkey

Nitro Chakra

Page 9: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

9

Lack of Hot Code

29 3079 6129 9179 1222915279183292137924429274790

20

40

60

80

100

PARSEC SPECint 2006 Web Apps

L1-I cache blocks

% o

f L1-

I mis

ses

95%

860 20,400

Page 10: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

10

Poor I-Cache Performance• Compared to conventional programs, JS code incurs

many more L1-I misses• Perfect I-Cache: 53% speedup

amazon

bingcn

n

facebook

gmapsgdocs

pixlr

PARSEC

SPECint 2

0060

5

10

15

20

25

30

L1-I

mpk

i

Page 11: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

11

Problem Statement

• Problem: Poor web client-side script I-Cache performance

• Opportunity: Web client-side scripts are executed in an event-driven model

• Solution: – Specialized prefetcher that is customized for event-driven

execution model– Identifies distinct events in the instruction stream

Page 12: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

Outline

12

Event-driven Web Applications

EFetch

Facets of Instruction Prefetching

Design and Architecture

Methodology

Results

Conclusion

Page 13: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

13

Web Browser Events

External Input Event

Mouse Click

On Load

Internal Browser Event

Page 14: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

14

Event-driven Web Applications

RendererThread

Event Queue

Popping an event for execution

Events inserted in to the queue

Events generate other events

Executes on JS

Engine

Event Queue empty -

Program waitsMouse ClickKeyboard key pressGPS events

External Input Events

Internal Events

Timer eventDOMContentLoaded

E2 E3E1

Head

• Poor I-Cache performance• Different events tend to execute different code• Events typically execute for a very short duration

Page 15: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

15

EFetch

Renderer Thread

E2

E3

E1

• Event Fetch - Instruction Prefetcher for event-driven web applications

• Technique:– Uses an event ID to identify distinct

events in the instruction stream– Event ID is augmented to create an event

signature that predicts control flow well

Event ID

Page 16: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

16

Event Signature

Renderer Thread

E2

E3

E1 Event Type Event Handler

Event ID• Formed by the browser• Uniquely identifies an

event

Function Call Context

Event Signature

Formed in the hardware from context depth (3)

ancestor functions in the Call Stack

Correlates well the program control flow

Page 17: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

17

Instruction Prefetcher: Facets

What to prefetch?

When to prefetch?

Instruction Prefetcher

Page 18: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

18

What to Prefetch?• Naïve solution: On a function call, prefetch the

function body– But, this is too late

• Our approach: On a function call, predict its callees and prefetch their function body addresses

event ID

Event Signature c1 : <I-Cache Addr>c2 : <I-Cache Addr>c3 : <I-Cache Addr>

ci - callee

Page 19: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

19

Duplication of Addresses

f

h

g

event

• A function can appear in two distinct event signatures

• Its body addresses might be duplicated

event f h < A, B, C >

callee I-Cache addresses

event g h < A, C, D >

Page 20: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

20

Compacting I-Cache Addresses

event f h

event g h

I-Cache Addresses

< A, B, C >

< A, C, D >< A, B, C, D >

f

h

g

< A, B, C, D >

( 1, 1, 1, 0 )

f

h

g

( 1, 0, 1, 1 )

callee bit vector

Page 21: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

21

Recording Callees and Function Bodies

c1

event signature

Context Table

FunctionTable

callee

bit vector

c2 bit vector

c2 bit vector

< A, B, C, D >

Page 22: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

22

Instruction Prefetcher: Facets

What to prefetch?

When to prefetch?

Instruction Prefetcher

Page 23: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

23

When to Prefetch?

• When?: Important to prefetch sufficiently in advance, but not too early

• Goal: Prefetch the next predicted function– Able to hide LLC hit latency– Typically sufficient due to low instruction miss rate in LLC

• Our Design: Keep track of a speculative call stack – Predictor Stack

Page 24: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

24

Predictor Stack

• Maintains the call stack as predicted by the prefetcher• Helps prefetch the next function predicted to be called

f

h i

Predictor Stack

f

Function Prefetched

h

i

h

call

Call Stack

f

hi

callreturni

callreturn

return

Page 25: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

25

Architecture

Call Stack

Function Call

Context

Event-ID

X

Event Signature

ci

Context Table

bv bv

Function Table

b1 b2

d

EA

Predicted callees,

addresses

Predictor Stack

Prefetch Queue

Page 26: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

26

Methodology

• Instrumented open source browser – Chromium– It uses the V8 JS engine shared with Google Chrome

• Browsing sessions of popular websites were studied– Their instruction traces were simulated with Sniper Sim

• Our focus was on JS code execution, which was simulated

Page 27: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

27

Architectural Details

• Modeled after Samsung Exynos 5250

• Core: 4-wide OoO, 1.66 GHz

• L1-(I,D) Cache: 32 KB, 2-way

• L2 Cache: 2 MB, 16-way

• Energy Modeling: Vdd = 1.2 V, 45 nm

Page 28: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

28

Related Work• We compare EFetch with the following designs:– L1I-64KB: Hardware overhead of EFetch provisioned

towards extra L1-I cache capacity – 64 KB

– N2L: Next-2 line prefetcher

– CGP: Call Graph Prefetching

– PIF: Proactive Instruction Fetch

– RDIP: Return address stack Directed Instruction Prefetching

Annavaram, et. al. HPCA ‘01

Ferdman, et. al. MICRO ‘11

Kolli, et. al. MICRO ‘13

Page 29: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

29

Prefetcher EfficacyN

2LCG

PPI

FRD

IPEF

etch

N2L

CGP

PIF

RDIP

EFet

ch

N2L

CGP

PIF

RDIP

EFet

ch

N2L

CGP

PIF

RDIP

EFet

ch

N2L

CGP

PIF

RDIP

EFet

ch

N2L

CGP

PIF

RDIP

EFet

ch

N2L

CGP

PIF

RDIP

EFet

ch

amazon bing cnn fb gmaps gdocs pixlr

0

50

100

150

200

250

Prefetch Hits Misses Erroneous Prefetches

%(L

1-I m

isse

s +

L1-I

pref

etch

hits

)

Page 30: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

30

Performance

amazon bing cnn fb gmaps gdocs pixlr gmean0

5

10

15

20

25

30

35

40

L1I-64KB N2L CGP PIF RDIP EFetch

%pe

rfor

man

ce im

prov

emen

t

Page 31: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

31

Energy ConsumptionDesign CGP PIF RDIP EFetch

Overhead (KB) 32 204 63 39

amazon bing cnn fb gmaps gdocs pixlr AMean0

0.2

0.4

0.6

0.8

1

1.2

N2L CGP PIF RDIP EFetch

Rela

tive

Ener

gy C

onsu

med

• Prefetching hardware structures consume little energy– Ranging from 0.01% of the total energy consumed for EFetch to

1.06% for PIF• Erroneous prefetches consume significant fraction of energy

Page 32: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

32

Energy, Performance, Area

1.15 1.2 1.25 1.3 1.35 1.40.7

0.75

0.8

0.85

0.9

0.95

1

1.05

EFetch

PIF

CGP

RDIP

N2L

Performance

Ener

gy

Page 33: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

33

Conclusion• Web 2.0 places greater demands on client-side

computing

• I-Cache performance is poor for web client-side script execution

• EFetch exploits the event-driven nature of web client-side script execution

• It achieves 29% performance improvement over no prefetching

Page 34: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

34

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications

Gaurav Chadha, Scott Mahlke, Satish Narayanasamy

University of Michigan

August, 2014

University of MichiganElectrical Engineering and Computer Science

Page 35: EFetch :  Optimizing Instruction Fetch for Event-Driven Web  Applications

35

Performance Potential

amazon

bing cnn fb gmaps gdocs pixlr GMean0

10

20

30

40

50

60

70

80

% P

erfo

rman

ce im

prov

emen

t

Perfect I-Cache: 53% speedup