juergen ributzka david stephenson timothy kong dee lee fred chow guang r. gao

14
Open64 A Software Pipelining Framework for Simple Processor Cores •Juergen Ributzka •David Stephenson •Timothy Kong •Dee Lee •Fred Chow •Guang R. Gao

Upload: brendan-richard

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Open64

A Software Pipelining Framework for Simple Processor Cores

•Juergen Ributzka•David Stephenson•Timothy Kong•Dee Lee•Fred Chow•Guang R. Gao

Page 2: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 2Open64

Overview

• SiCortex Multiprocessor• Software Pipelining Framework• Results• Future Work

3/23/2009

Page 3: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 3Open64

SiCortex Multiprocessor

• RISC• 6 cores (MIPS 5KF)• 500 – 700 MHz• In-Order Execution• Limited-Dual Issue• Low Power (12 – 16 Watt)

3/23/2009

Page 4: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 4Open64

Software Pipelining Framework

3/23/2009

DDG

MII

MS

MVE

RA

CG

Front-End

Middle-End

Back-End

FORTRAN C/C++ Java

Loop NestOptimizer

GlobalOptimizer

CodeGenerator

Page 5: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 5Open64

Data Dependence Graph (DDG)

• Cyclic Data Dependence Graph• No register anti- and output-

dependencies• Only single BBs

3/23/2009

DDG

MII

MS

MVE

RA

CG

S1

S2

S1

S2

Page 6: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 6Open64

Minimum Initiation Interval

• Limited by:– Resource Requirements– Recurrence Requirements

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 7: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 7Open64

Modulo Scheduling (MS)

• Huff Modulo Scheduling– Lifetime sensitive– Uses backtracking

• Hyper Node Reduction Scheduling– Lifetime sensitive– No backtracking

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 8: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 8Open64

Modulo Variable Expansion (MVE)

• Separate TN set for every iteration• # of unrollings depends on longest

lifetime

3/23/2009

DDG

MII

MS

MVE

RA

CG

TN10 op1 TNXX…TN20 op2 TN10 TNYY

TN11 op1 TNXX…TN21 op2 TN11 TNYY

TN10 op1 TNXX…TN20 op2 TN10 TNYY

Iteration

Cycle

Page 9: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 9Open64

Register Allocation (RA)

• Only kernels are fully register allocated

• No regions– SWP kernels are not a black box for GRA

and LRA anymore• Currently no register spilling support

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 10: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 10Open64

Code Generation (CG)

• Prologues and epilogues are partially register allocated

• Need several epilogues due to different register sets

3/23/2009

DDG

MII

MS

MVE

RA

CG

P

P0

K0

K1 E0

E1

E2

E

Page 11: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 11Open64

Benchmark Results

3/23/2009

BT CG EP FT IS LU MG SP UA0.9

0.95

1

1.05

1.1

1.15

1.2

NAS Parallel Benchmark

ABI n32ABI 64

spee

dup

Page 12: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 12Open64

Benchmark Results

3/23/2009

410.

bwav

es

433.

milc

434.

zeus

mp

435.

grom

acs

436.

cact

usAD

M

437.

lesli

e3d

444.

nam

d

447.

deal

II

450.

sopl

ex

453.

povr

ay

454.

calcu

lix

459.

Gem

sFDT

D

465.

tont

o

470.

lbm

481.

wrf

482.

sphi

nx3

400.

perlb

ench

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

456.

hmm

er

458.

sjeng

462.

libqu

antu

m

464.

h26r

ef

471.

omne

tpp

473.

asta

r

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

SPEC2006

ABI n32ABI 64

spee

dup

Page 13: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 13Open64

Conclusion and Future Work

• Spilling support needed to enable more loops for SWP

• Screen out loops with small trip count during runtime to reduce SWP overhead

• Hit-under-miss support to overcome current hardware limitations

3/23/2009

Page 14: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 14Open64

Questions ?

3/23/2009