cmp design choices finding parameters that impact cmp performance sam koblenski and peter mcclone

21
CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Upload: rodger-martin

Post on 21-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

CMP Design Choices

Finding Parameters that

Impact CMP PerformanceSam Koblenski and Peter McClone

Page 2: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Outline

Introduction Assumptions Plackett & Burman Analysis

Simulation methods Statistical Design Plackett & Burman Results

Mean Value Analysis MVA Implementation MVA Results AMVA Implementation AMVA Results

Complementary Results Conclusions

Page 3: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Introduction

2 part study Design space is huge, how can we reduce it?

Method 1 Plackett & Burman (PB) Analysis finds critical

parameters Design uses extreme values of parameters Detailed architecture design can focus on a few

parameters

Page 4: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Introduction (cont.)

Method 2 Mean Value Analysis Model of a CMP Simply designed to compute throughput Design choices can be narrowed down

quickly Intuition is gained and patterns/parameter

relationships identified

Page 5: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Assumptions - PB Design

In-Order approximated as OoO with small window Die Size = 300 mm2 (16 MB Cache @ 65nm) L2 Cache Size expanded to fill the die

Discrete sizes: 4, 8, 12 MB Associativity can be non-power-of-2

Core size measured in Cache Byte Equivalents:Pipeline Width CBE

In-Order 1 50 kB

In-Order 4 100 kB

Out-of-Order

1 75 kB

Out-of-Order

4 250 kB

Page 6: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Simulation Methodology

Simics with Ruby & Opal 16P sims used cache warmup files 2P sims ran for more transactions Attempted OLTP and JBB benchmarks

Benchmark Processors Transactions

OLTP 2 200

OLTP 16 100

JBB 2 20000

JBB 16 10000

Page 7: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Plackett & Burman Design

Motivation Narrow a huge design space Minimize simulation runs (experiments)

Preliminaries Performance Measure Extreme Parameter Values Number of Parameters (N < 4Xn-1)

Page 8: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

PB Design ExampleA B C D E F G Time+ + + - + - - 9- + + + - + - 11- - + + + - + 2+ - - + + + - 1- + - - + + + 9+ - + - - + + 74+ + - + - - + 7+ + + + + + + 4- - - + - + + 17+ - - - + - + 76+ + - - - + - 6- + + - - - + 31+ - + + - - - 19- + - + + - - 33- - + - + + - 6- - - - - - - 112

191 19 111 -13 79 55 239

Page 9: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

PB Design Parameter ValuesParameter Low Value (-) High Value (+)

Number of Cores 2 16

Pipeline Organization

In-Order Out-of-Order

Pipeline Width 1 4

L1 Cache Size 16 kB 128 kB

L1 Associativity Direct Mapped 32-Way

L2 Cache Size Die Area – Core Area

L2 Associativity Direct Mapped 32-Way

L2 Banks 2 32

L2 Latency 50 Cycles 12 Cycles

L2 Directory Latency 25 Cycles 6 Cycles

Pin Bandwidth 400 10000

Memory Latency 300 Cycles 100 Cycles

Page 10: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

PB Results

Extreme Values stressed the simulator Have not completed an entire set of

runs, yet Possibly necessary to build a custom

L2 network for each run

Page 11: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

PB Results for JBB

0

2

4

6

8

10

12

14

16

18

20

Cores

In/O

ut

Wid

th

L1 S

ize

L1 A

ssoc

L2 A

ssoc

L2 B

anks

L2 L

aten

cy

Direct

ory L

aten

cy

Pin B

W

Mem

ory L

aten

cy

Page 12: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Assumptions - MVA

Distribution of time between memory requests is exponential

Processor cores exhibit the same average behavior with respect to their service times and miss rates.

Doubling the size of the cache reduces the miss rate by a factor of 1/√2

An inorder core takes approximately the same area as 50 KB of cache

Page 13: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

MVA Design

Simple Closed Model:

Page 14: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

MVA Design

Two phases of this Model design First: Use the exact MVA equations

Use average time between memory access as an application parameter

Solve for throughput Second: Use Approximate MVA (AMVA)

Use an iterative method to converge on this service time

Solve for throughput 

Page 15: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Exact MVA

To solve for the MVA equations, we determine the mean residence time at all service centers: Rp – processor/L1 residence time

RL2 – L2 residence time

RM – memory residence time.

The case with one core is trivial. Use this case to solve for additional cores Rn,p = Dp * (1 + Qn-1,p)

Page 16: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Exact MVA results

Using data from simulation runs throughput was calculated Miss rates, number of memory requests

Results are erratic Not consistent with simulation results Source of the problem is most likely

processor service time!

Page 17: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Approximate MVA Design

An iterative method can be used to converge on a service time Uses total R as an input parameter

Iterative method works well with approximate MVA Goal is to match total average residence time of a memory

request

Page 18: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Approximate MVA Results

Convergence using the AMVA equations does not always occur

Total measured residence time cannot be reached with this model and parameter set.

Variation of input values without convergence implies flaws in the model structure

There is a complex relationship between the memory system and the rate at which a core issues requests that must be modeled 

Page 19: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Complementary Results

Initial goal to produce PB Results to find parameters to focus on for MVA Model

Results from both approaches could cross-verify correctness

Page 20: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Conclusions

Simics has a STEEP learning curve <5 weeks is not enough time for valid/any results

Refinement of a PB Design leads to long lead times on valid results

CMPs complicate the relationship between cores and memory subsystem

Design methodologies that focus simulation runs are necessary

More results and conclusions to follow

Page 21: CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone

Questions

Questions?