1 dynamically heterogeneous cores through 3d resource pooling houman homayoun vasileios kontorinis...

32
1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman Homayoun National Science Foundation CI Fellow University of California San Diego

Upload: ethelbert-stevenson

Post on 18-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

1

Dynamically Heterogeneous Cores Through 3D Resource Pooling

Houman HomayounVasileios KontorinisAmirali ShayanTa-Wei LinDean M. Tullsen

Speaker: Houman HomayounNational Science Foundation CI Fellow

University of California San Diego

Page 2: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Why Heterogeneity?

2

Existing General Purpose CMP designs use only homogeneous cores A general purpose one-size-fits-all core is not necessarily the most efficient

One processor optimized for each application!

Core 1 Core 2

Page 3: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Static vs. Dynamic Heterogeneity

3

• Prior proposals (e.g., Kumar 2003) propose static heterogeneity.

• Increases chance of finding an appropriate core

• Does not guarantee perfect match• Others have proposed solutions for dynamic

heterogeneity (Core Fusion, TFlex).• Due to the difficult of sharing resources at

a fine granularity, they enable only coarse-grain sharing.

• Big (combined) cores or small cores.

Page 4: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

4

Outline

Resource Pooling Why 3D? Design Solutions Adaptive Policies Results Conclusion

Page 5: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Application Resource Utilization

5

Page 6: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

6

ROBLDSQ RF IQ

Application Resource Utilization

Page 7: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

7

Application 1

Application 2

underutilized

ROBLDSQ RF IQ

ROBLDSQ RF IQ

Application Resource Utilization

Dual-Core Machine

Page 8: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Dynamic Heterogeneity Through Resource Pooling

8

Register File

ROB

Register File

ROB

Core 2Core 1

Page 9: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

9

Outline

Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion

Page 10: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Why NOT Sharing in 2D?

10

Long wire delay in 2D

In 2D, it is not

efficient

Demanding

500 psec

5 nsec

Page 11: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

11

Our Solution: 3D

Page 12: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

12

Our Solution: 3D

Fast interconnection network

As fast as few ps (three order of magnitude

smaller than 2D)

Minimize the Communication

Latency

5 psec

5000 psec A principal advantage

No change to the fundamental pipeline design of 2D architectures, yet still exploits the 3D to provide greater energy proportionality and core customization

Page 13: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

13

Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion

Outline

Page 14: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Stackable Structures for Resource Pooling Performance bottleneck and power hungry resources

Reorder Buffer and Register File (SRAM) Instruction Queue and Load and Store Queue (CAM+SRAM)

Our goal: share units across multiple cores with minimal impact on design spec (latency, number of ports and power)

Use previously proposed modular design Each partition is a self-standing and independently usable unit Effective in reducing power and access delay

14

Independentpartition

Part 1

Part 2

Part 3

Part 4

Register File

Page 15: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Example of Resource Sharing

15

Decoder

MUX

TSVRegister File in Core 0

Register File in Core 1

Free

Free

Partition

Additional logic to decide whether partition is empty Additional logic to route the signal to the right partition

Page 16: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

16

Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion

Outline

Page 17: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Adaptive Policies for Resource Pooling Several issues need to be considered

Ownership Fast releasing Fast reallocation Cycle by cycle adaptation Prevent starvation

A simple adaptive policy specification (MinMax policy) Set limit for the size of resources

how much they can grow up to (MAX) or they can shrink down to (MIN) Use free list Use central arbitration

17

Page 18: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

18

Arbitration Unit

Core 1 Core 2 Core 3 Core 4

Free List

Application 1 Application 2 Application 3 Application 4

Register File

MinMax Policy Example

MIN

Page 19: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

19

Arbitration Unit

Core 1 Core 2 Core 3 Core 4

Free List

Application 1 Application 2 Application 3 Application 4

Register File

MinMax Policy Example

MIN

Page 20: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

20

Arbitration Unit

Core 1 Core 2 Core 3 Core 4

Free List

Application 1 Application 2 Application 3 Application 4

Register File

MinMax Policy Example

MIN

Page 21: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

21

Arbitration Unit

Core 1 Core 2 Core 3 Core 4

Free List

Application 1 Application 2 Application 3 Application 4

Register File

MinMax Policy Example

MIN

Page 22: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

22

Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion

Outline

Page 23: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Baseline Architecture

23

Processor Model High-end architecture, four OoO cores with issue

width of 4 Medium-end architecture, four OoO cores with

issue width of 2

3D Floorplans (different performance, flexibility, and temperature tradeoff)

(1) Conventional (Thermal-Optimized Design) (2) Proposed (Performance-Optimized Design)

(1) (2)

Page 24: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Evaluation

24

1 Thread4 Thread2 ThreadPowerPerformanceTemperatureEnergy-Delay

Core 1 Core 2

Core 3 Core 4

Active core

Idle core

Link

Page 25: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Single Thread Performance

25

Speed Up

Standard SPEC2K and SPEC2006 Benchmark

Single benchmark (3 out of 4 cores are idle)

Page 26: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Multi-Thread Performance 2Thr: 2 idle cores + underutilized resources in the active cores 4Thr: No idle cores, only underutilized resources

26

Normalized Weighted

Speedup (%)

gains are dramatic when some cores are idle

Page 27: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Medium-end vs High-end

Resource pooling makes the medium core significantly more competitive with the high-end.

27

Normalized Weighted

Speedup (%)

28% 14% Only 3%!

0 Idle Core 2 Idle Core 3 Idle Core

Increase Resource Sharing

Page 28: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

28

power (Watt)

3X4X

Pooling pay a small price in power Because of the enhanced throughput. Large speedups on low-IPC threads and high average speedup, but smaller increase in total instruction

throughput and thus smaller increase in power

Power

Page 29: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

29

temperature (Celsius)

Temperature

Interestingly, the temperature of the medium resource-pooling core is comparable to the high-end core

Page 30: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Efficiency

30

Even still, at equal temperature, the more modest cores have a significant advantage in energy efficiency measured in MIPS2/W (MIPS2/W is the inverse of energy-delay product)

Normalized

2X

Page 31: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

Conclusions Homogeneous cores are inherently inefficient for a diverse

workload. Cores are typically overprovisioned as a result

3D stacking of cores enables fine-grain sharing (pooling) of resources not possible in 2D designs.

Our dynamically heterogeneous 3D architecture allows the processor to construct the right core for each application dynamically, maximizing energy efficiency.

Our 3D pooling architecture Leverages our experience in 2D pipeline design, yet still gains significant benefit from

3D Adapts to the specific demands of an application within a few cycles. Reduces reliance on overprovisioned cores, instead grabbing larger resources only

when needed.

31

Page 32: 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman

End of presentation