stories, not words: abstract datatype instruction setsmartha/slides/ndca11.pdf · stories, not...

36
Stories, Not Words: Abstract Datatype Instruction Sets Martha Kim Columbia University Workshop on New Directions in Computer Architecture 6/5/2011 Sunday, June 5, 2011

Upload: phamcong

Post on 16-Feb-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Stories, Not Words: Abstract Datatype

Instruction Sets

Martha KimColumbia University

Workshop on New Directions in Computer Architecture

6/5/2011

Sunday, June 5, 2011

The Utilization Wall

• Exponential decrease in percentage of transistors that can be operated at full frequency.

• In 45nm TSMC process, 7% of 300mm die can operate at full frequency

• In 32nm, 3.5%

Moore’s Law (manufacturable transistors)

Power budget (operable transistors)

Goulding et al. Conservation cores: Reducing the energy of mature computations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 205–218, Pittsburgh, Pennsylvania, March 2010.

2

Sunday, June 5, 2011

Specialization Is a Promising ApproachR. Hameed et al., “Understanding sources of inefficiency in general-purpose chips,” ISCA '10

G. Venkatesh et al., “Conservation cores: reducing the energy of mature computations,” ASPLOS '10

J. Kelm, D. Johnson, W. Tuohy, S. Lumetta, and S. Patel, “Cohesion: a hybrid memory model for accelerators,” ISCA '10

H. Franke et al., “Introduction to the wire-speed processor and architecture,” IBM Journal of Research and Development, vol. 54, no. 1, pp. 3:1–3:11, 2010.

V. Govindaraju, C. Ho, and K. Sankaralingam, “Dynamically Specialized Datapaths for energy efficient computing,” HPCA ’11

M. Lyons, M. Hempstead, G. Wei, and D. Brooks, “The Accelerator Store framework for high-performance, low-power accelerator-based systems,” Computer Architecture Letters, vol. 9, no. 2, pp. 53–56, 2010.

C. Cascaval, S. Chatterjee, H. Franke, K. Gildea, and P. Pattnaik, “A taxonomy of accelerator architectures and their programming models,” IBM Journal of Research and Development, vol. 54, no. 5, p. 5, 2010.

R. Hou et al., “Efficient data streaming with on-chip accelerators: Opportunities and challenges,” HPCA ’11

N. Goulding et al., “GreenDroid: A Mobile Application Processor for Silicon’s Dark Future,” Hotchips ‘10.

Sunday, June 5, 2011

An Ideal Accelerator SystemHigh Performance

Low Energy

Easy to Program

Software Portability

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

Application

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

Application

Microarch.

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

Application

Microarch.

Arch.

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

!Application

Microarch.

Arch.

Application

Microarch.

Arch.

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

!Application

Microarch.

Arch.

Application

Microarch.

Arch.

Application

Arch.

Sunday, June 5, 2011

Accelerator Design Processes

We need a design flow that facilitates usability

!Application

Microarch.

Arch.

Application

Microarch.

Arch.

Application

Arch.

Microarch.

Sunday, June 5, 2011

Extending Software Abstractions to Hardware

Application

Libraries

Machine Code

Micro-ops

Execution core

Caches

Memory

Sunday, June 5, 2011

Extending Software Abstractions to Hardware

Application

Libraries

Machine Code

Micro-ops

Execution core

Caches

Memory

Sunday, June 5, 2011

Extending Software Abstractions to Hardware

Application

Libraries

Machine Code

Micro-ops

Execution core

Caches

Memory

Raise HW/SW interface

Sunday, June 5, 2011

Extending Software Abstractions to Hardware

Application

Libraries

Machine Code

Micro-ops

Execution core

Caches

Memory

Raise HW/SW interface

Extend interfaces from libraries to hardware

Sunday, June 5, 2011

Extending Software Abstractions to Hardware

Application

Libraries

Machine Code

Micro-ops

Execution core

Caches

Memory

Raise HW/SW interface

Extend interfaces from libraries to hardware

Exploit interfaces with specialized hardware

Sunday, June 5, 2011

Abstract Datatype Processing

SW

Arch

UArch

Sunday, June 5, 2011

Abstract Datatype Processing

class HashTable

put(k,v) v get(k)SW

Arch

UArch

Sunday, June 5, 2011

Abstract Datatype Processing

class HashTable

put(k,v) v get(k)

put $h, $k, $v get $h, $k, $v

SW

Arch

UArch

Sunday, June 5, 2011

Hash Table Processor

Abstract Datatype Processing

class HashTable

put(k,v) v get(k)

put $h, $k, $v get $h, $k, $v

SW

Arch

UArch

Sunday, June 5, 2011

Compilation & Execution

Sequence Labeling

SparseVec HashTable

SV HTGP

Dispatch

Sunday, June 5, 2011

The Software Fallback

SVGP

Dispatch

SVGP

Dispatch

Sunday, June 5, 2011

An Ideal Accelerator SystemHigh Performance

Low Energy

Easy Use - align hardware interfaces with those software is already using

Portability - software fallback plan

Sunday, June 5, 2011

Sparse Vector Accelerator

Enforcing Data Encapsulation

set $v,$i,$x

CPU

get $v,$i,$x dot $v1,$v2,$p

Sunday, June 5, 2011

Sparse Vector Accelerator

Enforcing Data Encapsulation

set $v,$i,$x

CPU

get $v,$i,$x dot $v1,$v2,$p

v i x

AI B

Sunday, June 5, 2011

Sparse Vector Accelerator

Enforcing Data Encapsulation

set $v,$i,$x

CPU

get $v,$i,$x dot $v1,$v2,$p

v i x

AI BAI B I A B

Sunday, June 5, 2011

Sparse Vector Accelerator

Enforcing Data Encapsulation

set $v,$i,$x

CPU

get $v,$i,$x dot $v1,$v2,$p

v i x

AI BAI B I A BC D C D

Sunday, June 5, 2011

Specialized Caching for Sparse Vectors

0%

25%

50%

75%

100%

128 256 512 1024 2048

Hit

Rat

e

Storage Capacity (B)

Standard CacheVecStore

Sunday, June 5, 2011

Key Reuse in Hash Tables

0%

25%

50%

75%

100%

0.1 1 10 100 1000 10000 100000

Pct.

Has

h O

pera

tions

Number of Keys

LZW Compress Parser

Sunday, June 5, 2011

Key Reuse in Hash Tables

0%

25%

50%

75%

100%

0.1 1 10 100 1000 10000 100000

Pct.

Has

h O

pera

tions

Number of Keys

LZW Compress Parser

Sunday, June 5, 2011

Key Reuse in Hash Tables

0%

25%

50%

75%

100%

0.1 1 10 100 1000 10000 100000

Pct.

Has

h O

pera

tions

Number of Keys

LZW Compress Parser

386 entry table26% of table 99% of dynamic accesses

Sunday, June 5, 2011

Key Reuse in Hash Tables

0%

25%

50%

75%

100%

0.1 1 10 100 1000 10000 100000

Pct.

Has

h O

pera

tions

Number of Keys

LZW Compress Parser

386 entry table26% of table 99% of dynamic accesses

94K entry table.1% of table 75% of dynamic accesses

Sunday, June 5, 2011

Exploiting Key Reuse

Compress HTX-MParser HTX-M AccessesCompress HTX-M Entrystore AccessesParser HTX-M Entrystore Accesses

Hash Table Accelerator (HTX)

put $h,$k,$v get $h,$k,$v

HTX-M

HTX-C

Sunday, June 5, 2011

Exploiting Key Reuse

0%

25%

50%

75%

100%

1 10 100 1000

Red

uctio

n In

HT

X-M

Acc

esse

s

Cache Capacity

Compress HTX-MParser HTX-M AccessesCompress HTX-M Entrystore AccessesParser HTX-M Entrystore Accesses

Hash Table Accelerator (HTX)

put $h,$k,$v get $h,$k,$v

HTX-M

HTX-C

Sunday, June 5, 2011

SummaryExtend software’s encapsulated datatypes into hardware accelerators

Natural alignment with standard software engineering

Accelerator utility on all applications that use a particular type

A software fallback that ensures portability

Aggressive optimization of computation and data movement

Sunday, June 5, 2011

Research ChallengesWhat are the appropriate types to target? What is the lower bound in complexity? Is there a max number of types a hardware system can support?

How do I implment polymorphism efficiently? (e.g., priority queue with arbitrary types and user-defined sort function)

How do I optimized enforcement of data encapsulation? (copy-on-read is conservative)

Can the execution model support parallel execution?

What is type-specific coherence like? Simpler? Uglier?

What is the appropriate system-level resource allocation between general and specialized? Between different types?

Sunday, June 5, 2011

Thank You

Sunday, June 5, 2011