parallel gem5 simulation of many-core systems with ... · thank you q&a duttgroup.ics.uci.edu...

23
Parallel gem5 Simulation of Many - Core Systems with Software - Programmable Memories Bryan Donyanavard* Tiago Muck* Majid Shoushtari* Nikil Dutt Computer Science *all listed authors are equal contributors

Upload: others

Post on 14-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Parallel gem5 Simulation of Many-Core Systems

with Software-Programmable Memories

Bryan Donyanavard*Tiago Muck*

Majid Shoushtari*Nikil Dutt

Computer Science

*all listed authors are equal contributors

Page 2: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Target Platform

• A mesh-based many core architecture with distributed data software programmable memories

2nd gem5 User Workshop, June 20156/11/2015 2

Tile1 Tile2 Tile3

R R R

Tile4 Tile5 Tile6

R R R

Tile7 Tile8 Tile9

R R R

MM

MM

CPU Core

NIPMMU

Instruction Cache

Page 3: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Enabling Simulation of Target Platform

• Software Programmable On-chip Memories (SPMs)• SPM Architectural Components• SPM Programming API• Communication Infrastructure for Distributed

SPMs

• Simulator Speedup via Parallelization• Event Queues

2nd gem5 User Workshop, June 20156/11/2015 3

Page 4: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

6/11/2015 42nd gem5 User Workshop, June 2015

SPM Integration for Many-cores

Page 5: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Existing Memory Hierarchies

• Classic Memory Model• No explicit memory

controller• No NoC

• Ruby• Protocol dependent

2nd gem5 User Workshop, June 20156/11/2015 5

Page 6: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

New Memory Hierarchy

• We add the following infrastructure to the Classic Memory Model• SPM• Paged Memory

Management Unit (PMMU)• Address Translation Table

(ATT)• SPM Governor

CPU 1

SPM 1

CPU 2

SPM 2

BUS

Main Memory

DMA DMA

SPM Governor

PMMU 1 PMMU 2

Mesh NoC

ATT1 ATT2

2nd gem5 User Workshop, June 20156/11/2015 6

Page 7: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

New Memory Hierarchy

• SPM• Receives and responds to

memory requests from CPU (in place of old Cache)

• Receives and responds to memory requests from PMMU (remote requests over NoC)

• Forwards all main memory requests to memory bus (mimicking a DMA)

CPU 1

SPM 1

CPU 2

SPM 2

BUS

Main Memory

DMA DMA

SPM Governor

PMMU 1 PMMU 2

Mesh NoC

ATT1 ATT2

2nd gem5 User Workshop, June 20156/11/2015 7

Page 8: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

New Memory Hierarchy

• PMMU• Receives and responds to

memory requests from SPM (local CPU)

• Receives and responds to memory requests from NoC(remote CPU)

• Processes SPM allocate and free requests from SPM Governor

CPU 1

SPM 1

CPU 2

SPM 2

BUS

Main Memory

DMA DMA

SPM Governor

PMMU 1 PMMU 2

Mesh NoC

ATT1 ATT2

2nd gem5 User Workshop, June 20156/11/2015 8

Page 9: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

New Memory Hierarchy

• ATT• Holds translation of thread’s

virtual to SPM physical address mapping CPU 1

SPM 1

CPU 2

SPM 2

BUS

Main Memory

DMA DMA

SPM Governor

PMMU 1 PMMU 2

Mesh NoC

ATT1 ATT2

2nd gem5 User Workshop, June 20156/11/2015 9

Page 10: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

New Memory Hierarchy

• Governor• Maintains global state of all

memory mapped to SPMs• Receives SPM alloc and free

requests from all executing threads (via pseudo instructions)

• Determines memory mapping based on gem5-user-defined policy and system state

CPU 1

SPM 1

CPU 2

SPM 2

BUS

Main Memory

DMA DMA

SPM Governor

PMMU 1 PMMU 2

Mesh NoC

ATT1 ATT2

2nd gem5 User Workshop, June 20156/11/2015 10

Page 11: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

SPM Programming API

• Programmer’s Interface• SPM_ARRAY_ALLOC (BASE_PTR, LENGTH, DATA_TYPE)• SPM_ARRAY_FREE (BASE_PTR, LENGTH, DATA_TYPE)

int *arr1 = (int*) malloc (LENGTH*sizeof(int));...SPM_ARRAY_ALLOC (arr1, LENGTH, int);

for (i = 0; i<LENGTH; i++) {arr1 [i] = i;

}

SPM_ARRAY_FREE (arr1, LENGTH, int);...free(arr1);

2nd gem5 User Workshop, June 20156/11/2015 11

Page 12: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Network

NoC Integration

• Integrated simple network mesh NoC from Ruby into Classic Memory Model

• PMMU handles crossover from QueuedPorts to MessageBuffers, and acts as network node

P M M U

m_PMMU_responseToNetwork_ptr

m_PMMU_requestToNetwork_ptr

m_PMMU_responseFromSPM_ptr

m_PMMU_requestFromSPM_ptr

m_PMMU_requestToSPM_ptr

m_PMMU_responseToSPM_ptr

SPM

Governor

2nd gem5 User Workshop, June 20156/11/2015 12

Que

ued

Ports

Mas

ter

Slav

e

Mas

ter

Slav

e

Que

ued

Ports

Page 13: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

SPM Communication Protocol

• Protocol to enable SPM• SPMRequestMsg : NetworkMessage

• SPMRequestType_READ• SPMRequestType_WRITE• SPMRequestType_ALLOC• SPMRequestType_DEALLOC

• SPMResponseMsg : NetworkMessage• SPMResponseType_WRITE_ACK• SPMResponseType_DATA• SPMResponseType_GOV_ACK

2nd gem5 User Workshop, June 20156/11/2015 13

Page 14: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

6/11/2015 142nd gem5 User Workshop, June 2015

Simulator Speedup

Page 15: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Simulation Speedup via Parallelization

• Built on top of current multiple queue infrastructure

• Main changes• Assigning a separate queue for each CPU and its

“private” SimObjects (e.g. I$, D$) by setting the eventq_index param. All other SimObjectsassociated to a global queue

• Quantum-based synchronization replaced by event-based synchronization

6/11/2015 2nd gem5 User Workshop, June 2015 15

Page 16: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Event Synchronization

• Event on the system queue executes after all CPU queues

• CPU queues block until shared event is handled

• Using synchronization events to create a barrier

6/11/2015 2nd gem5 User Workshop, June 2015 16

CPU 0

$I $D

CPU EQ 0

CPU 1

$I $D

CPU EQ 1 System EQ

toL2Bus . . .

EvEv

EvEvEvEv

Page 17: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Event Synchronization

• Event on the system queue executes after all CPU queues

• CPU queues block until shared event is handled

• Using synchronization events to create a barrier

6/11/2015 2nd gem5 User Workshop, June 2015 16

CPU 0

$I $D

CPU EQ 0

CPU 1

$I $D

CPU EQ 1 System EQ

toL2Bus . . .

EvEv

EvEvEvEv

Ev Tick x

Page 18: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Event Synchronization

• Event on the system queue executes after all CPU queues

• CPU queues block until shared event is handled

• Using synchronization events to create a barrier

6/11/2015 2nd gem5 User Workshop, June 2015 16

CPU 0

$I $D

CPU EQ 0

CPU 1

$I $D

CPU EQ 1 System EQ

toL2Bus

BrEv

. . .

EvEv

EvEvEvEv

EvBrEv

BrEvBrEv BrEv

BrEv

Tick x-1

Tick x

Tick x+1

Page 19: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Synchronization Layers for Race Conditions

6/11/2015 2nd gem5 User Workshop, June 2015 17

Page 20: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Preliminary Simulator Speedups• Microbenchmarks on

cache-based classic memory• Cache-light:

computationally intensive• small number of accesses to

L2/system queue• Cache-heavy: memory

intensive with very high $ miss rate

• Many accesses to L2/system queue

• CPU queues block all the time

6/11/2015 2nd gem5 User Workshop, June 2015 18

Page 21: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Preliminary Simulator Speedups• Microbenchmarks on

cache-based classic memory• Cache-light:

computationally intensive• small number of accesses to

L2/system queue• Cache-heavy: memory

intensive with very high $ miss rate

• Many accesses to L2/system queue

• CPU queues block all the time

6/11/2015 2nd gem5 User Workshop, June 2015 18

Advantageous for coherent-less

memory (e.g. SPM)

Page 22: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Feedback

• Comments and suggestions? • Interested in participating? • Contact us:

• Bryan – [email protected]• Majid – [email protected]• Tiago – [email protected]

2nd gem5 User Workshop, June 20156/11/2015 19

Page 23: Parallel gem5 Simulation of Many-Core Systems with ... · Thank you Q&A duttgroup.ics.uci.edu 6/11/2015 2nd gem5 User Workshop, June 2015 20. Title: Partially-Forgetful Memories Author:

Thank you

Q&A

duttgroup.ics.uci.edu2nd gem5 User Workshop, June 20156/11/2015 20