a detailed on -chip network model inside a full-system simulator€¦ · full-state directory...

42
Garnet2.0: A Detailed On-Chip Network Model Inside a Full-System Simulator Tushar Krishna Assistant Professor School of ECE and CS Georgia Institute of Technology gem5 workshop ARM Research Summit September 11, 2017 [email protected] http://synergy.ece.gatech.edu/tools/garnet

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Garnet2.0: A Detailed On-Chip

Network Model Inside a Full-System Simulator

Tushar KrishnaAssistant Professor

School of ECE and CSGeorgia Institute of Technology

gem5 workshopARM Research Summit

September 11, [email protected]

http://synergy.ece.gatech.edu/tools/garnet

Page 2: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 2

Page 3: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 3

Page 4: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Networks-on-Chip

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 4

Page 5: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Introduction to NoCs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 5

Core + L1$

Core + L1$

Core + L1$

Core + L1$

Core + L1$

Core + L1$

L2$ L2$ L2$ L2$ L2$ L2$

On-Chip NetworkReq

Fwd

Rsp

LD (A) R1A

Page 6: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Modern NoCs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 6

L3$/Directory

L2$

L1D$

Core

L1I$ Network

Interface

Router

“Tile”

Network Interface converts cache messages (ctrl or data) into packets.

Packets get broken down into one or more flits depending on NoC link width

Page 7: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Network Architecture

uTopology

uRouting

uFlow Control

uRouter Microarchitecture

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 7

Page 8: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Topology:How to connect nodes with links

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 8

~Road Network

Page 9: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Routing: Which path should a message take

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 9

~Series of road segments from source to destination

Page 10: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Flow Control:When does a message stop/proceed

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 10

~Traffic Signals / Stop signs at end of each road segment

Page 11: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Router Microarchitecture: How to build the routers

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 11

~Design of traffic intersection (number of lanes, algorithm for turning red/green)

Page 12: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 12

Page 13: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Why model NoCs accurately?

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 13

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

BaselineNoC

(IntelSCC)

FANOUT+FANINNoC

(Krishnaetal,MICRO2011)

SMARTNoC

(Krishnaetal,HPCA2013)

Normalize

dRu

ntim

e

64-coreCMPwithdifferentNoCMicroarchitectures

Full-stateDirectory HyperTransport TokenCoherence

0

0.2

0.4

0.6

0.8

1

1.2

BaselineNoC(IntelSCC)

SMARTNoC(Krishnaetal,HPCA2013)

Norm

alize

dRu

ntime

64-coreCMPwithdifferentNoCMicroarchitectures

SharedL2 PrivateL2

Case Study I: Directory vs. Broadcast Protocols

Case Study II: Private vs. Shared L2

64-core CMP running PARSEC workloads in full-system gem5. Average runtime plotted.

Different NoC Microarchitectures may lead to different microarchitectural decisions

and new design optimization opportunities

Page 14: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 14

Page 15: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Garnet2.0u Detailed NoC Model

uCurrently part of Ruby Memory System in gem5uOriginal version (5-stage pipeline) released in 2009

uDeveloped by Niket Agarwal (currently at Google) and myself

uNew version (1-stage pipeline, more configurability) released in 2016

u Resourcesu Source: src/mem/ruby/network/garnet2.0u gem5 wiki page: www.gem5.org/garnet2.0uDev patches + practice labs:

http://synergy.ece.gatech.edu/garnetSeptember 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 15

Page 16: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Topology

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 16

Core + L1$

Core + L1$

Core + L1$

Core + L1$

Core + L1$

Core + L1$

L2$

L2$ L2$ L2$

L2$ L2$

Dir Dir

Dir Dir

DMA

Step 1: Instantiate routers and connect to controllersvia “Ext Links”

Step 2: Instantiate “Int Links” and connect to routers in desired topology

Each topology is a python file in configs/topologies/

This is an example of MeshDirCorners_XY.py

ExtLink(bi-directional)

IntLink(uni-directional)

R

R

R

R

R

R

Page 17: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Topology Configurable Parametersu Router

u router_latency (in cycles)u Can be set per router

u Defined in src/mem/ruby/network/BasicRouter.py

u Linku link_latency (in cycles)

u Can be set per link

u Defined in src/mem/ruby/network/BasicLink.py

u weight (i.e., link weight)u To bias routing algorithm [later slides]

u src_outport (string) and dst_inport (string)u Port direction (e.g., “East”)

u Helps with readability of Config file + Adaptive routing algorithms

u bw_multiplier (value)u Used by Ruby’s simple network model, NOT by Garnet

u Link bandwidth is set inside Garnet via the ni_flit_size parameter [later slides]

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 17

Page 18: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Default Topologies Supported

uPt2Pt

uCrossbar

uMeshuMesh_XY

uMesh_westfirst

uMeshDirCorners_XY

uCluster

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 18

Page 19: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Routingu Routing table

uAutomatically populated based on topologyuAll messages use shortest path

u In case of multiple options, the path with the smaller weight is chosen

uDeterministic Routing

u CustomuUsers can leverage outport/inport direction names

associated with each port to implement custom algorithms (say adaptive)

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 19

Page 20: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Deadlock Avoidanceu Deadlock: A condition in which a set of agents wait

indefinitely trying to acquire a set of resources

u Packet A holds buffer u (in 1) and wants buffer v (in 2)u Packet B holds buffer v (in 2) and wants buffer w (in 3)u Packet C holds buffer w (in 3) and wants buffer x (in 0)u Packet D holds buffer x (in 0) and wants buffer u (in 1)

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 20

0 1

23

A

BC

Du

vw

x

Page 21: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Deadlock-free Routing Algorithmsu Deadlocks may occur if the turns taken form a cycle

u Removing some turns can make the routing algorithm deadlock free

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 21

West-First Turn ModelXY Model

Page 22: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Deadlock-free Routing Algorithms

uAssign weights to bias which links used first (to ensure no cyclic dependence)

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 22

SA

DC

DA DB

SB SC

XY Routing: Always go X first, then Y

1 1

1 1

1 1

1 1

1 1

1 1

2

2

2

2

2

2

2

2

2

2

2

2

Mesh_XY

Page 23: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Deadlock-free Routing Algorithms

uAssign weights to bias which links used first (to ensure no cyclic dependence)

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 23

SA

DC

DA DB

SB SC

West-first Routing: Go W first, then N/S

2 2

2 2

2 2

1 1

1 1

1 1

2

2

2

2

2

2

2

2

2

2

2

2

Mesh_westfirst

Page 24: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Flow Controlu Virtual Channels

u Coherence Protocol requires certain number of virtual networks / message classes to avoid protocol deadlocksu This is the minimum number of VCs required

u Within each vnet, there can be more than one VC for boosting network performanceu In Garnet, only one packet can use a VC inside a router at a time

u VCs in vnets carrying control messages are 1-flit deep

u VCs in vnets carrying data (cacheline) messages are 4-5 flit deep

u Creditsu Each VC conveys its buffer availability by sending credits

to its upstream router

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 24

Page 25: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Conventional VC Router Microarch

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 25

Crossbar SwitchInput Buffers

VC 1VC 2

VC n

VA: VC AllocationInput VCs arbitrate for “output” VCs (Input VCs at next router)

SA: Switch AllocationInput ports arbitrate for output ports

Input Buffers

VC 0VC 1

VC n

BW VA ST LT

VC Allocator

SW Allocator

BW: Buffer Write

ST: Switch Traversal

LT: Link Traversal

RC: Route Compute

RC

Route Compute

SA BR

BR: Buffer Read

VN0

VN 1

FLIT

Page 26: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Single-Cycle Router Implementation

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 26

Router.ccàwakeup()

InputUnit.ccàwakeup()

* Buffer Write* Route Compute

OutputUnit.ccàwakeup()

* Rcv Credit* Update State

VirtualChannel.cc

OutVCState.cc

CrossbarSwitch.ccàwakeup()

* Switch Traversal: Push winner flits on link queue* Schedule output link wakeup for next cycle

* Arbitrate outports* VC Allocate* send credit

(schedule creditlinkwakeup next cycle)

* Buffer Read (send flit to switch)

SwitchAllocator.ccàwakeup()

has_free_vc()

select_free_vc() CreditLink.cc

NetworkLink.cc

NetworkLink.cc

CreditLink.cc

1 23

4

VC 1VN0

VC 2VC 3

VC 0

VN1

* Arbitrate inports

For multi-cycle router, add delay in flit

before it can do SA

Page 27: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 27

Page 28: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

System Organization

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 28

RouterRouter

NetworkInterface

NetworkLink

CreditLink

GarnetNetwork.cc

Ruby Memory System

Coherence Protocol

Stats

Interface is “MessageBuffers

” from Ruby

Page 29: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Network Interface

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 29

Core + L1$

Core + L1$

Core + L1$

Core + L1$

L2$

L2$

L2$

L2$

Dir

Dir

DMA R

R

R

R

NI

NI

NI

NI

NI

NINI

NI

NINI

NI

Ingress

Cach

e Co

ntro

ller

To/From Router

Flitisize

Egress

VC Select

credit<VN, msg>

VC 0VC 1

VN0

VC 2VC 3

VN1

Msg_size decides number of flits

Can add additional info in flit

NetworkInterface.ccàwakeup()

Page 30: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Garnet Configurable Parametersu ni_flit_size

u Default = 16B (128b) à 1-flit ctrl, 5-flit datau This sets the bandwidth of each physical link

u vcs_per_vnetu Total VCs in each inport = num_vnets * vcs_per_vnet

u buffers_per_data_vcu Default = 4

u buffers_per_ctrl_vcu Default = 1

u routing_algorithmu Weight-based table or custom

u Defined in: src/mem/ruby/network/garnet2.0/GarnetNetwork.py

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 30

Page 31: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Command Line Parameters

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 31

garnet2.0/GarnetNetwork.py

BasicRouter.pyBasicLink.py

Network.py

src/mem/ruby/network/

Definitions and Default Values

configs/

- Overrides default values in Ruby- All parameters in in these .py files can be specified from command line

network/Network.py

ruby/Ruby.py common/Options.py

example/garnet_synth_test.py

Page 32: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Running Garnet with Ruby

uBuild Ruby Coherence Protocol

uProtocol determines number of message classes/virtual networks required

u Invoke garnet2.0 from command line with appropriate network parameters

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 32

scons build/X86_MOESI_hammer/gem5.opt PROTOCOL=MOESI_hammer

./build/X86_MOESI_hammer/gem5.opt configs/example/fs.py--network=garnet2.0 --topology=Mesh_XY ...

Page 33: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Running Garnet Standaloneu Build the Garnet_standalone protocol

u Dummy protocol just for traffic injection via Garnet Synthetic Traffic tester (next slide)

u 3 Virtual Networks: vnet 0 and vnet 1 inject ctrl (1-flit) packets, vnet 2 injects data (5-flit) packets

u Run user-specified synthetic traffic

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 33

scons build/NULL/gem5.debug PROTOCOL=Garnet_Standalone

./build/NULL/gem5.debug configs/example/garnet_synth_traffic.py--network=garnet2.0 --topology=... \--synthetic=uniform_random \--injectionrate=0.01 \--num-packets-max=100 \

Page 34: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Garnet Synthetic Trafficu Dummy CPU Model [only works with Garnet_standalone protocol]

u src/cpu/testers/garnet_synthetic_traffic

u Inject packets for user-specified traffic pattern at user-specified injection rate

u uniform_random

u tornado

u bit_complement

u bit_reverse

u bit_rotation

u neighbor

u shuffle

u transpose

u Ability to inject continuously or fixed number of packets, and/or for fixed time, and/or from fixed source and/or to fixed destination in one/more vnets

u Heavy customization very useful for debugging

u All parameters described here: http://www.gem5.org/Garnet_Synthetic_Traffic

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 34

Page 35: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 35

Page 36: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Sample Simulation

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 36

./build/NULL/gem5.debug configs/example/garnet_synth_traffic.py--topology=Mesh_XY --num-cpus=16 --num-dirs=16 --mesh-rows=4

m5out/config.ini

Page 37: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Output Stats

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 37

m5out/stats.txt

Packet and Flit stats (#injected, #received, queueing latency at NIC, and

network latency) per VN and overall

More stats can be added via GarnetNetwork.cc

Page 38: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Overviewu What are Networks-On-Chip (NoCs)?

u Why model NoCs accurately

u Garnet2.0u Configuration

u Topology

u Routing

u Flow-Control

u Router Microarchitecture

u System Integrationu Network Interface

u Network Parameters

u Running Garnet with Ruby

u Ruby Garnet Standalone

u Output Stats

u Extensions and FAQs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 38

Page 39: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Extensions and FAQsu System Level Modeling

u How can I integrate Garnet into my own simulator (or with the gem5 classic memory system)?u This should not be too hard - would just require some changes to the NI code on how

it receives its inputs. If anyone wants to try it, I would love to give pointers.

u How do I print a network trace?u You can add some code to the NI. I have a patch on my website for reference.

u Can garnet read a network trace?u You can run Garnet in a standalone mode, and have it inject traffic from a trace

instead of a fixed synthetic pattern. Alternately, try to use cpu/testers/traffic_gen

u Does Garnet report NoC area and power numbers?u The output of Garnet can be fed into DSENT (Sun et al, NOCS 2012) which is present

in ext/dsent. We will add an automatic stats.txt parser for it soon.

u Can we model multiple clock domains?u The entire Ruby memory system (including Garnet inside it) is one clock domain. To

mimic a multi-clock domain design, you can schedule wakeups of slow routers intelligently at some multiples of the clock rather than every cycle.

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 39

Page 40: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Extensions and FAQsu Topology

u How do we model multiple BW links in Garnet?u Inherently, that would require support in the router to manage multiple flit

sizes. Instead, you can add multiple links between the same nodes if you want to model higher bandwidth. If they have the same weight, Garnet will randomly send over the two.

u Can we model a heterogeneous CPU-GPU system?u Yes. The current AMD GPU model models a cluster of CPUs connected to a GPU.

u Can we model indirect networks such as Clos?u Yes, there can be additional routers that are not connected to any controller

and act purely as switches.

u Can we model large-scale HPC networks?u Garnet can model any sized network. You can run 256-node standalone

simulations easily. However, beyond that gem5 cannot instantiate more directories (which it uses as destination nodes). I have a patch to run 1024-node synthetic sims on my website. But these run quite slowly.

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 40

Page 41: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Extensions and FAQsu Routing

u How do we model an adaptive routing algorithm?u If you want to use internal NoC metrics (such as number of credits at an

output port) for making routing decisions, do not use table-based routing. Instead, set the routing-algorithm to custom, and implement your own routing function inside RoutingUnit.cc. See outportComputeXY() for reference.

u Flow Controlu Can we implement alternate deadlock avoidance schemes (such as

escape VCs or dateline)?u You can update the vc selection scheme inside SwitchAllocator to control

which VCs get allocated.

u Microarchitectureu Can we model variable number of VCs in each router?

u Currently the codebase is very tied to the global vcs_per_vnet parameter. If you want to model variable number of VCs, one hack could be to have everyone instantiate the same number of VCs, but modify the VC select to never allocate certain VCs

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 41

Page 42: A Detailed On -Chip Network Model Inside a Full-System Simulator€¦ · Full-state Directory HyperTransport Token Coherence 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NoC (Intel SCC) SMART

Conclusionsu Garnet2.0 is an open-source research vehicle

u Use it and contribute to it!u Being actively maintained by the following people

u Georgia Tech: Tushar Krishnau AMD Research: Brad Beckmann, Onur Kayiran, Jieming Yin, Matt

Porembasu If you have any questions, email on gem5-users or gem5-dev mailing lists

u Would love to have it integrated with the classic memory model

u Useful Resources:u www.gem5.org/Interconnection_Networku www.gem5.org/garnet2.0u http://synergy.ece.gatech.edu/garnet

September 11, 2017Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology 42

Thank you!