amacprotocolfor reliablebroadcastcommunica7onsin...
TRANSCRIPT
A MAC protocol for Reliable Broadcast Communica7ons in
Wireless Network-‐on-‐Chip
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Sergi Abadal ([email protected]) Albert Mestres, Josep Torrellas, Eduard Alarcón, and Albert Cabellos-‐Aparicio
UPC and UIUC
Context
11/28/16 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016 2
Current trends are leading to larger manycores Wireless on-‐chip communica<on holds promise for the implementa<on of fast networks for these mul<processors In complement of a wired NoC, wireless provides Low latency Natural broadcast capabili<es Flexibility
Mo<va<on
11/28/16 3
As the core density increases, more wireless interfaces can be expected on chip Need for arbitra<on strategies (MAC protocols)
That provide low latency That support broadcast traffic That are reliable (packets cannot be lost) That scale with number of wireless nodes
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Contribu<on: BRS-‐MAC
11/28/16 4
We propose a BRS-‐MAC, a protocol based on three pillars: Broadcast, Reliability, Sensing We develop analy<cal models to explore the performance of BRS-‐MAC We compare the obtained performance with that of token passing and a wired mesh
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Emerging Interconnect Technologies
Nanophotonics Vantrease et al, 2008
Transmission Line Interconnects Oh et al, 2013
Wireless on-‐chip Communica<on
Transceiver (transmi8er and receiver circuits)
On-‐chip Antenna
Core
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
On-‐chip Wireless Communica<on
11/28/16 6
PROS Inherently broadcast Low latency Simplicity / Flexibility / Non-‐intrusiveness
CONS Less energy efficient than TL or photonics Low bandwidth
S. Abadal et al, “Broadcast-‐Enabled Massive Mul7core Architectures: A Wireless RF Approach,” IEEE MICRO, vol. 35, no. 5, pp. 52–61, 2015.
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Wired-‐Wireless Network-‐on-‐Chip
Hybrid Network Interface (HNIF)
Controller
eNIF
wNIF
Transceiver
MAC PHY CORE
S +
MEM
ORY
Antenna
Router
Wired-‐Wireless Network-‐on-‐Chip
Hybrid Network Interface (HNIF)
Controller
eNIF
wNIF
Transceiver
MAC PHY CORE
S +
MEM
ORY
Antenna
Router
Medium Access Control (MAC)
11/28/16 9
The MAC layer defines mechanisms to ensure that all nodes can access to a shared medium in a reliable manner Simultaneous accesses to the same channel collide and need to retry
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
t
t
Core 1
Core 2
MAC Context Analysis
11/28/16 10 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Chip scenario Physically constrained – need for lightweight MAC Unlike most environments, the chip scenario is staEc and known beforehand
“Everyone sees everything” Easy to reach consensus Collisions can be detected Simpler protocols
MAC Context Analysis
11/28/16 11 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Applica<on Requirements Low latency – need for fast solu<on Reliability – MAC cannot lose packets Scalability – protocol must scale to many cores
Traffic Characteris<cs Broadcast is the objec<ve of our WNoC* Variable – flexibility is desirable
*S. Abadal, E. Alarcón, A. Cabellos-‐Aparicio, and J. Torrellas, “WiSync: An Architecture for Fast Synchroniza7on through On-‐Chip Wireless Communica7on,” in Proceedings of the ASPLOS ’16, April 2016.
MAC in Wireless NoC
11/28/16 12 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
MAC in Wireless NoC
Channeliza<on Low latency, reliability due to mul<ple channels Inherently rigid, hard to implement broadcast Does not scale è becomes area/power hungry
Coordinated Access (token passing) High throughput due to the absence of collisions S<ll somehow rigid Scaling problems è protocol becomes slow
Random access?
S. Deb, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo, “Wireless NoC as Interconnec7on Backbone for Mul7core Chips: Promises and Challenges,” IEEE J. Emerg. Sel. Top. Circuits Syst., vol. 2, no. 2, pp. 228–239, 2012.
BRS-‐MAC protocol
11/28/16 14
For all this, we propose BRS-‐MAC Broadcast (B): designed to serve broadcast fast Reliability (R): guarantee correct delivery Sensing (S): based on carrier sensing and collision detec<on principles
Seeks to provide the low latency, scalability, and flexibility desired via random access
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
BRS-‐MAC – Basic Assump<ons
11/28/16 15
Nodes can sense the medium In the event of a collision, at least one of the receivers can detect such situa<on and no<fy the transminers At least one collision no<fica<on will reach all the colliding transminers
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
BRS-‐MAC – Algorithm
11/28/16 16
TX: Check if the channel is free. If so, transmit a preamble. Then, listen for a Nega<ve ACK: If there is a NACK, there was a collision. Back off (exponen<al) and retransmit If silence, preamble was OK. Con<nue with the rest of the transmission (cannot collide)
1 N 2 3 4
1 N
1 N Core 1
Core 2 1 N 2 3 4
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
BRS-‐MAC – Algorithm
11/28/16 17
RX: listen for new transmissions. When a preamble is received, check for errors, then If errors, transmit a NACK. If no errors, stay silent.
1 N 2 3 4
1 N
1 N Core 1
Core 2 1 N 2 3 4
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Core 3
NAC
K
BRS-‐MAC – Key Decisions
11/28/16 18
There are several key design decisions Collision detec<on at the RX cannot normally be assumed, but here we have a sta<c scenario Preamble contains the size of packet, receivers can calculate the transmission dura<on NACK mechanism is chosen because
Success will be more probable than collision Only 1 NACK is needed to cancel a colliding tx N-‐1 ACKs are needed to confirm a successful tx
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Analy<cal Modeling – Outline
11/28/16 19 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
BRS-‐MAC Load (G)
Throughput (S)
Delay (D)
L. Kleinrock and F. Tobagi, “Packet Switching in Radio Channels: Part I-‐-‐Carrier Sense Mul7ple-‐Access Modes and Their Throughput-‐Delay Characteris7cs,” IEEE Trans. Commun., vol. 23, no. 12, pp. 1400–1416, 1975.
Analy<cal Modeling – Equa<ons
11/28/16 20 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Throughput: 𝑆= 𝐸{𝑈}/𝐸{𝐵}+𝐸{𝐼} U – occupancy of the channel (successful) B + I – <me between transmissions (busy+idle)
Delay: 𝐷= 𝑁↓𝑟𝑒 𝑅+ 𝑁↓𝑐 𝑇↓𝐾𝑂 + 𝑇↓𝑂𝐾 Nre, Nc – number of retransmissions, collisions R – dura<on of the backoff periods TOK, TKO – <me lost in a collision, delay of successful transmission
Modeling Approaches
11/28/16 21 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Classical (worst-‐case) Posi<on of nodes is unknown, they can move Assume a constant propaga<on <me
Network-‐on-‐Chip (exact) Posi<ons are known a priori Propaga<on <me can be calculated exactly Homogeneous deployment is assumed
Analy<cal Modeling
11/28/16 22 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Necessary assump<ons (Kleinrock et al) All arrivals follow Poisson process Traffic uniformly distributed among infinite popula<on (reasonable in manycores) Negligible <me to switch between TX and RX Negligible <me to sense the channel busy
L. Kleinrock and F. Tobagi, “Packet Switching in Radio Channels: Part I-‐-‐Carrier Sense Mul7ple-‐Access Modes and Their Throughput-‐Delay Characteris7cs,” IEEE Trans. Commun., vol. 23, no. 12, pp. 1400–1416, 1975.
Closed-‐form expressions
11/28/16 23 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
𝑆= 𝑒↑−𝑎𝐺 /𝑒↑−𝑎𝐺 (1−𝑏)+𝑏+2𝑎+1/𝐺
𝑆↑𝑒 ≈1−𝐺𝛼𝑎/1+(2+𝛼)𝑎 −(1−𝑏)𝐺𝛼𝑎+1/𝐺
𝑁↓𝑐 = 𝑎+1/𝐺/𝐸{𝐵}+𝐸{𝐼} 𝐺/𝑆 −1
𝑁↓𝑐↑𝑒 = 𝛼𝑎+1/𝐺/𝐸{𝐵↑𝑒 }+𝐸{𝐼} 𝐺/𝑆↑𝑒 −1
THROUGHPUT
DELAY worst-‐case
exact
Closed-‐form expressions
11/28/16 24 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
𝑆= 𝑒↑−𝒂𝐺 /𝑒↑−𝒂𝐺 (1−𝑏)+𝑏+2𝒂+1/𝐺
𝑆↑𝑒 ≈1−𝐺𝜶𝒂/1+(2+𝜶)𝒂 −(1−𝑏)𝐺𝜶𝒂+1/𝐺
𝑁↓𝑐 = 𝒂+1/𝐺/𝐸{𝐵}+𝐸{𝐼} 𝐺/𝑆 −1
𝑁↓𝑐↑𝑒 = 𝜶𝒂+1/𝐺/𝐸{𝐵↑𝑒 }+𝐸{𝐼} 𝐺/𝑆↑𝑒 −1
Worst-‐case propaga<on <me appears in all expressions.
Parameter rela<ng average exact propaga<on with worst-‐case propaga<on <me
Closed-‐form expressions
11/28/16 25 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
𝑆= 𝑒↑−𝑎𝐺 /𝑒↑−𝑎𝐺 (1−𝒃)+𝒃+2𝑎+1/𝐺
𝑆↑𝑒 ≈1−𝐺𝛼𝑎/1+(2+𝛼)𝑎 −(1−𝒃)𝐺𝛼𝑎+1/𝐺
𝑁↓𝑐 = 𝑎+1/𝐺/𝐸{𝐵}+𝐸{𝐼} 𝐺/𝑆 −1
𝑁↓𝑐↑𝑒 = 𝛼𝑎+1/𝐺/𝐸{𝐵↑𝑒 }+𝐸{𝐼} 𝐺/𝑆↑𝑒 −1
Length of the preamble and NACK period.
Evalua<on Environment
11/28/16 26
Implemented BRS-‐MAC within event-‐based simulator to validate models Propaga<on <mes modeled accurately Overlapping transmissions è collision
As baseline, we implemented classic CSMA with ideal acknowledging Explored impact of a and b parameters
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
L. Kleinrock and F. Tobagi, “Packet Switching in Radio Channels: Part I-‐-‐Carrier Sense Mul7ple-‐Access Modes and Their Throughput-‐Delay Characteris7cs,” IEEE Trans. Commun., vol. 23, no. 12, pp. 1400–1416, 1975.
Valida<on of the model
11/28/16 27 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Impact of propaga<on <me
11/28/16 28 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Impact of preamble length
11/28/16 29 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Performance Comparison
11/28/16 30
Wireless Token Passing Passing the token takes 1 clock cycle Cannot be overlapped with data transmission
Aggressive Wired Mesh Each hop takes 2 clock cycles without conten<on Tree mul<cast Mul<port alloca<on and mul<cast crossbar
Same per-‐link bandwidth in all cases 100% broadcast traffic
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Performance Comparison
11/28/16 31 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
a = 0.1 b = 0.1
Performance Comparison
11/28/16 32 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
Latency increases with number of nodes due to increasing token round-‐trip <me (TOKEN) or network diameter (MESH)
Performance Comparison
11/28/16 33 NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
BRS-‐MAC has the best latency of all op<ons BRS-‐MAC has a reasonable
satura<on throughput
Conclusions
11/28/16 34
We presented BRS-‐MAC, a random access protocol for Wireless Network-‐on-‐Chip We accurately modeled and explored its performance BRS-‐MAC achieves much lower latency than token passing or mesh networks, with reasonable satura<on throughput
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016
The end
11/28/16 35
Thanks for your a8enEon. QuesEons?
NoCArc ’16 @ MICRO-‐49 – Taipei, Taiwan – October 15, 2016