ΔΙΑΛΕΞΗ 7: interconnection architectures - ucy · 7: interconnection architectures. ......
TRANSCRIPT
ΧΑΡΗΣ
ΘΕΟΧΑΡΙΔΗΣ
ΗΜΥ
656
ΠΡΟΧΩΡΗΜΕΝΗ
ΑΡΧΙΤΕΚΤΟΝΙΚΗ
ΗΛΕΚΤΡΟΝΙΚΩΝ
ΥΠΟΛΟΓΙΣΤΩΝ
Εαρινό
Εξάμηνο
2007
ΔΙΑΛΕΞΗ
7: Interconnection
Architectures
Interconnect
•
An interconnect consists of:–
Medium.
–
Channels.–
Nodes and switches.
•
A host connects to the network through a node.
•
Information is divided in transmission units, called packets.
The Ubiquitous Microchip The Ubiquitous Microchip
Sources: Sony, Philips, McLaren
Mercedes, Apple, Airbus, Lexus, Toshiba
System-Level Design IssuesSystem-Level Design IssuesSystem Complexity
Higher level abstraction and specificationSystem-level reuse
System ReliabilityRobustness to internal and external noiseSelf-Sufficient Recovery
System Power Consumption Energy-Performance Trade-Offs
Integration of Heterogeneous TechnologiesComponent varietyTop-Down Planning, SynchronizationInterconnect
Source: International Technology Roadmap for Semiconductors, June 2005
A Critical Bottleneck -
InterconnectA Critical Bottleneck -
Interconnect
Source:
Gordon Moore, Chairman Emeritus, Intel Corp.
0.18
0
50
100
150
200
250
300
Technology generation (μm)
Del
ay (p
sec)
0.8 0.5 0.250.2
50.1
5
Transistor/Gate delay
Interconnect delay
0.35
The Billion Transistor EraThe Billion Transistor Era
Intel Itanium 2 (Codename Montecito)
1.7 BILLION transistors per die!
Pho
to b
y In
tel
Feature sizes diminishing RAPIDLY into the nanometer regime
Transistor densities skyrocketing
Gate delays are scaling down
What about Global Wiring delays?As wire cross-sections decrease, resistance INCREASES!
Interconnects are also an issue in terms of AREA, POWER, and RELIABILITYThe INTERCONNECT
can no longer be ignored!
Delay for Metal 1 and Global Wiring versus Feature Size (2005 ITRS)
Wiring Delays Keep Increasing Relentlessly!Wiring Delays Keep Increasing Relentlessly!
Global w/o Repeaters
Global with Repeaters
Gate Delay
Relative DelaysRelative Delays
Gate Delay250nm
32nm
Global Wiring
Global Interconnect Delays are NOT Scaling like Gate Delays!
Pho
to b
y IB
M
System-on-Chip (SoC) Design RevolutionSystem-on-Chip (SoC) Design Revolution
DesignIP Blocks
LogicalComponents
System-on-ChipIntegration
ASICdesign
PhysicalComponents
System-on-BoardIntegration
Until now
Now/Future
Increasing Circuit Complexity
IP Re-Use
Platform-Based Design
On-chip Interconnect Scalability
Features of current SoCsFeatures of current SoCs
Reuse of design and testHard cores: available as layouts or netlistsSoft cores: Available as synthesizable HDL code
SoC DesignSelection and specialization of cores
Example: In a processor core you may have an option of selecting number of registers
Standard interfaces
Plug’n Play ApproachPlug the core into a “predefined” area and expect it to work.
On Chip InterconnectsOn Chip Interconnects
Requirements for an on-chip interconnect
Buses
Switching networksCircuit-switchingPacket-switching
Comparison
Immutability and uniqueness
Requirements: General goalsRequirements: General goals
A reusable on-chip interconnect must:Support a modern, IP-block based methodology.Provide a pre-made/black-box/push-a-button product to the system designer.Provide a standard interface for access.Support a wide range of configuration parameters.Be effective and efficient!
Generic performance goals:Latency and latency jitter.Bandwidth.Power.Performance scalability.
RequirementsRequirements
Topology and protocol requirements:Long connections must be asynchronous.Architectural scalability.General purpose.Programmable.
Reliability requirements:Performance guarantees.Noise resistance.Dynamic fault tolerance.
Design environment requirements:Early performance estimates must be possible.Economic use of resources.Implemented as an IP-block (or a set of them).
Buses: Definition (I)Buses: Definition (I)
A bus is an interconnection structure in which all connected hosts share the communication mechanism spatially.
Communication is broadcasted, and multiplexed in time.
Hierarchical buses are an extension of this idea.
Buses: Definition (II)Buses: Definition (II)
Buses: Advantages vs. DisadvantagesBuses: Advantages vs. Disadvantages
Economic.
Simple communication mechanism.
Simple priority (arbitration) implementation.
Memory mapped communication.
Deficient scalability.
Contention.
Full functional test impossible.
Lack of modularity.
Switching Networks: Definition (I)Switching Networks: Definition (I)
The communication medium is divided in segments called links.
Connections between links are dynamically controlled by switches.
The network architecture defines the topologyand routing scheme.
Switching Networks: Definition (II)Switching Networks: Definition (II)
If a path between hosts is kept through the transmission, we talk about circuit-switching.
Circuit-switching: Advantages and Disadvantages
Circuit-switching: Advantages and Disadvantages
Resources decoupled.
High accumulated bandwidth.
Stable connection parameters.
Blocking.
Circuit set-up penalization.
Switching Networks: Definition (III)Switching Networks: Definition (III)
If each packet is routed independently, we talk about packet-switching.
Packet-switching: Advantages and Disadvantages
Packet-switching: Advantages and Disadvantages
Alternate paths for each packet available: congestion avoidance, fault tolerance.
Flexibility and programmability.
Modularity.
Routing and reordering.
Header penalty.
Nodes need buffering.
Difficult QoS guarantees.
ComparisonComparison
The flaws of packet switching can be alleviated at design time, using software tools.
Immutability and Uniqueness: DefinitionImmutability and Uniqueness: Definition
Once the on-chip interconnect is built on silicon, it is immutable.
A NoC is unique, in the sense that it will fit that application’s requirements better than any other’s, and that for those other applications, we will be able to instantiate a better NoC.
Immutability and Uniqueness: ApplicationsImmutability and Uniqueness: Applications
Because of immutability, NoCs can be optimized further than LANs or other macro-networks.
Uniqueness allows NoCs to take part in the system level tasks that are carried out with a certain level of knowledge about the system and the application running in it.
On-chip Interconnect ScalabilityOn-chip Interconnect Scalability
Non- scalable
Global Wiring Complexity
Shared-Medium, Bus-Based ArchitecturesSegmented BusHierarchical Bus
Ring-Based ArchitecturesIBM Cell Microprocessor (8 cores)
Crossbar-Based ArchitecturesSun UltraSPARC T1 (Niagara) (8 cores)Microsoft Xbox 360 CPU (by IBM) (3 cores)
Point-to-Point Architectures
Buses are becoming spaghettiBuses are becoming spaghetti
CEVA-X1620
PMUICUTIMERS
DMA
L2 SRAM
GPIO
I/OAPB
bridge
APB systemcontrol
DMA Data
CoreO
Data Controller
Program Controller
TAG
I/O
Userperipherals
Userperipherals
Userperipherals
-
Peripheral APB
AHBMasterBridge
AHBSlave
BridgeIF
IF
Internal Data memory
DMA Prog.
Internal Program memory
CRU
ARM DATA bus
DMA - DATA bus 1
CORE - DATA busCORE - program bus
DMA - DATA bus 2
TDM
IF
L2 SRAM
Accelerator Bus
Learning from FPGAsLearning from FPGAs
Universal Logic Blocks
Regular layout and
Interconnection resources
Programmability
Enter the Network-on-Chip (NoC)!Enter the Network-on-Chip (NoC)!
Replace Global Wires with a Resource-Constrained Network
Structured Interconnect Layout
Electrical Properties OPTIMIZED and WELL CONTROLLED
NoCs are like IP Blocks for Wiring!
PEPE PEPE
PEPE PEPE
PEPE PEPE
PEPE PEPE
Systems-on-ChipSystems-on-Chip
Systems-on-Chip Networks-on-Chip
VGA CORE
ADC / DAC
ANALOG
DSP
ALU CORE
What
are Networks-on-Chip (NoC)?What
are Networks-on-Chip (NoC)?
NIC
R
NIC
R
NIC
R
NIC
R
NIC
R
NIC
R
NIC
R
NIC
R
NIC
R
Processing
Elements
(PEs) interconnected via a packet-based network
NICb b
Router
b-bitLinks
Regular Network on ChipRegular Network on Chip
PE
PE
PE
PE
PE
PE
PE
PE
PE
PERouter
Networks On ChipNetworks On Chip
Messages packetized at PE-Network Interface, routed to destinations where they are de-packetized into data.
MSG
MSG
Packetized Message
Decoded Message
The NoC
Paradigm ShiftThe NoC
Paradigm Shift
Computing
module
Network
router
Network
link
Architectural paradigm shift Replace the spaghetti by a customized network
Usage paradigm shift Pack everything in packets
Organizational paradigm shift Confiscate communications from logic designersCreate a new discipline, a new back-end responsibility (Already done for power grid, clock grid, …)
Bus
Why go there?Why go there?
Efficient sharing of wires
Lower area / lower power / faster operation
Shorter design time, lower design effort
Scalability
NoC
CustomizationNoC
Customization
Trim routers / ports /
links
Place Modules
Adjustlink
capacities
Network ComponentsNetwork ComponentsNetwork Interface
Hardware located between each PE and routerCreates data/control packets for outgoing dataDecodes incoming data/control packets
Network Router/SwitchReceives packets, routes them based on routing algorithmCrossbar switch used for switchingContains buffering capacity for switchingOptional error control, QoS hardware, etc.
Network LinksPhysical channels between each router-to-router and router-to-PEUnidirectional links typicallyLow-swing signals used for low-power consumption
Related Work in NoCsRelated Work in NoCs
Architectural Impact of NoCsNetwork on a Chip: An architecture for billion transistor era [Hemani, Jantsch, Kumar, Postula, Oberg, Millberg, Lindqvist – IEEE NorChip Conference 2000]Route Packets, Not Wires: On-Chip Interconnection Networks [Dally, Towles – DAC 2001]Networks on Chips: A New SoC Paradigm [Benini, De Micheli – IEEE Computer January 2002]The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs [Taylor et al. – IEEE Micro March/April 2002]
High PerformanceLow-Latency Virtual-Channel Routers for On-Chip Networks [Mullins, West, Moore –ISCA 2004]
Power-Performance, Temperature-PerformancePower-Driven Design of Router Microarchitectures in On-Chip Networks [Wang, Peh, Malik – MICRO 2003]Thermal Modeling, Characterization and Management of On-Chip Networks [Shang, Peh, Kumar, Jha – MICRO 2004]
Performance-ReliabilityNetworks-On-Chip: The Quest for On-Chip Fault-Tolerant Communication [Marculescu– ISVLSI 2003]
Micronetwork ControlMicronetwork Control
Protocol stack is employed to effectively utilize micronetwork architecture
Abstraction into data link layer, network layer and transport layer
Physical
Data link
Network
Transport
Application
Data Link LayerData Link Layer
Requirement - To increase the reliability of the physical link up to a minimum required level
Physical layer is not sufficiently reliablePacketizing data
Performance vs. error probability tradeoff depending on packet size
Error-correcting codealternating-bit, go-back-N and selective repeat Physical
Data link
Network
Transport
Application
Network LayerNetwork LayerRequirement – To implement end-to-end delivery control in network architectures with many communication channels
Switching algorithmsCircuit, packet and cut-through swithcing
Routing algorithmsDeterministic routing – good for regular traffic patternAdaptive routing – good for irregular traffic (case of SoCs)
Physical
Data link
Network
Transport
Application
Transport LayerTransport LayerRequirement – To provide reliable end-to-end services (e.g. TCP)
Packetization – at the sourceResequencing and reassembling – at the destinationFlow control and negotiation
Deterministic approach – service quality guarantee with resource underutilizationStatistical approach – more efficient but no quality guarantee
Physical
Data link
Network
Transport
Application
Micronetwork ControlMicronetwork Control
Further work to predict the tradeoff curves
Architecture and protocol can be tailored to the target system or applications
Impact of architecture and control design on communication energy consumption
Commercial NoCsCommercial NoCs
ST MicroelectronicsSTNoC, dubbed “Spidergon”Complex multimedia chipsProprietary topology
Philips Electronics NVÆthereal NoCQuality-of-ServiceNetwork connections configurable at run-time
Arteris SALicensable NoC design toolsIP cores of NoC components
AEthereal
Network-on-SiliconAEthereal
Network-on-Silicon
Research in progress (Philips)
IP cores are connected by network
Packet-switched router network
Protocol stack-based design
Provide guaranteed serviceSimplifies IP design and composition of IPs
Generic Router I/O and ArchitectureGeneric Router I/O and Architecture
MXN
ROUTINGDECISION
Generic NoC Router ArchitectureGeneric NoC Router Architecture
RoutingDecision
Unit
ERROR DETECTION/ERROR CORRECTION
INCOMING FLIT
NACK/ACK
FORWARDFLOW
OUTGOINGFLIT
(N)ACK fromnext router
nth OUTPUT PORTnth INPUT PORT
Crossbar
Switch
VirtualChannel
Arbitration
VirtualChannelRegisters
ACK/NACK /CORRECTED DATA
CrossbarArbitration
RetransmissionRegisters Retransmission?
A Conventional NoC RouterA Conventional NoC Router
VC 0
Crossbar (5 x 5)
Routing Unit (RC)
VC Allocator(VA)
Switch Allocator (SA)
VC Identifier
From East
To East
To PE
VC 1VC 2
VC 0From WestVC 1
VC 0From NorthVC 1
VC 2
VC 0From SouthVC 1
VC 0From PEVC 1
VC 2
To WestTo NorthTo South
VC 2
VC 2
Input Port with BuffersControl Logic
Crossbar
The Typical NoC Router PipelineThe Typical NoC Router Pipeline
Switch Alloc.
SAArbiter
VC Alloc.
VCArbiter
Crossbar Flit OutVC 1
VC V
Flit In VC 2:
Routing
L.S. Peh et al. (HPCA 2001)• 3-stage pipeline
Look-Ahead Routing(ISCA 2006, DAC 2005)• 2-stage pipeline
R. Mullins et al. (ISCA 2004)• 1-stage pipeline
Network-On-Chip IssuesNetwork-On-Chip IssuesPower Consumption
Overhead power consumed in routers, network interfaces, and overhead data transmission/encoding
Data such as addresses, control bits, etc.
ReliabilityReliable data transmission is a necessary concept for any on-chip NetworkNetwork guarantees data transmission from PE A to PE B.
PerformanceNetwork net throughput
Defined as the rate of useful data that can be sent over the networkNetwork utilization
In general, similar to traditional networks!
TopologiesTopologiesDifferent interconnection topologies tend to have different traffic patterns - the way in which nodes are connected in a network impacts latency
Bandwidth Traffic pattern
2D meshLow costSome nodes connect to more neighbors than othersTends to generate “hot spots” in the center of the topology
2D TorusLower message latencyFolded torus is used to avoid wire delaysAll nodes in a torus connect to the same number of neighborsUniform traffic density
Packets and AddressingPackets and AddressingPacket SizeNo: of packets per messageBoth header and payload are packetsPacket length
ExplicitImplicit
Addressing scheme E.g. 6 bit encoding for at most an 8*8 array
Switching activitySwitching activityVirtual Cut-Through
Packet is forwarded as soon as destination can accept it in its entiretyBuffering requirements pretty high
Store-and-ForwardPacket is received in its entirety and then it is forwardedAgain, high buffering requirements…
Wormhole RoutingPackets are broken down into flits (smallest bufferable chunk)Flits are being routed as soon as the destination can accept a single flitMuch smaller buffering requirementsPreferred method of switching in NoC’s today
Routing Algorithms (Deterministic)Routing Algorithms (Deterministic)
X-Y routing Hierarchical routing Hot-potato routing
S S
DDD
S
Wiring and TilingWiring and Tiling
Crossbar SwitchesCrossbar Switches
E
W
N
S
PE
INPUTS
OUTPUTSE W N S PE
Control
OUT
IN
•Delay through a crossbar increases significantly with the number of ports •Places a limit on the connectivity of the network•Power-hungry operation
Virtual ChannelsVirtual Channels
CROSSBARPRIORITY
DETERMINATION UNIT
VC # 1
VC # 2
VC # n
Virtual Channel Selection Signal
N
S
E
W
PE
OUTPUT LINK
•Virtual Channel concept used to provide QoS and deadlock avoidance•Buffer capacity a limiting factor•VC hardware can be complex
VirtualizationVirtualization
Mapping of more than one node onto physical PE hardwareAllows larger number of nodes on-chip. Shared hardwareConfiguration Memory overhead at each PE
B1 B2 B3 B4
C1 C2
ROUTER ROUTER ROUTER
ROUTER ROUTER ROUTER
B1B2
C1C2
B3B4
Network ParametersNetwork ParametersChannel width
Flit size – In combined GT-BE router flit size should be a multiple of block size to avoid alignment problems
Number of channels – two nodes can have more than one channel between them
Buffer memory parameters - Critical since we cannot drop packets
Flit buffer depth
Flit buffer organization Shared between channelsIndividual buffers for each channels
Hot Spots in NoCHot Spots in NoC
Hot Spot: A module that occasionally cannot digest all the traffic addressed to it
Results in temporary massive delay build-upResults in blocking the net !
This is NOT congestion on the netHigher network capacity won’t help
ExamplesPort to off-chip DRAMShared resource on chip
HotSpots
in QNoC
(cont’d)
When HotSpot (HS) cogs, worms “get stuck” in the network, and block other worms
Two problems:PerformanceFairness
IP (HS) Inte
rface
IP3Interface
IP2
Inte
rfaceIP1
(HS) Inte
rface
HS Affects the System
HS is not a local problem. Traffic destined elsewheresuffers too!
The Green packet experiences long delay even though it does NOT share any link with HS traffic
Network Performance
As HS module utilization grows, a large part of the system becomes clogged
Source (un)FairnessModule location greatly affects QoS
Example: At 90% utilization, a distant module experiences x10 the latency of a close one
Simulation results for a 4x4 NoC with 10Gbit/Sec links, 6Gbit/Sec HS Module
HS
6
1
5
3
7
4
8
109 11 12
1413 15 16
R
R
R
R
R R
R
RR R R
RR R R
R
Blocked Output Ports…Blocked Output Ports…
Cooling down the Hot SpotCooling down the Hot SpotWhen the spot gets hot, block new packets to it
This is prevention
How? With credit-based allocation
IP1
Flow
Control
IP4
NoC
Interface
Interface
IP3
IP2 (HS)
Enh
ance
d In
terfa
ceSc
hedu
ler
HotSpot Credit-Based Allocation
IP1
IP4
NoC
Interface
Interface
IP3
IP2 (HS)
Enh
ance
d In
terfa
ceFlow
Control
Sche
dule
r
HotSpot
Credit-Based Allocation
IP1
Flow
Control
NoCIP2 (HS)
Enh
ance
d In
terfa
ceSc
hedu
ler
Interface
IP3
IP4
Interface
HotSpot
Credit-Based Allocation
Power ConsumptionPower ConsumptionNetwork power identified as a limitation - ~40% of total power!Energy(flit)
= [E(write buffer) + E(read buffer) + E(arbitration) + E(crossbar) + E(link) ] * # of Hops
= E(Buffers)+ E(Arbitration) + E(crossbar) + E(link)* Hops
Energy (Packet) = E(Flit) X # flits/packetEnergy per packet depends on the amount of flits per packet, and the number of hops the packet travels through the networkThe larger the network, the more hopsNeed better routing algorithms, topologies, etc.
NoC ReliabilityNoC ReliabilityShrinking feature size results in decreasing Vdd and Vt
Crosstalk, coupling noise, soft errors and process variations affect reliabilityReliability a critical design issue
Communication protocol (NoCs) requires error protection mechanisms
Error protection consumes energy and increases latency
Traditional macro-networks provide ideas on error detection/correction schemes
Error Detection + Data Retransmission vs. Error Correction
Recall: NoC DesignsRecall: NoC DesignsS S S S
S S S S
S S S S
S S S S
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
RESOURCE
NI
•PE – Switch(es)-PE communication
•While we view a PE as a sender and a PE as a receiver, same concept can be applied for Switch to Switch
•As such, multiple transmissions for each data packet
•Level of protection?
Error Detection SchemesError Detection Schemes
End-to-End Error DetectionParity Check or CRC codes can be added to each packet/flitCRC/Parity encoder integrated into each sender NIPackets are encoded, transmitted and stored (for retransmission)Receiver NI checks for error
Ack/Nack Signal to sender, either piggybacked w/ response packet or individual.Open Core Protocol requires request-response transaction
Time-Out mechanism necessaryNeed sequence #’s for each packet (for re-ordering and identification of duplicate packets)
End-to-End RetransmissionEnd-to-End Retransmission
Decoder
receiver NIsender NI
packet Buffers
credit signalqueuing buffers
swtich A
Core
switch B
Core
Encoder
Network
switch
Error Detection SchemesError Detection SchemesSwitch-to-Switch Error Detection
Similar to end-to-end only done at each switchCan be done at packet or flit level
switch-switch flit w/ parity, switch-switch flit w/ CRC– Each flit contains its own check bits
switch-switch packet w/ parity, switch-switch packet w/ CRC– Check bits added to tail flit
Need two sets of buffersRegular operation (queuing buffers)Storing packets not acknowledged from receiver (for retransmission)Capacity as w/ queuing buffers (2NL+1 for flit level) and (2NL+f for packet level of f flits/packet) ACK/NACK can be a single wire now
Switch to Switch RetransmissionSwitch to Switch Retransmission
TMRnodatavalid
ACK
mf
mf
mf
data
credit signal
buffers)(queuing + retransmissioncircular buffers
DecoderDecoder receiver NI
switch Bswtich A
packet buffers
sender NI
Core Core
Encoder
switch
ObservationsObservationsFor End-End and Hybrid schemes
Power consumption mainly in buffers at NICommunication pattern can provide buffer requirement information
Increase in traffic (ACK/NACK packets)Merge multiple ACKs/NACKs in a single packet?
For Switch-Switch (packet/flit) schemesRetransmission Buffers responsible for most power consumption
Efficient Buffer allocation based on application demands needs to be explored
Out-of-Order arrival and Duplicate Rejection Mechanisms also necessary and consume power overhead.
Not efficient to block network traffic while waiting for a retransmission packet
ObservationsObservations
End-End more efficient when length of link is large (multi-cycle links)
Switch-Switch more efficient when short link and when hop-count is high (NI buffering an issue)
Low error rates result in similar performance results,
Higher error rates favor the hybrid mechanism
End-End a subset of hybrid scheme, hence we can selectively disable correction circuitry
Hierarchical networks Switch-based for local communicationEnd-Based for global communication
Case Study:
A Neural NetworkCase Study:
A Neural Network
Input Receptive Fields
Output Neuron
10x10
10x10 10x10
10x10
5x55x5
5x55x5
5x55x55x55x5
5x20
1st Hidden Layer of Neurons
2nd Hidden Layer of Neurons
5x205x205x205x205x20
Input Image
FACE ?
Lighting Correction
From Spaghetti Wires to NoC!From Spaghetti Wires to NoC!
Traffic flow – Regular!
Some computation values initialized during configuration
Data in and out through(I/O ingress/egress) nodes
Research Ideas -
ChallengesResearch Ideas -
ChallengesPerformance
Throughput, Bandwidth, Frequency
EnergyReduce # of hops, packets, optimizations on individual components
ReliabilityTransient errors (e.g. soft errors) occurring within a routerEnergy reduction via Application-Specific characteristics
What about the application space?Applications that benefit from NoC ImplementationsCases where NoC overhead is a negative factor?
Vision and Multimedia applications are huge benefactors!
3D Chip Design3D Chip Design
New Challenges = New Opportunities
How about the third dimension?
3D Stacking = Increased Locality! 3D Stacking = Increased Locality!
Many more neighbors within a few minutes of reach!
Device Layer 2Vertical Interconnect
Silicon
1
Multiple layers of active devices
Vertical interconnects between layers
Device Layer
Silicon
1
Courtesy: K.Bernstein, IBM
2D Chip
3D Chip
Layer 1
Layer 2
Chip-Level 3D IntegrationChip-Level 3D Integration
Reduced Global Interconnect LengthReduced Global Interconnect Length
L
L
Delay/Power Reduction
Bandwidth Increase
Smaller Footprint
Mixed Technology Integration
3D Benefit: Increased Locality3D Benefit: Increased Locality
CPU Nodes within 1 hopNodes within 2 hops Nodes within 3 hops
Bus-based Inter-Layer Communication (dTDMA Bus Pillar)
2D vicinity
3D vicinity
3-D Networks on Chip3-D Networks on Chip
To NoC or not to NoC ?To NoC or not to NoC ?
Adopting just any net feature for NoC may be a mistake
You can create an elegant regular topologyBut ASICs are irregular
You can create a non-blocking networkBut hot spots can block networks of infinite capacity
You can guarantee service (it’s easy to verify)But extremely hard to configure. Best Effort is simpler
You can use lots of buffersAnd dissipate lots of power
You can create complex routingFixed, simple single-path routing saves energy and area
You can try to balance trafficSingle-path routing works better with links of uneven capacity
You can make packets conflict with each otherBetter use priority levels and pre-emption