ee 587 soc design & test partha pande school of eecs washington state university...
TRANSCRIPT
SoC Physical Design Issues
Interconnect Architectures and Signal Integrity
Design Challenges
1. Non-scalable global wire delay
2. Moving signals across a large die within one clock cycle is not possible.
3. Current interconnection architecture- Buses are inherently non-scalable.
4. Transmission of digital signals along wires is not reliable.
Bus – non scalability
Clock cycle depends on the parasitic and bus length
Multiple bus segments
•More than one design iteration
•Converges to network
Bus Architectures
Split Bus Architecture
),(),()(
),(
),([
5.02
1 221
2 ,22
1 ,11
2
ijBUSi BUSj
jiBUSBUS
BUSi jiBUSjjiBUS
BUSi jiBUSjjiBUS
MMxferMMxferCC
MMxferC
MMxferC
swVE
Achievable Clock Cycle in a Bus segment
Minimize Power Consumption
Modification of interconnect architectures
Incorporate parallelism (ITRS 2003 & ISSCC 2004) Decoupling of communication and processing Modular architecture
Minimize use of global wires Locality in communication
SoC Micro architecture Trend
50-100K gates block – No global wire delay problem. Block-based hierarchical design style that uses block sizes of
50-100K gates. Single synchronous clock regions will span only a small fraction
of the chip area. Different self-synchronous IPs communicate via network-
oriented protocols. Structured network wiring leads to deterministic electrical
parameters - reduces latency and increases bandwidth. Failures due to inherent unreliable physical medium can be
addressed by introducing error correction mechanisms.
New design paradigm
New designs – very large number of functional blocks Moving bits around efficiently
• Develop on-chip infrastructure to solve future inter-block communication bottlenecks
Development of infrastructure IPs
• SoC = (SFIP + SI2P)
Silicon Back plane
MIPS SoC-it
The network-on-chip paradigm
Driven by
Increased levels of integration Complexity of large SoCs
– New designs counting 100s of IP blocks
Need for platform-based design methodologies DSM constraints (power, delay, time-to-market, etc…)
Decoupling of functionality from communication Dedicated infrastructure for data transport
High-bandwidthmemory interface
High-performanceARM processor
High-bandwidthARM processor
DMA Busmaster
BRI
DGE
UART
PIOKeypad
TimerAHB APB
NoC infrastructure
switch link
NoC Features
Some Common Architectures
(a) Mesh, (b) Folded-Torus (FT) and (c) Butterfly Fat Tree (BFT)
- F unc tio nal IP - S w itc h
(a) (b )
(c )
Data Transmission
Packet-based communication Low memory requirement
Packet switching Wormhole routing
Packets are broken down into flow control units or flits which are then routed in a pipelined fashion
Connecting Different IP Blocks Using Tree Architecture
Communication Pipelining
• Need to constrain the delay of each stage within 15 FO4
Signal Integrity
According to ITRS signal integrity will become a major issue in future technologies
Causes for such inherent unreliability Shrinking geometries, layout dimensions
Reduction in the charge used for storing bits
Increased probability of transient events like:
Crosstalk
Ground Bounce
Alpha particle hits
Micro network Protocol Stack
On Chip Signal Transmission
Future global wires will function as lossy transmission lines Reduced-swing signaling Noise due to crosstalk, electromagnetic interference, and other
factors will have increased impact. it will not be possible to abstract the physical layer of on-chip
networks as a fully reliable, fixed-delay channel At the micro network stack layers atop the physical layer, noise
is a source of local transient malfunctions.
Coding Schemes
Low-Power Coding
Reducing self-transition activity Crosstalk Avoidance Coding
Reducing Coupling with adjacent lines Error Control Coding
SEC, SECDED
Low Power Coding
Reduction of self-transition activity Bus-Invert Code Data is inverted and an invert bit is sent to the decoder if the
current data word differs from the previous data word in more than half the number of bits
Effectiveness decreases with increase in bus width
Error Control Coding
Linear block codes (n, k) linear block code, a data block, k bits long, is mapped
onto an n bit code word, Forward Error Correction or Automatic Repeat Request Redundant wires Possibility of voltage reduction Energy efficiency is an important criterion Codec overhead
Worst Case Crosstalk
Transition from 101 to 010 pattern or vice versa
Due to Miller Capacitance worst case capacitance between adjacent wires become
Victim Rise Time
Aggressor Rise Time
Victim Wire
Aggressor Wire 2
0
1
1
0
Aggressor Wire 1
0
1
LC41
Joint Crosstalk Avoidance and Single Error Correction Codes
Reduce crosstalk as well correct errors due to other transient events
Duplicate Add Parity (DAP) Dual Rail Code (DR) Boundary Shift Code (BSC) Modified Dual Rail Code (MDR)
Worst case crosstalk capacitance is reduced to (1+2λ)CL
Duplicate-Add-Parity Code
Each bit is duplicated A parity bit from one
copy is computed Same as Dual Rail
Code
Crosstalk Avoidance Double Error Correction Code (CADEC)
The 32-bit flit is Hamming coded and then an overall parity is calculated
All bits apart from the overall parity are duplicated
The 32 bit original flit becomes 77 bits
Minimum Hamming distance is 7
Worst case crosstalk capacitance is reduced to (1+2λ)CL
(38,32)Hamming encoding
32 38
38 parity, bit76
bit 0
bit 1
bit 2
bit 3
bit 4
bit 5
bit 6
bit 7
bit 74
bit 75
32 bit i/p
77 bit o/p
Hamming encoding
DAP duplication
Energy Savings with Joint Codes
Due to increased error resilience lower noise margins can be tolerated and hence operating voltage can be reduced
Coding adds overhead in terms of extra wires and codec
Voltage Swing Reduction for CADEC
10-20
10-10
0.4
0.5
0.6
0.7
0.8
0.9
1EDDAPCADEC
V
Word error rate
The probability of word error for DAP 2
2
)1(3
kkPDAP
32 )4()( nnPCADEC
Energy Savings with CADEC
2010
Communication Pipelining
Inter- and Intra-switch stages
Pipelined Data Transfer
inte
r-sw
itch
li
nk
inte
r-sw
itch
li
nk
inte
r-sw
itch
li
nk
dec
od
er
enco
der
dec
od
er
enco
der
intra-switch pipelined stages
intra-switch pipelined stages
Latency Characteristics
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load
Ave
rag
e M
essa
ge
Lat
ency
(C
ycle
s)
UncodedCoded
•The codes should be optimized
It can be merged with existing stages No Latency penalty
Adaptive Supply Voltage Links
Dynamic Voltage Scaling (DVS) DVS schemes dynamically adjust the processor clock frequency
and supply voltage to just meet instantaneous performance requirement, making the system energy aware.
communication architectures display a wide variance in their utilization depending on the communication patterns of applications
adapts the link’s frequency and supply voltage in accordance with the instantaneous traffic bandwidth.
Repeater Insertion & Coding
Repeater insertion reduces interconnect wire delay Increases power dissipation due large drivers CACs reduce coupling capacitance Joint repeater insertion and CAC is a promising solution to
reduce power in global wires
Repeater Insertion & Coding
Reference: A low-Power Bus
Design Using Joint Repeater Insertion and
Coding
130 nm
Repeater Insertion & Coding
45 nm
Reliability
Crosstalk, electromigration,material ageing…. Transient failures
Error control coding Crosstalk avoidance coding Power, area trade-off
Permanent failures
Spare switches and links Overall routing complexity Effect on system performance