programmable soc aka fpga · 2014-09-24 · wired comms all programmable smarter • multiple...
Post on 05-Jul-2020
10 Views
Preview:
TRANSCRIPT
© Copyright 2013 Xilinx .
Ivo Bolsens, Senior Vice President & CTO
Beating Moore’s Law with the All
Programmable SOC aka FPGA
Page 1
© Copyright 2013 Xilinx .
Moore’s Law
© Copyright 2013 Xilinx .
Node Name, Chip Dimensions or Marketing?
3
Metal Half Pitch
Node Name
Gate Length
16/14nm 20nm
28nm
40nm
65nm
90nm
Metal Pitch
Metal Half Pitch
IEEE Spectrum Nov 2013: The End of the Shrink
© Copyright 2013 Xilinx .
Scalable Architecture : FPGA
2013 : FPGA = UltraSCALE SOC
Logic
IO
Memory
Processing
Serdes
DSP
Modular
Hierarchical
Re-Use
2001 : FPGA =
Glue Logic
© Copyright 2013 Xilinx .
Moore’s Law is About Logic, IO and Memory
5
I/O SerDes Logic
ESD
DCAP
Dri
ver
Inductor Bond pad
© Copyright 2013 Xilinx .
Non-Uniform Block Scaling
6
0%
20%
40%
60%
80%
100%
65 40 28 20
AR
EA
SC
AL
ING
NODE
Logic/IO/GT Area Scaling
Logic
IO
GT3
.8G
b/s
32G
b/s
12.5
Gb
/s
6.6
Gb
/s
330
Mb
/s
106
6M
b/s
186
6M
b/s
2400M
b/s
© Copyright 2013 Xilinx .
Mix Has Also Changed, Putting More Pressure
on Area
7
0%
20%
40%
60%
80%
100%
65 40 28 20NODE
Logic/IO/GT Area Breakdown
GT
IO
Logic
Are
a S
cali
ng
© Copyright 2013 Xilinx .
What is Driving the Bandwidth Increase?
© Copyright 2013 Xilinx .
2.2X/2 years
Cable Access Growth Exceeds Moore’s Law
(Nielsen’s Law)
© Copyright 2013 Xilinx .
Similar Story in USB, WLAN and Wireless
2.5X/2 years
© Copyright 2013 Xilinx .
Driven by Streaming Video
HD 2-5Mb/s
UHD 25Mb/s
© Copyright 2013 Xilinx .
What is impact on FPGA
© Copyright 2013 Xilinx .
Page 13
Maximum Density Of Xilinx FPGA By Node
Node (nm)
kL
C’s
331
759
2000
4400
200
400
800
1600
3200
6400
20nm 28nm 40nm 65nm
2.4X/Node
© Copyright 2013 Xilinx .
0
5
10
15
20
25
30
35
Page 14
Maximum Xilinx SerDes Rate (by Pin)
Node (nm)
1.8X/Node
20nm 28nm 40nm 65nm
© Copyright 2013 Xilinx .
0.5
1.2
2.8
5.9
0.1
1.0
10.0
1 2 3 4
Aggregate Xilinx SerDes Bandwidth by Node
15
2.3X/Node
Tb
/s
Node (nm)
20nm 28nm 40nm 65nm
© Copyright 2013 Xilinx .
32
64
128
256
512
1 2 3 4
DDR
HMC
Aggregate Xilinx FPGA Memory BW by Node
16
65nm 40nm 28nm 20nm
1.6X/Node
Node (nm)
2.5X/Node
GB
/s
Sandy Bridge (76)
Ivy Bridge (101)
Nehalem (51)
© Copyright 2013 Xilinx .
3-D Stacking, Providing More Silicon Area at
Lower Cost and Lower Power
© Copyright 2013 Xilinx .
Cost Comparison: Monolithic vs Multi-Die
Die Area
Die
Co
st
Monolithic
Multi-Die
“Moore’s Law is really about economics” – Gordon Moore
© Copyright 2013 Xilinx .
Why is First 3D Logic Product an FPGA?
Natural partition using
“long lines”
Very low “opportunity cost”
No 3rd party dependence
“Size matters” to customers
Compelling value proposition
“next generation density in
this generation technology”
© Copyright 2013 Xilinx .
Virtex 2000T: Homogeneous Stacked Silicon
Interconnect Technology (SSIT)
FPGA slice
FPGA Slices
Side-by-Side Silicon
Interposer
Silicon Interposer
>10K routing connections
between slices
~1ns latency
© Copyright 2013 Xilinx .
Silicon Interposer
Microbumps
Through-Silicon Vias
Elements of SSIT
Package Substrate
28nm FPGA Slice 28nm FPGA Slice 28nm FPGA Slice 28nm FPGA Slice
C4 Bumps
BGA Balls
Microbumps • Access to power / ground / IOs
Through-silicon Vias (TSV) • Only bridge power / ground / IOs to C4 bumps
Side-by-Side Die Layout • Minimal heat flux issues
Passive Silicon Interposer (65nm Generation) • 4 conventional metal layers connect micro bumps & TSVs
New!
FPGA
Package
Interposer
Microbumps
TSV C4
Package
Ball
1mm
0.2mm
0.04mm
© Copyright 2013 Xilinx .
22
Interconnects Energy
Efficiency
(pJ/bit)
Inter-Die on Si Interposer 1 0.4
Intra-Package MCM 2 0.54
Inter Package (low loss cable)3 2.6
Short-Reach SerDes on PCB 4 13.7
1 Xilinx SSI technology 2 23.3, ISSCC 2013 (Poulton, Dally, et. al.) 3 23.2 ISSCC 2013 ((Mansuri et al) 4 Pawlowski, Hot Chips 23
ISSCC 2014
Interposer Optimizes Energy/Bit
© Copyright 2013 Xilinx .
7V580T – Dual FPGA Slice with 8x28Gb/s
SerDes Die
Virtex-7 HT @ 28Gbps
Page 23
© Copyright 2013 Xilinx .
1.0
2.0
4.0
8.0
16.0
1 2 3 4
LC Count
SerDes BW
Comms BW
HMC
Moore
DDR BW
Xilinx LC counts have exceeded Moore’s Law and are keeping up
with Communications requirements
SerDes aggregate bandwidth is keeping up with Comms growth
I/O (in particular DRAM) aggregate bandwidth is falling behind
– Serial memory (HMC) is mitigating. Interposer based memory (HBM) will
further mitigate.
Page 24
Takeaways N
orm
alized
Gro
wth
Node (nm)
65nm 40nm 28nm 20nm
© Copyright 2013 Xilinx .
Future Will be More Interesting
© Copyright 2013 Xilinx .
Industry Debates on Transistor Cost
Page 26
© Copyright 2013 Xilinx .
28nm =
2x 45nm
cost
> $170 M
28/22-nm
32-nm
45-nm
65-nm
90-nm
130-nm
180-nm
0 45 90 135 180
($ Million)
Estimated Chip Design Cost, by Process Node, Worldwide, 2011
Design cost ($M)
Mask cost ($M)
Embedded software ($M)
Yield ramp-up cost ($M)
Design Cost
Page 27
© Copyright 2013 Xilinx .
Page 28
Growing Challenge for ASIC & ASSP
Eroding customer confidence in vendors
High cost burden from over design for diverse needs
No ability to differentiate or customize
>50% of Top 16 ASSP Vendors Losing Money
Source – Public reports, Xilinx estimates
2012
© Copyright 2013 Xilinx .
Trend Wireless : Scalable Platforms
Page 29
Coverage
Capacity
Home
Office
Dense
Indoor (Malls,
Transport
Hubs)
Urban
Infill
Wide Area
4-16 Users
<100mW
30-60 Users
<250mW
30-200 Users
<1W
~200 Users
<1-10W
Single Sector
~200 Users/
sector
20-100W
Multi-Sector
Residential
Femto
Enterprise
Femto
Picocell
Microcell
Macrocell
Macrocell +
Active Antennas
Wide Area 2-3x Data
Capacity of RRU
Outdoor Indoor
Source ALU
© Copyright 2013 Xilinx .
Trend Wired : Software Defined
From “Virtualizing the Net” by Jon Turner
Page 30
© Copyright 2013 Xilinx .
Computing
Communicating
Storing
Gaming
Streaming
Latency (ms)
GB, Latency (s)
GB, BW (mean, var)
GB, BW (mean, var)
latency (ms)
Latency (ms), BW (mean)
Trend Services : Different Figures of Merit
© Copyright 2013 Xilinx .
Software Defined Networking gained massive
industry mindshare
The best thing about OpenFlow or SDN, is that it’s brought back a new hope to networking.
Networking is cool again- Jayshree, CEO - Arista Networks
© Copyright 2013 Xilinx .
Trend in Embedded : More Intelligence…Smart
MACHINES THAT UNDERSTAND
The Next Big,
Digital Economy;
‘Smart Energy’ The energy market is undergoing
a major transformation…
Smart Factories For factory management in the future, it will become
essential to strive to implement smart capabilities…
SMART Data Center Revolution
New Opportunities to Control Costs
and Increase Strategic Advantage…
Smart wireless networks
to the rescue Carriers are turning toward more intelligent
network management…
Page 33
© Copyright 2013 Xilinx .
Trend Data Center : Scalability
Big Data Increasing Volume, Velocity, and Variety
Security Both outside and inside
Low power Reduce operation and cooling costs
Page 34
© Copyright 2013 Xilinx .
Impact of trends
(1) Networking
Page 35
New network fabrics
• Faster, Fatter, and Flatter
Software defined
networking
• Software control plane
• Hardware data plane
Content-aware
networking
• Deep packet inspection
• Enhanced security
© Copyright 2013 Xilinx .
Impact of trends
(2) Compute
Page 36
ARM-based microservers
• Improved performance per watt
Hybrid SoC
• CPU+accelerators+fabric
• Cost and power reduction
Larger memory
• Hybrid NVRAM and DRAM
• Latency reduction
© Copyright 2013 Xilinx .
Impact of trends
(3) Storage
Page 37
Specialized functions
• Compression, encryption, memcached
Custom SSD controllers
• Higher performance
• Reduced latency
Data-aware storage
• Integrated database support
• Offload from processor
© Copyright 2013 Xilinx .
Page 38
Programmable & Smart Across All Markets
Embedded
Data Center
Wired Comms
All Programmable Smarter • Multiple Spectrums
• Multiple Standards (LTE, 3G)
• Multiple Levels of QoS
• Self Organizing Networks (SON)
• Cognitive Radio
• Smart Antenna
• Network Function Virtualization (NFV)
• Multiple Stds (400Gb etc.)
• Dynamic QoS Provisioning
• Context Aware Network Services
• Self-Healing Networks
• Video Caching at the Edge
• Software Defined Networks (SDN)
• Multiple Stds (FCoE, iSCSI ...)
• Config Storage (SAN, NAS, SSD…)
• Data Pre-Processing & Analytics
• Virtualized Resource Optimization
• Intelligent Appliances
• Changing Resolutions (MPixel, Fps)
• Emerging Video Stds (UHD, 8K/4K)
• Evolving Video Processing Algorithms
• Object Detection & Analytics
• Automotive Collision Avoidance
• Industrial Machine Vision
Wireless Comms
© Copyright 2013 Xilinx .
The All Programmable Platform
Security : Bit level operations
Packet Processing : Wide Datapaths
DSP Processing : Pipelined Datapaths
Graphics Processing : Parallel Micro-Engines
System Management : Finite State machines
© Copyright 2013 Xilinx .
The Era of Heterogeneous Processing Unit
Page 40
© Copyright 2013 Xilinx .
X86 or ARM Accelerator
Accelerator
Accelerator MicroBlaze micro
engines
uEngine uEngine
uEngine ARM multicore
Programming the ’XSOC’
Platform
Accelerator
Accelerator
Accelerator DSP engines
Dedicated Accelerators
Host
X = Connected
X = Scalable
X = Parallel
X = Heterogeneous
X = Configurable
X = Xilinx
© Copyright 2013 Xilinx .
Delivering All Programmable XSOC Heterogeneous Multi-Processing
Page 42
© Copyright 2013 Xilinx .
Page 43
Programming Accelerators from C/C++
C/C++ Algorithm
Vivado HLS
RTL Design
Vivado
(ISE/EDK)
Bitstream
Enables software programmers
to target Xilinx FPGAs
– Software-programmability
– Portability: 7 series, Zynq
Delivers productivity increase
for RTL designers
– C/C++ level verification and
testbench reuse
– Earlier area/latency reports
– Software-driven design exploration
Directives
More Turns Per Day (Verification and Architecture Exploration)
Report
© Copyright 2013 Xilinx .
From C Algorithm to FPGA Implementation
Video frames/second
0
50
100
150
200
5.1
196
DSP C2FPGA
0
2
4
6
FPGA resources
RTL C2FPGA
FPGA: >38 times better performance than DSP video processor
QOR: C2FPGA equal to or better than RTL synthesis
Ease-of-use: C2FPGA 2x fewer lines of C code than DSP processor
Page 44
Quality of Results
© Copyright 2013 Xilinx .
HW/SW Design Flow
C-compiler
SW-drivers
AXI
Libraries
Wires
C-synthesis
CPU FPGA
Middleware
Concurrent
SW
Hardware
Video Codec CPU
Encryption
LTE Modem Memory Data
Mo
ve
me
nt
Inte
rco
nn
ec
t
Application
Platform
Page 45
© Copyright 2013 Xilinx .
HW/SW Design Flow: SW Programmer View
C-compiler
SW-drivers
AXI
Libraries
Wires
C-synthesis
CPU FPGA
Middleware
Hardware
Concurrent
SW
Video Codec CPU
Encryption
LTE Modem Memory Data
Mo
ve
me
nt
Inte
rco
nn
ec
t
Application
Application Programming Page 46
© Copyright 2013 Xilinx .
Towards Heterogeneous Multi-core
Page 47
FPGA V
ideo
co
de
c
En
cry
ptio
n
Pa
cke
t
Pro
ce
ssin
g
FF
T
Se
arc
h
Application-Specific
ARM Processor
A9 A9
HS-SW Interfacing
Domain Specific API
Hardware / Software partitioning & interfacing
Commercial Software
Ecosystem
OpenCL
C
Compile / Debug
C-HLS
Accelerator synth
© Copyright 2013 Xilinx .
Programmable Platform:
CPU + FPGA Peer Processing
Core
L1 Cache
Shared L2 Cache
DDR
MemCon
Coherency Engine
Over NOC Interconnect
Logic
Coherency Engine
Accel
L1 Cache
Capabilities
Coherent Caches for HW
Coherent Caches for SW
Coherency Management
Core
L1 Cache
Accel
L1 Cache
I/O
Device DMA
Coherency Benefits:
Peer Processing: Direct Cache-2-Cache data movement
Latency: Very low latency access to CPU (FPGA) data
Usability: No SW cache flush needed
SRAM
Page 48
© Copyright 2013 Xilinx .
OpenCL SDK
(Application Programming)
Base Systems
(Domain Customization)
FPGA Boards
OpenCL Domain Specific Platforms
QPI
PCIe Host
Bridge
Packets
In
Packets
Out
DMA
MEMC
AXI
X86
FPGA
Video
In
Video
Out
MEMC
AXI
Zynq
A
9
A
9 Zynq PS
Host code
compiler
Kernel
compiler
(Vivado HLS)
OpenCL Compiler
ZED board Alphadata
© Copyright 2013 Xilinx .
More transistors, more performance, lower power
Architecture Innovations to create Value
– Connectivity
– Granularity
– 3D Integration
All Programmable Platforms : heterogeneous and scalable
– Programmable IO, Memory, Interconnect, DSP, Micro
New Programming Abstractions that support
– Parallelism
– Heterogeneity
Page 50
Conclusions
© Copyright 2013 Xilinx .
51
© Copyright 2013 Xilinx .
Hands-on introduction to Zynq for:
– Technical/non-technical managers
– Hardware/software engineers
– Academics and students
Book divided into three main sections
1) High level introduction to Zynq
• What is it?
• What can I do with it?
• How do I use it?
2) Technical overview
• Embedded system, Zynq, AXI, IP design, HLS,
System Design (Vivado)
3) Operating systems for Zynq
• Background, Linux overview, Linux on Zynq
Page 52
The Zynq Book Embedded Processing with the ARM Cortex-A9 on the
Xilinx Zynq All Programmable SoC
© Copyright 2013 Xilinx .
Page 53
Free PDF of The Zynq Book
From the book’s website
– www.zynqbook.com
Same content as the
hardcopy version of the book
Also download tutorials and
source files from website
Open source license for
academia
– Enables book contents to be
re- used for non-commercial
teaching and research
top related