creating alternate title designs借助sdsoc快速開發 複雜的嵌入式應用 may 2017
TRANSCRIPT
借助SDSoC快速開發複雜的嵌入式應用
May 2017
© Copyright 2017 Xilinx.
SoC application-like programming
Tools and IP for system-level
profiling
Rapid system performance
estimation by fast SW/HW
partitioning and implementation
Full system optimizing compiler
What Is
C/C++ Development
System-level Profiling
Specify C/C++ Functions
for Acceleration
Full System Optimizing Compiler
ARM Codemain()
Connectivity
GCC Vivado
Acceleratorfunc()
Page 2
© Copyright 2017 Xilinx.
What Is
C/C++ Development
System-level Profiling
Specify C/C++ Functions
for Acceleration
Full System Optimizing Compiler
ARM Codemain()
Connectivity
GCC Vivado
Acceleratorfunc()
“With SDSoC, I was able to complete
a full Zynq design in 4 days.
Then, I did the same design
without SDSoC and took me 3 weeks
with the same QoR.”
Daniele B, Xilinx DSP Specialist for EMEA
21 4Days Days
Page 3
© Copyright 2017 Xilinx.
Typical SoC Development Flow and Challenges
APP(){
funcA();
funcB();
funcC();}
HW-SW partition?
funcA funcB, funcC
HW-SW Connectivity?
funcA funcB,
funcC
Datamover
PS-PL interfaces
SW drivers
Processing System (PS) Programmable Logic (PL)
• Hard to know final system bottle neck during early system design
System Team
• Manually translating algorithms to HDL takes long time
• Manually implementing data mover network takes time
Hardware Team
• Write drivers for register control interface
• Write and debug device driver of DMA
Software Team
Page 4
© Copyright 2017 Xilinx.
Before SDSoC: HW-SW Partition Exploration
PL
PS
ApplicationSDKC/C++
DriverSDK, OS ToolsC
IP IntegratorIPI projectDatamover
PS-PL interface
IPVivadoHLS
Verilog, VHDL
HW-SW partition
spec
Met
Req
?
Involving multiple discipline to explore architecturePage 5
© Copyright 2017 Xilinx.
After SDSoC: Automatic System Generation
C/C++
Select functions
for PL
PL
PS
IP
Application
Driver
Datamover
PS-PL interface
Met
Req
?
C/C++ to System in hours, days
func1();<-SW
func2();<-HW
func3();<-HW
Page 6
© Copyright 2017 Xilinx.
SDSoC: Makes Everyone More Productive
• Explore HW-SW partitions and architecture rapidly in C/C++
System Team
• No need to translate algorithms to HDL
• No need to implement data mover network
• Build IO system into a re-usable platform
Hardware Team
• Now I can accelerate my code in HW
• No need to write device driver
Software Team
Page 7
© Copyright 2017 Xilinx.
Find Optimal Solution with Fast Iterations
System Team
C/C++
func1();<-SW
func2();<-SW
func3();<-SW
1 2 3SW
HW
One Click to toggle SW and HW implementation
One line #pragma to change data port and data mover type
C/C++
func1();<-SW
func2();<-HW
func3();<-HW
1
2 3
ACP, DMA
C/C++
func1();<-SW
#pragma SDS
func2();<-HW
func3();<-HW
1
2 3
HP, SGDMA
Page 8
© Copyright 2017 Xilinx.
Look into Details by Event Trace Tool
System Team
Enable by one click during compile time
Differentiate among SW, Data Mover and Accelerator IP
Page 9
© Copyright 2017 Xilinx.
Automatic HW & SW Connection
Software Team
Hardware Team
Auto generate register access device driver
Auto generate data motion DMA driver
Easy to integrate RTL based IP into SDSoC using C-
callable IP
PL
PS
HDL IP
Application
Driver
Datamover
PS-PL interface
Page 10
© Copyright 2017 Xilinx.
Architecture-aware Algorithm Implementation
Hardware Team
Algorithm spec in C/C++
Functional specification
Translate to RTL
Optimize RTL
Algorithm spec in C/C++
Optimize C/C++
Traditional Flow
SDSoC Flow
Manual RTL translation is not required
Algorithm and HW team work together to optimize IP
Optimize RTL
Page 11
© Copyright 2017 Xilinx.
Easy to Create Base Platform for Custom Boards
Application
Driver
Interface
IPs
Interface
IPs
Application
Driver
AXI Bus
Platform
Processing Systems (PS)
Programmable Logic (PL)
Platform = Vivado project + Bootable software images
– HW: define AXI interface and interrupt in Vivado project in tcl
– SW: define boot images (FSBL, kernel, rootfs, etc) in GUI tool
Software Team
Hardware Team
Page 12
© Copyright 2017 Xilinx.
Easy to Create Base Platform
Platform = Vivado project + Bootable software images
SDSoC generated IP can hook to AXI in base platform
Software Team
Hardware Team
Application
Driver
Interface
IPs
Interface
IPs
Application
Driver
AXI Bus
Platform
Application
Driver
IP IP IP IP
Connectivity
Generated
Page 13
© Copyright 2017 Xilinx.
Examples and Use Cases
Page 14
© Copyright 2017 Xilinx.
main(){
imread(A);
imread(B);
denseOpticalFlowPyrltr(A,B,out);
imshow(out);}
MIPI
AXIPS
PL
Linux
Libraries
Application
Drivers
denseOptical
FlowPyrltrHDMI
Xilinx ZU9
Frames/s 60
Power (W) 4.8
Latency (ms) 16.7
Utlization 15%
• Benchmarks do not include the camera inputs and HDMI/DP
• LK dense optical flow, non-pyramidal, non-iterative, Window size 53x53
DMA
AXI-S
ZCU102 EV
Platform
SDSoC
Generated
Platform
DMA
AXI-S
Page 15
Computer Vision Design Example4K60 Dense Optical Flow
© Copyright 2017 Xilinx.
main(){
imread(A);
imread(B);
stereoRectify(A,B,C,D);
stereoLBM(C,D,out);
imshow(out);}
USB3
AXIPS
PL
SDSoC
Generated
Platform
Stereo
LBM
DMA
AXI-S
Stereo
Rectify
Linux
Libraries
Application
Drivers
Xilinx ZU9
Frames/s 140
Power (W) 4.8
Latency (ms) 7.1
Utlization 14%
• SAD based stereo localBM
• Benchmarks do not include the camera inputs and HDMI/DP outputs
HDMI
ZCU102 EV
Platform
Page 16
Computer Vision Design ExampleStereo Disparity Map
© Copyright 2017 Xilinx.
Zynq UltraScale
Target Device ZU4EV or ZU5EV
< 6W total power
Linux based OS
H.264 HP, H.265 encoder
Use Case: Surveillance
4K60
H.265
DDR3/4
CNN (Face Detect)CMOS
Sensor
VCU
PS
ISPLVDS
DDR3/4
Overlay
ENET
DP
CNN (Face Detect)
Page 17
© Copyright 2017 Xilinx.
Use Case: Machine Vision
Target Device ZU2 or ZU3
Linux based OS
Zynq UltraScale
4K60
DDR3/4
CMOS
Sensor
PS
ISPLVDS
DDR3/4
USB3
1G Eth
GTHFeature
Extract
Color
Inspect
GigE
Coax GTH
Page 18
© Copyright 2017 Xilinx.
Use Case: Drone/UAV
Target Device ZU4EV, ZU5EV or ZU7EV
Linux based OS
Zynq UltraScale
2K60
DDR3/4
CMOS
Sensor
ISP
MIPI
DDR3/4
PS
Radio
DPOptical
ADC
2K60CMOS
Sensor
4K60CMOS
Sensor
4K60CMOS
Sensor
MIPI
MIPI
MIPI
CNN
Stereo
Vision
VCUH.265
Page 19
© Copyright 2017 Xilinx.
Use Case: AR
Target Device ZU4EV, ZU5EV or ZU7EV
Linux based OS
Zynq UltraScale
DDR3/4
ISP
Gyro/
Accer
DDR3/4
PS DPPosition
tracking
2K60CMOS
Sensor
4K60CMOS
Sensor
4K60CMOS
Sensor
MIPI
MIPI
MIPI
CNN (Eye
tracking)
Stereo
Vision
VCU
H.265
WiFi
Gesture
Page 20
© Copyright 2017 Xilinx.
Embedded Vision Development Kits
Base Zynq Board ZCU102 ZCU106 ZC702 ZC706
Device ZU9 (16nm) ZU7 (16nm) Z7020 (28nm) Z7045 (28nm)
CPU Quad Cortex A53 up to 1.5GHz Dual Cortex A9 up to 1.0GHz
Peak GOPS @ INT8 7857 5386 571 2331
On-chip RAM (Mbits) 32.1 38.0 4.9 19.1
Inputs USB3, MIPI, HDMI USB3, MIPI, HDMI HDMI* HDMI*
Outputs HDMI, DisplayPort HDMI, DisplayPort HDMI HDMI
Video Codec Units No 4K60 Encode/Decode No No
reVISION Support xFopencv, xFdnn xFopencv, xFdnn xFopencv, xFdnn xFopencv, xFdnn
Sensor Inputs Sony IMX274 Quad OnSemi AR0231 StereoLab Zed Stereo eCon camera
Spec 3840x2160 @ 60 FPS 1920x1080 @ 30 FPS 3840x1080 @ 30 FPS 1920x1080 @ 60 FPS
Interface MIPI via FMC MIPI via FMC USB3 USB3
* Requires an HDMI IO FMC card
Page 21
© Copyright 2017 Xilinx.
SDSoC 2016.4 Enhancements
Zynq Ultrascale+ MPSoC enhancements
– Full support to zcu102 platform, with additional clocks
– Increased high performance HP port data width to 128-bits
– Support for custom Zynq Ultrascale+ MPSoC platforms
System compiler enhancements
– Support of packed structs and scalar widths up to 1024-bits (was 32-bits)
– Support for HLS dataflow and multi-buffered BRAM-mapped array
arguments at the function top level
– Support for SG-DMA on MIG accessible DDR
HW/SW event trace support for async hardware functions
Page 22
© Copyright 2017 Xilinx.
Application Development
Algorithm Development
Platform Development
CNNGoogLeNet
SSD
FCN …
DNN
Page 23
© Copyright 2017 Xilinx.
Algorithm
to RTL
Bitstream
Generation
OpenCV
Apps
ML Apps
Traditional RTL flow
System
Integration
Ease of Use
OpenCVMachine Learning
Tim
e o
n A
pp
sT
ime
on
Pla
tfo
rm
Removing the Barrier to Broad Adoption
Page 24
© Copyright 2017 Xilinx.
SDSoC Resources
Access SDSoC Portal for learning materials
– http://www.xilinx.com/sdsoc
Page 25
© Copyright 2017 Xilinx.
SDSoC Basic Usage
– UG1028: SDSoC Tuturial
– UG1127: SDSoC Environment User Guide
– Video: SDSoC Development Environment Demo
Custom Platform Creation
– UG1146: Platform Development Guide
– Video: SDSoC Custom Platform Generation
C-Callable IP
– UG1127 Chapter 5
SDSoC Optimization Guide
– UG1235
Documents
Page 26
© Copyright 2017 Xilinx.
Conclusion: Faster Time-to-Market using SDSoC
Page 27
• Explore HW-SW partitions and architecture rapidly in C/C++
System Team
• Build IO system into a re-usable platform for software team
• Help software team to optimize C code rather than translate C into RTL manually
Hardware Team
• Focus on algorithm function and performance
Software Team
© Copyright 2017 Xilinx.
Page 28