exploring computation- communication tradeoffs in camera ...amrita/slides/iiswc17.pdf · exploring...
TRANSCRIPT
![Page 1: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/1.jpg)
Exploring Computation-Communication Tradeoffs in Camera Systems
1
Amrita MazumdarThierry MoreauSung KimMeghan Cowan
Armin AlaghiLuis CezeMark OskinVisvesh Sathe
IISWC 2017
![Page 2: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/2.jpg)
video surveillance cameras
3D-360 virtual reality camera rig
Camera applications are a prominent workload with tight constraints
2
large data size
large data size
energy harvesting camera
augmented reality glasses
light weight
light weight
real-time processing
real-time processing
real-time processing
low-power
low-power
![Page 3: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/3.jpg)
Hardware implementations compound the camera system design space
constraint
power
time size
bandwidth
implementation
ASICFPGA
DSPCPU
GPU
3
camera system
DogChat™
![Page 4: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/4.jpg)
We can represent camera applications as camera processing pipelines to clarify design space exploration
sensor block 1 block 2 block 3 block 4
4
functions in the application
![Page 5: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/5.jpg)
5
DogChat™
sensor image processing
face detection
feature tracking
image rendering
We can represent camera applications as camera processing pipelines to clarify design space exploration
![Page 6: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/6.jpg)
6
DogChat™
sensor image processing
face detection
feature tracking
image rendering
offloaded to cloud
Developers can trade off between computation and communication costs
![Page 7: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/7.jpg)
7
DogChat™
Developers can trade off between computation and communication costs
offloaded to cloudin-camera processing
sensor image processing
face detection
feature tracking
image rendering
![Page 8: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/8.jpg)
8
Optional and required blocks in camera pipelines introduce more tradeoffs
edge detection
motion detection
motion tracking
required
optional
sensor image processing
face detection
feature tracking
image rendering
![Page 9: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/9.jpg)
sensor image processing
face detection
feature tracking
image rendering
edge detection
motion detection
motion tracking
Custom hardware platforms explode the camera system design space
9
ASIC
FPGADSP CPU
GPUDSP
FPGA
required
optional
![Page 10: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/10.jpg)
sensor image processing
face detection
feature tracking
image rendering
edge detection
motion detection
motion tracking
Custom hardware platforms explode the camera system design space
10
ASIC
FPGADSP CPU
GPUDSP
FPGA
required
optional
In-camera processing pipelines can help us evaluate these tradeoffs!
![Page 11: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/11.jpg)
Challenges for modern camera systems
Low-power: face authentication for energy-harvesting cameras with ASIC design
Low latency: real-time virtual reality for multi-camera rigs with FPGA acceleration
11
motion detection
face detection
neural network
prep align depth
stitch
![Page 12: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/12.jpg)
Challenges for modern camera systems
Low-power: face authentication for energy-harvesting cameras with ASIC design
Low latency: real-time virtual reality for multi-camera rigs with FPGA acceleration
12
motion detection
face detection
neural network
prep align depth
stitch
![Page 13: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/13.jpg)
Face authentication with energy harvesting cameras
WISP Cam energy-harvesting camera
powered by RF1 frame / second
~1 mW processing / frame
13
![Page 14: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/14.jpg)
Is this Armin? ✅
14
Face authentication with energy harvesting cameras
![Page 15: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/15.jpg)
sensor neural network
other application functions
on-chip CPU cloud
15
CPU-based face authentication neural networks can exceed WISPcam power budgets
![Page 16: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/16.jpg)
sensor neural network
other application functions
ASIC hardware cloud
16
adding optional blocks can reduce power consumption for a neural network
face detection
motion detection
on-chip circuit
CPU-based face authentication neural networks can exceed WISPcam power budgets
![Page 17: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/17.jpg)
Exploring design tradeoffs in ASIC accelerators
Evaluated NN topology and hardware impact on energy and accuracy
Selected a 400-8-1 network topology and used 8-bit datapaths for optimal energy/accuracy point
17
SNNAP
DMA Master
Bus Scheduler
PU
SRAM
control
PE
PE
SIG
... MUL MUL MUL MUL
weight weight weight weightd_in
ADD ADD ADD ADD
offset88 88 88 88
16 16 16 16acc.fifo
sig.fifosigmoid unit
26 26 26 26
26 26 26 26acc
1626
8 26acc
PE0 PE1 PE2 PE3
8
d_out
feature unit
integral accumulatorVJ
integral image accumulator
classifier unit
window buffer
stage unitthreshold unit
feature unit
pixels in
input row
integral row output
4 41
+= 2 311 116 72
threshold
‘yes’ weight‘no’ weight
a db c
++ - x
+ +
>- x
- x
weight1++
a db cweight2
++
a db cweight3
previous row
+ + +
Streaming face detection accelerator
Explored classifier and other algorithm parameters to optimize energy optimality
neural network face detection
many more details in paper!
![Page 18: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/18.jpg)
Synthesized ASIC accelerators in Synopsys
Constructed simulator to evaluate power consumption on real-world video input
Computed power for computation and transfer of resulting data for each pipeline configuration
18
EvaluationWhich pipeline achieves the lowest overall power?
![Page 19: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/19.jpg)
Which pipeline achieves the lowest power consumption?
19
platform configuration compute transfer
sensor <1% >99%
sensor motion <1% >99%
sensor face detect 10% 90%
sensor NN 16% 84%
sensor motion face detect >99% <1%
sensor motion NN >99% <1%
sensor face detect NN >99% <1%
sensor motion face detect NN >99% <1%
log Power (µW)1 1000 1000000
160
419
257,236
132
782,090
374
3,731
11,340
(ratios)
![Page 20: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/20.jpg)
Which pipeline achieves the lowest power consumption?
20
platform configuration compute transfer
sensor <1% >99%
sensor motion <1% >99%
sensor face detect 10% 90%
sensor NN 16% 84%
sensor motion face detect >99% <1%
sensor motion NN >99% <1%
sensor face detect NN >99% <1%
sensor motion face detect NN >99% <1%
log Power (µW)1 1000 1000000
160
419
257,236
132
782,090
374
3,731
11,340
(ratios)
prefilters reduce overall power
![Page 21: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/21.jpg)
Which pipeline achieves the lowest power consumption?
21
platform configuration compute transfer
sensor <1% >99%
sensor motion <1% >99%
sensor face detect 10% 90%
sensor NN 16% 84%
sensor motion face detect >99% <1%
sensor motion NN >99% <1%
sensor face detect NN >99% <1%
sensor motion face detect NN >99% <1%
log Power (µW)1 1000 1000000
160
419
257,236
132
782,090
374
3,731
11,340
(ratios)
just using NN
prefilters with NN use less power
![Page 22: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/22.jpg)
Which pipeline achieves the lowest power consumption?
22
platform configuration compute transfer
sensor <1% >99%
sensor motion <1% >99%
sensor face detect 10% 90%
sensor NN 16% 84%
sensor motion face detect >99% <1%
sensor motion NN >99% <1%
sensor face detect NN >99% <1%
sensor motion face detect NN >99% <1%
log Power (µW)1 1000 1000000
160
419
257,236
132
782,090
374
3,731
11,340
(ratios)
most power-efficient
most power-efficient with on-chip NN
![Page 23: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/23.jpg)
In-camera processing for face authentication
In isolation, even well-designed hardware can show sub-optimal performance
Optional blocks can improve the overall cost,if they balance compute and communication
better than the original design
23
motion detection
face detection
neural network
![Page 24: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/24.jpg)
Challenges for modern camera systems
Low-power: face authentication for energy-harvesting cameras with ASIC design
Low latency: real-time virtual reality for multi-camera rigs with FPGA acceleration
24
motion detection
face detection
neural network
prep align depth
stitch
![Page 25: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/25.jpg)
Challenges for modern camera systems
Low-power: face authentication for energy-harvesting cameras with ASIC design
Low latency: real-time virtual reality for multi-camera rigs with FPGA acceleration
25
motion detection
face detection
neural network
prep align depth
stitch
![Page 26: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/26.jpg)
26
16 GoPro cameras 4K-30 fps
3.6 GB/s raw video
Goal: 30 fps
3D-360 stereo video 1.8 GB/s output
Producing real-time VR video from a camera rig
![Page 27: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/27.jpg)
27
16 GoPro cameras 4K-30 fps
3.6 GB/s raw video
Goal: 30 fps
3D-360 stereo video 1.8 GB/s output
Producing real-time VR video from a camera rig
cloud processing prevents real-
time video
![Page 28: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/28.jpg)
28
offloaded to cloud
prep image align
depth from flow
image stitchsensor stream
to viewer
VR pipeline is usually offloaded to perform heavy computation
5% 20% 70% 5%
processing time
need to accelerate “depth from flow” to achieve high
performance
![Page 29: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/29.jpg)
29
prep image align
depth from flow
image stitchsensor stream
to viewer
Offloading before the costly step doesn’t avoid compute-communication tradeoffs
Vide
o Fr
ame
Size
(MB)
0
150
300
450
600image alignment step produces significant
intermediate data
offloading early on is still 2x final output
size
![Page 30: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/30.jpg)
Evaluation
30
Designed a simple parallel accelerator for Xilinx Zynq SoC, simulated for Virtex UltraScale+
Evaluated against CPU and GPU implementations in Halide
Assumed 2GB/s network link for communication
Which pipeline achieves the highest frame rate?
implementation details in paper
![Page 31: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/31.jpg)
Which pipeline achieves the highest frame rate?
31
pipeline configuration compute transfer
sensor 100 15.8
sensor prep 100 15.8
sensor prep align 100 3.95
sensor prep align depth (CPU) 0.09 5.27
sensor prep align depth (GPU) 11.2 5.27
sensor prep align depth (FPGA) 174 5.27
sensor prep align depth (CPU) stitch 0.09 31.6
sensor prep align depth (GPU) stitch 11.2 31.6
sensor prep align depth (FPGA) stitch 174 31.6
effective FPS0 7 14 21 28 35
31.6
11.2
0.1
5.3
5.3
0.1
4.0
15.8
15.8
.09
.09
(FPS)
![Page 32: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/32.jpg)
Which pipeline achieves the highest frame rate?
32
pipeline configuration compute transfer
sensor 100 15.8
sensor prep 100 15.8
sensor prep align 100 3.95
sensor prep align depth (CPU) 0.09 5.27
sensor prep align depth (GPU) 11.2 5.27
sensor prep align depth (FPGA) 174 5.27
sensor prep align depth (CPU) stitch 0.09 31.6
sensor prep align depth (GPU) stitch 11.2 31.6
sensor prep align depth (FPGA) stitch 174 31.6
effective FPS0 7 14 21 28 35
31.6
11.2
0.1
5.3
5.3
0.1
4.0
15.8
15.8
.09
.09
(FPS)
CPU results are slowest
![Page 33: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/33.jpg)
Which pipeline achieves the highest frame rate?
33
pipeline configuration compute transfer
sensor 100 15.8
sensor prep 100 15.8
sensor prep align 100 3.95
sensor prep align depth (CPU) 0.09 5.27
sensor prep align depth (GPU) 11.2 5.27
sensor prep align depth (FPGA) 174 5.27
sensor prep align depth (CPU) stitch 0.09 31.6
sensor prep align depth (GPU) stitch 11.2 31.6
sensor prep align depth (FPGA) stitch 174 31.6
effective FPS0 7 14 21 28 35
31.6
11.2
0.1
5.3
5.3
0.1
4.0
15.8
15.8
.09
.09
(FPS)
Data size is too big after depth for
offloading
![Page 34: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/34.jpg)
Which pipeline achieves the highest frame rate?
34
pipeline configuration compute transfer
sensor 100 15.8
sensor prep 100 15.8
sensor prep align 100 3.95
sensor prep align depth (CPU) 0.09 5.27
sensor prep align depth (GPU) 11.2 5.27
sensor prep align depth (FPGA) 174 5.27
sensor prep align depth (CPU) stitch 0.09 31.6
sensor prep align depth (GPU) stitch 11.2 31.6
sensor prep align depth (FPGA) stitch 174 31.6
effective FPS0 7 14 21 28 35
31.6
11.2
0.1
5.3
5.3
0.1
4.0
15.8
15.8
.09
.09
(FPS)
full pipeline with FPGA is only one
that achieves real-time frame rate
![Page 35: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/35.jpg)
In-camera processing for real-time VR video
Computation and communication together highlight benefits not seen when considered separately
For VR video, in-camera processing pipelines enable applications that could not even be achieved via
cloud offload
35
prep align depth
stitch
![Page 36: Exploring Computation- Communication Tradeoffs in Camera ...amrita/slides/iiswc17.pdf · Exploring design tradeoffs in ASIC accelerators Evaluated NN topology and hardware impact](https://reader034.vdocuments.site/reader034/viewer/2022050119/5f4f3e6566f92a00e7209d2b/html5/thumbnails/36.jpg)
In-camera pipelines evaluate computation-communication trade-offs
Use hardware-software co-design to balance constraints and optimize designs
Achieve optimal performance by considering bottlenecks in context of full system
In-camera processing pipelines help characterize camera systems
Thank you!