where tegra meets titan - nvidiaon-demand.gputechconf.com/gtc/2016/presentation/s...distributed...
TRANSCRIPT
![Page 1: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/1.jpg)
Where Tegra meets Titan!
Prof Tom Drummond!
![Page 2: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/2.jpg)
Computer vision is easy!!But first a diversion to 10th Century Persia …!
! ! ! ! ! ! !… and the first recorded game of chess!
![Page 3: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/3.jpg)
The rice and the chessboard!
![Page 4: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/4.jpg)
The rice and the chessboard!
![Page 5: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/5.jpg)
The rice and the chessboard!
![Page 6: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/6.jpg)
The rice and the chessboard!
![Page 7: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/7.jpg)
The rice and the chessboard!
![Page 8: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/8.jpg)
The rice and the chessboard!
First half of the chessboard: 100 tons of rice
![Page 9: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/9.jpg)
The rice and the chessboard!
First half of the chessboard: 100 tons of rice
Second half of the chessboard: 400 billion tons of rice = 1000 years of production
And the moral of the story is …
![Page 10: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/10.jpg)
The transistor and the chessboard!
![Page 11: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/11.jpg)
The transistor and the chessboard!1974: Intel 8080 (6,000 transistors) 1978: Intel 8086 (29,000 transistors) 1982: Intel 80286 (134,000 transistors) 1993 Intel Pen:um (3,000,000 transistors) 2004 P4 Intel Presco> (125,000,000 transistors)
![Page 12: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/12.jpg)
The transistor and the chessboard!
?
How many on the last square…?
1974: Intel 8080 (6,000 transistors) 1978: Intel 8086 (29,000 transistors) 1982: Intel 80286 (134,000 transistors) 1993 Intel Pen:um (3,000,000 transistors) 2004 P4 Intel Presco> (125,000,000 transistors) This notebook > 2 trillion transistors
2004: Nvidia NV40 (222,000,000 transistors) 2006: Nvidia G80 (484,000,000 transistors) 2008: Nvidia GT200 (1,400,000,000 transistors) 2010: Nvidia GF104 (1,900,000,000 transistors) 2012: Nvidia GK104 (3,540,000,000 transistors) 2015: Nvidia GM200 (8,000,000,000 transistors)
![Page 13: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/13.jpg)
Can run Mooreʼs law backwards!Q: According to Moore’s law, when was there just one transistor? A: 1948
![Page 14: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/14.jpg)
Can run Mooreʼs law backwards!Q: According to Moore’s law, when was there just one transistor? A: 1948
In Nov 1947, Bardeen, Bra>ain and Shockley a>ached two gold contacts to a crystal of germanium…
![Page 15: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/15.jpg)
Power!
Mooreʼs law gives us increasing compute power!
BUT!
With great power comes great …!
![Page 16: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/16.jpg)
Mooreʼs Law is not always our friend!!
Even with GPUs, compute on mobile devices is limited Can’t put a K40 on a Quadrotor!
![Page 17: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/17.jpg)
Mooreʼs Law is not always our friend!!
Even with GPUs, compute on mobile devices is limited But a TX1 fits just fine! (Stereolabs TX1 enabled drone)
![Page 18: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/18.jpg)
ACRV!
The Australian Research Council Centre of Excellence for Robo:c Vision • $25.5M over 7 years • 13 Chief Inves:gators in 4 Universi:es • 16 Research Fellows • ~50 PhD students • Research into:
– Seman:cs (deep learning) – Robust vision (all weathers) – Vision and Ac:on (closing the loop) – Algorithms and Architecture (constrained resources)
![Page 19: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/19.jpg)
Distributed Robotic Vision!
Simplest method is to just partition the problem somewhere, giving some tasks to the mobile and some to the server!
mobile server
![Page 20: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/20.jpg)
Distributed Robotic Vision!
But often this isnʼt the best solution !e.g. latency introduced by the network may be a problem!
Many interesting solutions not like this, e.g:!
Obtain sensor data
Extract summary
informa:on
Compute accurate solu:on
Compute approximate solu:on
Compare
Calculate output
Update local model
Bring correc:on up to date
Calculate and send correc:on
Compute approximate solu:on
![Page 21: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/21.jpg)
Distributed Robotic Vision!
Want to create solutions to enable robotics in a distributed sensing and compute environment!
TX1
TX1 TX1
K40
K40
K40
K40
K40
K40
K40
K40
CPU
CPU
![Page 22: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/22.jpg)
Distributed Localisation Service!
Extract landmarks CCTV1 Build Image
Pyramid Build
Descriptors Index Match
Extract landmarks CCTV2 Build Image
Pyramid Build
Descriptors Index Match
Extract landmarks Robot Build Image
Pyramid Build
Descriptors
Compute 1 Compute Robot pose
![Page 23: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/23.jpg)
Distributed Localisation Service!==3031== NVPROF is profiling process 3031, command: ./ComputeOrb 1!Frame# 1!Elapsed time : 5.955523 ms!Frame Elapsed time : 7.765627 ms!
numCorners: 28304, nmsnumCorners: 5073!==3031== Profiling application: ./ComputeOrb 1!==3031== Profiling result:!
Time(%) Time Calls Avg Min Max Name! 57.18% 3.2379ms 1 3.2379ms 3.2379ms 3.2379ms OrbDescriptors(…)! 30.57% 1.7312ms 1 1.7312ms 1.7312ms 1.7312ms (…)! 4.29% 242.92us 1 242.92us 242.92us 242.92us fastcorner(…)!
4.00% 226.31us 1 226.31us 226.31us 226.31us harris(…)! 1.46% 82.553us 1 82.553us 82.553us 82.553us NMS(…)! 0.73% 41.458us 1 41.458us 41.458us 41.458us cleansweep(…)!
!
Speedup over CPU* implementation is 4-5X!
!
* Intel Core2 Quad Q8400 @2.66Ghz!
![Page 24: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/24.jpg)
Sub-pixel localisation!
Timing Results: ! ! !(µs/keypoint)Inverse Additive ! ! !672 Inverse Compositional !367 Ours ! ! ! ! !7!
Extract image patch Camera 1 Find
landmarks
Compute matrix Compute 1
Camera 2 Extract image patch
Find landmarks
Compute sub-‐pixel correspondence on many subsequent frames
Compute sub-‐pixel correspondence on many subsequent frames
![Page 25: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/25.jpg)
Approximate Nearest Neighbor!Big data in high dimensional spaces Given a query point, find the nearest reference point Solu:on: FANNG (Fast Approximate Nearest Neighbor Graphs) @CVPR 2016 Can serve 1.2M queries/second at 90% recall in a database of 1M reference points in 128D space on Titan X
![Page 26: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/26.jpg)
Approximate Nearest Neighbor!CUDA implementa:on requires a short priority queue BUT int array[30]; // very slow global memory!
Solu:on is to treat a warp as a single unit with array spread over the warp in a single register: int array; // there are 32 of these in a warp !...!// find the first entry in array that is > thresh!int pq = __ffs(__ballot(array > thresh));!...!!
![Page 27: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by](https://reader033.vdocuments.site/reader033/viewer/2022042413/5f2cde1ef3dbfb30cd54501a/html5/thumbnails/27.jpg)
Approximate Nearest Neighbor!Want to keep the array sorted when we insert a new value, discarding the largest value
1 2 4 5 9 11 13 15 array:
0 1 2 3 4 5 6 7 thread:
new_value: 8
8 8 8 8 9 11 13 15 ship value:
8 8 8 8 9 11 13 shuffle:
(each thread sees this value)
=max(new_value,array)
Write new value if less than array
1 2 4 5 8 9 1 13 array:
8 8 8 8 8 8 8