autonomous vehicles: the intersection of robotics and artificial intelligence

Autonomous Vehicles WebinarThe intersection of robotics and artificial intelligence

Streaming live via Hangouts8pm CT - August 28th, 2016

Undergraduate student at University of Illinois at Urbana - Champaign, Class of 2017

B.S. Mechanical Engineering, Minor in Electrical Engineering

Previous: PwC, Cummins, UIUC RA

OverviewI. What is an AV?II. Technology

A. AI + Robotics = AVsB. “Self-Driving Stack”

1. Sensing2. Processing3. Actuation

III. Up Next

What is an autonomous vehicle (AV) ?

Within the context of this discussion are focusing of roadway motor vehicles.

AVs at their simplest would be a car with cruise-control capability. At its most complex is an entirely driverless vehicle.

Much like everything else in tech, there is a lot of contention on how the classification should be structured. What is ‘full autonomy’, etc? Thankfully, the U.S. Dept. of Transportation developed an official tiering with very clear distinctions.

Autonomous vehicles (AVs) are vehicles that are capable movement with limited or no outside instruction or intervention.

Autonomy, per the U.S. Dept. of Transportation:

SOURCE:http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development

Tier 1Automation at this level involves one or more specific control functions. Examples include electronic stability control or pre-charged brakes, where the vehicle automatically assists with braking to enable the driver to regain control of the vehicle or stop faster than possible by acting alone.

Tier 2This level involves automation of at least two primary control functions designed to work in unison to relieve the driver of control of those functions. An example of combined functions enabling a Level 2 system is adaptive cruise control in combination with lane centering.

Tier 3Vehicles at this level of automation enable the driver to cede full control of all safety-critical functions under certain traffic or environmental conditions and in those conditions to rely heavily on the vehicle to monitor for changes in those conditions requiring transition back to driver control. The driver is expected to be available for occasional control, but with sufficiently comfortable transition time.

Tier 4The vehicle is designed to perform all safety-critical driving functions and monitor roadway conditions entirely. The driver could provide destination input and is not expected to be available for control at any time during the trip. This includes unoccupied vehicles.

http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development



AI + robotics = AVs

The intersection of artificial intelligence and robotics

An intelligent system that is capable of taking information/data and acting upon that data, capable of learning how to draw further insight

Study of design and control of mechanical systems. On a closed-loop, these systems are capable of controlling themselves using sensory information

● Modern machine learning and AI techniques are capable of this for specific tasks (AlphaGo, Image Classification)

● These similar techniques, especially Deep Learning, could be applied to vehicles to teach it them drive given high volumes of data

● Robotics is a well understood field of study with decades of research and progress

● Has been applied to planes, cars, etc, but in an extremely limited fashion

● Autonomy cannot be “hard-coded”, must be “learned”

AI

Robotics

The intersection of artificial intelligence and robotics: where the magic happens

Autonomous vehicles have always been a scientific dream. Planes have been capable of auto-pilot, “self-flying” features for decades. How is it taking so long to happen on cars? Well, existing infrastructures and roads cannot support rule-based robotic systems. There are too many possible scenarios that could occur when driving, rules for robotic vehicles cannot be “hard-coded”.

True autonomy requires artificial intelligence. Intelligence that resembles the human capability to decipher 3D space changing in time. With decades of advances in machine learning and artificial intelligence we are nearing a time when machines are better at understanding roads than we are.

Technology Deep-Dive

There is a lot going on under the hood, let’s try to simplify it

Pose Graph

LIDAR

Graph SLAM

1 Sensing

Processing2

Actuation3

The “Self-Driving Stack”The architecture of autonomy

Commands are sent to Control Unit which tells engine/motor to speed up or slow down. An analogous process occurs for vehicle steering.

Sensor data is passed on ro algorithms and is processed locally (GPUs) or over a distributed network (the Cloud)

Autonomous Vehicle Architecture

01000110101010100010110101000101

Video Camera (still images processing, pixels)

LIDAR (light-radar, point clouds)

Specific sensors (e.g. red light detection, pedestrian detection)

1Sensin

g Processing

2 3

Actuation

Autonomous Vehicle Architecture

Electromechanical Actuation

Sensing

Sensing

Processing/Computation

1

1

23

1Sensing

Technology Deep-Dive: Sensing

1 Sensing

Processing2

Actuation3

LIDAR, video cameras, and radar/sonic sensors are most commonly used for gathering vehicle environment data



Specific sensors (e.g. red light detection, stop signs)

Sensing

● “Light radar” - LIDAR ● Generates point clouds that are 3D

representations of the driving environment● Seen as the high-resolution input data that is

integral to SLAM + RRT techniques

● Simple video cameras input feeds of still images that can be processed for lanes, obstacles, pedestrians, etc

● Cheap and effective, now being heavily implemented as the choice data for deep learning

● Case-specific sensors are heavily leveraged to provide insight in areas that LiDAR and cameras cannot handle in a general way

● Ex) a specific camera pointed at where stoplights are - feed directly into a specific algorithm for sensing red, yellow, and green colors

A deep-dive on LIDAR Sensing

● LIDAR has quickly become a go-to sensor for autonomous applications. Velodyne is an industry leader with relatively cheap, easy to calibrate units

● LIDAR units send out pulses of light and measure the time to return, which can be used to compute the distance of an object

● A rotating LIDAR sensor gathering distances of objects at different angles can gather enough points of data to construct a “point cloud”

● It is evident how useful point clouds are, similar effect as the human eye, 3D representation of space in real time

Researchers at MIT in collaboration with DARPA have been able to fabricate and implement a solid-state LIDAR chip:

“Our lidar chips promise to be orders of magnitude smaller, lighter, and cheaper than lidar systems available on the market today. They also have the potential to be much more robust because of the lack of moving parts, with a non-mechanical beam steering 1,000 times faster than what is currently achieved in mechanical lidar systems.”

“At the moment, our on-chip lidar system can detect objects at ranges of up to 2 meters, though we hope to achieve a 10-meter range within a year. The minimum range is around 5 centimeters. We have demonstrated centimeter longitudinal resolution and expect 3-cm lateral resolution at 2 meters. There is a clear development path towards lidar on a chip technology that can reach 100 meters, with the possibility of going even farther.”

Massive size and price reduction of LIDAR sensors could fundamentally change approach to autonomous vehicles, drones, prosthetics, etc.

“MIT and DARPA pack LIDAR sensor onto single chip”IEEE Spectrum, Aug 4 2016

A new, cheaper, solid state LIDAR emerging Sensing

SOURCE: http://spectrum.ieee.org/tech-talk/semiconductors/optoelectronics/mit-lidar-on-a-chip

http://spectrum.ieee.org/tech-talk/semiconductors/optoelectronics/mit-lidar-on-a-chip



The sensing stage needs to gather lots of data from different sources in order to fully understand the environment



Specific sensors (e.g. red light detection, stop signs)

Sensing

Technology Deep-Dive: Processing

1 Sensing

Processing2

Actuation3

z

The Processing Stack Processing

● CPUs, GPUs, SoCs on board

● Large amounts of flash memory

● “Cloud” compute● Powerful endpoints,

limited only by speed of data communication

Computational Methods Local

Distributed

RRT*, SLAM, Kinematics

End-to-End, DNN, CNN

Motion Planning / Mapping

Machine Learning / Deep Learning

Intersections, Left-turn

Rule-based systems

LIDAR point cloud data

Video Camera Feed

Computational Muscle

Input Data

Output Commands

Computational Methods

Motion Planning Artificial Intelligence (ML/Deep Learning)

Motion Planning - Algorithm 1: SLAM Processing

What is the world around me (mapping)● Sense from various positions● Integrate measurements to produce map

Where in I am in the world (localization)● Sense● Relate sensor reading to a world model (a priori maps)● Compute (probabilistic) location relative to model

**above points taken from CMU paper cited below

Depicted to the right is a Kalman Filter being applied to position measurements and sensory information that in turn generates a Gaussian distribution of the possible positions

Simultaneous localization and mapping (SLAM)

SOURCE: http://www.cs.cmu.edu/~motionplanning/lecture/Chap8-Kalman-Mapping_howie.pdf

http://www.cs.cmu.edu/~motionplanning/lecture/Chap8-Kalman-Mapping_howie.pdf

http://www.cs.cmu.edu/~motionplanning/lecture/Chap8-Kalman-Mapping_howie.pdf


SLAM Walkthrough

SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf

http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf




SLAM Walkthrough 1 2 3 4 5 6 7

RobotLandmark







Location Likelihood Distribution





Motion Planning - Algorithm 2: RRTs Processing

● Rapidly-exploring Random Trees (RRTs) are a set of exploratory algorithms that are useful for trajectory planning

● With a set of polygonal obstacles, an RRT can generate a possible path from the starting configuration to the ending (goal) configuration

● Sample paths are then input to a controller/model representation of the vehicle dynamics and the predicted trajectory of the vehicle is computed (x)

● The runtime of these algorithms can vary since accuracy is based on samples taken

Once a probabilistic localization is realized, a probabilistic path can be generated using RRTs

SOURCE: http://acl.mit.edu/papers/KuwataTCST09.pdfhttp://www.staff.science.uu.nl/~gerae101/pdf/compare.pdf

http://acl.mit.edu/papers/KuwataTCST09.pdf

http://www.staff.science.uu.nl/~gerae101/pdf/compare.pdf

Motion Planning - SLAM + RRTs = advanced guesswork Processing

● In order to obtain a higher-resolution probabilistic model of the ideal trajectory more samples need to be taken and more computations performed, hence the need for massive compute power!

● It is understandable that a car driving 60mph would have issues performing this depth of computation in a rapidly changing environment

For more in-depth understanding of algorithmic robotics motion planning works check out SLAM for Dummies

A probabilistic path generated from probabilistic input poses issues for vehicles moving at high speeds

SOURCE: http://workshops.acin.tuwien.ac.at/clutter2014/papers/ric2014_submission_9.pdfhttp://acl.mit.edu/papers/KuwataTCST09.pdf

**white spots represent sampled points used to generate RRT

http://www.staff.science.uu.nl/~gerae101/pdf/compare.pdf

http://workshops.acin.tuwien.ac.at/clutter2014/papers/ric2014_submission_9.pdf

http://workshops.acin.tuwien.ac.at/clutter2014/papers/ric2014_submission_9.pdf

http://acl.mit.edu/papers/KuwataTCST09.pdf

Artificial Intelligence (ML/Deep Learning) Processing

● Newly emerging methodologies all revolve around deep learning via neural nets

○ RNNs, CNNs, GANs, Autoencoding, etc● Two main forces driving adoption of these

methods:○ Cheaper and more powerful local and cloud

computing (GPUs)○ Open-source deep learning platforms

(TensorFlow)

These deep learning methodologies are injecting intelligence into vehicles, feeding them massive amounts of data, and letting them learn

Please check out this Deep Learning Playground for a better visualization of the concept

Artificial Intelligence Methods Feature extraction performed by a CNN on video from a forward facing camera. Model was able to determine what were road edges with relative accuracy (via NVIDIA)

Lane centering generator that predicts path of vehicles based on video input from front facing camera (via Comma.ai)

http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.37135&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false

Artificial Intelligence (ML/Deep Learning) Processing

Important Academic Papers Regarding Deep Learning Processing

● NVIDIA - “End to End Learning for Self-Driving Cars”Video input from a forward facing camera is trained against steering wheel position and deep learning networks are capable of detecting important road features with limited additional nudging in the right direction

● Comma.ai - "Learning A Driving Simulator"Using video input with no additional training metadata (IMU, wheel angle) auto-encoded video was generated, predicting many frames into the future while maintaining road features

● Radford et al. (Facebook AI) - "Unsupervised Representational Learning w/ Deep GANs"Seminal work on deep learning auto-encoding that allowed Comma.ai breakthrough and similar work i.e. “Autoencoding Blade Runner”

● NYU & Facebook AI - “Deep Multi-Scale Video Prediction Beyond Mean Square Error”

Implications of these papers indicate deep learning is a highly promising solution for AVs

https://arxiv.org/pdf/1604.07316v1.pdf

https://arxiv.org/pdf/1608.01230v1.pdf

http://arxiv.org/pdf/1511.06434v2.pdf

https://medium.com/@Terrybroad/autoencoding-blade-runner-88941213abbe#.vnbbygo6z

http://arxiv.org/pdf/1511.05440.pdf

Computational Muscle

CPUs GPUs SoCs (Onboard) Distributed Computing (Cloud)

Computational muscle limited to local compute, for now Processing

● Current self-driving solutions are all implemented with local compute due to the need for simplicity, focusing on software first

● Utilizing GPUs and special SoCs to perform simple operations (i.e. with pixels and point clouds) at massive scale in parallel

● New TPUs (tensor processing units) are being designed specifically for the purpose of machine learning and AI, as well as new platforms emerging specifically for AVs

● A distributed network offering massive computational muscle would be ideal, but does not offer immediate simplicity due to latency, security, reliability, ...

● Movement toward an “AWS for AVs” is a huge opportunity, many companies are actively working on

Two paradigms currently, local compute (CPUs, SoCs, GPUs) and distributed computation over a network (Cloud)

Google’s new TPU that powered AlphaGo

Technology Deep-Dive: Actuation

1 Sensing

Processing2

Actuation3

Actuation stage is primarily based on field of controls and electromechanical systems Actuation

● The control unit is circuit hardware that manages electromechanical systems within a car

● Large amount of low-level controls have been standardized into protocols like CAN

● Most well-studied and understood portion of the self-driving technology stack, high feasibility relative to other parts of the “stack”

● Companies like Delphi and Bosch are large players in this space and have invested decades of time and research into vehicle controls

● Innovation in this space is much more iterative, positioning incumbents to dominate the controls hardware/software for AVs

The processing stage sends commands via bus like CAN or similar architectures to engine control unit/modules

Up Next

High level trends, “Self-Driving Stack” trends, general comments

● Costs of sensors is falling through the floor

● No “best sensor” yet, converging toward LIDAR and video camera, dependent on processing approaches

● Accuracy limits, distance limits, latency of data feed (LIDAR especially) are improving exponentially with cost

Sensing● Models vs. Neural vs.

Mixed, no “best practice”● Local compute only

implementation yet, will transition toward “Cloud” same way as software

● Mapping is important but AI vector bank is the new data network effect

● V2V, V2I communication cannot be relied upon

Processing● Actuation / controls is out

in front of the rest of the tech, not a limiting factor

● Mission critical safety and reliability needs to be investigated more heavily, beyond “Six Sigma”

● Incumbents well positioned

● Security has not been investigated thoroughly, will emerge as a large space later on

Actuation / Controls

My Thoughts

1Data Network Effects for AI systems are the sole most important factor to long-term success. Advantage Uber and Tesla.

LIDAR and GPU companies will become important OEMs and provide it as a service to Big Auto, only non-commodity hardware that matters to enable “AV”.

2

The inherently difficult problems are software related, Big Auto not positioned to “win” at software. Defer to startups with ex-researchers.

3

- Otto (recently acquired by Uber for ~$600M)- Zoox

- $200M fundraise with not evening a landing page, talk about stealthy! Team consists of “fathers of AVs”

- Comma.ai- Attempting to offer autonomy enablement to vehicle

mfgs.- Drive.ai

- Software for AVs, not much info, rockstar team with very deep background

- Peloton Tech- More immediate use case for semi-autonomy with

platooning. Strategic investors, UPS venture arm is a positive signal.

- NuTonomy- Released functioning product in Singapore, great

team

Companies to pay attention to

Thank You