3d scene analysis via sequenced predictions over points and regions
DESCRIPTION
I gave this talk in Machine Vision seminar at Jacobs University. I presented the state of the art in 3D point cloud classification and I described X. Xiong et al approach in a paper published in 2010.TRANSCRIPT
XUEHAN XIONG, DANIEL MUNOZ, J. ANDREW BAGNELL,
MARTIAL HEBERT
CARNEGIE MELLON UNIVERSITY
3-D Scene Analysis via Sequenced Predictions over Points and Regions
Presenter: Flavia GrosanJacobs University Bremen, 2011
[email protected]://flaviagrosan.com
Introduction
Range scanners as standard equipmentScan segmentation = distribute the points in
object classes Scene understanding Robot localization
Difficulties in 3D: No color information Often noisy and sparse data Handling of previously unseen object instances of
configurations
Definition
3D point cloud classification
= assign one of the predefined class labels to each point of a cloud based on:
Local properties of the point
Global properties of the cloud
Segmentation algorithms
Exploit different features Automatic trade off
Enforce spatial contiguity Adjacent points in the
scan tend to have the same label
Adapt to the scanner used Different scanners
produce qualitatively different outputs
Classification = Training + Validation
Data: labeled instances 3D scan manually labeled Training set Validation set Test set
Training Estimate parameters on training set Tune parameters on validation set Report results on test set Anything short of this yields over-optimistic claims
Evaluation Many different metrics Ideally, the criteria used to train the classifier should be closely related to those used to evaluate the classifier
Statistical issues Want a classifier which does well on test data Overfitting: fitting the training data very closely, but not generalizing well Error bars: want realistic (conservative) estimates of accuracy
TrainingData
ValidationData
TestData
Some State-of-the-art Classifiers
Support vector machine
Random forests Apache Mahout
Perceptron
Nearest neighbor – kNN
Bayesian classifiers
Logistic regression
Approach – Generative Model
Learn p(y) and p(x|y) – classification step
Use Bayes rule:
Classify as:
p(y) p(x|y) p(y|x)
Approach – Generative Model
Carnegie Mellon University, Artificial Intelligence, Fall 2010
State of the Art 3D Point Cloud Classifier
Markov Random Fields Scan points modeled as random variables = nodes Each random variable corresponds to the label of each
point Proximity links between points = edges Defines joint distribution
Pairwise Markov networks Node and edges associated with potentials
Node potential = a points ‘individual’ preference for different labels
Edge potential = encode interactions between labels of related points
Markov Random Fields
Conditional Probability Query: P(Y| X = xi) = ?
Generate joint distribution, exhaustively sum out the joint.
Bad News: NP-hard
€
P(Y | X = x i) =P(Y,x i)
P(x i)
Xiong et al. Approach
Explicit joint probability distribution model: Does not model P(y|x) directly Exact inference is hard Approximate inference leads to poor results
Instead, directly design and train an inference procedure via sequence of predictions from simple machine learning modules Use discriminative model Logistic regression Max-likelihood estimation problem
Overview
2 level hierarchy Top-level: region, mixed labels Bottom-level: points
K-means++, k = 1% points to establish initial clusters
Predict label distribution per region
Update each region’s intra-level context using neighboring regions predictions
Pass the predicted label distribution to the region’s points inter-level context
Overview
At point level, train 2 classifiers:1. Inter-level context + point cloud descriptors2. Neighboring points predictions
Move up in the hierarchy: Average the predicted label distribution of points over a region Send the average as inter-level context to the region Validation set determines the number of up-down
iterations
Base Classifier (LogR)
Assumption: log p(y|x) of each class is a linear function of x + a normalization
constant
Ci – RV for the class of a region xi – features yi – ground truth distribution of K labels w – parameters ?
€
Qw (Ci = k | x i) =ewk
T xi
ewaT xi
a=1
K
∑
Base Classifier
Max-likelihood estimation:
Use regularization to avoid over fitting
Concave problem, solved with stochastic gradient descent Choose an initial guess for w Take a small step in the direction opposite the gradient This gives a new configuration Iterate until gradient is 0
€
argmaxw
y i[k]log[Qw (Ci = k | x i)k
∑i
∑ ]
€
argmaxw
y i[k]log[Qw (Ci = k | x i)k
∑i
∑ ] − λ ||w ||2,λ > 0
Contextual Features
Construct a sphere around the region centroid (O) 12 meter radius Divide the sphere in 3 slices on vertical: 4m radius
Average points’ label distribution within each slice A feature vector of length K/slice
Average angles formed between z-axis and the [O, Ni] vector Ni = neighboring point (not part of this region) Models the spatial configuration of neighboring points
3(K+1) contextual features add them to xi (a region’s features)
Multi-Round Stacking (MRS)
1. X = {xi} – training set
2. Y = {yi} – label distribution (ground truth)
3. w1 = T(X, Y)- first trained classifier4. Y’ = P(X, w1)5. Use Y’ to compute new contextual features for
X X’6. w2 = T(X’, Y’) – train a second classifier7. Repeat until no improvement seen
w1 is optimistically correct w2 prone to overfitting
MRS – Avoid Overfitting
Generate multiple temporary classifiers Partition the training set into 5 disjoint sets Train temporary classifier γ = T(X – Xi, Y – Yi) Use γ only on Xi to generate Y’I
Discard γ afterwardsPerform one or more rounds of stacking
€
X ={X i}i=15 ,Y ={Yi}i=1
5
Examine the w parameters computed
A tree trunk region likely has: vegetation above, but
not below car and ground below,
but not on top
Stacked 3D Parsing Algorithm (S3DP)
Labeled point cloudConstruct 2-level hierarchy
Top Bottom
Extract point cloud featuresCreate ground truth label distribution
(Xt, Yt) - top (Xb, Yb) - bottom
Stacked 3D Parsing Algorithm (S3D)
Parse UP the hierarchy: Apply N rounds of MRS on (Xb, Yb):
N+1 classifiers Yb label prediction from the last round
Extend each region’s feature vector with the average of its children’s probability distribution in Yb
Apply N rounds of MRS on (Xt, Yt): N+1 classifiers
Save ft and fb for inference
€
fb ={w(b )n }n=1
N +1
€
ft={w(t)n}n=1N+1
Stacked 3D Parsing Algorithm (S3D)
Parse DOWN the hierarchy: Apply N rounds of MRS on (Xt, Yt):
N+1 classifiers Yt label prediction from the last round
Extend each point’s feature vector with the average of its parents’ probability distribution in Yt
Apply N rounds of MRS on (Xb, Yb): N+1 classifiers
Save ft and fb for inference€
fb ={w(b )n }n=1
N +1
€
f t ={w(t )n }n=1
N +1
Experimental Setup - Features
Bottom level Local neighborhood: 0.8m/2m radius
Compute covariance matrix and eigenvalues a1> a2> a3
Scattered points: a1≅ a2 ≅a3 (vegetation) Linear structures: a1, a2 >>a3 (wires) Solid surface: a1>> a2 ,a3 (tree trunks)
Scalar projection of local tangent and normal directions
on to z-axis
Experimental Setup - Features
Bottom & top levels Bounding box enclosing the points
Over local neighborhood at bottom level Over region itself at top level
Relative elevations Take a horizontal cells of 10m x 10m, centered in
centroid Compute min z- and max z- coordinates Compute 2 differences in elevation between region’s
centroid elevation and it’s cells 2 extrema
Evaluation Metrics
Recall= = fraction of all objects correctly classified
Precision=
= fraction of all questions correctly answered
For a class k:
€
F1 =2PkRkPk + Rk
Questions answered
Correct answers
Misclassified objects
Unclassified objects
Objects correctly classified
TP FP
Experimental Results
A. VMR-Oakland-v2 Dataset CMU Campus 3.1 M points 36 sets, each ~85,000 points
6 training sets 6 validation sets All remaining – test sets
Labels: wire, pole, ground, vegetation, tree-trunk, building, car
Comparison with associative Max-Margin Markov Network (M3N) algorithm
A. VMR-Oakland-v2 Dataset
M3N Conditional Random Fields
MRF trained discriminatively
Pairwise model:
Associative (Potts) model: €
P(y | x) =1
Zexp[ φ(y i,x) + φij (y i,
( ij )∈E
∑ y j ,x)]i=1
N
∑
A. VMR-Oakland-v2 Dataset
M3N
A. VMR-Oakland-v2 Dataset
Experimental Results
B. GML-PCV Dataset 2 aerial datasets, A and B Each dataset split in training and test, ~1 M points
each Each training set split in learning and validation Labels: ground, roof/building, tree, low
vegetation/shrub, car
Comparison with Non-Associative Markov Network (NAMN)
Pairwise Markov network constructed over segments Edge potentials non-zero for different labels
B. GML-PCV Dataset
Experimental Results
C. RSE-RSS Dataset 10 scans, each ~ 65,000 points, Velodyne laser on the
ground Most difficult set: noise, sparse measurements and ground
truth Labels: ground, street signs, tree, building, fence, person, car,
background Comparison with the approach from Lai and Fox:
Use information from World Wide Web (Google 3D Warehouse) to reduce the need for manually labeled training data
Final Comments
S3DP performs a series of simple predictions
Effective encoding of neighboring contexts
Learning of meaningful spatial layouts E.g.: tree-trunks are below vegetation
Usable in many environments scanned with different sensors
S3DP requires about 42 seconds
References
1. X. Xiong, D. Munoz, J. A. Bagnell, M. Hebert, 3-D Scene Analysis via Sequenced Predictions over Points and Regions, ICRA 2011
2. D. Anguelov, B. Taskar, V. Chatalbashev, Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data, Computer Vision and Pattern Recognition, 2005
3. G. Obozinski, Practical Machine Learning CS 294, Berkeley University, Multi-Class and Structured Classification, 2008
4. A. Kulesza, F. Pereira, Structured Learning with Approximate Inference, In Proceedings of NIPS'20075. K. Lai, D. Fox, 3D Laser Scan Classification Using Web Data and Domain Adaptation, In International
Journal of Robotics Research, Special Issue on Robotics: Science & Systems 2009, July 2010 6. D. Munoz, J.A. Bagnell, M. Hebert, Stacked Hierarchical Labeling, Paper and Presentation, European
Conference on Computer Vision, 20107. D. Munoz, J. A. Bagnell, N. Vandapel, M. Hebert, Contextual Classification with Functional Max-Margin
Markov Networks, Paper and Presentation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June, 2009
8. R. Shapovalov, A. Velizhev, O. Barinova, Non-Associative Markov Networks for 3D Point Cloud Classification, PCV 2010
9. C. Sutton, An Introduction to Conditional Random Fields, Statistical Machine Learning Class, University of Edinburgh
10. M. Jordan, Machine Learning Class, University of California, Berkeley, Classification lecture11. D. Munoz, D. Bagnell, N. Vandapel, M. Hebert, Contextual Classification with Functional Max-Margin
Markov Networks, Paper Presentation, 200912. P.J. Flynn, A.K. Jain, Surface Classification: Hypothesis Testing and Parameter Estimation, CVPR, 198813. S.L. Julien, Combining SVM with graphical models for supervised classification: an introduction to Max-
Margin Markov Networks, University of California, Berkeley, 200314. D. Koller, N. Friedman, L. Getoor,B. Taskar, Graphical Models in a Nutshell, In Introduction to Statistical
Relational Learning, 2007