omputer vision- ased makerspace access ontrol …...omputer vision- ased makerspace access ontrol...
TRANSCRIPT
Computer Vision-Based Makerspace Access Control System
Problem Statement: The long-term goal of this project is to create an interactive resident A.I. that can visually
identify students and provide access to equipment based on individual permissions using sensors including a
Kinect on a custom pan/tilt assembly. Due to privacy concerns, only a single frontal face image will be provided
for each student. The system should be able to learn in real-time without the use of a GPU.
Approach: Three local-feature-based classifiers were implemented and their performances compared with a
Convolutional Neural Network (CNN). The first two are the well-known Histogram of Oriented Gradients (HOG)
and Local Binary Pattern Histograms (LBPH). The third is based on Scale Invariant Feature Transform (SIFT). Since
all three of these classifiers work by comparing a sample with every template, they are “lazy classifiers,” doing
most of their work during prediction rather than training. This is in contrast to the CNN which is created and
trained with a fixed number of output nodes each corresponding to one class. The CNN runtime performance
scales much better, since, as the number of classes grows, it still need only be run once on a given sample.
Unfortunately, this also means that it needs to be retrained every time a new class is added. The computational
cost of retraining a CNN makes it infeasible as a solution. Our use of it here is merely as a performance benchmark.
Dataset: All models were tested on a hand-picked subset of the Multi-Biometric Grand Challenge (MBGC) dataset
[1], with images cropped to include only the face region in 224 x 224 greyscale. The dataset consists of 100 classes,
with 1 sample per class used for training and 4 for testing. Images were chosen to be visually dissimilar yet still
classifiable by a human.
Results: Rank-1 accuracies for HOG, SIFT, LBPH, and CNN were 67.75%, 52.5%, 52.5%, and 38.5% respectively,
with HOG as the clear winner. We have performed similar comparisons in previous work [2] using datasets having
in excess of 10 training samples and much less visual obfuscation, and found the difference to be less pronounced
if not entirely absent. Previous work has also shown the CNN to be the clear winner, but this advantage appears
to be nullified when training data is scarce, as CNNs are prone to overfitting. It is also worth noting that, in our
experiments, HOG ran several times faster than the runner up, LBPH.
Ongoing/Future Work: To find an optimum trade-off between true-accepts and false-accepts, rather than using
a raw cut-off, we prefer to use the confidence ratio between the first and second best predictions. In addition to
its simplicity, this ratio is also robust to changes in the number of classes. To use it, a naive approach is to take the
median confidence ratios for true-accepts vs. false-accepts during training, and use a ratio cutoff at the midpoint
between them. Doing this in combination with a simple voting ensemble arrangement involving all three feature-
based classifiers, we achieve a true-accept rate of 54% and a false accept rate of 8%. While we’d like true-accepts
to be in the 80%+ range, accuracy limitations on a single image are often mitigated in practice by building a
consensus over a number of consecutive moving frames. Future work will likely explore the use of meta-learners
and various hybridization techniques involving both feature and decision level fusion.
Acknowledgements: This research is based upon work supported by the Army Research Office (Contract No. W911NF-15-1-0524). References:
[1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H.
Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand
Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-714, 2009.
[2] R. Dellana, and K. Roy. "Data augmentation in CNN-based periocular authentication," Information
Communication and Management (ICICM), International Conference on, pp. 141-145. IEEE, 2016.
Picture to illustrate problem/idea/goals Goal: Create an interactive resident A.I. that can visually identify students and provide access to equipment based on individual permissions. Sensors include a Kinect on a custom pan/tilt assembly. System can regulate access by controlling electrical outlets.Challenge: Due to privacy concerns, only a single frontal face image will be provided for each student. No additional training images may be collected. System should be able to learn in real-‐time without the use of a GPU.
• Implement three local-‐feature-‐based classifiers and compare theirperformance with a Convolutional Neural Network (CNN).
• Histogram of Oriented Gradients (HOG), Local Binary PatternHistograms (LBPH), and Scale Invariant Feature Transform (SIFT)are all explicit feature-‐based “lazy classifiers.” They do most oftheir work during prediction rather than during training.
• The CNN internalizes the training samples, and doesn’t need tocompare a sample to an ever growing number of templates. Thus,it’s run-‐timer performance scales better.
• The trade-‐off for this is that a CNN must be retrained when-‐ever anew class is added, which makes it unsuitable for our application.
• Our dataset is a hand-‐picked subset of the Multi-‐Biometric GrandChallenge (MBGC) dataset [1], with images cropped to includeonly the face region in 224 x 224 greyscale. There are 100 classes,with 1 sample per class used for training and 4 for testing. Imageswere chosen to be visually dissimilar yet still classifiable by ahuman.
Problem Statement and Goals
Approach
Results
Computer Vision-‐Based Makerspace Access Control System
Ryan DellanaNorth Carolina Agricultural and Technical State University
HOG
SIFT
LBPH
LBPH
SIFT
HOG
CNN
CNN
Simplified 3D rendering of our Convolutional Neural Network Topology
Fully Connected Layer
Output Layer
Max Pooling Layer
Convolutional Layer
Input Layer
SIFT Keypoint Matching
Sample Template
Histogram of Oriented Gradients
Local Binary Pattern Histograms
Motion TrackingFace Detection
Dangerous Lab Equipment
Kinect on Custom Pan/Tilt Assembly
• HOG is significantly more accurate than the other classifiers with rank-‐1 accuracies forHOG, SIFT, LBPH, and CNN of 67.75%, 52.5%, 52.5%, and 38.5% respectively.
• Previous work [2] has shown the CNN to be more accurate, but this advantage appearsto be nullified when training data is scarce, as the CNN is prone to overfitting.
• HOG ran several times faster than the runner up, LBPH.
• To obtain the desired true-‐accept to false-‐accept ratio, rather than using theconfidence value directly, we use the confidence ratio between the first and secondbest predictions. One nice feature of this approach is that it’s robust to changes in thenumber of classes.
• By using as a threshold, the midpoint between the median confidence ratios for true-‐accepts vs. false-‐accepts during training, we get a true-‐accept rate of 54% and a falseaccept rate of 8%. In practice, this is good enough since poor single-‐frame accuracy canbe mitigated by building a consensus over a number of consecutive moving frames.
• Future work will likely explore the use of meta-‐learners and various hybridizationtechniques involving both feature and decision level fusion.
Acknowledgements:
This research is based upon work supported by the Army Research Office (Contract No.W911NF-‐15-‐1-‐0524).References:[1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-‐714, 2009.[2] R. Dellana, and K. Roy. "Data augmentation in CNN-‐based periocular authentication," Information Communication and Management (ICICM), International Conference on, pp. 141-‐145. IEEE, 2016.