omputer vision- ased makerspace access ontrol …...omputer vision- ased makerspace access ontrol...

Computer Vision-Based Makerspace Access Control System

Problem Statement: The long-term goal of this project is to create an interactive resident A.I. that can visually

identify students and provide access to equipment based on individual permissions using sensors including a

Kinect on a custom pan/tilt assembly. Due to privacy concerns, only a single frontal face image will be provided

for each student. The system should be able to learn in real-time without the use of a GPU.

Approach: Three local-feature-based classifiers were implemented and their performances compared with a

Convolutional Neural Network (CNN). The first two are the well-known Histogram of Oriented Gradients (HOG)

and Local Binary Pattern Histograms (LBPH). The third is based on Scale Invariant Feature Transform (SIFT). Since

all three of these classifiers work by comparing a sample with every template, they are “lazy classifiers,” doing

most of their work during prediction rather than training. This is in contrast to the CNN which is created and

trained with a fixed number of output nodes each corresponding to one class. The CNN runtime performance

scales much better, since, as the number of classes grows, it still need only be run once on a given sample.

Unfortunately, this also means that it needs to be retrained every time a new class is added. The computational

cost of retraining a CNN makes it infeasible as a solution. Our use of it here is merely as a performance benchmark.

Dataset: All models were tested on a hand-picked subset of the Multi-Biometric Grand Challenge (MBGC) dataset

[1], with images cropped to include only the face region in 224 x 224 greyscale. The dataset consists of 100 classes,

with 1 sample per class used for training and 4 for testing. Images were chosen to be visually dissimilar yet still

classifiable by a human.

Results: Rank-1 accuracies for HOG, SIFT, LBPH, and CNN were 67.75%, 52.5%, 52.5%, and 38.5% respectively,

with HOG as the clear winner. We have performed similar comparisons in previous work [2] using datasets having

in excess of 10 training samples and much less visual obfuscation, and found the difference to be less pronounced

if not entirely absent. Previous work has also shown the CNN to be the clear winner, but this advantage appears

to be nullified when training data is scarce, as CNNs are prone to overfitting. It is also worth noting that, in our

experiments, HOG ran several times faster than the runner up, LBPH.

Ongoing/Future Work: To find an optimum trade-off between true-accepts and false-accepts, rather than using

a raw cut-off, we prefer to use the confidence ratio between the first and second best predictions. In addition to

its simplicity, this ratio is also robust to changes in the number of classes. To use it, a naive approach is to take the

median confidence ratios for true-accepts vs. false-accepts during training, and use a ratio cutoff at the midpoint

between them. Doing this in combination with a simple voting ensemble arrangement involving all three feature-

based classifiers, we achieve a true-accept rate of 54% and a false accept rate of 8%. While we’d like true-accepts

to be in the 80%+ range, accuracy limitations on a single image are often mitigated in practice by building a

consensus over a number of consecutive moving frames. Future work will likely explore the use of meta-learners

and various hybridization techniques involving both feature and decision level fusion.

Acknowledgements: This research is based upon work supported by the Army Research Office (Contract No. W911NF-15-1-0524). References:

[1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H.

Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand

Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-714, 2009.

[2] R. Dellana, and K. Roy. "Data augmentation in CNN-based periocular authentication," Information

Communication and Management (ICICM), International Conference on, pp. 141-145. IEEE, 2016.

Picture to illustrate problem/idea/goals Goal: Create an interactive resident A.I. that can visually identify students and provide access to equipment based on individual permissions. Sensors include a Kinect on a custom pan/tilt assembly. System can regulate access by controlling electrical outlets.Challenge: Due to privacy concerns, only a single frontal face image will be provided for each student. No additional training images may be collected. System should be able to learn in real-‐time without the use of a GPU.

• Implement three local-‐feature-‐based classifiers and compare theirperformance with a Convolutional Neural Network (CNN).

• Histogram of Oriented Gradients (HOG), Local Binary PatternHistograms (LBPH), and Scale Invariant Feature Transform (SIFT)are all explicit feature-‐based “lazy classifiers.” They do most oftheir work during prediction rather than during training.

• The CNN internalizes the training samples, and doesn’t need tocompare a sample to an ever growing number of templates. Thus,it’s run-‐timer performance scales better.

• The trade-‐off for this is that a CNN must be retrained when-‐ever anew class is added, which makes it unsuitable for our application.

• Our dataset is a hand-‐picked subset of the Multi-‐Biometric GrandChallenge (MBGC) dataset [1], with images cropped to includeonly the face region in 224 x 224 greyscale. There are 100 classes,with 1 sample per class used for training and 4 for testing. Imageswere chosen to be visually dissimilar yet still classifiable by ahuman.

Problem Statement and Goals

Approach

Results

Computer Vision-‐Based Makerspace Access Control System

Ryan DellanaNorth Carolina Agricultural and Technical State University

HOG

SIFT

LBPH

LBPH

SIFT

HOG

CNN

CNN

Simplified 3D rendering of our Convolutional Neural Network Topology

Fully Connected Layer

Output Layer

Max Pooling Layer

Convolutional Layer

Input Layer

SIFT Keypoint Matching

Sample Template

Histogram of Oriented Gradients

Local Binary Pattern Histograms

Motion TrackingFace Detection

Dangerous Lab Equipment

Kinect on Custom Pan/Tilt Assembly

• HOG is significantly more accurate than the other classifiers with rank-‐1 accuracies forHOG, SIFT, LBPH, and CNN of 67.75%, 52.5%, 52.5%, and 38.5% respectively.

• Previous work [2] has shown the CNN to be more accurate, but this advantage appearsto be nullified when training data is scarce, as the CNN is prone to overfitting.

• HOG ran several times faster than the runner up, LBPH.

• To obtain the desired true-‐accept to false-‐accept ratio, rather than using theconfidence value directly, we use the confidence ratio between the first and secondbest predictions. One nice feature of this approach is that it’s robust to changes in thenumber of classes.

• By using as a threshold, the midpoint between the median confidence ratios for true-‐accepts vs. false-‐accepts during training, we get a true-‐accept rate of 54% and a falseaccept rate of 8%. In practice, this is good enough since poor single-‐frame accuracy canbe mitigated by building a consensus over a number of consecutive moving frames.

• Future work will likely explore the use of meta-‐learners and various hybridizationtechniques involving both feature and decision level fusion.

Acknowledgements:

This research is based upon work supported by the Army Research Office (Contract No.W911NF-‐15-‐1-‐0524).References:[1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-‐714, 2009.[2] R. Dellana, and K. Roy. "Data augmentation in CNN-‐based periocular authentication," Information Communication and Management (ICICM), International Conference on, pp. 141-‐145. IEEE, 2016.

omputer vision- ased makerspace access ontrol …...omputer vision- ased makerspace access ontrol...

Documents