omputer vision- ased makerspace access ontrol …...omputer vision- ased makerspace access ontrol...

2
Computer Vision-Based Makerspace Access Control System Problem Statement: The long-term goal of this project is to create an interactive resident A.I. that can visually identify students and provide access to equipment based on individual permissions using sensors including a Kinect on a custom pan/tilt assembly. Due to privacy concerns, only a single frontal face image will be provided for each student. The system should be able to learn in real-time without the use of a GPU. Approach: Three local-feature-based classifiers were implemented and their performances compared with a Convolutional Neural Network (CNN). The first two are the well-known Histogram of Oriented Gradients (HOG) and Local Binary Pattern Histograms (LBPH). The third is based on Scale Invariant Feature Transform (SIFT). Since all three of these classifiers work by comparing a sample with every template, they are “lazy classifiers,” doing most of their work during prediction rather than training. This is in contrast to the CNN which is created and trained with a fixed number of output nodes each corresponding to one class. The CNN runtime performance scales much better, since, as the number of classes grows, it still need only be run once on a given sample. Unfortunately, this also means that it needs to be retrained every time a new class is added. The computational cost of retraining a CNN makes it infeasible as a solution. Our use of it here is merely as a performance benchmark. Dataset: All models were tested on a hand-picked subset of the Multi-Biometric Grand Challenge (MBGC) dataset [1], with images cropped to include only the face region in 224 x 224 greyscale. The dataset consists of 100 classes, with 1 sample per class used for training and 4 for testing. Images were chosen to be visually dissimilar yet still classifiable by a human. Results: Rank-1 accuracies for HOG, SIFT, LBPH, and CNN were 67.75%, 52.5%, 52.5%, and 38.5% respectively, with HOG as the clear winner. We have performed similar comparisons in previous work [2] using datasets having in excess of 10 training samples and much less visual obfuscation, and found the difference to be less pronounced if not entirely absent. Previous work has also shown the CNN to be the clear winner, but this advantage appears to be nullified when training data is scarce, as CNNs are prone to overfitting. It is also worth noting that, in our experiments, HOG ran several times faster than the runner up, LBPH. Ongoing/Future Work: To find an optimum trade-off between true-accepts and false-accepts, rather than using a raw cut-off, we prefer to use the confidence ratio between the first and second best predictions. In addition to its simplicity, this ratio is also robust to changes in the number of classes. To use it, a naive approach is to take the median confidence ratios for true-accepts vs. false-accepts during training, and use a ratio cutoff at the midpoint between them. Doing this in combination with a simple voting ensemble arrangement involving all three feature- based classifiers, we achieve a true-accept rate of 54% and a false accept rate of 8%. While we’d like true-accepts to be in the 80%+ range, accuracy limitations on a single image are often mitigated in practice by building a consensus over a number of consecutive moving frames. Future work will likely explore the use of meta-learners and various hybridization techniques involving both feature and decision level fusion. Acknowledgements: This research is based upon work supported by the Army Research Office (Contract No. W911NF-15-1-0524). References: [1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-714, 2009. [2] R. Dellana, and K. Roy. "Data augmentation in CNN-based periocular authentication," Information Communication and Management (ICICM), International Conference on, pp. 141-145. IEEE, 2016.

Upload: others

Post on 07-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: omputer Vision- ased Makerspace Access ontrol …...omputer Vision- ased Makerspace Access ontrol System Problem Statement: The long-term goal of this project is to create an interactive

Computer Vision-Based Makerspace Access Control System

Problem Statement: The long-term goal of this project is to create an interactive resident A.I. that can visually

identify students and provide access to equipment based on individual permissions using sensors including a

Kinect on a custom pan/tilt assembly. Due to privacy concerns, only a single frontal face image will be provided

for each student. The system should be able to learn in real-time without the use of a GPU.

Approach: Three local-feature-based classifiers were implemented and their performances compared with a

Convolutional Neural Network (CNN). The first two are the well-known Histogram of Oriented Gradients (HOG)

and Local Binary Pattern Histograms (LBPH). The third is based on Scale Invariant Feature Transform (SIFT). Since

all three of these classifiers work by comparing a sample with every template, they are “lazy classifiers,” doing

most of their work during prediction rather than training. This is in contrast to the CNN which is created and

trained with a fixed number of output nodes each corresponding to one class. The CNN runtime performance

scales much better, since, as the number of classes grows, it still need only be run once on a given sample.

Unfortunately, this also means that it needs to be retrained every time a new class is added. The computational

cost of retraining a CNN makes it infeasible as a solution. Our use of it here is merely as a performance benchmark.

Dataset: All models were tested on a hand-picked subset of the Multi-Biometric Grand Challenge (MBGC) dataset

[1], with images cropped to include only the face region in 224 x 224 greyscale. The dataset consists of 100 classes,

with 1 sample per class used for training and 4 for testing. Images were chosen to be visually dissimilar yet still

classifiable by a human.

Results: Rank-1 accuracies for HOG, SIFT, LBPH, and CNN were 67.75%, 52.5%, 52.5%, and 38.5% respectively,

with HOG as the clear winner. We have performed similar comparisons in previous work [2] using datasets having

in excess of 10 training samples and much less visual obfuscation, and found the difference to be less pronounced

if not entirely absent. Previous work has also shown the CNN to be the clear winner, but this advantage appears

to be nullified when training data is scarce, as CNNs are prone to overfitting. It is also worth noting that, in our

experiments, HOG ran several times faster than the runner up, LBPH.

Ongoing/Future Work: To find an optimum trade-off between true-accepts and false-accepts, rather than using

a raw cut-off, we prefer to use the confidence ratio between the first and second best predictions. In addition to

its simplicity, this ratio is also robust to changes in the number of classes. To use it, a naive approach is to take the

median confidence ratios for true-accepts vs. false-accepts during training, and use a ratio cutoff at the midpoint

between them. Doing this in combination with a simple voting ensemble arrangement involving all three feature-

based classifiers, we achieve a true-accept rate of 54% and a false accept rate of 8%. While we’d like true-accepts

to be in the 80%+ range, accuracy limitations on a single image are often mitigated in practice by building a

consensus over a number of consecutive moving frames. Future work will likely explore the use of meta-learners

and various hybridization techniques involving both feature and decision level fusion.

Acknowledgements: This research is based upon work supported by the Army Research Office (Contract No. W911NF-15-1-0524). References:

[1] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’Toole, D. Bolme, K. W. Bowyer, B. A. Draper, G. H.

Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer, “Overview of the Multiple Biometrics Grand

Challenge,” Advances in Biometrics Lecture Notes in Computer Science, pp. 705-714, 2009.

[2] R. Dellana, and K. Roy. "Data augmentation in CNN-based periocular authentication," Information

Communication and Management (ICICM), International Conference on, pp. 141-145. IEEE, 2016.

Page 2: omputer Vision- ased Makerspace Access ontrol …...omputer Vision- ased Makerspace Access ontrol System Problem Statement: The long-term goal of this project is to create an interactive

Picture  to  illustrate  problem/idea/goals   Goal:  Create  an  interactive  resident  A.I.  that  can  visually  identify  students  and  provide  access  to  equipment  based  on  individual  permissions.  Sensors  include  a  Kinect  on  a  custom  pan/tilt  assembly.  System  can  regulate  access  by  controlling  electrical  outlets.Challenge:  Due  to  privacy  concerns,  only  a  single  frontal  face  image  will  be  provided  for  each  student.  No  additional  training  images  may  be  collected.  System  should  be  able  to  learn  in  real-­‐time  without  the  use  of  a  GPU.  

• Implement three local-­‐feature-­‐based classifiers and compare theirperformance with a Convolutional Neural Network (CNN).

• Histogram of Oriented Gradients (HOG), Local Binary PatternHistograms (LBPH), and Scale Invariant Feature Transform (SIFT)are all explicit feature-­‐based “lazy classifiers.” They do most oftheir work during prediction rather than during training.

• The CNN internalizes the training samples, and doesn’t need tocompare a sample to an ever growing number of templates. Thus,it’s run-­‐timer performance scales better.

• The trade-­‐off for this is that a CNN must be retrained when-­‐ever anew class is added, which makes it unsuitable for our application.

• Our dataset is a hand-­‐picked subset of the Multi-­‐Biometric GrandChallenge (MBGC) dataset [1], with images cropped to includeonly the face region in 224 x 224 greyscale. There are 100 classes,with 1 sample per class used for training and 4 for testing. Imageswere chosen to be visually dissimilar yet still classifiable by ahuman.

Problem  Statement and  Goals

Approach

Results

Computer  Vision-­‐Based  Makerspace  Access  Control  System

Ryan  DellanaNorth  Carolina  Agricultural  and  Technical  State  University

HOG

SIFT

LBPH

LBPH

SIFT

HOG

CNN

CNN

Simplified   3D  rendering  of  our  Convolutional  Neural  Network  Topology

Fully  Connected   Layer

Output  Layer

Max  Pooling  Layer

Convolutional   Layer

Input  Layer

SIFT  Keypoint Matching

Sample Template

Histogram  of  Oriented  Gradients

Local  Binary  Pattern  Histograms

Motion  TrackingFace  Detection

Dangerous  Lab  Equipment

Kinect  on  Custom  Pan/Tilt  Assembly

• HOG is significantly more accurate than the other classifiers with rank-­‐1 accuracies forHOG, SIFT, LBPH, and CNN of 67.75%, 52.5%, 52.5%, and 38.5% respectively.

• Previous work [2] has shown the CNN to be more accurate, but this advantage appearsto be nullified when training data is scarce, as the CNN is prone to overfitting.

• HOG ran several times faster than the runner up, LBPH.

• To obtain the desired true-­‐accept to false-­‐accept ratio, rather than using theconfidence value directly, we use the confidence ratio between the first and secondbest predictions. One nice feature of this approach is that it’s robust to changes in thenumber of classes.

• By using as a threshold, the midpoint between the median confidence ratios for true-­‐accepts vs. false-­‐accepts during training, we get a true-­‐accept rate of 54% and a falseaccept rate of 8%. In practice, this is good enough since poor single-­‐frame accuracy canbe mitigated by building a consensus over a number of consecutive moving frames.

• Future work will likely explore the use of meta-­‐learners and various hybridizationtechniques involving both feature and decision level fusion.

Acknowledgements:

This research is based upon work supported by the Army Research Office (Contract No.W911NF-­‐15-­‐1-­‐0524).References:[1]  P.  J.  Phillips,   P.  J.  Flynn,   J.  R.  Beveridge,  W.   T.  Scruggs,  A.  J.  O’Toole,  D.  Bolme,  K.  W.  Bowyer,  B.  A.  Draper,  G.  H.  Givens,  Y.  M.  Lui,  H.  Sahibzada,  J.  A.  Scallan,  and  S.  Weimer,  “Overview  of  the  Multiple  Biometrics  Grand  Challenge,”  Advances  in  Biometrics  Lecture  Notes  in  Computer  Science,  pp.  705-­‐714,  2009.[2]  R.  Dellana,  and  K.  Roy.  "Data  augmentation  in  CNN-­‐based  periocular  authentication,"  Information  Communication  and  Management  (ICICM),   International  Conference  on,  pp.  141-­‐145.   IEEE,  2016.