high detection-rate cascades for real-time object...

High Detection-rate Cascades for Real-Time Object Detection

Hamed Masnadi-Shirazi, Nuno VasconcelosDepartment of Electrical and Computer Engineering,University of California San Diego

San Diego, CA [email protected], [email protected]

Abstract

A new strategy is proposed for the design of cascadedobject detectors of high detection-rate. The problem ofjointly minimizing the false-positive rate and classificationcomplexity of a cascade, given a constraint on its detec-tion rate, is considered. It is shown that it reduces to theproblem of minimizing false-positive rate given detection-rate and is, therefore, an instance of the classic problem ofcost-sensitive learning. A cost-sensitive extension of boost-ing, denoted byasymmetric boosting, is introduced. Itmaintains a high detection-rate across the boosting itera-tions, and allows the design of cascaded detectors of highoverall detection-rate. Experimental evaluation shows that,when compared to previous cascade design algorithms, thecascades produced by asymmetric boosting achieve signif-icantly higher detection-rates, at the cost of a marginal in-crease in computation.

1. Introduction

Significant attention has been devoted to the problemof real-time object detection in recent years. The seminalcontribution was the cascaded object detector of Viola andJones (VJ) [9]. It achieves real time detection with perfor-mance comparable to that of previous state of the art meth-ods. Although its success is due to a combination of innov-ative ideas, such as using a large set of very simple features,efficient computation through integral images, and the re-course to weak classifiers and boosting, one of the great-est assets of the VJ detector is the adoption of a cascadedclassifier architecture[9]. In particular, the detector isim-plemented as a sequence of classifiers of high detection andlow false positive rate. Because most image locations caneasily be identified as not containing the target object, thisallows the rejection of non-targets with a very small num-ber of feature evaluations, enabling extreme computationalefficiency. In the original VJ algorithm, a standard imple-mentation of AdaBoost [3] is used to identify the set of dis-criminant features which, along with a threshold, make up

each stage of the cascaded detector. The overall goal is tomeet pre-specified detection and false-positive rates for thewhole cascade, while maximizing detection speed.

While cascades designed with the procedure of [9] areextremely fast, the lack of an optimal method for cascadedesign is problematic in two ways. First, it makes this de-sign dependent on manual supervision. Because the stagesof the cascade are thresholded combinations of weak learn-ers, there is a need to search for the configuration (numberof learners and threshold) of each stage. This usually re-quires trial and error and can be very time consuming, com-promising the viability of cascaded detectors as a practicalsolution for generic object detection. Second, the result-ing cascades are inevitably sub-optimal. This is particularlyproblematic because the objects to detect occur rarely in theimages where detection is performed, and the cost of loos-ing each occurrence is usually large. The problem is com-pounded by the fact that the missed detections of the variousstages accumulate, and it is very difficult to design cascadesof high overall detection rate.

The limitations of the original VJ design have motivateda number of enhancements in the recent years. Most ofthese contributions, while improving various aspects of theoriginal design - e.g. by proposing improved boosting algo-rithms (or other feature selection procedures) [5, 2], embed-ded cascade structures [11, 10], cost-sensitive extensionsfor the design of cascade stages [7, 4], or threshold post-processing [1, 6] - have not addressed the problem of howto optimally design the whole cascade. This problem hasonly recently started to receive some attention [2, 8], andinteresting theoretical formulations have been proposed forits solution, but their translation into practical algorithms re-quires assumptions or approximations that may not alwayshold. Perhaps due to this, these formulations have not yetproved effective at producing high detection-rate cascades.

In this work, we propose a new cascade design algorithmthat addresses this problem. We introduce a novel formula-tion for cascade design, which poses the problem as oneof jointly minimizing computational cost and false positiverate for a given detection rate. We then show that this re-

0 100 200 300 400 500 600 700 80091

92

93

94

95

96

97

98

# False Positives

Det

ectio

n%

No CascadeT=800EmbeddedT=800EmbeddedT=400EmbeddedT=200WaldBoostViola−JonesBoosting Chain

Figure 1. ROC of the various cascades discussed in the text, forface detection on the MIT-CMU test set.

duces to the problem of minimizing false positive given de-tection rate, i.e. the classic problem of cost-sensitive learn-ing. We next introduce a cost-sensitive extension of boost-ing that maintains a high detection rate across the boostingiterations, and allows the design of single-feature embed-ded cascades with high overall detection rate. Experimentalevaluation shows that, when compared to the state-of-the-art, these cascades do achieve significantly higher detectionrates, at the cost of a marginal increase in computation.

2. Evaluation

We performed various experiments to evaluate the con-tributions of this paper. Unless otherwise noted, all ex-periments followed the experimental protocol of [9], us-ing a face database of10K positive and350K negativeexamples, and weak learners combining decision stumpsand Haar wavelet features (selected from a feature pool of50, 000). Because asymmetric boosting maintains a high-detection rate throughout all iterations, it enables the designof cascades “a posteriori”. This consists of simply addingexit points (i.e. rejecting examples classified as negatives)at intermediate steps of the detector. In particular, all singlefeature cascades discussed in what follows were producedby running asymmetric boosting to obtain a non-cascadeddetector, and then introducing exit points after the evalua-tion of each weak learner.

Figure 1 presents a comparison between the performanceof a single feature embedded cascade and the cascades pro-duced by three state-of-the-art methods: VJ [9], Boostingchain [11] and WaldBoost [8]. The evaluation was con-ducted on the CMU-MIT face test set, and the detectionrate and number of total false positives are reported for a se-ries of points on the ROC curves of the different cascades.We compared the performance of three single-feature em-bedded cascades obtained from the same boosting run, by

considering800, 400 and200 features. Note that all threeachieve much higher detection rates than those produced bythe previous methods. In fact, the only method that achieveswithin 3% of their detection rate is WaldBoost, and only forsubstantially larger false positive rates. The previous meth-ods produce smaller false positives in the low detection-rate regime, but we have not tried to optimize the perfor-mance in this area: all embedded cascades were obtainedwith (C1 = 5,C2 = 1), other cost configurations wouldhave given greater preference to the low false-positive rateregion.

It is also worth emphasizing that the design of the em-bedded cascade only required350K random negative ex-amples, extracted from images not containing faces. Thesewere available in our lab from previous experiments withface detectors, but were not bootstrapped specifically for thedesign of this cascade. Although bootstrapping for more ex-amples could have been used to improve performance, it isnot a strict requirement for embedded cascades. This is incontrast to the millions of negative image patches used inthe design of other methods. It is also interesting to notethat all three embedded cascades achieve high detection-rates. In particular, note that the detection-rate does notseem to decrease from200 to 800 features. This is ev-idence that asymmetric boosting maintains high-detectionrates throughout the whole boosting run. It also challengesthe assumption of independent errors which, for single fea-ture embeddings, would imply an exponential decrease indetection rate. On the contrary,there appears to be nopenalty for the cascading operation. To test this premise,we have also measured the ROC curve of the non-cascadedboosted detector. As can be seen from Figure 1, there is vir-tually no difference between its ROC and that of the tremen-dously faster single feature cascade.

Table 1 provides a comparison of the number of features(#F) and average number of features used (Avg-Used) bydifferent methods. The non-cascaded detector ofn featureswould usen features on average. The embedded cascadedesign reduces this number to about15, a number compa-rable to that produced by the other methods. VJ and Wald-Boost have slightly faster evaluation times, but significantlyworse detection rates. Finally, the total training time wasof2 days for the200 feature embedded cascade, and consisteduniquely ofCPU time(no manual supervision), as opposedto the ”weeks” of trial and error reported by VJ.

References

[1] L. Bourdev and J. Brandt. Robust object detection via softcascade. InCVPR, 2005.

[2] S. C. Brubaker, M. D. Mullin, and J. M. Rehg. Towardsoptimal training of cascaded detectors. InECCV, 2006.

[3] Y. Freund and R. Schapire. A decision-theoretic generaliza-tion of on-line learning and an application to boosting.Com-

Method VJ Waldboost Embedded200 Embedded400 Boosting Chain#F 4297 400 200 400 700

Avg-Used 8 10.84 15.45 15.95 18.1Table 1. comparison of length and average number of features used .

puter and System Sciences, 1997.[4] X. Hou, C. Liu, and T. Tan. Online learning asymmetric

boosted classifiers for object detection. InCVPR, 2007.[5] S. Z. Li, Z. Zhang, H.-Y. Shum, and H. Zhang. Floatboost

learning and statistical face detection. InPAMI, 2005.[6] H. Luo. Optimization design of cascaded classifiers. In

CVPR, 2005.[7] Y. Ma and X. Ding. Robust real-time face detection based on

cost-sensitive adaboost method . InICME, 2003.[8] J. Sochman and J. Matas. Waldboost-learning for time con-

strained sequential detection. InCVPR, 2005.[9] P. Viola and M. Jones. Robust real-time object detection.

In Workshop on Statistical and Computational Theories ofVision, 2001.

[10] B. Wu, H. AI, C. Huang, and S. Lao. Fast rotation invariantmulti-view face detection based on real adaboost. InFGR,2004.

[11] R. Xiao, L. Zhu, and H.-J. Zhang. Boosting chain learningfor object detection. InICCV, 2003.

high detection-rate cascades for real-time object...

Documents