how good are edge boxes, really? - mitsunw.csail.mit.edu/2015/papers/13_mcmahon_sunw.pdf ·...

How Good Are Edge Boxes, Really?

Sean McMahon, Niko Sunderhauf, Ben Upcroft, Michael MilfordAustralian Centre for Robotic Vision (ACRV)

Queensland University of Technology, Brisbane, [email protected]

Abstract

Image region proposals are now commonly placed withthe current object detection algorithms to provide an effi-cient technique to locate object regions. The image regionproposal algorithm, Edge Boxes [7] is evaluated on theNYU2 dataset [4] and compared with the optimal param-eters from a random parameter search. Edge Boxes perfor-mance in the cluttered, low resolution images of the NYU2dataset, commonly encountered in robotics is shown. Whilecomparisons between region proposals has been performed,few have viewed them as a technique that can be optimisedor fine-tuned. The most two influential variables to refinecontrolled the step size of the region search and the candi-date box density. It was found that the default parameterswere among the worst for object recall, while the new re-fined parameters revealed increased recall without sacrific-ing computation time.

1. Introduction

The benefits of region proposals where first shown withR-CNN [2] and since then many systems have made useof similar techniques but few have utilised them as morethan a black-box, not as another technique to be fine-tuned.Region proposals must be more efficient than using a slid-ing window with a CNN and preferably more so than otherregion proposal techniques. High recall is important to pro-vide enough information for the classification algorithm torecognise objects, the more accurate, the fewer candidateboxes to evaluate. The most common region proposal al-gorithms include are Edge Boxes and Selective Search [7],[6]. Edge Boxes has as been shown through experimenta-tion to be the state-of-the-art region proposal system by [3].Therefore it is the Edge Boxes system that will be analysedhere. Built on the Structured Edge Detector, Edge Boxesuse the number of enclosed edges to find proposals, also thenumber of edges at the border of boxes used to rank.

Figure 1. Left: Candidate boxes extracted using Edge Boxes’ de-fault parameters in a cluttered scene do not cover the relevant ob-ject well. Right: After refining the parameters the quality signifi-cantly improves with all picture frames and pillows located.

2. How Good is Edge Boxes for Real-WorldObject Detection?

Fig. 1 (left) illustrates how the default parameters ofEdge Boxes do not achieve high recall compared to thetuned parameters (right) in the cluttered and low resolutionscenes of the NYU2 dataset (RGB images) [4]. The ex-tracted candidate boxes often do not align well with the rel-evant objects in the scene. This is problematic in real worldapplications such as robotics, where these scenes commonlyoccur. Current region proposals have been tuned for typicaldatasets such as PASCAL, CalTech and ImageNet, which isproblematic due to dataset bias that is acknowledged in theComputer Vision community [5], [1]. An example of thisbias can be seen in the car class where ImageNet tends tofavour racing cars, CalTech shows a strong preference forside views and PASCAL have cars at non-canonical views[5].

3. Improving the Default Parameters of EdgeBoxes

Improving the performance of Edge Box can be achievedby refining the parameters with a random parameter search.Images from the NYU2 dataset and the labs’ ”wander-ing robot” were used due to their low resolution, clutteredscenes encountered in the real-world as well as robotics.

1

Because of computation and time expenses the parametersearch was conducted on a selection of the scene types inthe NYU2 dataset, including kitchen, office, classroom andlounge room.

The open source MATLAB implementation of EdgeBoxes was used; the parameters modified were Alpha andBeta. Alpha indicates the IoU (Intersection over Union) forneighbouring candidate boxes; step sizes of the sliding win-dows are determined such that one step results in an IoU ofAlpha. Beta is the Non-Maximum Suppression threshold ofa candidate box, that is, when a neighbouring boxs with anIoU greater than Beta, the lower ranked box is removed. Itwas empirically found that these two variables are the mostinfluential at changing Edge Boxes’ proposals. Other pa-rameters such as the minimum score, Gamma and Kappa[7], were included in the parameter search but did not affectthe proposals significantly enough to justify the increasedcomputational costs of a larger parameter search space.

A pipeline was established to create a performance rankof each parameter set. A performance measurement was de-fined using the maximum IoUs between the generated can-didate boxes and the image objects (ground truth boxes).Originally the average IoU of all Edge Boxes per groundtruth was found, however it resulted in larger boxes beingselected as they covered more objects (ground truths), notreducing enough of the search area for the classifier. In thisway the maximum IoU for each ground truth box was thenchosen as summarised within Eq. 1 below.

α∗, β∗ = argmaxα,β

1

m

m∑k=1

n∑j=1

wi ·maxi IoU(Bki (α, β), Okj )

n

(1)IoU computes the intersection of a proposed box Bki,(α,β)with ground truth box Okj divided by their union area togive an indication of how well aligned the two are.

The weight wi is defined as: wi = µ−iµ where µ is the

maximum possible number of extracted candidate boxes.This weight was added because CNNs are expensive; onlya fraction of the 1000 proposed can be used. Thereforea majority of objects should be enclosed in the highestscored candidate boxes determined, by analysing the first100 boxes computational costs would be reduced. With themaximum IoUs per proposal weighted, all values per imagefound were averaged to indicate parameter performance oneach image. Each of these values was then averaged againfor each parameter set, as seen in Eq. 1.

A performance check was done with the full NYU2dataset without the parameter search images and the robotimages; the refined parameters still outperformed the de-faults with an overall performance value of 0.1046 (refined)and 0.0706 (default) as in Eq 1.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Alpha

Averaged Weighted IoU over Parameter Space

Beta

0.1

0.15

0.2

0.25

0.3

Figure 2. Illustration of the parameter space as defined in Eq. 1.The default parameters (black) clearly score lower than the bestparameters (white) found by a randomised search.

4. ResultsEdge Boxes was evaluated on the NYU2 and the wan-

dering robots’ datasets with a wide variety of parameters,as shown in Fig. 2, which illustrates the values of the per-formance measure Eq. 1 over the parameter space and vi-sualises the default and optimised variables. The param-eter search found that the default variables were amongthe worst for object recall on the NYU2 dataset, while thenew parameters had increased recall with little difference incomputational efficiency. The refined parameter set showsthat the defaults are indeed biased towards the standarduncluttered datasets, and therefore ill-suited for the non-specific robotics applications of the real world.

References[1] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams,

J. Winn, and A. Zisserman. The Pascal Visual Object Classes Chal-lenge: A Retrospective. International Journal of Computer Vision,pages 98–136, 2014.

[2] R. Girshick, J. Donahue, T. Darrell, U. C. Berkeley, and J. Malik.Rich feature hierarchies for accurate object detection and semanticsegmentation. Cvpr’14, pages 2–9, 2014.

[3] J. Hosang, R. Benenson, and B. Schiele. How good are detectionproposals, really? pages 1–25, 2014.

[4] P. K. Nathan Silberman Derek Hoiem and R. Fergus. Indoor Segmen-tation and Support Inference from RGBD Images. In ECCV, 2012.

[5] A. Torralba and A. a. Efros. Unbiased look at dataset bias. Proceed-ings of the IEEE Computer Society Conference on Computer Visionand Pattern Recognition, pages 1521–1528, 2011.

[6] K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M.Smeulders. Segmentation as selective search for object recognition.In 2011 International Conference on Computer Vision, pages 1879–1886. IEEE, Nov. 2011.

[7] C. L. Zitnick and P. Doll. Edge Boxes : Locating Object Proposalsfrom Edges. Eccv2014, pages 391–405, 2014.

2

how good are edge boxes, really? - mitsunw.csail.mit.edu/2015/papers/13_mcmahon_sunw.pdf ·...

Documents