an efficient feature descriptor based on synthetic basis functions and uniqueness matching strategy...

13
Computer Vision and Image Understanding 142 (2016) 37–49 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu An efficient feature descriptor based on synthetic basis functions and uniqueness matching strategy Alok Desai a , Dah-Jye Lee a,, Dan Ventura b a Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602, USA b Computer Science Department, Brigham Young University, Provo, Utah 84602, USA article info Article history: Received 19 January 2015 Accepted 20 September 2015 Available online 30 September 2015 Keywords: Feature detection Feature descriptor Synthetic basis functions Feature matching abstract Feature matching is an important step for many computer vision applications. This paper introduces the development of a new feature descriptor, called SYnthetic BAsis (SYBA), for feature point description and matching. SYBA is built on the basis of the compressed sensing theory that uses synthetic basis functions to encode or reconstruct a signal. It is a compact and efficient binary descriptor that performs a number of similarity tests between a feature image region and a selected number of synthetic basis images and uses their similarity test results as the feature descriptors. SYBA is compared with four well-known binary descriptors using three benchmarking datasets as well as a newly created dataset that was designed specifically for a more thorough statistical T-test. SYBA is less computationally complex and produces better feature matching results than other binary descriptors. It is hardware-friendly and suitable for embedded vision applications. © 2015 Elsevier Inc. All rights reserved. 1. Introduction Computer vision applications often involve computationally- intensive tasks such as target tracking [1–3], object identification [4,5], image rectification, localization, pose estimation [1,6–9], opti- cal flow [10,11] and many more. The initial steps of all these appli- cations are the detection, description, and matching of high quality feature points, with feature description and matching being the most challenging and time-consuming processes. They focus on computing abstractions of image information that are associated with the points of interest detected by the feature detector. There exist a large number of feature descriptors [12–28], but not every one of them is suitable for hardware implementation for real- time applications. An efficient feature descriptor for real-time appli- cations should not be too computationally complex. To be hardware- friendly, it should not use many square root, division, or exponential operations that require floating-point computations. A good feature descriptor is able to describe a feature point and measure its simi- larity to other feature points. It allows a feature point to be correctly identified and matched to a feature point in another image that has similar characteristics. A well-known orientation and magnitudes- of-intensity gradient-based feature descriptor is Scale Invariant Fea- ture Transform (SIFT) [12]. It works well on intensity images and This paper has been recommended for acceptance by Michael M Bronstein. Corresponding author. Fax: 801 422 0201. E-mail addresses: [email protected] (A. Desai), [email protected] (D.-J. Lee), [email protected] (D. Ventura). provides descriptors that are invariant to rotation and scaling. How- ever, increased complexity and robustness come with the increase in computation and storage requirements and make it not suitable for many resource-limited platforms and real-time applications. An- other well-known descriptor, Speeded-UP Robust Features (SURF) computes descriptors using integral images and 2-D Haar Wavelet transform [13,14]. A minor drawback is that it requires 256 bytes to encode 64 floating-point values. Ke and Sukthankar [15] developed a descriptor and applied di- mensional reduction technique, Principal Component Analysis (PCA), to the normalized image gradient patch. PCA-SIFT performs better than SIFT descriptor on artificially generated data. At the same time, it has the benefit of reducing high frequency noise in the descriptors. The drawback is that it is not tuned to obtain a sub-space that will be discriminative for matching [16]. Another, dimensional reduction technique Linear Discriminant Embedding (LDE) was developed by Hua et al. [16]. In order to perform well, LDE requires labeled train- ing data, which are difficult to obtain. They are both suitable for fea- ture description and require smaller descriptor size than many well- known approaches. Simonyan et al. [17] and Trzcinski et al. [18] developed descrip- tors based on training data. Both descriptors require complex oper- ations and require computational resources that are not suitable for hardware implementations. The accuracy of these descriptors may be affected when they are applied to applications that are completely different from the training dataset. Simonyan et al. proposed to use floating point and then convert it to binary which clearly causes the loss of matching accuracy [17]. Similarly, other descriptors [19–22] http://dx.doi.org/10.1016/j.cviu.2015.09.005 1077-3142/© 2015 Elsevier Inc. All rights reserved.

Upload: rajpatel

Post on 26-Jan-2016

218 views

Category:

Documents


2 download

DESCRIPTION

fdjbsk

TRANSCRIPT

Page 1: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

Computer Vision and Image Understanding 142 (2016) 37–49

Contents lists available at ScienceDirect

Computer Vision and Image Understanding

journal homepage: www.elsevier.com/locate/cviu

An efficient feature descriptor based on synthetic basis functions and

uniqueness matching strategy✩

Alok Desai a, Dah-Jye Lee a,∗, Dan Ventura b

a Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602, USAb Computer Science Department, Brigham Young University, Provo, Utah 84602, USA

a r t i c l e i n f o

Article history:

Received 19 January 2015

Accepted 20 September 2015

Available online 30 September 2015

Keywords:

Feature detection

Feature descriptor

Synthetic basis functions

Feature matching

a b s t r a c t

Feature matching is an important step for many computer vision applications. This paper introduces the

development of a new feature descriptor, called SYnthetic BAsis (SYBA), for feature point description and

matching. SYBA is built on the basis of the compressed sensing theory that uses synthetic basis functions

to encode or reconstruct a signal. It is a compact and efficient binary descriptor that performs a number of

similarity tests between a feature image region and a selected number of synthetic basis images and uses their

similarity test results as the feature descriptors. SYBA is compared with four well-known binary descriptors

using three benchmarking datasets as well as a newly created dataset that was designed specifically for a

more thorough statistical T-test. SYBA is less computationally complex and produces better feature matching

results than other binary descriptors. It is hardware-friendly and suitable for embedded vision applications.

© 2015 Elsevier Inc. All rights reserved.

1

i

[

c

c

f

c

a

o

e

t

c

f

o

d

l

i

s

o

t

v

p

e

i

f

o

c

t

e

m

t

t

i

T

b

t

H

i

t

k

t

a

h

1

. Introduction

Computer vision applications often involve computationally-

ntensive tasks such as target tracking [1–3], object identification

4,5], image rectification, localization, pose estimation [1,6–9], opti-

al flow [10,11] and many more. The initial steps of all these appli-

ations are the detection, description, and matching of high quality

eature points, with feature description and matching being the most

hallenging and time-consuming processes. They focus on computing

bstractions of image information that are associated with the points

f interest detected by the feature detector.

There exist a large number of feature descriptors [12–28], but not

very one of them is suitable for hardware implementation for real-

ime applications. An efficient feature descriptor for real-time appli-

ations should not be too computationally complex. To be hardware-

riendly, it should not use many square root, division, or exponential

perations that require floating-point computations. A good feature

escriptor is able to describe a feature point and measure its simi-

arity to other feature points. It allows a feature point to be correctly

dentified and matched to a feature point in another image that has

imilar characteristics. A well-known orientation and magnitudes-

f-intensity gradient-based feature descriptor is Scale Invariant Fea-

ure Transform (SIFT) [12]. It works well on intensity images and

✩ This paper has been recommended for acceptance by Michael M Bronstein.∗ Corresponding author. Fax: 801 422 0201.

E-mail addresses: [email protected] (A. Desai), [email protected] (D.-J. Lee),

[email protected] (D. Ventura).

h

b

d

fl

l

ttp://dx.doi.org/10.1016/j.cviu.2015.09.005

077-3142/© 2015 Elsevier Inc. All rights reserved.

rovides descriptors that are invariant to rotation and scaling. How-

ver, increased complexity and robustness come with the increase

n computation and storage requirements and make it not suitable

or many resource-limited platforms and real-time applications. An-

ther well-known descriptor, Speeded-UP Robust Features (SURF)

omputes descriptors using integral images and 2-D Haar Wavelet

ransform [13,14]. A minor drawback is that it requires 256 bytes to

ncode 64 floating-point values.

Ke and Sukthankar [15] developed a descriptor and applied di-

ensional reduction technique, Principal Component Analysis (PCA),

o the normalized image gradient patch. PCA-SIFT performs better

han SIFT descriptor on artificially generated data. At the same time,

t has the benefit of reducing high frequency noise in the descriptors.

he drawback is that it is not tuned to obtain a sub-space that will

e discriminative for matching [16]. Another, dimensional reduction

echnique Linear Discriminant Embedding (LDE) was developed by

ua et al. [16]. In order to perform well, LDE requires labeled train-

ng data, which are difficult to obtain. They are both suitable for fea-

ure description and require smaller descriptor size than many well-

nown approaches.

Simonyan et al. [17] and Trzcinski et al. [18] developed descrip-

ors based on training data. Both descriptors require complex oper-

tions and require computational resources that are not suitable for

ardware implementations. The accuracy of these descriptors may

e affected when they are applied to applications that are completely

ifferent from the training dataset. Simonyan et al. proposed to use

oating point and then convert it to binary which clearly causes the

oss of matching accuracy [17]. Similarly, other descriptors [19–22]

Page 2: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

38 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

i

o

T

m

o

A

t

d

s

p

o

f

m

d

l

s

o

b

a

n

M

w

i

t

r

v

t

A

require time-consuming training process and complex operations.

Even though some descriptors [17–21] use smaller descriptor size

than SYBA, they are not suitable for low resource platforms. In re-

cent years, new feature descriptors such as Binary Robust Indepen-

dent Elementary Features (BRIEF) [23,24], Binary Robust Invariant

Scalable Keypoints (BRISK) [25], and Aggregated LOcal HAar (ALOHA)

[26] have been reported. These feature descriptors are all based on

intensity comparisons. ALOHA is based on a set of Haar-like pixel pat-

terns defined within an image patch. It performs intensity difference

tests to encode the image patch into a binary string. ALOHA requires

larger patch size and slightly fewer operations than BRIEF descriptor.

Even with larger patch size the results are not robust to viewpoint

changes and illumination changes. BRIEF descriptor trades reliability

and robustness for processing speed, consisting of a binary string that

contains the results of simple image intensity comparisons at random

pre-determined pixel locations. BRISK relies on configurable circular

sampling patterns from which it computes brightness comparisons

to represent a binary descriptor. Overall, BRISK descriptor requires

significantly more computation and slightly more storage space than

BRIEF descriptor. Both of these algorithms use faster feature detectors

and smaller descriptor size than SIFT and SURF. As another alternative

to SIFT and SURF descriptor, Rublee et al. introduced a new version

of BRIEF called rBRIEF descriptor [27]. It is based on a specific set of

256 learned pixel pairs selected for reducing correlation among the

binary tests.

Feature description has been an active area of research in com-

puter vision and machine learning. The main objective of this work is

to develop a simple and hardware-friendly feature descriptor to re-

duce the computational power requirement and increase the speed

and accuracy of feature matching. Our new descriptor is inspired by

recent work in compressed sensing [29]. Compressed sensing theory

is used to encode and decode a signal efficiently and reduces band-

width and storage requirements. It is able to uniquely describe a sig-

nal with synthetic basis functions, which makes it a perfect theory for

feature description.

i

a

b c

e f

Fig. 1. (a) Example ship locations in the Battleship game. (b–d) Three 15 × 15 synthetic basi

Guess results using these three random patterns.

To understand the theory, consider the popular game of Battleship,

n which the best result can be obtained by using an adaptive strategy

f counting the number of hits in recursively subdivided half-planes.

he major drawbacks of this adaptive strategy are that it requires

emory space to record and processing power to analyze all previ-

us guesses and guess results in order to determine the next guess.

nderson developed a new compressed sensing algorithm based on

his adaptive strategy, using synthetic basis functions instead of sub-

ivided half-planes to minimize memory space requirement [29].

Using the battleship game as an analogy, the basic idea of using

ynthetic basis functions for compressed sensing is to use a random

attern (as shown in Fig. 1(b–d)) as a guess. The biggest advantage

f using synthetic basis functions is that it does not require memory

or storing previous guesses. Reducing memory space requirement

akes using synthetic basis functions an excellent choice for feature

escription for resource-limited systems.

As an example, in Fig. 1(a) the orange squares represent battleship

ocations in a 15×15 area and the black squares in Fig. 1(b–d) repre-

ent the guessed locations of the battleships. The maximum number

f different random patterns (or turns) that is required to locate all

attleships using this non-adaptive approach is surprisingly the same

s the original adaptive strategy approach (but with the benefit of sig-

ificantly reduced memory space), which is given by Anderson [29]

= C

(K ln

N

K

), (1)

here N is the number of squares on the game board (n × n) and K

s the number of queried battleship locations. C is round up number

o the nearest integer. M is the number of random patterns (turns)

equired to locate all ships and is the smallest when K = N/2. This

ery small number of random patterns is sufficient to locate all bat-

leships.

Fig. 1(a) shows that there are seven battleships in a 15 × 15 area.

s shown in Fig. 1(e), out of seven battleships, six of them (squares

n blue) are hit or guessed correctly because their locations coincide

d

g

s images with 113 ((15 × 15)/2) random black squares as the guessed locations. (e–g)

Page 3: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 39

w

i

t

w

b

s

r

t

(

1

n

t

b

t

f

s

t

l

u

M

t

t

t

S

t

B

i

S

a

2

f

t

e

p

a

a

d

2

s

o

n

m

n

m

n

a

f

i

T

o

E

r

p

t

w

3

a

b

Fig. 2. (a) A sample of 30 × 30 synthetic basis image and (b) a 5 × 5 synthetic basis

image (zoomed).

t

a

t

n

N

m

a

p

2

o

t

c

t

F

g

p

t

t

t

g

w

I

g

g

t

t

g

t

r

t

t

p

m

S

i

s

u

i

t

ith six of the black squares in the random pattern (or turn) shown

n Fig. 1(b). One ship is missed (square in orange) using the same pat-

ern (turn) shown in Fig. 1(b) because its location coincides with a

hite square. Similarly, there are six and five battleships (squares in

lue) are hit or guessed correctly using random patterns (or turns)

hown in Fig. 1(c) and (d). Their results are shown in Fig. 1(f) and (g),

espectively. According to Eq. 1, the number of random patterns or

urns required to locate all battleships in a guessing game of this size

15×15), using the unique (non-repetitive) basis patterns (K = N/2), is

13 ln (225/113) or 78.

Inspired by this compressed sensing theory, we have developed a

ew descriptor algorithm using synthetic basis functions, called SY-

hetic BAsis (SYBA). It uses a number of randomly generated synthetic

asis images (SBIs) as the guesses in a “battleship game” to measure

he similarity between a small image region surrounding a detected

eature point, called a feature region image (FRI), and the SBIs. The

imilarities between an FRI in the image and all SBIs are then used as

he feature descriptor.

This work involves a unique way of measuring descriptor simi-

arity in order to match similar features between two images. This

nique way of measuring descriptor similarity is less complex than

ahalanobis and Euclidean methods. This work also includes a fea-

ure matching strategy that contains a two-pass search to enforce

he uniqueness constraint and global minimum requirement to de-

ermine the best matching feature pairs. The new descriptor, called

YBA, is introduced in Section 2. Experimental results based on fea-

ure matching comparison with two widely used binary descriptors,

RIEF-32 and rBRIEF, are presented in Section 3. In Section 3, also

ncludes the statistical T-test experiments on newly created dataset.

ection 4 summaries the paper with discussion of the performance

nd ideas for future work.

. SYBA descriptor algorithm

Well-known binary descriptors are often used for benchmarking

eature description performance. BRIEF descriptor compares the in-

ensity of two randomly selected pixels and uses the intensity differ-

nce as a descriptor [23]. Rather than intensity difference, SYBA com-

ares a feature region image with a number of synthetic basis images

nd uses the similarity measures as the feature descriptor. The cre-

tion of the synthetic basis images and the computation of the SYBA

escriptor are the two major parts of this algorithm.

.1. Synthetic basis images

Synthetic basis images are sparse images. They differ from the ba-

is dictionary images created from natural or man-made objects in

ur previous work [30,31]. All basis dictionary images created from

atural or man-made objects are not always sparse. There are two

ajor differences between the basis functions created from random

umbers (i.e. synthetic) and basis images created from natural/man-

ade objects. The computation time for basis images created from

atural and man-made objects require several hours, while basis im-

ges created using random numbers (i.e. synthetic) require at most a

ew seconds. Memory space required to create synthetic basis images

s far less than basis images created from natural/man-made objects.

hese are the reasons the synthetic basis are much better.

The number of synthetic basis images (M) represents the number

f “turns” as in the battleship game and is calculated according to

q. (1). Of course, a larger number of synthetic basis images are

equired for a larger pixel region surrounding the detected feature

oints or feature region image (FRI). The maximum number of syn-

hetic basis images required is 9 (when K = N/2) for a 5 × 5 FRI,

hereas a 30 × 30 FRI requires 312 synthetic basis images.

Two examples of synthetic basis images are shown in Fig. 2. One is

0 × 30 and the other is 5 × 5. Synthetic basis images similar to these

wo are used for SYBA descriptor calculation. The first step to creating

synthetic basis image is to determine its dimension (N = n×n). Once

he dimension of the synthetic basis image is determined, K (=N/2)

ormally distributed pseudo random numbers are generated from [1,

, N] to represent the black squares in the synthetic basis image.

ote that even for small SBIs (e.g., 5 × 5) that are generated in this

anner, all SBIs in one set (M) will be uniquely represented (with

probability equals to 0.99999893), and thus no specially designed

atterns are needed.

.2. Descriptors calculation and complexity

The main function of the SYBA descriptor is to “describe” the FRI

f an image feature point in a unique way so that feature points be-

ween two images can be matched. SYBA descriptor does not require

omplex descriptor calculations and yet is able to provide good fea-

ure matching accuracy. SYBA descriptor algorithm is illustrated in

ig. 3.

The first step of the SYBA algorithm is to detect feature points and

enerate a feature list. Any feature detector can be used for this pur-

ose. This work uses SURF as the feature detector because it is several

imes faster than SIFT [13]. For each feature on the feature list, its fea-

ure region is cropped and saved as a 30 × 30 FRI. The second step of

he algorithm is to calculate the average intensity (g) of the FRI as

=∑

x,y I(x, y)

p, (2)

here p is the number of pixels in the image (900 in this case), and

(x, y) is the intensity value at pixel location (x, y). A binary FRI is then

enerated based on the average intensity g. If I(x, y) is brighter than

, the binary FRI at the pixel location (x, y) is set to one, otherwise

he value is set to zero. The last step of the algorithm is to calculate

he similarity between the binary FRI and each of the SBIs in order to

enerate a descriptor for each binary FRI on the feature list.

A unique SYBA similarity measure (SSM) is developed to measure

he similarity between the FRI and a selected number of SBIs. The

esult of SSM represents an accurate feature description because it

akes into account the spatial and structural information of the fea-

ure region. The output of the SSM is then used to describe the feature

oint for feature matching as shown Fig. 3.

For the experiments, SYBA with two different sizes was imple-

ented. One was computed with SBIs of size 5 × 5 and named

YBA5 × 5. The maximum number of SBIs required for SYBA5 × 5

s 9 when half of the pixels (N = 25 and K = 13) are black. Fig. 4(a)

hows an example of 9 5 × 5 SBIs labeled from 1 to 9. The other size

sed for experiments was 30 × 30 and named SYBA30 × 30. The max-

mum number of SBIs required for SYBA30 × 30 is 312 when half of

he pixels (N = 900 and K = 450) are black. Once the required SBIs

Page 4: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

40 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

Fig. 3. The flowchart of the SYBA descriptor algorithm.

a

c

b

d

Fig. 4. (a) Nine 5 × 5 synthetic basis images labeled 1–9, (b) A 30 × 30 feature region

image (FRI) that is divided into 36 5 × 5 subregions, (c) Similarity measure between

the highlighted 5 × 5 subregion and the first SBI, and (d) Similarity measure between

the highlighted 5 × 5 subregion and the second SBI.

s

t

h

i

b

r

b

g

F

h

s

h

r

3

s

b

u

o

4

N

(

s

are generated, they should not be changed in order to use the same

patterns to test the next image.

Fig. 4 shows an example of how the SSM is calculated between a

30 × 30 FRI and SYBA5 × 5. The SSM between a 30 × 30 binary FRI

and SYBA30 × 30 is be calculated in a similar manner. The first step of

the SSM calculation is to divide the 30 × 30 binary FRI into 36 equal-

ized 5 × 5 pixel subregions (as shown in Fig. 4(b)). The next step is

o count how many pixels in the 5 × 5 subregion of the binary FRI are

it by each of the 9 SBIs in Fig. 4(a). Each of these 36 5 × 5 subregions

s compared with each of the 9 5 × 5 SBIs and the number of times

oth contain a black pixel at the same location is counted as a hit.

The maximum possible number of hits for comparing a 5 × 5 sub-

egion with a 5 × 5 SBI is 13 because there are only 13 (K = 25/2)

lack pixels in each SBI. For example, the highlighted 5 × 5 subre-

ion shown in Fig. 4 (b) compared with SBI #1 has 5 hits (shown in

ig. 4(c)). The same subregion in Fig. 4(b) compared with the SBI #2

as 4 hits (shown in Fig. 4(d)). After comparing with all 9 SBIs, each

ubregion will yield 9 numbers ranging from 0 to 13. The number of

its in each subregion is stacked in the feature descriptor. Each sub-

egion will use these 9 numbers as its feature descriptor. Therefore, a

0 × 30 FRI with 36 5 × 5 subregions will require a feature descriptor

ize of 36 (5 × 5 subregions) × 9 (5 × 5 SBIs) × 4 bits (0–13) = 1,296

its or 162 bytes.

For the SYBA30 × 30 implementation, the entire 30 × 30 FRI is

sed to compare with 312 30 × 30-pixel SBIs. The maximum number

f hits between the FRI and each SBI is 450 because there are only

50 (K = 900/2) black pixels in the entire 30 × 30 SBI. The resulting

feature descriptor size for SYBA30 × 30 is 1(FRI 30 × 30 region) ×312

(30 × 30 SBIs) × 9 bits (0–450) = 2,808 bits or 351 bytes.

SYBA descriptor size can be easily adjusted by changing the sizes

of SBI and FRI. A generalized approach that describes SYBA descriptor

size is as follow. Choose the FRI dimension F first and then choose

the SBI dimension S to be an integer factor Q of F so that S×Q = F.

ote that M is a function of K and N (Eq. (1)), K is a function of N

K = N/2), N divisible by S, and Q = F/S. These relationships allow

complete parameterization of SYBA in terms of just F (the dimension

of an FRI) and S (the dimension of an SBI). The SYBA descriptor size is

Q×Q (# of subregions) × M (# of SBIs) × log2K bits.

Although the compressed sensing theory is able to uniquely repre-

ent a signal, the feature representation may not be unique due to the

Page 5: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 41

w

s

t

o

c

2

p

r

a

g

a

t

r

o

s

d

w

a

i

f

f

t

a

d

t

s

i

a

i

m

t

r

t

t

i

i

p

h

p

t

l

u

c

t

t

f

f

r

s

5

p

i

l

p

i

d

d

p

p

a

t

d

f

l

e

F

l

b

o

a

i

a

m

i

F

5

r

u

a

i

T

w

m

t

o

c

f

F

r

T

3

r

a

p

t

w

t

p

t

T

P

r

c

u

m

t

m

t

t

ay the descriptor is calculated as shown in Fig. 4. We do lose some

patial uniqueness by only counting the number of “hits” and not

racking where the “hits” are like in the Battleship game. The trade-

ff we make to simplify our descriptor calculation sacrifices (statisti-

ally) the uniqueness a little.

.3. Matching features

The SYBA descriptor is used to find the best matching feature

oints between two image frames. In this process, 324 (36 (5 × 5 sub-

egions) × 9 (5 × 5 SBIs)) descriptor elements ranging from 0 to 13

re used as feature descriptors for SYBA5 × 5, and 312 (1 (30 × 30 re-

ion) × 312 (30 × 30 SBIs)) descriptor elements ranging from 0 to 450

re used as feature descriptors for SYBA30 × 30. To minimize compu-

ational complexity, for determining similarity we use the L1 norm

ather than other common comparison metrics such as Euclidean

r Mahalanobis distance, which require complex operations such as

quare and square root.

The L1 norm is computed as the sum of absolute differences:

=n∑

i=1

|xi − yi| (3)

here, xi is the score for region of the feature point in the first im-

ge, and yi is the score for region of the feature point in the second

mage, n is total number of regions used in the basis comparison (324

or SYBA5 × 5 and 312 for SYBA30 × 30). The similarity between two

eatures is represented by d and the smallest L1 norm (d) represents

he best match of features between two images. Eq. (4) shows an ex-

mple of SYBA descriptor calculation. Each row represents a feature

escriptor. The d value for Eq. (4) is 5 between the two example fea-

ure descriptors.

(4)

To match the features, we first determine point-to-point corre-

pondences using the similarity measure. We select each descriptor

n the first image and compare it to all descriptors in the second im-

ge by calculating the d value as shown above. The remaining process

s divided into two steps: (1) two-pass search, and (2) global mini-

um requirement. First we use a two-pass search to find feature pairs

hat uniquely match to each other. We then use a global minimum

equirement to screen for possible good matching feature pairs from

he remaining feature points.

1) Two-Pass Search:

In this step, the first pass is to find the minimum distance d be-

ween one feature in the first image and all features in the second

mage. The feature that has the smallest distance in the second image

s considered a match to the feature in the first image. The second

ass it to confirm that the matched feature in the second image also

as the shortest distance to its match in the first image. If the second

ass fails to confirm the reciprocal of the shortest distance between

he two, then they are not matched. They will remain on the feature

ist and to be tested in the second step. This two-pass search ensures a

nique one-to-one match and eliminates any possible ambiguity. Be-

ause our aim is to find unique matching feature point pairs between

wo images, a feature that matches to two or more features that have

he same shortest distance is not considered and will remain on the

eature list. After the completion of the two-pass search, the matched

eature pairs are excluded from any further matching processes. The

emaining feature points in both images are then further tested in the

econd step.

Fig. 5 shows an example of the two-pass search. As shown in Fig.

(a), there are 8 feature points in image-1 (vertical) and 7 feature

oints in image-2 (horizontal). The similarity between feature points

n image-1 and feature points in image-2 is calculated using Eq. 3. The

ast (right) column shows the minimum d value of each row by com-

aring each feature point in image-1 with all of the feature points

n image-2. For feature point-3 of image-1, there are two smallest

istances of 3 in image-2: feature point-2 and feature point-3. This

istance is shown as (3, 3) in the last column. Also for the feature

oint-7 in image-1, there are two equal d values (2, 2) for feature

oint-1 and feature point-3 in image-2. The row minimum d values

re highlighted by horizontal black lines in Fig. 5(b). The last (bot-

om) row shows the minimum d value of each column. The minimum

value is found by comparing each feature point of image-2 with all

eature points of image-1. The column minimum d values are high-

ighted by vertical black lines in Fig. 5(b). Feature points are consid-

red uniquely matched if they have the mutually shortest distance.

our pairs of these mutual matches are highlighted in blue crossed

ines in Fig. 5(b).

Because the aim is to find unique matching feature point pairs

etween two images in this step, any matches that have more than

ne smallest d value are not considered a match. Point 3 in image-1

nd Point 2 in image-2 are not considered a match because Point 3 in

mage-1 and Point 3 in image-2 also have a minimum distance 3. Only

unique smallest d value in the same row or column can be called a

atching pair. Three unique matching pairs between feature points

n image-1 and feature points in image-2 are highlighted in blue in

ig. 5(c) using this two-pass search. Feature point numbers 1, 4, and

in image-1 match to feature point numbers 1, 5, and 3 in image-2,

espectively. Since these feature points have been paired with their

nique matches, they will not match to any other points. The rows

nd columns of these matched points in both images are highlighted

n 45-degree oblique black lines and removed from further searches.

he remaining unmatched feature points (not highlighted in Fig. 5(c))

ill be sent to the second matching step.

2) Global minimum requirement:

After the two-pass search is performed, global minimum require-

ent is applied to the remaining feature points. In this step, we find

he minimum d values for all remaining feature points. For one-to-

ne matches between two images, the smallest unique d value is

onsidered a match. This process repeats until all possible pairs are

ound. Any remaining feature points are without a matching point.

ig. 6 illustrate the process of applying global minimum requirement.

An example of this global minimum requirement applying to the

emaining feature points from the two-pass search is shown in Fig. 6.

hree global minima are found as shown in Fig. 6(a). Feature point-

in image-1 is uniquely matched to feature point-2 in image-2. The

ow and column of this feature point are highlighted with blue rect-

ngles and will not be considered for further search. The remaining

ossible matches are shown in Fig. 6(b). The next lowest distance in

he remaining points is 5. There are three possible matching pairs

ith a distance 5 but only one is a unique match (Point 7 in image-1

o Point 7 in image-2). Again, the row and column of this matched

oint are highlighted with blue rectangles and are removed from fur-

her search. The remaining possible matches are shown in Fig. 6(c).

he next lowest distance in the remaining points is 6, which matches

oint 2 in image-1 to Point 6 in image-2. After row 2 and column 6 are

emoved from this search, the only possible matches are row 8 and

olumn 4 as shown in Fig. 6(d). The same process can be performed

ntil no minimum can be found. As shown in Figs. 5 and 6, seven

atches are found. Of these 7 matches, 3 matches were found using

he two-pass search and 4 matches were found using global mini-

um requirement. Note that a global minimum can be adjusted to

erminate the search at any stage. A smaller global minimum will re-

urn fewer but better matches whereas a larger global minimum will

Page 6: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

42 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

a b

c

Fig. 5. (a) Possible feature matching pairs between feature points of image-1 and image-2 are shown. The last column shows the minimum distances from each feature point

of image-1 to all feature points of image-2. The last row shows minimum distances from each feature point of image-2 to all feature points of image-1. (b) The row minimum is

indicated by horizontal black lines. The column minimum is indicated by vertical black lines. The mutual minima are highlighted in blue cross lines. (c) The unique minimum of

each row and column is highlighted in blue cross lines, and its row and column eliminated by black diagonal lines. (For interpretation of the references to color in this figure legend,

the reader is referred to the web version of this article).

Table 1

Feature matching pairs between image-1 and image-2. Feature point-6 in image-

1 remains unmatched.

Strategies Images

Image-1 feature Image-2 feature

points points

Two pass matching strategy Point-1 Point-1

Point-4 Point-5

Point-5 Point-3

Global minimum matching strategy Point-2 Point-6

Point-3 Point-2

Point-7 Point-7

Point-8 Point-4

3

d

a

a

T

a

t

u

r

f

m

e

T

m

m

T

S

v

p

3

t

M

b

d

d

return more but lower quality matches. The matching feature pairs of

the example shown in Figs. 5 and 6 are listed in Table 1.

3. Experiments

Four experiments were performed to validate SYBA’s perfor-

mance. First, matching accuracy of two versions of SYBA descriptor is

compared with several common feature descriptors using our Idaho

dataset [31]. The second experiment compares SYBA to two versions

of BRIEF, SURF, and ALOHA descriptors (all are binary descriptors) to

demonstrate the performance of the SYBA descriptor using the pop-

ular Oxford dataset. The third experiment was performed on multi-

view stereo correspondence dataset to show the descriptor’s perfor-

mance on patched dataset [22]. The last experiment was performed

on a newly created BYU feature matching dataset to statistically ana-

lyze the descriptor’s performance.

.1. Experiment on the Idaho dataset

The dataset used for testing was the Idaho dataset [31]. The Idaho

ataset contains a total of 597 images. Fig. 7 shows two example im-

ges from the Idaho dataset. Idaho was created from real-world im-

ges taken from a downward-facing camera on an actual air flight.

he images in the Idaho dataset were taken from a camera running

t 30 frames per second, with 640 × 480 pixel resolutions. The Idaho

est set features large blank areas of fields with few features, pop-

lated urban scenes, and natural features such as mountains and

ivers. The images used for the dataset were obtained from video

rames that were one second apart to allow noticeable camera move-

ent.

To measure the performance of SYBA, we performed the same

valuation as that used on our previous Tree-BASIS algorithm [31].

hat is, a homography was computed from feature descriptors

atched between two images. Feature descriptor performance is

easured by the percentage of correct homography computations.

able 2 shows the memory usage and homography accuracy of SIFT,

URF, two versions of BRIEF, and two versions of our BASIS, and two

ersions of the new SYBA. Only BRIEF-32 has comparable result to the

roposed SYBA.

.2. Experiment on the Oxford dataset

BRIEF is considered a well-known binary descriptor and proven

o perform better than BRISK and many others in the literature.

ost publications in binary descriptors use BRIEF’s performance as a

enchmark. Its implementation is readily available for comparison. It

oes not require off-line computation and training so its performance

oes not depend on the training dataset, which allows us to perform a

Page 7: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 43

a b

c d

Fig. 6. The remaining feature points from the two-pass search (Fig. 5(c)) are input to the global minimum requirement search. The global minimum search is used to find additional

possible matches. (a) Three global minima are found (d = 3). A unique feature point is highlighted with a black and blue rectangle. The other two feature points are ignored because

they do not have a unique match. (b) The next smallest global minimum value of 5 is found. There are three different locations with the minimum value 5. Feature point-7 in

image-1 matches uniquely to feature point-7 in image-2, while the other two values do not have a unique match. (c) The next smallest global minimum value of 6 is located and

one unique match is found. (d) The last unique match is found with the global minimum value of 5. (For interpretation of the references to color in this figure legend, the reader is

referred to the web version of this article).

Fig. 7. Sample images from the Idaho dataset.

m

n

d

h

t

a

r

S

r

t

b

T

a

fi

ore subjective comparison. In addition BRIEF and SYBA are both bi-

ary descriptors and both use randomly generated patterns. BRIEF-32

escriptor [23] requires fewer comparisons compared to BRISK and

as been proven to outperform several other existing fast descrip-

ors such as SURF (except on Graffiti sequence [23]) [13], U-SURF [13]

nd Compact Signature [32]. We compared SYBA with BRIEF-32 and

BRIEF, a new binary descriptor called ALOHA [26], and the popular

URF in this work.

Six commonly used image sequences [33] were tested for accu-

acy comparisons. These six image sequences were designed to test

he robustness of feature descriptor with image perturbations such as

lurring, lighting variation, viewpoint change, or image compression.

s

hese sequences (in parentheses) include the following (example im-

ges are shown in Fig. 8):

• Image compression artifacts - UBC JPEG test sequence (Fig. 8(a)),• Illumination change - Leuven Light test sequence (Fig. 8(b)),• Image blurring - Bikes test sequence (Fig. 8(c)) and Trees test se-

quence (Fig. 8(d)),• Viewpoint change - Wall test sequence (Fig. 8(e)) and Graffiti test

sequence (Fig. 8(f)).

Each sequence consists of a 6 images. For our experiments, the

rst image in the sequence was used as the reference image. The

ubsequent 5 images were used as the tested images for matching.

Page 8: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

44 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

(a) UBC JPEG test sequence (Image compression artifacts) (b) Leuven Light test sequence (Illumination change)

(c) Bikes test sequence (Image blurring) (d) Trees test sequence (Image blurring)

(e) Wall test sequence (Viewpoint change) (f) Graffiti test sequence (Viewpoint change )

Fig. 8. Examples of images used for evaluation from the Oxford dataset. Four image transformations are evaluated: JPEG compression (a); illumination (b); image blur (c) and (d);

viewpoint change (e) and (f).

Table 2

Accuracy results and memory footprints for SYBA on the

Idaho dataset. Memory usage assumes 1000 features per im-

age are kept for each algorithm.

Algorithm Average memory usage Homography

per image (Kilobytes) accuracy

SIFT 1024.0 34.7%

SURF 512.0 73.5%

BASIS 288.0 75.5%

D-BRIEF 8.0 78.9%

TreeBASIS 2.1 79.6%

BASIS384 691.0 81.6%

BRIEF -32 32.0 83.5%

SYBA5×5 162.0 84.2%

SYBA30×30 351.0 85.1%

o

c

t

T

a

t

a

k

5

t

w

u

t

t

f

t

p

s

c

d

1

t

The image perturbations become more severe from one image to the

next in the sequence. For example, matching feature points between

the first and the third images is more challenging than matching fea-

ture points between the first and the second images. In this work,

similar to the recognition rate in [23,26], the detection rate is defined

as the ratio of the number of correct matches (Nc) to the total number

of matches found (N).

Open source computer vision library (OpenCV) implementations

f BRIEF and rBRIEF (ORB descriptor [27]) descriptors were used to

ompare feature descriptor performance. In these implementations,

he region size was fixed to 48×48 for BRIEF and 31 × 31 for rBRIEF.

o calculate the mean intensity, a 9 × 9 size region was used for BRIEF

nd a 5 × 5 size region was used for rBRIEF. As explained previously,

wo versions of SYBA were compared against BRIEF, rBRIEF, ALOHA,

nd SURF. In both SYBA versions, the feature region image size was

ept at 30 × 30, whereas two different synthetic basis image sizes

× 5 and 30 × 30 were used for SYBA5 × 5 and SYBA30 × 30, respec-

ively.

Both BRIEF and rBRIEF descriptors use SURF to detect features,

ithout any pyramidal analysis. The SURF feature detector was also

sed for SYBA in order to use the same feature points to compare

heir performance. The number of detected features ranged from 500

o 1500 depending on the image sequence. Fig. 9 illustrates the per-

ormance of two versions of SYBA and the other four methods. For

his assessment the detection thresholds were set such that all out-

uts have a nearly equal number of correspondences. Both SYBA ver-

ions were more robust than BRIEF and rBRIEF for images that are

orrupted by compression artifacts in the "UBC JPEG compression"

ataset (Fig. 9(a)). SYBA30 × 30 outperformed BRIEF by more than

5% and ALOHA and SURF by more than 30% for image pair 1|6. For

he "Leuven light" image dataset, which is corrupted by illumination

Page 9: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 45

(a) UBC JPEG

(c) Bikes

(e) Wall

(b) Leuven

(d) Trees

(f) Graffiti

Fig. 9. Comparison of detection rates for the different feature descriptors on various datasets. In the graphs, blue: SYBA5 × 5, red: SYBA30 × 30, green: BRIEF-32, purple: rBRIEF,

black: ALOHA, and dark blue: SURF. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

Page 10: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

46 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

Table 3

Percent of incorrect matches when 95% of the true matches are found. These

descriptors do not depend on the training data.

Test SYBA5×5 SYBA30 × 30 BRIEF rBRIEF SIFT

Liberty 29.24% 28.98% 34.15% 35.02% 26.27%

Notre Dame 24.52% 24.05% 29.57% 30.17% 24.09%

Yosemite 26.06% 25.12% 31.96% 32.46% 23.14%

p

s

c

b

o

c

v

3

S

d

p

H

a

p

S

s

t

t

t

B

t

3

a

t

t

a

a

e

[

w

t

p

r

t

A

p

s

9

t

f

F

change, the detection rate of SYBA30×30 is more than 3% higher than

BRIEF and ALOHA and 25% higher than SURF for image pair 1|6 (Fig.

9(b)).

For the "Bikes" and "Trees" sequences that are corrupted by im-

age blurring, SYBA outperformed all other four algorithms. The ac-

curacy difference was even more obvious for the strongest blurring

conditions. For image pair 1|6, the differences were 7% for the "Bikes"

sequence and 10% for the "Trees" sequence (Fig. 9(c)–(d)). For the

"Wall" and "Graffiti" sequences which are corrupted by viewpoint

change, BRIEF performed slightly better than SYBA only for image

pair 1|5 in the "Wall" sequence and for image pair 1|6 in the "Graf-

fiti" sequence (Fig. 9(e)–(f)). SURF performed slightly better than oth-

ers in the "Graffiti" sequence except image pair 1|5. As mentioned

previously, SURF feature detector was used for feature detection in

this study for fair comparison because both BRIEF versions use it as

well. This might have given SURF slight advantage in matching fea-

tures. SYBA performed better than other algorithms for all other im-

age pairs. It is noted that rBRIEF exhibits lower performances in all

cases because rBRIEF has been optimized for being used with orien-

tation information delivered by the detector (which was not available

in these experiments).

In order to better highlight the advantages of the SYBA descriptor

over BRIEF and rBRIEF, a recall vs. precision curve was used to further

evaluate the performance. We did not include ALOHA and SURF in

this study because their poor performance with the majority of the

image sequences. Fig. 10 shows the recall vs. precision curve using

threshold-based similarity matching (sliding the Hamming distance

from minimum to maximum) on this dataset. Again, for this assess-

ment the detection thresholds were set such that all outputs have

a nearly equal number of correspondences in the spirit of fairness.

SYBA outperformed both BRIEF algorithms for high recall values. For

90% recall, SYBA precision exceeds 92%, while BRIEF fell to 75% and

rBRIEF fell to 72%. SYBA demonstrated the best discrimination capa-

bility in this experimental setup. In order to better point out the merit

of the SYBA descriptor statistically, T-test is performed on the newly

created BYU feature matching dataset. The experimental result is dis-

cussed in Section 3.3.

Fig. 10. Recall vs. Precision curve using th

Different computing platforms have varying computational

ower, which makes it difficult to compare the processing speed

ubjectively. We used the number of operations to compare the pro-

essing speed instead. SYBA5 × 5 requires 324 (9 × 36) comparisons

etween SBIs and the feature region image and 324 summation

perations to calculate the descriptor. SYBA30 × 30 requires 312

omparisons and 312 summations but on a 30 × 30 sub-image. Both

ersions of BRIEF require a total of 1536 operations [26].

.3. Experiment on multi-view stereo correspondence dataset

In the second experiment, we evaluated the performance of the

YBA descriptor using another publically available dataset [22]. This

ataset consists of three sets of patches. These patches are sam-

led from the Statue of Liberty (New York), Notre Dame (Paris) and

alf Dome (Yosemite). Each of them contains over 400,000 scale-

nd rotation-normalized 64 × 64 patches. These patches are sam-

led around interest points detected using multi-scale Harris corners.

ample patches from the Liberty, Notre Dame, and Half Dome set are

hown in Fig. 11. This dataset also contains training data for descrip-

ors like BinBoost and D-BRIEF that require training data. Training sets

ypically contained from 10,000 to 500,000 patch pairs depending on

he applications. SYBA does not require any kind of training data.

For descriptor evaluation we compared SYBA with two versions of

RIEF descriptor, as it does not required any training data as well. In

hese implementations, patches are resized to 48 × 48 for BRIEF and

1 × 31 for rBRIEF, as the region size was fixed to 48 × 48 for BRIEF

nd 31 × 31 for rBRIEF (ORB descriptor [27]). In both SYBA versions,

he feature region image size was kept at 30 × 30. Two different syn-

hetic basis image sizes (5 × 5 and 30 × 30) were used for SYBA5 × 5

nd SYBA30 × 30. In our experiments, we resized patches to 30 × 30

nd followed the same procedure to calculate the SYBA descriptor as

xplained in Section 2.

We performed the experiments following the online instruction

34]. Instead of matching one patch to the rest of ∼400,000 patches,

e randomly selected 1000 patches for each matching processing

o reduce the computation time for comparison. One of these 1000

atches is from the match information provided online [34] and the

emaining 999 patches were selected randomly. For each patch, the

ested feature descriptor reported the best match and the non-match.

s a result of this process, we created 50,000 pairs of matching

atches and 50,000 pairs of non-matching patches for each set and

ubmitted them to the website to evaluate the performance.

For comparison of descriptors we reported the results in term of

5% error rate the same as [22]. The term 95% error rate represents

he incorrect matches obtained when 95% of the true matches are

ound. For reference, we also provided results obtained with SIFT.

or SIFT, we used the publicity available Matlab implementation of

reshold-based similarity matching.

Page 11: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 47

a b c

Fig. 11. Some image patches from the Liberty set (a), Notre Dame set (b), and Half Dome set (c).

Fig. 12. Examples of image from the BYU feature matching dataset. Four image transformations are evaluated: JPEG compression, illumination change, image blur, and viewpoint

change.

V

i

9

r

d

3

i

o

edaldi [35]. Table 3 clearly shows that SYBA5×5 provided up to 5%

mprovement over BRIEF and up to 5.5% improvement over rBRIEF at

5% error rate. It also shows that SYBA provided comparable accu-

acy as to the much larger and more computationally expensive SIFT

escriptor.

.4. Statistical T-test

The Oxford dataset does not contain more than two sequences of

mages for blurring and viewpoint change and has only one sequence

f images for compression artifact and illumination variation. It is not

Page 12: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

48 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49

(a) Compression artifacts test sequence (b) Illumination change test sequence

(c) Blurring test sequence (d) Viewpoint change test sequence

Fig. 13. Comparison of detection rates for the different feature descriptors on various datasets. In the graphs, blue: SYBA5 × 5, red: SYBA30 × 30, green: BRIEF-32, and purple:

rBRIEF. ∗ indicates p-value < 0.05. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

t

f

o

f

F

(

c

a

a

l

f

(

i

S

v

v

c

g

t

o

S

t

4

sufficient for better evaluation of descriptor performance. The multi-

view stereo correspondence dataset in Section 3.3 is not prepared for

evaluating the descriptor performance statistically. A new dataset has

been created called the BYU feature matching dataset [36]. It consists

of 20 sets of images. Each set includes four image sequences. Each im-

age sequence has six images that have gone through image transfor-

mations that include blurring, compression, illumination variation,

and viewpoint change. The first of the six images in each sequence is

the original image and the subsequent images have increasing level

of image transformation. An example of the original image from the

BYU feature matching dataset is shown in Fig. 12.

The aim is to measure the descriptor performance statistically

with this new dataset. A t-test is a statistical hypothesis test, in which

the statistically significant difference between two means of two

samples is compared. The same test procedure discussed previously

was followed for the BYU feature matching dataset. The average detec-

tion rate for each image pair (i.e. image pair 1|2, pair 1|3, and so on)

was calculated and then the difference was compared. The results of

this test help to understand descriptor performance on different sets

of image pairs for each image perturbation.

Similar to Fig. 9, Fig. 13 illustrates the performance of two versions

of SYBA and two versions of BRIEF. In this figure, pairs which have sta-

tistical significance computed with standardize p-value (< 0.05) are

denoted with an asterisk. For this assessment the detection thresh-

olds were set such that all outputs have a nearly equal number of cor-

respondences. Both versions of SYBA were more robust than BRIEF-32

and rBRIEF for images that are corrupted by compression artifacts in

the new dataset (Fig. 13(a)). SYBA30 × 30 outperformed BRIEF-32 by

more than 18% for image pair 1|6. For image corrupted by illumina-

S

ion change, the detection rate of SYBA30 × 30 is more than 9% higher

or image pair 1|6 (Fig. 13(b)).

For the image dataset that is corrupted by image blurring, SYBA

utperformed both versions of BRIEF algorithm. The accuracy dif-

erence is even more obvious for the strongest blurring conditions.

or image pair 1|6, the difference was 9% in this blurring sequence

Fig. 13(c)). For the image dataset that is corrupted by viewpoint

hange, BRIEF performed comparably with SYBA30×30 only for im-

ge pair 1|6 (Fig. 13(d)). SYBA outperformed both versions of BRIEF

lgorithms for all other image pairs. It is noted that rBRIEF exhibits

ower performances in all cases because rBRIEF has been optimized

or being used with orientation information delivered by the detector

which is not available in these experiments).

SYBA performed better for sequences with compression artifacts,

llumination change, image blurring, and small viewpoint change.

lightly lower accuracy (but still better than others) for very large

iewpoint change does not affect SYBA’s performance because the

iewpoint change is usually small for many embedded vision appli-

ations such as unmanned air vehicle pose estimation or unmanned

round vehicle autonomous navigation. SYBA30 × 30 performed bet-

er than SYBA5 × 5 but required a larger descriptor size. The size

f SBI can be easily adjusted for different application requirements.

YBA is proven to be a good candidate for embedded vision applica-

ions due to its computational simplicity and superior performance.

. Conclusion

In this paper we have presented a new feature descriptor called

YBA. This unique approach was inspired by a new compressed

Page 13: An Efficient Feature Descriptor Based on Synthetic Basis Functions and Uniqueness Matching Strategy 2016 Computer Vision and Image Understanding

A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 49

s

i

o

a

b

c

v

b

m

a

a

t

t

p

c

d

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

ensing theory. SYBA has been compared favorably to BRIEF, which

s currently arguably the best binary descriptor in the literature, and

ther more common descriptors such as SIFT, SURF, ALOHA, BASIS<

nd Tree BASIS. SYBA requires a slightly larger descriptor than BRIEF,

ut it provides better description and matching results. We have suc-

essfully applied SYBA to four different vision applications and seen

ery accurate results. These include soccer game event annotation in

roadcast video [37], tracking of multiple moving targets from an un-

anned aerial vehicle [38], drift reduction for visual odometry [39],

nd motion analysis for advanced driving assistance systems. SYBA is

n excellent candidate for hardware implementation due to its ability

o create a feature descriptor without using complex computations

hat require floating-point operations. Future work will focus on ap-

lying SYBA to various computer vision applications that require ac-

urate feature matching. Hardware implementation for the embed-

ed vision sensor will also be explored.

eferences

[1] B. Tippetts, K. Lillywhite, S. Fowers, A. Dennis, D.-J. Lee, J. Archibald, A simple,inexpensive, and effective implementation of a vision guided autonomous robot,

in: Proceedings of the SPIE Optics East, Intelligent Robots and Computer Vision

XXIV: Algorithms, Techniques, and Active Vision, 6382, 2006 63820P.[2] B. Tippetts, S. Fowers, K. Lillywhite, D.-J. Lee, J. Archibald, FPGA implementation

of a feature detection and tracking algorithm for real-time applications, in: Pro-ceedings of the 3rd International Conference on Advances in Visual Comput-

ing - Volume Part I, ser. ISVC’07, Berlin, Heidelberg, Springer-Verlag, 2007, pp.682–691.

[3] Z. Jia, A. Balasuriya, S. Challa, Vision based data fusion for autonomous vehicles

target tracking using interacting multiple dynamic models, Comput. Vis. ImageUnderst. 109 (1) (Jan 2008) 1–21.

[4] H.C. Garcia, J.R. Villalobos, G.C. Runger, An automated feature selection methodfor visual inspection systems, IEEE Trans. Autom. Sci. Eng. 3 (4) (2006) 394–406.

[5] Y. Chi, M.K.H. Leung, A general shape context framework for object identification,Comput. Vis. Image Underst. 112 (3) (Dec 2008) 324–336.

[6] K. Lillywhite, D.-J. Lee, B. Tippetts, S. Fowers, A. Dennis, B. Nelson, J. Archibald, An

embedded vision system for an unmanned four-rotor helicopter, in: Proceedingsof the SPIE Optics East, Intelligent Robots and Computer Vision XXIV: Algorithms,

Techniques, and Active Vision, 6384, 2006 63840G.[7] B.J. Tippetts, D.J. Lee, S.G. Fowers, J.K. Archibald, Real-time vision sensor for an

autonomous hovering Micro-UAV, J. Aerosp. Comput., Inf. Commun. 6 (10) (2009)570–584.

[8] V. Bonato, E. Marques, G.A. Constantinides, A parallel hardware architecture for

scale and rotation invariant feature detection, IEEE Trans. Circuits Syst. VideoTechnol. 18 (12) (Dec. 2008) 1703–1712.

[9] W.S. Fife, A.J.K. Archibald, Reconfigurable on-board vision processing forsmall autonomous vehicles, EURASIP J. Embedded Syst. 2007 (1) (Jan 2007)

33–46.[10] Z. Wei, D. Lee, B.E. Nelson, A hardware-friendly adaptive tensor based optical flow

algorithm, Lect. Notes Comput. Sci. 4842 (2007) 43.

[11] R. Fransens, C. Strecha, L. Van Gool, Optical flow based super-resolution: a proba-bilistic approach, Comput. Vis. Image Underst. 106 (1) (Apr 2007) 106–115.

[12] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Com-put. Vis. 60 (2) (Nov 2004) 91–110.

[13] H. Bay, T. Tuytelaars, L. Van Gool, SURF: speeded up robust features, ComputerVision ECCV 2006 (2006) 404–417.

[14] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), Com-put. Vis. Image Underst. 110 (3) (Jun 2008) 346–359.

[15] Y. Ke and R. Sukthankar, "PCA-SIFT: a more distinctive representation for local im-

age descriptors," in Computer Vision and Pattern Recognition, 2004. CVPR 2004.Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp. I1-506-

I1-513 Vol. 2, 2004.

[16] G. Hua, M. Brown, S. Winder, Discriminant Embedding for Local Image Descrip-tors, in: Proceedings of the IEEE 11th International Conference on Computer Vi-

sion, 2007. ICCV 2007., 2007.[17] K. Simonyan, A. Vedaldi, A. Zisserman, Learning local feature descriptors using

convex optimisation, IEEE Trans. Pattern Anal. Mach. Intell. 36 (8) (2014) 1573–1585.

[18] T. Trzcinski, V. Lepetit, Efficient discriminative projections for compact binary de-scriptors, in: Proceedings of the Computer Vision–ECCV 2012, Berlin Heidelberg,

Springer, 2012, pp. 228–242.

[19] J. Masci, D. Migliore, M. Bronstein, J. Schmidhuber, Descriptor learning for om-nidirectional image matching, in: Registration and Recognition in Images and

Videos, Berlin Heidelberg, Springer, 2014, pp. 49–62.20] J. Masci, M. Bronstein, A. Bronstein, J. Schmidhuber, Multimodal Similarity-

Preserving Hashing, IEEE Trans. Pattern Anal. Mach. Intell. 36 (4) (2014) 824–830.[21] C. Strecha, A. Bronstein, M. Bronstein, P. Fua, LDAHash: improved matching with

smaller descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2012) 66–78.

22] M. Brown, Gang Hua, S. Winder, Discriminative learning of local image descrip-tors, IEEE Trans. Pattern Anal. Mach. Intell. 33 (1) (2011) 43–57.

23] M. Calonder, V. Lepetit, C. Strecha, P. Fua, BRIEF: binary robust independent ele-mentary features, in: Proceedings of the 11th European conference on Computer

vision: Part IV, ser. ECCV’10, Berlin, Heidelberg, Springer-verlag, 2010, pp. 778–792.

24] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, P. Fua, "BRIEF: comput-

ing a local binary descriptor very fast," pattern analysis and machine intelligence,IEEE Trans. 34 (7) (July 2012) 1281–1298.

25] S. Leutenegger, M. Chli, R. Siegwart, BRISK: Binary robust invariant scalable key-points, in: Proceedings of the IEEE International Conference on Computer Vision

(ICCV), 2011, Nov. 2011, pp. 2548–2555.26] S. Saha, V. Demoulin, ALOHA: an efficient binary descriptor based on Haar fea-

tures, in: Proceedings of the 19th IEEE International Conference on Image Pro-

cessing (ICIP), 2012, 2012, pp. 2345–2348.[27] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, ORB: an efficient alternative to SIFT

or SURF, in: Proceedings of the EEE International Conference on Computer Vision(ICCV), 2011, 2011, pp. 2564–2571.

28] G. Carneiro, A. Jepson, "Multi-scale phase-based local features," in computer vi-sion and pattern recognition, 2003, in: Proceedings of the IEEE Computer Society

Conference on 2003, 1, 2003, pp. I-736–I-743.

29] H. Anderson, Both lazy and efficient: Compressed sensing and applications, San-dia National Laboratories, Albuquerque, NM, 2013, pp. 2013–7521P. Tech. Rep..

30] S.G. Fowers, D. Lee, D.A. Ventura, J.K. Archibald, The nature-inspired BASIS featuredescriptor for UAV imagery and its hardware implementation, IEEE Trans. Circuits

Syst. Video Technol. PP (99) (2012) 1.[31] S.G. Fowers, A. Desai, D.J. Lee, D. Ventura, D.K. Wilde, Efficient tree-based fea-

ture descriptor and matching algorithm, AIAA J. Aerosp. Inf. Syst. 11/9 (September

2014) 596–606.32] M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, P. Mihelich, Compact sig-

natures for high-speed interest point description and matching, in: Proceedingsof the IEEE 12th International Conference on Computer Vision, 2009, Sept. 2009,

pp. 357–364.33] HYPERLINK "http://www.robots.ox.ac.uk/∼vgg/research/affine/" (Accessed: 9

February 2014).34] http://www.cs.ubc.ca/∼mbrown/patchdata/patchdata.html (Accessed: 15 April

2015).

35] A. Vedaldi and B. Fulkerson, "Vlfeat: An an open and portable library of computervision algorithms," 2008. [Online]. Available: http://www.vlfeat.org/

36] http://roboticvision.groups.et.byu.net/Robotic_Vision/Feature/BYUFeatureMatching.html (Accessed: 11 January 2015).

[37] A. Desai, D.J. Lee, and C.N. Wilson, “Determine Absolute Soccer Ball Locationin Broadcast Video Using SYBA Descriptor,” Lecture Notes in Computer Science

(LNCS), International Symposium on Visual Computing (ISVC), Part II, LNCS 8888,

p. 588–597, Las Vegas, NV, U.S.A., December 8-10, 2014.38] A. Desai, D.J. Lee, and M. Zhang, “Using Accurate Feature Matching for Un-

manned Aerial Vehicle Ground Object Tracking,” Lecture Notes in Computer Sci-ence (LNCS), International Symposium on Visual Computing (ISVC), Part I, LNCS

8887, p. 435–444, Las Vegas, NV, U.S.A., December 8-10, 2014.39] A. Desai and D.J. Lee, “Visual odometry drift reduction using SYBA descriptor and

feature transformation,” IEEE Trans. Intell. Transp. Syst. (Revised).