an efficient feature descriptor based on synthetic basis functions and uniqueness matching strategy...
DESCRIPTION
fdjbskTRANSCRIPT
Computer Vision and Image Understanding 142 (2016) 37–49
Contents lists available at ScienceDirect
Computer Vision and Image Understanding
journal homepage: www.elsevier.com/locate/cviu
An efficient feature descriptor based on synthetic basis functions and
uniqueness matching strategy✩
Alok Desai a, Dah-Jye Lee a,∗, Dan Ventura b
a Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602, USAb Computer Science Department, Brigham Young University, Provo, Utah 84602, USA
a r t i c l e i n f o
Article history:
Received 19 January 2015
Accepted 20 September 2015
Available online 30 September 2015
Keywords:
Feature detection
Feature descriptor
Synthetic basis functions
Feature matching
a b s t r a c t
Feature matching is an important step for many computer vision applications. This paper introduces the
development of a new feature descriptor, called SYnthetic BAsis (SYBA), for feature point description and
matching. SYBA is built on the basis of the compressed sensing theory that uses synthetic basis functions
to encode or reconstruct a signal. It is a compact and efficient binary descriptor that performs a number of
similarity tests between a feature image region and a selected number of synthetic basis images and uses their
similarity test results as the feature descriptors. SYBA is compared with four well-known binary descriptors
using three benchmarking datasets as well as a newly created dataset that was designed specifically for a
more thorough statistical T-test. SYBA is less computationally complex and produces better feature matching
results than other binary descriptors. It is hardware-friendly and suitable for embedded vision applications.
© 2015 Elsevier Inc. All rights reserved.
1
i
[
c
c
f
c
a
o
e
t
c
f
o
d
l
i
s
o
t
v
p
e
i
f
o
c
t
e
m
t
t
i
T
b
t
H
i
t
k
t
a
h
1
. Introduction
Computer vision applications often involve computationally-
ntensive tasks such as target tracking [1–3], object identification
4,5], image rectification, localization, pose estimation [1,6–9], opti-
al flow [10,11] and many more. The initial steps of all these appli-
ations are the detection, description, and matching of high quality
eature points, with feature description and matching being the most
hallenging and time-consuming processes. They focus on computing
bstractions of image information that are associated with the points
f interest detected by the feature detector.
There exist a large number of feature descriptors [12–28], but not
very one of them is suitable for hardware implementation for real-
ime applications. An efficient feature descriptor for real-time appli-
ations should not be too computationally complex. To be hardware-
riendly, it should not use many square root, division, or exponential
perations that require floating-point computations. A good feature
escriptor is able to describe a feature point and measure its simi-
arity to other feature points. It allows a feature point to be correctly
dentified and matched to a feature point in another image that has
imilar characteristics. A well-known orientation and magnitudes-
f-intensity gradient-based feature descriptor is Scale Invariant Fea-
ure Transform (SIFT) [12]. It works well on intensity images and
✩ This paper has been recommended for acceptance by Michael M Bronstein.∗ Corresponding author. Fax: 801 422 0201.
E-mail addresses: [email protected] (A. Desai), [email protected] (D.-J. Lee),
[email protected] (D. Ventura).
h
b
d
fl
l
ttp://dx.doi.org/10.1016/j.cviu.2015.09.005
077-3142/© 2015 Elsevier Inc. All rights reserved.
rovides descriptors that are invariant to rotation and scaling. How-
ver, increased complexity and robustness come with the increase
n computation and storage requirements and make it not suitable
or many resource-limited platforms and real-time applications. An-
ther well-known descriptor, Speeded-UP Robust Features (SURF)
omputes descriptors using integral images and 2-D Haar Wavelet
ransform [13,14]. A minor drawback is that it requires 256 bytes to
ncode 64 floating-point values.
Ke and Sukthankar [15] developed a descriptor and applied di-
ensional reduction technique, Principal Component Analysis (PCA),
o the normalized image gradient patch. PCA-SIFT performs better
han SIFT descriptor on artificially generated data. At the same time,
t has the benefit of reducing high frequency noise in the descriptors.
he drawback is that it is not tuned to obtain a sub-space that will
e discriminative for matching [16]. Another, dimensional reduction
echnique Linear Discriminant Embedding (LDE) was developed by
ua et al. [16]. In order to perform well, LDE requires labeled train-
ng data, which are difficult to obtain. They are both suitable for fea-
ure description and require smaller descriptor size than many well-
nown approaches.
Simonyan et al. [17] and Trzcinski et al. [18] developed descrip-
ors based on training data. Both descriptors require complex oper-
tions and require computational resources that are not suitable for
ardware implementations. The accuracy of these descriptors may
e affected when they are applied to applications that are completely
ifferent from the training dataset. Simonyan et al. proposed to use
oating point and then convert it to binary which clearly causes the
oss of matching accuracy [17]. Similarly, other descriptors [19–22]
38 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
i
o
T
m
o
A
t
d
s
p
o
f
m
d
l
s
o
b
a
n
M
w
i
t
r
v
t
A
require time-consuming training process and complex operations.
Even though some descriptors [17–21] use smaller descriptor size
than SYBA, they are not suitable for low resource platforms. In re-
cent years, new feature descriptors such as Binary Robust Indepen-
dent Elementary Features (BRIEF) [23,24], Binary Robust Invariant
Scalable Keypoints (BRISK) [25], and Aggregated LOcal HAar (ALOHA)
[26] have been reported. These feature descriptors are all based on
intensity comparisons. ALOHA is based on a set of Haar-like pixel pat-
terns defined within an image patch. It performs intensity difference
tests to encode the image patch into a binary string. ALOHA requires
larger patch size and slightly fewer operations than BRIEF descriptor.
Even with larger patch size the results are not robust to viewpoint
changes and illumination changes. BRIEF descriptor trades reliability
and robustness for processing speed, consisting of a binary string that
contains the results of simple image intensity comparisons at random
pre-determined pixel locations. BRISK relies on configurable circular
sampling patterns from which it computes brightness comparisons
to represent a binary descriptor. Overall, BRISK descriptor requires
significantly more computation and slightly more storage space than
BRIEF descriptor. Both of these algorithms use faster feature detectors
and smaller descriptor size than SIFT and SURF. As another alternative
to SIFT and SURF descriptor, Rublee et al. introduced a new version
of BRIEF called rBRIEF descriptor [27]. It is based on a specific set of
256 learned pixel pairs selected for reducing correlation among the
binary tests.
Feature description has been an active area of research in com-
puter vision and machine learning. The main objective of this work is
to develop a simple and hardware-friendly feature descriptor to re-
duce the computational power requirement and increase the speed
and accuracy of feature matching. Our new descriptor is inspired by
recent work in compressed sensing [29]. Compressed sensing theory
is used to encode and decode a signal efficiently and reduces band-
width and storage requirements. It is able to uniquely describe a sig-
nal with synthetic basis functions, which makes it a perfect theory for
feature description.
ia
b c
e f
Fig. 1. (a) Example ship locations in the Battleship game. (b–d) Three 15 × 15 synthetic basi
Guess results using these three random patterns.
To understand the theory, consider the popular game of Battleship,
n which the best result can be obtained by using an adaptive strategy
f counting the number of hits in recursively subdivided half-planes.
he major drawbacks of this adaptive strategy are that it requires
emory space to record and processing power to analyze all previ-
us guesses and guess results in order to determine the next guess.
nderson developed a new compressed sensing algorithm based on
his adaptive strategy, using synthetic basis functions instead of sub-
ivided half-planes to minimize memory space requirement [29].
Using the battleship game as an analogy, the basic idea of using
ynthetic basis functions for compressed sensing is to use a random
attern (as shown in Fig. 1(b–d)) as a guess. The biggest advantage
f using synthetic basis functions is that it does not require memory
or storing previous guesses. Reducing memory space requirement
akes using synthetic basis functions an excellent choice for feature
escription for resource-limited systems.
As an example, in Fig. 1(a) the orange squares represent battleship
ocations in a 15×15 area and the black squares in Fig. 1(b–d) repre-
ent the guessed locations of the battleships. The maximum number
f different random patterns (or turns) that is required to locate all
attleships using this non-adaptive approach is surprisingly the same
s the original adaptive strategy approach (but with the benefit of sig-
ificantly reduced memory space), which is given by Anderson [29]
= C
(K ln
N
K
), (1)
here N is the number of squares on the game board (n × n) and K
s the number of queried battleship locations. C is round up number
o the nearest integer. M is the number of random patterns (turns)
equired to locate all ships and is the smallest when K = N/2. This
ery small number of random patterns is sufficient to locate all bat-
leships.
Fig. 1(a) shows that there are seven battleships in a 15 × 15 area.
s shown in Fig. 1(e), out of seven battleships, six of them (squares
n blue) are hit or guessed correctly because their locations coincide
d
g
s images with 113 ((15 × 15)/2) random black squares as the guessed locations. (e–g)
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 39
w
i
t
w
b
s
r
t
(
1
n
t
b
t
f
s
t
l
u
M
t
t
t
S
t
B
i
S
a
2
f
t
e
p
a
a
d
2
s
o
n
m
n
m
n
a
f
i
T
o
E
r
p
t
w
3
a
b
Fig. 2. (a) A sample of 30 × 30 synthetic basis image and (b) a 5 × 5 synthetic basis
image (zoomed).
t
a
t
n
…
N
m
a
p
2
o
t
c
t
F
g
p
t
t
t
g
w
I
g
g
t
t
g
t
r
t
t
p
m
S
i
s
u
i
t
ith six of the black squares in the random pattern (or turn) shown
n Fig. 1(b). One ship is missed (square in orange) using the same pat-
ern (turn) shown in Fig. 1(b) because its location coincides with a
hite square. Similarly, there are six and five battleships (squares in
lue) are hit or guessed correctly using random patterns (or turns)
hown in Fig. 1(c) and (d). Their results are shown in Fig. 1(f) and (g),
espectively. According to Eq. 1, the number of random patterns or
urns required to locate all battleships in a guessing game of this size
15×15), using the unique (non-repetitive) basis patterns (K = N/2), is
13 ln (225/113) or 78.
Inspired by this compressed sensing theory, we have developed a
ew descriptor algorithm using synthetic basis functions, called SY-
hetic BAsis (SYBA). It uses a number of randomly generated synthetic
asis images (SBIs) as the guesses in a “battleship game” to measure
he similarity between a small image region surrounding a detected
eature point, called a feature region image (FRI), and the SBIs. The
imilarities between an FRI in the image and all SBIs are then used as
he feature descriptor.
This work involves a unique way of measuring descriptor simi-
arity in order to match similar features between two images. This
nique way of measuring descriptor similarity is less complex than
ahalanobis and Euclidean methods. This work also includes a fea-
ure matching strategy that contains a two-pass search to enforce
he uniqueness constraint and global minimum requirement to de-
ermine the best matching feature pairs. The new descriptor, called
YBA, is introduced in Section 2. Experimental results based on fea-
ure matching comparison with two widely used binary descriptors,
RIEF-32 and rBRIEF, are presented in Section 3. In Section 3, also
ncludes the statistical T-test experiments on newly created dataset.
ection 4 summaries the paper with discussion of the performance
nd ideas for future work.
. SYBA descriptor algorithm
Well-known binary descriptors are often used for benchmarking
eature description performance. BRIEF descriptor compares the in-
ensity of two randomly selected pixels and uses the intensity differ-
nce as a descriptor [23]. Rather than intensity difference, SYBA com-
ares a feature region image with a number of synthetic basis images
nd uses the similarity measures as the feature descriptor. The cre-
tion of the synthetic basis images and the computation of the SYBA
escriptor are the two major parts of this algorithm.
.1. Synthetic basis images
Synthetic basis images are sparse images. They differ from the ba-
is dictionary images created from natural or man-made objects in
ur previous work [30,31]. All basis dictionary images created from
atural or man-made objects are not always sparse. There are two
ajor differences between the basis functions created from random
umbers (i.e. synthetic) and basis images created from natural/man-
ade objects. The computation time for basis images created from
atural and man-made objects require several hours, while basis im-
ges created using random numbers (i.e. synthetic) require at most a
ew seconds. Memory space required to create synthetic basis images
s far less than basis images created from natural/man-made objects.
hese are the reasons the synthetic basis are much better.
The number of synthetic basis images (M) represents the number
f “turns” as in the battleship game and is calculated according to
q. (1). Of course, a larger number of synthetic basis images are
equired for a larger pixel region surrounding the detected feature
oints or feature region image (FRI). The maximum number of syn-
hetic basis images required is 9 (when K = N/2) for a 5 × 5 FRI,
hereas a 30 × 30 FRI requires 312 synthetic basis images.
Two examples of synthetic basis images are shown in Fig. 2. One is
0 × 30 and the other is 5 × 5. Synthetic basis images similar to these
wo are used for SYBA descriptor calculation. The first step to creating
synthetic basis image is to determine its dimension (N = n×n). Once
he dimension of the synthetic basis image is determined, K (=N/2)
ormally distributed pseudo random numbers are generated from [1,
, N] to represent the black squares in the synthetic basis image.
ote that even for small SBIs (e.g., 5 × 5) that are generated in this
anner, all SBIs in one set (M) will be uniquely represented (with
probability equals to 0.99999893), and thus no specially designed
atterns are needed.
.2. Descriptors calculation and complexity
The main function of the SYBA descriptor is to “describe” the FRI
f an image feature point in a unique way so that feature points be-
ween two images can be matched. SYBA descriptor does not require
omplex descriptor calculations and yet is able to provide good fea-
ure matching accuracy. SYBA descriptor algorithm is illustrated in
ig. 3.
The first step of the SYBA algorithm is to detect feature points and
enerate a feature list. Any feature detector can be used for this pur-
ose. This work uses SURF as the feature detector because it is several
imes faster than SIFT [13]. For each feature on the feature list, its fea-
ure region is cropped and saved as a 30 × 30 FRI. The second step of
he algorithm is to calculate the average intensity (g) of the FRI as
=∑
x,y I(x, y)
p, (2)
here p is the number of pixels in the image (900 in this case), and
(x, y) is the intensity value at pixel location (x, y). A binary FRI is then
enerated based on the average intensity g. If I(x, y) is brighter than
, the binary FRI at the pixel location (x, y) is set to one, otherwise
he value is set to zero. The last step of the algorithm is to calculate
he similarity between the binary FRI and each of the SBIs in order to
enerate a descriptor for each binary FRI on the feature list.
A unique SYBA similarity measure (SSM) is developed to measure
he similarity between the FRI and a selected number of SBIs. The
esult of SSM represents an accurate feature description because it
akes into account the spatial and structural information of the fea-
ure region. The output of the SSM is then used to describe the feature
oint for feature matching as shown Fig. 3.
For the experiments, SYBA with two different sizes was imple-
ented. One was computed with SBIs of size 5 × 5 and named
YBA5 × 5. The maximum number of SBIs required for SYBA5 × 5
s 9 when half of the pixels (N = 25 and K = 13) are black. Fig. 4(a)
hows an example of 9 5 × 5 SBIs labeled from 1 to 9. The other size
sed for experiments was 30 × 30 and named SYBA30 × 30. The max-
mum number of SBIs required for SYBA30 × 30 is 312 when half of
he pixels (N = 900 and K = 450) are black. Once the required SBIs
40 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
Fig. 3. The flowchart of the SYBA descriptor algorithm.
a
c
b
d
Fig. 4. (a) Nine 5 × 5 synthetic basis images labeled 1–9, (b) A 30 × 30 feature region
image (FRI) that is divided into 36 5 × 5 subregions, (c) Similarity measure between
the highlighted 5 × 5 subregion and the first SBI, and (d) Similarity measure between
the highlighted 5 × 5 subregion and the second SBI.
s
t
h
i
b
r
b
g
F
h
s
h
r
3
s
b
u
o
4
N
(
s
are generated, they should not be changed in order to use the same
patterns to test the next image.
Fig. 4 shows an example of how the SSM is calculated between a
30 × 30 FRI and SYBA5 × 5. The SSM between a 30 × 30 binary FRI
and SYBA30 × 30 is be calculated in a similar manner. The first step of
the SSM calculation is to divide the 30 × 30 binary FRI into 36 equal-
ized 5 × 5 pixel subregions (as shown in Fig. 4(b)). The next step is
o count how many pixels in the 5 × 5 subregion of the binary FRI are
it by each of the 9 SBIs in Fig. 4(a). Each of these 36 5 × 5 subregions
s compared with each of the 9 5 × 5 SBIs and the number of times
oth contain a black pixel at the same location is counted as a hit.
The maximum possible number of hits for comparing a 5 × 5 sub-
egion with a 5 × 5 SBI is 13 because there are only 13 (K = 25/2)
lack pixels in each SBI. For example, the highlighted 5 × 5 subre-
ion shown in Fig. 4 (b) compared with SBI #1 has 5 hits (shown in
ig. 4(c)). The same subregion in Fig. 4(b) compared with the SBI #2
as 4 hits (shown in Fig. 4(d)). After comparing with all 9 SBIs, each
ubregion will yield 9 numbers ranging from 0 to 13. The number of
its in each subregion is stacked in the feature descriptor. Each sub-
egion will use these 9 numbers as its feature descriptor. Therefore, a
0 × 30 FRI with 36 5 × 5 subregions will require a feature descriptor
ize of 36 (5 × 5 subregions) × 9 (5 × 5 SBIs) × 4 bits (0–13) = 1,296
its or 162 bytes.
For the SYBA30 × 30 implementation, the entire 30 × 30 FRI is
sed to compare with 312 30 × 30-pixel SBIs. The maximum number
f hits between the FRI and each SBI is 450 because there are only
50 (K = 900/2) black pixels in the entire 30 × 30 SBI. The resulting
feature descriptor size for SYBA30 × 30 is 1(FRI 30 × 30 region) ×312
(30 × 30 SBIs) × 9 bits (0–450) = 2,808 bits or 351 bytes.
SYBA descriptor size can be easily adjusted by changing the sizes
of SBI and FRI. A generalized approach that describes SYBA descriptor
size is as follow. Choose the FRI dimension F first and then choose
the SBI dimension S to be an integer factor Q of F so that S×Q = F.
ote that M is a function of K and N (Eq. (1)), K is a function of N
K = N/2), N divisible by S, and Q = F/S. These relationships allow
complete parameterization of SYBA in terms of just F (the dimension
of an FRI) and S (the dimension of an SBI). The SYBA descriptor size is
Q×Q (# of subregions) × M (# of SBIs) × log2K bits.
Although the compressed sensing theory is able to uniquely repre-
ent a signal, the feature representation may not be unique due to the
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 41
w
s
t
o
c
2
p
r
a
g
a
t
r
o
s
d
w
a
i
f
f
t
a
d
t
s
i
a
i
m
t
r
t
t
i
i
p
h
p
t
l
u
c
t
t
f
f
r
s
5
p
i
l
p
i
d
d
p
p
a
t
d
f
l
e
F
l
b
o
a
i
a
m
i
F
5
r
u
a
i
T
w
m
t
o
c
f
F
r
T
3
r
a
p
t
w
t
p
t
T
P
r
c
u
m
t
m
t
t
ay the descriptor is calculated as shown in Fig. 4. We do lose some
patial uniqueness by only counting the number of “hits” and not
racking where the “hits” are like in the Battleship game. The trade-
ff we make to simplify our descriptor calculation sacrifices (statisti-
ally) the uniqueness a little.
.3. Matching features
The SYBA descriptor is used to find the best matching feature
oints between two image frames. In this process, 324 (36 (5 × 5 sub-
egions) × 9 (5 × 5 SBIs)) descriptor elements ranging from 0 to 13
re used as feature descriptors for SYBA5 × 5, and 312 (1 (30 × 30 re-
ion) × 312 (30 × 30 SBIs)) descriptor elements ranging from 0 to 450
re used as feature descriptors for SYBA30 × 30. To minimize compu-
ational complexity, for determining similarity we use the L1 norm
ather than other common comparison metrics such as Euclidean
r Mahalanobis distance, which require complex operations such as
quare and square root.
The L1 norm is computed as the sum of absolute differences:
=n∑
i=1
|xi − yi| (3)
here, xi is the score for region of the feature point in the first im-
ge, and yi is the score for region of the feature point in the second
mage, n is total number of regions used in the basis comparison (324
or SYBA5 × 5 and 312 for SYBA30 × 30). The similarity between two
eatures is represented by d and the smallest L1 norm (d) represents
he best match of features between two images. Eq. (4) shows an ex-
mple of SYBA descriptor calculation. Each row represents a feature
escriptor. The d value for Eq. (4) is 5 between the two example fea-
ure descriptors.
(4)
To match the features, we first determine point-to-point corre-
pondences using the similarity measure. We select each descriptor
n the first image and compare it to all descriptors in the second im-
ge by calculating the d value as shown above. The remaining process
s divided into two steps: (1) two-pass search, and (2) global mini-
um requirement. First we use a two-pass search to find feature pairs
hat uniquely match to each other. We then use a global minimum
equirement to screen for possible good matching feature pairs from
he remaining feature points.
1) Two-Pass Search:
In this step, the first pass is to find the minimum distance d be-
ween one feature in the first image and all features in the second
mage. The feature that has the smallest distance in the second image
s considered a match to the feature in the first image. The second
ass it to confirm that the matched feature in the second image also
as the shortest distance to its match in the first image. If the second
ass fails to confirm the reciprocal of the shortest distance between
he two, then they are not matched. They will remain on the feature
ist and to be tested in the second step. This two-pass search ensures a
nique one-to-one match and eliminates any possible ambiguity. Be-
ause our aim is to find unique matching feature point pairs between
wo images, a feature that matches to two or more features that have
he same shortest distance is not considered and will remain on the
eature list. After the completion of the two-pass search, the matched
eature pairs are excluded from any further matching processes. The
emaining feature points in both images are then further tested in the
econd step.
Fig. 5 shows an example of the two-pass search. As shown in Fig.
(a), there are 8 feature points in image-1 (vertical) and 7 feature
oints in image-2 (horizontal). The similarity between feature points
n image-1 and feature points in image-2 is calculated using Eq. 3. The
ast (right) column shows the minimum d value of each row by com-
aring each feature point in image-1 with all of the feature points
n image-2. For feature point-3 of image-1, there are two smallest
istances of 3 in image-2: feature point-2 and feature point-3. This
istance is shown as (3, 3) in the last column. Also for the feature
oint-7 in image-1, there are two equal d values (2, 2) for feature
oint-1 and feature point-3 in image-2. The row minimum d values
re highlighted by horizontal black lines in Fig. 5(b). The last (bot-
om) row shows the minimum d value of each column. The minimum
value is found by comparing each feature point of image-2 with all
eature points of image-1. The column minimum d values are high-
ighted by vertical black lines in Fig. 5(b). Feature points are consid-
red uniquely matched if they have the mutually shortest distance.
our pairs of these mutual matches are highlighted in blue crossed
ines in Fig. 5(b).
Because the aim is to find unique matching feature point pairs
etween two images in this step, any matches that have more than
ne smallest d value are not considered a match. Point 3 in image-1
nd Point 2 in image-2 are not considered a match because Point 3 in
mage-1 and Point 3 in image-2 also have a minimum distance 3. Only
unique smallest d value in the same row or column can be called a
atching pair. Three unique matching pairs between feature points
n image-1 and feature points in image-2 are highlighted in blue in
ig. 5(c) using this two-pass search. Feature point numbers 1, 4, and
in image-1 match to feature point numbers 1, 5, and 3 in image-2,
espectively. Since these feature points have been paired with their
nique matches, they will not match to any other points. The rows
nd columns of these matched points in both images are highlighted
n 45-degree oblique black lines and removed from further searches.
he remaining unmatched feature points (not highlighted in Fig. 5(c))
ill be sent to the second matching step.
2) Global minimum requirement:
After the two-pass search is performed, global minimum require-
ent is applied to the remaining feature points. In this step, we find
he minimum d values for all remaining feature points. For one-to-
ne matches between two images, the smallest unique d value is
onsidered a match. This process repeats until all possible pairs are
ound. Any remaining feature points are without a matching point.
ig. 6 illustrate the process of applying global minimum requirement.
An example of this global minimum requirement applying to the
emaining feature points from the two-pass search is shown in Fig. 6.
hree global minima are found as shown in Fig. 6(a). Feature point-
in image-1 is uniquely matched to feature point-2 in image-2. The
ow and column of this feature point are highlighted with blue rect-
ngles and will not be considered for further search. The remaining
ossible matches are shown in Fig. 6(b). The next lowest distance in
he remaining points is 5. There are three possible matching pairs
ith a distance 5 but only one is a unique match (Point 7 in image-1
o Point 7 in image-2). Again, the row and column of this matched
oint are highlighted with blue rectangles and are removed from fur-
her search. The remaining possible matches are shown in Fig. 6(c).
he next lowest distance in the remaining points is 6, which matches
oint 2 in image-1 to Point 6 in image-2. After row 2 and column 6 are
emoved from this search, the only possible matches are row 8 and
olumn 4 as shown in Fig. 6(d). The same process can be performed
ntil no minimum can be found. As shown in Figs. 5 and 6, seven
atches are found. Of these 7 matches, 3 matches were found using
he two-pass search and 4 matches were found using global mini-
um requirement. Note that a global minimum can be adjusted to
erminate the search at any stage. A smaller global minimum will re-
urn fewer but better matches whereas a larger global minimum will
42 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
a b
c
Fig. 5. (a) Possible feature matching pairs between feature points of image-1 and image-2 are shown. The last column shows the minimum distances from each feature point
of image-1 to all feature points of image-2. The last row shows minimum distances from each feature point of image-2 to all feature points of image-1. (b) The row minimum is
indicated by horizontal black lines. The column minimum is indicated by vertical black lines. The mutual minima are highlighted in blue cross lines. (c) The unique minimum of
each row and column is highlighted in blue cross lines, and its row and column eliminated by black diagonal lines. (For interpretation of the references to color in this figure legend,
the reader is referred to the web version of this article).
Table 1
Feature matching pairs between image-1 and image-2. Feature point-6 in image-
1 remains unmatched.
Strategies Images
Image-1 feature Image-2 feature
points points
Two pass matching strategy Point-1 Point-1
Point-4 Point-5
Point-5 Point-3
Global minimum matching strategy Point-2 Point-6
Point-3 Point-2
Point-7 Point-7
Point-8 Point-4
3
d
a
a
T
a
t
u
r
f
m
e
T
m
m
T
S
v
p
3
t
M
b
d
d
return more but lower quality matches. The matching feature pairs of
the example shown in Figs. 5 and 6 are listed in Table 1.
3. Experiments
Four experiments were performed to validate SYBA’s perfor-
mance. First, matching accuracy of two versions of SYBA descriptor is
compared with several common feature descriptors using our Idaho
dataset [31]. The second experiment compares SYBA to two versions
of BRIEF, SURF, and ALOHA descriptors (all are binary descriptors) to
demonstrate the performance of the SYBA descriptor using the pop-
ular Oxford dataset. The third experiment was performed on multi-
view stereo correspondence dataset to show the descriptor’s perfor-
mance on patched dataset [22]. The last experiment was performed
on a newly created BYU feature matching dataset to statistically ana-
lyze the descriptor’s performance.
.1. Experiment on the Idaho dataset
The dataset used for testing was the Idaho dataset [31]. The Idaho
ataset contains a total of 597 images. Fig. 7 shows two example im-
ges from the Idaho dataset. Idaho was created from real-world im-
ges taken from a downward-facing camera on an actual air flight.
he images in the Idaho dataset were taken from a camera running
t 30 frames per second, with 640 × 480 pixel resolutions. The Idaho
est set features large blank areas of fields with few features, pop-
lated urban scenes, and natural features such as mountains and
ivers. The images used for the dataset were obtained from video
rames that were one second apart to allow noticeable camera move-
ent.
To measure the performance of SYBA, we performed the same
valuation as that used on our previous Tree-BASIS algorithm [31].
hat is, a homography was computed from feature descriptors
atched between two images. Feature descriptor performance is
easured by the percentage of correct homography computations.
able 2 shows the memory usage and homography accuracy of SIFT,
URF, two versions of BRIEF, and two versions of our BASIS, and two
ersions of the new SYBA. Only BRIEF-32 has comparable result to the
roposed SYBA.
.2. Experiment on the Oxford dataset
BRIEF is considered a well-known binary descriptor and proven
o perform better than BRISK and many others in the literature.
ost publications in binary descriptors use BRIEF’s performance as a
enchmark. Its implementation is readily available for comparison. It
oes not require off-line computation and training so its performance
oes not depend on the training dataset, which allows us to perform a
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 43
a b
c d
Fig. 6. The remaining feature points from the two-pass search (Fig. 5(c)) are input to the global minimum requirement search. The global minimum search is used to find additional
possible matches. (a) Three global minima are found (d = 3). A unique feature point is highlighted with a black and blue rectangle. The other two feature points are ignored because
they do not have a unique match. (b) The next smallest global minimum value of 5 is found. There are three different locations with the minimum value 5. Feature point-7 in
image-1 matches uniquely to feature point-7 in image-2, while the other two values do not have a unique match. (c) The next smallest global minimum value of 6 is located and
one unique match is found. (d) The last unique match is found with the global minimum value of 5. (For interpretation of the references to color in this figure legend, the reader is
referred to the web version of this article).
Fig. 7. Sample images from the Idaho dataset.
m
n
d
h
t
a
r
S
r
t
b
T
a
fi
ore subjective comparison. In addition BRIEF and SYBA are both bi-
ary descriptors and both use randomly generated patterns. BRIEF-32
escriptor [23] requires fewer comparisons compared to BRISK and
as been proven to outperform several other existing fast descrip-
ors such as SURF (except on Graffiti sequence [23]) [13], U-SURF [13]
nd Compact Signature [32]. We compared SYBA with BRIEF-32 and
BRIEF, a new binary descriptor called ALOHA [26], and the popular
URF in this work.
Six commonly used image sequences [33] were tested for accu-
acy comparisons. These six image sequences were designed to test
he robustness of feature descriptor with image perturbations such as
lurring, lighting variation, viewpoint change, or image compression.
shese sequences (in parentheses) include the following (example im-
ges are shown in Fig. 8):
• Image compression artifacts - UBC JPEG test sequence (Fig. 8(a)),• Illumination change - Leuven Light test sequence (Fig. 8(b)),• Image blurring - Bikes test sequence (Fig. 8(c)) and Trees test se-
quence (Fig. 8(d)),• Viewpoint change - Wall test sequence (Fig. 8(e)) and Graffiti test
sequence (Fig. 8(f)).
Each sequence consists of a 6 images. For our experiments, the
rst image in the sequence was used as the reference image. The
ubsequent 5 images were used as the tested images for matching.
44 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
(a) UBC JPEG test sequence (Image compression artifacts) (b) Leuven Light test sequence (Illumination change)
(c) Bikes test sequence (Image blurring) (d) Trees test sequence (Image blurring)
(e) Wall test sequence (Viewpoint change) (f) Graffiti test sequence (Viewpoint change )
Fig. 8. Examples of images used for evaluation from the Oxford dataset. Four image transformations are evaluated: JPEG compression (a); illumination (b); image blur (c) and (d);
viewpoint change (e) and (f).
Table 2
Accuracy results and memory footprints for SYBA on the
Idaho dataset. Memory usage assumes 1000 features per im-
age are kept for each algorithm.
Algorithm Average memory usage Homography
per image (Kilobytes) accuracy
SIFT 1024.0 34.7%
SURF 512.0 73.5%
BASIS 288.0 75.5%
D-BRIEF 8.0 78.9%
TreeBASIS 2.1 79.6%
BASIS384 691.0 81.6%
BRIEF -32 32.0 83.5%
SYBA5×5 162.0 84.2%
SYBA30×30 351.0 85.1%
o
c
t
T
a
t
a
k
5
t
w
u
t
t
f
t
p
s
c
d
1
t
The image perturbations become more severe from one image to the
next in the sequence. For example, matching feature points between
the first and the third images is more challenging than matching fea-
ture points between the first and the second images. In this work,
similar to the recognition rate in [23,26], the detection rate is defined
as the ratio of the number of correct matches (Nc) to the total number
of matches found (N).
Open source computer vision library (OpenCV) implementations
f BRIEF and rBRIEF (ORB descriptor [27]) descriptors were used to
ompare feature descriptor performance. In these implementations,
he region size was fixed to 48×48 for BRIEF and 31 × 31 for rBRIEF.
o calculate the mean intensity, a 9 × 9 size region was used for BRIEF
nd a 5 × 5 size region was used for rBRIEF. As explained previously,
wo versions of SYBA were compared against BRIEF, rBRIEF, ALOHA,
nd SURF. In both SYBA versions, the feature region image size was
ept at 30 × 30, whereas two different synthetic basis image sizes
× 5 and 30 × 30 were used for SYBA5 × 5 and SYBA30 × 30, respec-
ively.
Both BRIEF and rBRIEF descriptors use SURF to detect features,
ithout any pyramidal analysis. The SURF feature detector was also
sed for SYBA in order to use the same feature points to compare
heir performance. The number of detected features ranged from 500
o 1500 depending on the image sequence. Fig. 9 illustrates the per-
ormance of two versions of SYBA and the other four methods. For
his assessment the detection thresholds were set such that all out-
uts have a nearly equal number of correspondences. Both SYBA ver-
ions were more robust than BRIEF and rBRIEF for images that are
orrupted by compression artifacts in the "UBC JPEG compression"
ataset (Fig. 9(a)). SYBA30 × 30 outperformed BRIEF by more than
5% and ALOHA and SURF by more than 30% for image pair 1|6. For
he "Leuven light" image dataset, which is corrupted by illumination
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 45
(a) UBC JPEG
(c) Bikes
(e) Wall
(b) Leuven
(d) Trees
(f) Graffiti
Fig. 9. Comparison of detection rates for the different feature descriptors on various datasets. In the graphs, blue: SYBA5 × 5, red: SYBA30 × 30, green: BRIEF-32, purple: rBRIEF,
black: ALOHA, and dark blue: SURF. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).
46 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
Table 3
Percent of incorrect matches when 95% of the true matches are found. These
descriptors do not depend on the training data.
Test SYBA5×5 SYBA30 × 30 BRIEF rBRIEF SIFT
Liberty 29.24% 28.98% 34.15% 35.02% 26.27%
Notre Dame 24.52% 24.05% 29.57% 30.17% 24.09%
Yosemite 26.06% 25.12% 31.96% 32.46% 23.14%
p
s
c
b
o
c
v
3
S
d
p
H
a
p
S
s
t
t
t
B
t
3
a
t
t
a
a
e
[
w
t
p
r
t
A
p
s
9
t
f
F
change, the detection rate of SYBA30×30 is more than 3% higher than
BRIEF and ALOHA and 25% higher than SURF for image pair 1|6 (Fig.
9(b)).
For the "Bikes" and "Trees" sequences that are corrupted by im-
age blurring, SYBA outperformed all other four algorithms. The ac-
curacy difference was even more obvious for the strongest blurring
conditions. For image pair 1|6, the differences were 7% for the "Bikes"
sequence and 10% for the "Trees" sequence (Fig. 9(c)–(d)). For the
"Wall" and "Graffiti" sequences which are corrupted by viewpoint
change, BRIEF performed slightly better than SYBA only for image
pair 1|5 in the "Wall" sequence and for image pair 1|6 in the "Graf-
fiti" sequence (Fig. 9(e)–(f)). SURF performed slightly better than oth-
ers in the "Graffiti" sequence except image pair 1|5. As mentioned
previously, SURF feature detector was used for feature detection in
this study for fair comparison because both BRIEF versions use it as
well. This might have given SURF slight advantage in matching fea-
tures. SYBA performed better than other algorithms for all other im-
age pairs. It is noted that rBRIEF exhibits lower performances in all
cases because rBRIEF has been optimized for being used with orien-
tation information delivered by the detector (which was not available
in these experiments).
In order to better highlight the advantages of the SYBA descriptor
over BRIEF and rBRIEF, a recall vs. precision curve was used to further
evaluate the performance. We did not include ALOHA and SURF in
this study because their poor performance with the majority of the
image sequences. Fig. 10 shows the recall vs. precision curve using
threshold-based similarity matching (sliding the Hamming distance
from minimum to maximum) on this dataset. Again, for this assess-
ment the detection thresholds were set such that all outputs have
a nearly equal number of correspondences in the spirit of fairness.
SYBA outperformed both BRIEF algorithms for high recall values. For
90% recall, SYBA precision exceeds 92%, while BRIEF fell to 75% and
rBRIEF fell to 72%. SYBA demonstrated the best discrimination capa-
bility in this experimental setup. In order to better point out the merit
of the SYBA descriptor statistically, T-test is performed on the newly
created BYU feature matching dataset. The experimental result is dis-
cussed in Section 3.3.
Fig. 10. Recall vs. Precision curve using th
Different computing platforms have varying computational
ower, which makes it difficult to compare the processing speed
ubjectively. We used the number of operations to compare the pro-
essing speed instead. SYBA5 × 5 requires 324 (9 × 36) comparisons
etween SBIs and the feature region image and 324 summation
perations to calculate the descriptor. SYBA30 × 30 requires 312
omparisons and 312 summations but on a 30 × 30 sub-image. Both
ersions of BRIEF require a total of 1536 operations [26].
.3. Experiment on multi-view stereo correspondence dataset
In the second experiment, we evaluated the performance of the
YBA descriptor using another publically available dataset [22]. This
ataset consists of three sets of patches. These patches are sam-
led from the Statue of Liberty (New York), Notre Dame (Paris) and
alf Dome (Yosemite). Each of them contains over 400,000 scale-
nd rotation-normalized 64 × 64 patches. These patches are sam-
led around interest points detected using multi-scale Harris corners.
ample patches from the Liberty, Notre Dame, and Half Dome set are
hown in Fig. 11. This dataset also contains training data for descrip-
ors like BinBoost and D-BRIEF that require training data. Training sets
ypically contained from 10,000 to 500,000 patch pairs depending on
he applications. SYBA does not require any kind of training data.
For descriptor evaluation we compared SYBA with two versions of
RIEF descriptor, as it does not required any training data as well. In
hese implementations, patches are resized to 48 × 48 for BRIEF and
1 × 31 for rBRIEF, as the region size was fixed to 48 × 48 for BRIEF
nd 31 × 31 for rBRIEF (ORB descriptor [27]). In both SYBA versions,
he feature region image size was kept at 30 × 30. Two different syn-
hetic basis image sizes (5 × 5 and 30 × 30) were used for SYBA5 × 5
nd SYBA30 × 30. In our experiments, we resized patches to 30 × 30
nd followed the same procedure to calculate the SYBA descriptor as
xplained in Section 2.
We performed the experiments following the online instruction
34]. Instead of matching one patch to the rest of ∼400,000 patches,
e randomly selected 1000 patches for each matching processing
o reduce the computation time for comparison. One of these 1000
atches is from the match information provided online [34] and the
emaining 999 patches were selected randomly. For each patch, the
ested feature descriptor reported the best match and the non-match.
s a result of this process, we created 50,000 pairs of matching
atches and 50,000 pairs of non-matching patches for each set and
ubmitted them to the website to evaluate the performance.
For comparison of descriptors we reported the results in term of
5% error rate the same as [22]. The term 95% error rate represents
he incorrect matches obtained when 95% of the true matches are
ound. For reference, we also provided results obtained with SIFT.
or SIFT, we used the publicity available Matlab implementation of
reshold-based similarity matching.
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 47
a b c
Fig. 11. Some image patches from the Liberty set (a), Notre Dame set (b), and Half Dome set (c).
Fig. 12. Examples of image from the BYU feature matching dataset. Four image transformations are evaluated: JPEG compression, illumination change, image blur, and viewpoint
change.
V
i
9
r
d
3
i
o
edaldi [35]. Table 3 clearly shows that SYBA5×5 provided up to 5%
mprovement over BRIEF and up to 5.5% improvement over rBRIEF at
5% error rate. It also shows that SYBA provided comparable accu-
acy as to the much larger and more computationally expensive SIFT
escriptor.
.4. Statistical T-test
The Oxford dataset does not contain more than two sequences of
mages for blurring and viewpoint change and has only one sequence
f images for compression artifact and illumination variation. It is not
48 A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49
(a) Compression artifacts test sequence (b) Illumination change test sequence
(c) Blurring test sequence (d) Viewpoint change test sequence
Fig. 13. Comparison of detection rates for the different feature descriptors on various datasets. In the graphs, blue: SYBA5 × 5, red: SYBA30 × 30, green: BRIEF-32, and purple:
rBRIEF. ∗ indicates p-value < 0.05. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).
t
f
o
f
F
(
c
a
a
l
f
(
i
S
v
v
c
g
t
o
S
t
4
sufficient for better evaluation of descriptor performance. The multi-
view stereo correspondence dataset in Section 3.3 is not prepared for
evaluating the descriptor performance statistically. A new dataset has
been created called the BYU feature matching dataset [36]. It consists
of 20 sets of images. Each set includes four image sequences. Each im-
age sequence has six images that have gone through image transfor-
mations that include blurring, compression, illumination variation,
and viewpoint change. The first of the six images in each sequence is
the original image and the subsequent images have increasing level
of image transformation. An example of the original image from the
BYU feature matching dataset is shown in Fig. 12.
The aim is to measure the descriptor performance statistically
with this new dataset. A t-test is a statistical hypothesis test, in which
the statistically significant difference between two means of two
samples is compared. The same test procedure discussed previously
was followed for the BYU feature matching dataset. The average detec-
tion rate for each image pair (i.e. image pair 1|2, pair 1|3, and so on)
was calculated and then the difference was compared. The results of
this test help to understand descriptor performance on different sets
of image pairs for each image perturbation.
Similar to Fig. 9, Fig. 13 illustrates the performance of two versions
of SYBA and two versions of BRIEF. In this figure, pairs which have sta-
tistical significance computed with standardize p-value (< 0.05) are
denoted with an asterisk. For this assessment the detection thresh-
olds were set such that all outputs have a nearly equal number of cor-
respondences. Both versions of SYBA were more robust than BRIEF-32
and rBRIEF for images that are corrupted by compression artifacts in
the new dataset (Fig. 13(a)). SYBA30 × 30 outperformed BRIEF-32 by
more than 18% for image pair 1|6. For image corrupted by illumina-
Sion change, the detection rate of SYBA30 × 30 is more than 9% higher
or image pair 1|6 (Fig. 13(b)).
For the image dataset that is corrupted by image blurring, SYBA
utperformed both versions of BRIEF algorithm. The accuracy dif-
erence is even more obvious for the strongest blurring conditions.
or image pair 1|6, the difference was 9% in this blurring sequence
Fig. 13(c)). For the image dataset that is corrupted by viewpoint
hange, BRIEF performed comparably with SYBA30×30 only for im-
ge pair 1|6 (Fig. 13(d)). SYBA outperformed both versions of BRIEF
lgorithms for all other image pairs. It is noted that rBRIEF exhibits
ower performances in all cases because rBRIEF has been optimized
or being used with orientation information delivered by the detector
which is not available in these experiments).
SYBA performed better for sequences with compression artifacts,
llumination change, image blurring, and small viewpoint change.
lightly lower accuracy (but still better than others) for very large
iewpoint change does not affect SYBA’s performance because the
iewpoint change is usually small for many embedded vision appli-
ations such as unmanned air vehicle pose estimation or unmanned
round vehicle autonomous navigation. SYBA30 × 30 performed bet-
er than SYBA5 × 5 but required a larger descriptor size. The size
f SBI can be easily adjusted for different application requirements.
YBA is proven to be a good candidate for embedded vision applica-
ions due to its computational simplicity and superior performance.
. Conclusion
In this paper we have presented a new feature descriptor called
YBA. This unique approach was inspired by a new compressed
A. Desai et al. / Computer Vision and Image Understanding 142 (2016) 37–49 49
s
i
o
a
b
c
v
b
m
a
a
t
t
p
c
d
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
ensing theory. SYBA has been compared favorably to BRIEF, which
s currently arguably the best binary descriptor in the literature, and
ther more common descriptors such as SIFT, SURF, ALOHA, BASIS<
nd Tree BASIS. SYBA requires a slightly larger descriptor than BRIEF,
ut it provides better description and matching results. We have suc-
essfully applied SYBA to four different vision applications and seen
ery accurate results. These include soccer game event annotation in
roadcast video [37], tracking of multiple moving targets from an un-
anned aerial vehicle [38], drift reduction for visual odometry [39],
nd motion analysis for advanced driving assistance systems. SYBA is
n excellent candidate for hardware implementation due to its ability
o create a feature descriptor without using complex computations
hat require floating-point operations. Future work will focus on ap-
lying SYBA to various computer vision applications that require ac-
urate feature matching. Hardware implementation for the embed-
ed vision sensor will also be explored.
eferences
[1] B. Tippetts, K. Lillywhite, S. Fowers, A. Dennis, D.-J. Lee, J. Archibald, A simple,inexpensive, and effective implementation of a vision guided autonomous robot,
in: Proceedings of the SPIE Optics East, Intelligent Robots and Computer Vision
XXIV: Algorithms, Techniques, and Active Vision, 6382, 2006 63820P.[2] B. Tippetts, S. Fowers, K. Lillywhite, D.-J. Lee, J. Archibald, FPGA implementation
of a feature detection and tracking algorithm for real-time applications, in: Pro-ceedings of the 3rd International Conference on Advances in Visual Comput-
ing - Volume Part I, ser. ISVC’07, Berlin, Heidelberg, Springer-Verlag, 2007, pp.682–691.
[3] Z. Jia, A. Balasuriya, S. Challa, Vision based data fusion for autonomous vehicles
target tracking using interacting multiple dynamic models, Comput. Vis. ImageUnderst. 109 (1) (Jan 2008) 1–21.
[4] H.C. Garcia, J.R. Villalobos, G.C. Runger, An automated feature selection methodfor visual inspection systems, IEEE Trans. Autom. Sci. Eng. 3 (4) (2006) 394–406.
[5] Y. Chi, M.K.H. Leung, A general shape context framework for object identification,Comput. Vis. Image Underst. 112 (3) (Dec 2008) 324–336.
[6] K. Lillywhite, D.-J. Lee, B. Tippetts, S. Fowers, A. Dennis, B. Nelson, J. Archibald, An
embedded vision system for an unmanned four-rotor helicopter, in: Proceedingsof the SPIE Optics East, Intelligent Robots and Computer Vision XXIV: Algorithms,
Techniques, and Active Vision, 6384, 2006 63840G.[7] B.J. Tippetts, D.J. Lee, S.G. Fowers, J.K. Archibald, Real-time vision sensor for an
autonomous hovering Micro-UAV, J. Aerosp. Comput., Inf. Commun. 6 (10) (2009)570–584.
[8] V. Bonato, E. Marques, G.A. Constantinides, A parallel hardware architecture for
scale and rotation invariant feature detection, IEEE Trans. Circuits Syst. VideoTechnol. 18 (12) (Dec. 2008) 1703–1712.
[9] W.S. Fife, A.J.K. Archibald, Reconfigurable on-board vision processing forsmall autonomous vehicles, EURASIP J. Embedded Syst. 2007 (1) (Jan 2007)
33–46.[10] Z. Wei, D. Lee, B.E. Nelson, A hardware-friendly adaptive tensor based optical flow
algorithm, Lect. Notes Comput. Sci. 4842 (2007) 43.
[11] R. Fransens, C. Strecha, L. Van Gool, Optical flow based super-resolution: a proba-bilistic approach, Comput. Vis. Image Underst. 106 (1) (Apr 2007) 106–115.
[12] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Com-put. Vis. 60 (2) (Nov 2004) 91–110.
[13] H. Bay, T. Tuytelaars, L. Van Gool, SURF: speeded up robust features, ComputerVision ECCV 2006 (2006) 404–417.
[14] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), Com-put. Vis. Image Underst. 110 (3) (Jun 2008) 346–359.
[15] Y. Ke and R. Sukthankar, "PCA-SIFT: a more distinctive representation for local im-
age descriptors," in Computer Vision and Pattern Recognition, 2004. CVPR 2004.Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp. I1-506-
I1-513 Vol. 2, 2004.
[16] G. Hua, M. Brown, S. Winder, Discriminant Embedding for Local Image Descrip-tors, in: Proceedings of the IEEE 11th International Conference on Computer Vi-
sion, 2007. ICCV 2007., 2007.[17] K. Simonyan, A. Vedaldi, A. Zisserman, Learning local feature descriptors using
convex optimisation, IEEE Trans. Pattern Anal. Mach. Intell. 36 (8) (2014) 1573–1585.
[18] T. Trzcinski, V. Lepetit, Efficient discriminative projections for compact binary de-scriptors, in: Proceedings of the Computer Vision–ECCV 2012, Berlin Heidelberg,
Springer, 2012, pp. 228–242.
[19] J. Masci, D. Migliore, M. Bronstein, J. Schmidhuber, Descriptor learning for om-nidirectional image matching, in: Registration and Recognition in Images and
Videos, Berlin Heidelberg, Springer, 2014, pp. 49–62.20] J. Masci, M. Bronstein, A. Bronstein, J. Schmidhuber, Multimodal Similarity-
Preserving Hashing, IEEE Trans. Pattern Anal. Mach. Intell. 36 (4) (2014) 824–830.[21] C. Strecha, A. Bronstein, M. Bronstein, P. Fua, LDAHash: improved matching with
smaller descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2012) 66–78.
22] M. Brown, Gang Hua, S. Winder, Discriminative learning of local image descrip-tors, IEEE Trans. Pattern Anal. Mach. Intell. 33 (1) (2011) 43–57.
23] M. Calonder, V. Lepetit, C. Strecha, P. Fua, BRIEF: binary robust independent ele-mentary features, in: Proceedings of the 11th European conference on Computer
vision: Part IV, ser. ECCV’10, Berlin, Heidelberg, Springer-verlag, 2010, pp. 778–792.
24] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, P. Fua, "BRIEF: comput-
ing a local binary descriptor very fast," pattern analysis and machine intelligence,IEEE Trans. 34 (7) (July 2012) 1281–1298.
25] S. Leutenegger, M. Chli, R. Siegwart, BRISK: Binary robust invariant scalable key-points, in: Proceedings of the IEEE International Conference on Computer Vision
(ICCV), 2011, Nov. 2011, pp. 2548–2555.26] S. Saha, V. Demoulin, ALOHA: an efficient binary descriptor based on Haar fea-
tures, in: Proceedings of the 19th IEEE International Conference on Image Pro-
cessing (ICIP), 2012, 2012, pp. 2345–2348.[27] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, ORB: an efficient alternative to SIFT
or SURF, in: Proceedings of the EEE International Conference on Computer Vision(ICCV), 2011, 2011, pp. 2564–2571.
28] G. Carneiro, A. Jepson, "Multi-scale phase-based local features," in computer vi-sion and pattern recognition, 2003, in: Proceedings of the IEEE Computer Society
Conference on 2003, 1, 2003, pp. I-736–I-743.
29] H. Anderson, Both lazy and efficient: Compressed sensing and applications, San-dia National Laboratories, Albuquerque, NM, 2013, pp. 2013–7521P. Tech. Rep..
30] S.G. Fowers, D. Lee, D.A. Ventura, J.K. Archibald, The nature-inspired BASIS featuredescriptor for UAV imagery and its hardware implementation, IEEE Trans. Circuits
Syst. Video Technol. PP (99) (2012) 1.[31] S.G. Fowers, A. Desai, D.J. Lee, D. Ventura, D.K. Wilde, Efficient tree-based fea-
ture descriptor and matching algorithm, AIAA J. Aerosp. Inf. Syst. 11/9 (September
2014) 596–606.32] M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, P. Mihelich, Compact sig-
natures for high-speed interest point description and matching, in: Proceedingsof the IEEE 12th International Conference on Computer Vision, 2009, Sept. 2009,
pp. 357–364.33] HYPERLINK "http://www.robots.ox.ac.uk/∼vgg/research/affine/" (Accessed: 9
February 2014).34] http://www.cs.ubc.ca/∼mbrown/patchdata/patchdata.html (Accessed: 15 April
2015).
35] A. Vedaldi and B. Fulkerson, "Vlfeat: An an open and portable library of computervision algorithms," 2008. [Online]. Available: http://www.vlfeat.org/
36] http://roboticvision.groups.et.byu.net/Robotic_Vision/Feature/BYUFeatureMatching.html (Accessed: 11 January 2015).
[37] A. Desai, D.J. Lee, and C.N. Wilson, “Determine Absolute Soccer Ball Locationin Broadcast Video Using SYBA Descriptor,” Lecture Notes in Computer Science
(LNCS), International Symposium on Visual Computing (ISVC), Part II, LNCS 8888,
p. 588–597, Las Vegas, NV, U.S.A., December 8-10, 2014.38] A. Desai, D.J. Lee, and M. Zhang, “Using Accurate Feature Matching for Un-
manned Aerial Vehicle Ground Object Tracking,” Lecture Notes in Computer Sci-ence (LNCS), International Symposium on Visual Computing (ISVC), Part I, LNCS
8887, p. 435–444, Las Vegas, NV, U.S.A., December 8-10, 2014.39] A. Desai and D.J. Lee, “Visual odometry drift reduction using SYBA descriptor and
feature transformation,” IEEE Trans. Intell. Transp. Syst. (Revised).