working(with(images:(cis519/fall2017/lectures/15...• fei-fei & perona, 2005 interest point...
TRANSCRIPT
![Page 1: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/1.jpg)
Working with Images: Bag of Visual Words
Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
![Page 2: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/2.jpg)
Bag-of-features models
![Page 3: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/3.jpg)
Origin 1: Texture recognition
• Texture is characterized by the repetition of basic elements or textons
• For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 4: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/4.jpg)
Origin 1: Texture recognition
Universal texton dictionary
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 5: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/5.jpg)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 6: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/6.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 7: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/7.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 8: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/8.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 9: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/9.jpg)
Bags of features for object recognition
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
face, flowers, building
• Works pretty well for image-level classification
![Page 10: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/10.jpg)
Bags of features for object recognition
Caltech6 dataset
bag of features bag of features Parts-and-shape model
![Page 11: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/11.jpg)
Bag of features: outline 1. Extract features
![Page 12: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/12.jpg)
Bag of features: outline 1. Extract features 2. Learn “visual vocabulary”
![Page 13: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/13.jpg)
Bag of features: outline 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary
![Page 14: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/14.jpg)
Bag of features: outline 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”
![Page 15: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/15.jpg)
Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005
1. Feature extraction
![Page 16: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/16.jpg)
Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005
Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005
1. Feature extraction
![Page 17: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/17.jpg)
Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005
Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005
Other methods • Random sampling (Vidal-Naquet & Ullman, 2002) • Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas,
Blei, Jordan, 2003)
1. Feature extraction
![Page 18: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/18.jpg)
Normalize patch
Detect patches [Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]
Compute SIFT
descriptor [Lowe’99]
Slide credit: Josef Sivic
1. Feature extraction
![Page 19: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/19.jpg)
…
1. Feature extraction
![Page 20: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/20.jpg)
2. Learning the visual vocabulary
…
![Page 21: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/21.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
![Page 22: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/22.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
![Page 23: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/23.jpg)
From clustering to vector quantization • Clustering is a common method for learning a
visual vocabulary or codebook • Unsupervised learning process • Each cluster center produced by k-means becomes a
codevector • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the
codebook will be “universal”
• The codebook is used for quantizing features • A vector quantizer takes a feature vector and maps it to the
index of the nearest codevector in a codebook • Codebook = visual vocabulary • Codevector = visual word
![Page 24: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/24.jpg)
Example visual vocabulary
Fei-Fei et al. 2005
![Page 25: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/25.jpg)
Image patch examples of visual words
Sivic et al. 2005
![Page 26: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/26.jpg)
Visual vocabularies: Issues • How to choose vocabulary size?
• Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting
• Generative or discriminative learning? • Computational efficiency
• Vocabulary trees (Nister & Stewenius, 2006)
![Page 27: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/27.jpg)
3. Image representation
…..
frequ
ency
codewords
![Page 28: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/28.jpg)
Image classification • Given the bag-of-features representations of
images from different classes, how do we learn a model for distinguishing them?
![Page 29: Working(with(Images:(cis519/fall2017/lectures/15...• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other](https://reader036.vdocuments.site/reader036/viewer/2022080718/5f7874fafb5847600d02a664/html5/thumbnails/29.jpg)
No rigorous geometric information of the object components
It’s intuitive to most of us that objects are made of parts – no such information
Not extensively tested yet for • View point invariance • Scale invariance
Segmentation and localization unclear
Weakness of the model