parsing images with context/content sensitive grammars eran borenstein, stuart geman, ya jin, wei...
TRANSCRIPT
![Page 1: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/1.jpg)
Parsing Images with Context/Content Sensitive
Grammars
Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang
![Page 2: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/2.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 3: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/3.jpg)
Artificial Intelligence
• Knowledge Engineering
• Learning Theory
• Both Lack Model
engineer everything, learn nothing
engineer nothing, learn everything
![Page 4: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/4.jpg)
Natural Intelligence
• Strong Representation
• Hierarchy and Reusability
simulation and semantics
ventral visual pathway, linguistics, compositionality
![Page 5: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/5.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 6: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/6.jpg)
Machines still can’t reliably read license plates
License plate images from Logan Airport
![Page 7: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/7.jpg)
Machines can’t read fixed-font fixed-scale characters as well as humans
Wafer ID’s
![Page 8: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/8.jpg)
Machines can’t find the bad guys at the Super Bowl
Super Bowl
![Page 9: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/9.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 10: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/10.jpg)
Vision is content sensitive
Instantiation
same
twins
Empire style table
![Page 11: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/11.jpg)
Background is structured, and made of the same stuff!
“Clutter”
Human Interactive Proofs
![Page 12: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/12.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 13: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/13.jpg)
e.g. discontinuities, gradient
e.g. linelets, curvelets, T-junctions
e.g. contours, intermediate objects
e.g. animals, trees, rocks
Hierarchical of Reusable Parts
“Bricks”
![Page 14: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/14.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 15: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/15.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 16: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/16.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 17: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/17.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 18: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/18.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 19: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/19.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 20: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/20.jpg)
Hierarchy of Disjunctions of Conjunctions
![Page 21: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/21.jpg)
Interpretations and Probabilities
Interpretation
I
selected subgraph
![Page 22: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/22.jpg)
Interpretations and Probabilities
selected subgraph
IInterpretation
![Page 23: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/23.jpg)
Interpretations and Probabilities
)(IP GRAPHICAL MODEL (Markov)
X LIKELIHOOD RATIO (non-Markov)
![Page 24: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/24.jpg)
Generative (Bayesian) Model
![Page 25: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/25.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 26: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/26.jpg)
Test set: 385 images, mostly from Logan Airport
Courtesy of Visics Corporation
![Page 27: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/27.jpg)
characters, plate sides
generic letter, generic number, L-junctions of sides
license plates
Architecture
parts of characters, parts of plate sides
plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)
license numbers (3 digits + 3 letters, 4 digits + 2 letters)
![Page 28: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/28.jpg)
Original Image Top object
Top 10 objects Top 25 objects
Image interpretation
![Page 29: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/29.jpg)
Test image Top objects
Image interpretation
![Page 30: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/30.jpg)
• 385 images
• Six plates read with mistakes (>98%)
• Approx. 99.5% characters read correctly
• Zero false positives
Performance
![Page 31: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/31.jpg)
Original image Zoomed license region
Top object under Markov distribution
Top object under content-sensitive distribution
Efficient discrimination: Markov versus Content-Sensitive dist.
![Page 32: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/32.jpg)
9 active “8” bricks under whole model 1 active “8” brick under parts model
Test image
Efficient discrimination: testing objects against their parts
![Page 33: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/33.jpg)
Vision is Content Sensitive
Summary
Background is Structured, and Made of the Same Stuff
Non-Markovian probability models
Objects come equipped with their own background models
![Page 34: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/34.jpg)
I. Structured Representation in Neural Systems
II. Vision is Hard
III. Why is Vision Hard?
IV. Hierarchies of Reusable Parts
V. Demonstration System: Reading License Plates
VI. Generalization: Face Detection
![Page 35: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/35.jpg)
Plates Face Detection
Rigid Deformable
“Black/White” Data Model Intensity Model
Hand-Crafted Probabilities Learned Probabilities
![Page 36: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/36.jpg)
Face Hierarchy
![Page 37: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/37.jpg)
![Page 38: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/38.jpg)
Sampling from Data Model 0.6 1
![Page 39: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/39.jpg)
Sampling faces from the distribution
![Page 40: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang](https://reader036.vdocuments.site/reader036/viewer/2022062304/56649f475503460f94c69452/html5/thumbnails/40.jpg)
PATTERN SYNTHESIS
= PATTERN RECOGNITION
Ulf Grenander