crowdsourced discovery of fine-grained attributessmaji/presentations/attributes-poster... · one...

1
Overview Goals Attribute discovery for fine-grained and detailed recognition of visual categories. Approach A novel annotation framework to collect descriptions of a collection of images A novel analysis of the text to discover the structure of parts and attributes Key idea A good set of attributes should enable easy discrimination between instances We also focus on the communication aspect which is important for human-computer interaction Sources of attributes Field guides Can provide an exhaustive list of attributes when available, but may not be task specific, or jargon free Captioned images on the web Descriptions are often not visual Detailed captions are available only for few categories Image from Berg et al., ECCV’10 We crowdsource attribute discovery Using a simple interface that we found to be very effective Our approach Annotation task Describe the visual differences between the two images The annotators list 5 differences between the images in short sentences separated by v.s. %&'() *'" )(+,() -). /0&0- *'" -1..)- !"#$ &'(&)'*)# !"# %-0%)&&)- %&'() !"# %'"")(+)- %&'() 0() )(+,() !"# 201- )(+,()" -). /0&0- !"# 3*,4) /0&0- -01(. -1..)- !"# %0,(45 -1..)- !"#$ +",)')-.)# Description vs. Discriminative description Annotators are forced to reveal detailed attributes which are useful for fine-grained discrimination by design Also provides us with sentence pairs that can be analyzed to discover the structure of attributes, e.g., parts, modifiers and their relations. Collecting annotations Interface on Amazon Mechanical Turk Users respond in free form text We also provide a few example annotations We collect annotations for random pairs of images If an attribute is present with probability p, then it is likely to be revealed in a pair with probability p(1-p), which is highest for p=0.5, i.e. the most discriminative attribute Example annotations collected propeller to the body propeller to the wing one rudder two rudders thin body fat body low wings high wings facing towards left side facing slightly towards" Analyzing the text Instance specific properties Different properties are revealed in different comparisons The frequency of an attribute is a measure of its discriminative ability [From Wikipedia] Florida jay: It has a strong black bill, blue head and nape without a crest, a whitish forehead and supercilium, blue bib, blue wings, grayish underparts, gray back, long blue tail, black legs and feet. Analyzing the sentence pairs Sentence pairs provide additional structure over the words used parts: words that repeat across a pair modifiers: words that are different across a pair Can also be obtained using part-of-speech taggers A sentence pair is more informative than two sentences in isolation Additional constraint: the modifiers belong from the same category In the example below, red and white must be two kinds of the same thing (color in this case). Example yellow black body orange brown body pointy beak shape beak short tail long tail black spot over head brown stripe over head short leg long leg" 5: blue wings 4: black beak 4: long tail 3: blue head 3: short beak 2: blue white body 2: blue white fur 2: sharp beak 2: short tail 1: big body 1: black leggs 1: black legs 1: black wings 1: blue body 1: blue color head 1: gray body 1: long bird 1: orange white body 1: short wings 1: slim body 1: white and blue fur A generative model of sentence pairs Discovering parts, modifiers and attributes The constraints described earlier can be captured using a generative bipartite-topic model. Parameter estimation using variational EM algorithm Experiments FGVC-Aircraft dataset (see our other poster) 200 random images, 1000 random pairs Caltech-UCSD birds 200 dataset 200 images (1 per category), 1600 pairs bird wings feather feathers tail beak like body leg legs eyes neck head in fur GLOBAL long short small large big v pointy pointed point bend bended pointly slightly sparrow duck sparow crow eagle dove pigeon humming kite parrot brown blue yellow gray red green spotted ash light black white orange fat slim silm lean sharp round flat normal shaped curved blunt little rounded ordinary oval PASCAL VOC 2011 person dataset 200 images from trainval subset, 1600 pairs hand hands hair facing snap snape the picture spectacles in glass bag watch glasses GLOBAL towards left right backwards forward sidewards man lady woman boy girl ladies baby adult child kid children adults shirt tshirt jacket dress coat tshirts wearing not smiling having asian caucasian africans asians latin western side back front backside frontal toward turn turned rear sideways upright us black white blue brown blonde red green gray yellow colored pink orange single couple 2 non double group couples many 3 dark light fair tee sky show bright lighter design medium thick fat slim normal thin lean fit average skinny female male full half of sleeve only fully sleeveless jeens close partial rain thinning torso waist indoor outdoor door homely stage inside outside out handed long short small shot women men womens young old older middle mature elderly two one three both somebody weight sitting standing walking riding cycling sleeping dancing driving no with without alone bald Reference Discovering a Lexicon of Parts and Attributes, S. Maji, Workshop on Parts and Attributes, ECCV, 2012 Crowdsourced Discovery of Fine-grained Attributes Subhransu Maji Toyota Technological Institute at Chicago Parts Modifiers (groups) Attributes Thickness of the edge is proportional to the frequency of the attribute sorted by frequency

Upload: truongduong

Post on 21-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Crowdsourced Discovery of Fine-grained Attributessmaji/presentations/attributes-poster... · one rudder two rudders thin body fat body low wings high wings ... • The frequency of

OverviewGoalsAttribute discovery for fine-grained and detailed recognition of visual categories.Approach• A novel annotation framework to collect descriptions of a

collection of images

• A novel analysis of the text to discover the structure of parts and attributes

Key idea• A good set of attributes should enable easy discrimination

between instances

• We also focus on the communication aspect which is important for human-computer interaction

Sources of attributes Field guides• Can provide an exhaustive list of attributes when

available, but may not be task specific, or jargon free

Captioned images on the web• Descriptions are often not visual

• Detailed captions are available only for few categories

Image from Berg et al., ECCV’10

We crowdsource attribute discovery• Using a simple interface that we found to be very effective

Our approachAnnotation task• Describe the visual differences between the two images

• The annotators list 5 differences between the images in short sentences separated by v.s.

!"#$

•  %&'()$•  *'"$)(+,()$•  -).$/0&0-$•  *'"$-1..)-$

!"#$%&'(&)'*)#%$%-0%)&&)-$%&'()$!"#$%'"")(+)-$%&'()$

$$0()$)(+,()$!"#$201-$)(+,()"$$$$$-).$/0&0-$!"#$3*,4)$/0&0-$

-01(.$-1..)-$!"#$%0,(45$-1..)-$

!"#$%+",)')-.)#%

!"#$

•  %&'()$•  *'"$)(+,()$•  -).$/0&0-$•  *'"$-1..)-$

!"#$%&'(&)'*)#%$%-0%)&&)-$%&'()$!"#$%'"")(+)-$%&'()$

$$0()$)(+,()$!"#$201-$)(+,()"$$$$$-).$/0&0-$!"#$3*,4)$/0&0-$

-01(.$-1..)-$!"#$%0,(45$-1..)-$

!"#$%+",)')-.)#%

Description vs. Discriminative description• Annotators are forced to reveal detailed attributes which

are useful for fine-grained discrimination by design

• Also provides us with sentence pairs that can be analyzed to discover the structure of attributes, e.g., parts, modifiers and their relations.

Collecting annotations Interface on Amazon Mechanical Turk• Users respond in free form text

We also provide a few example annotations

We collect annotations for random pairs of images • If an attribute is present with probability p, then it is likely

to be revealed in a pair with probability p(1-p), which is highest for p=0.5, i.e. the most discriminative attribute

Example annotations collected

pair 65/999; 5 good

propeller to the body propeller to the wingone rudder two ruddersthin body fat bodylow wings high wingsfacing towards left side facing slightly towards"

Analyzing the textInstance specific properties• Different properties are revealed in different comparisons

• The frequency of an attribute is a measure of its discriminative ability

[From Wikipedia] Florida jay: It has a strong black bill, blue head and nape without a crest, a whitish forehead and supercilium, blue bib, blue wings, grayish underparts, gray back, long blue tail, black legs and feet.

Analyzing the sentence pairs• Sentence pairs provide additional structure over the

words used‣ parts: words that repeat across a pair‣ modifiers: words that are different across a pair‣ Can also be obtained using part-of-speech taggers

• A sentence pair is more informative than two sentences in isolation‣ Additional constraint: the modifiers belong from the

same category‣ In the example below, red and white must be two

kinds of the same thing (color in this case).

Example

pair 43/1600; 5 good

yellow black body orange brown bodypointy beak shape beakshort tail long tailblack spot over head brown stripe over headshort leg long leg"

image 66/199, 21 attributes

5: blue wings 4: black beak 4: long tail 3: blue head 3: short beak 2: blue white body 2: blue white fur 2: sharp beak 2: short tail 1: big body 1: black leggs 1: black legs 1: black wings 1: blue body 1: blue color head 1: gray body 1: long bird 1: orange white body 1: short wings 1: slim body 1: white and blue fur

A generative model of sentence pairsDiscovering parts, modifiers and attributes• The constraints described earlier can be captured using

a generative bipartite-topic model.

• Parameter estimation using variational EM algorithm

ExperimentsFGVC-Aircraft dataset (see our other poster)• 200 random images, 1000 random pairs

Caltech-UCSD birds 200 dataset• 200 images (1 per category), 1600 pairs

bird wingsfeatherfeathers tail beak like body leg legs eyes neck head in fur GLOBAL

longshortsmalllargebig

vpointypointedpointbendbendedpointlyslightly

sparrowducksparowcroweagledovepigeonhumming

kiteparrot

brownblueyellowgrayredgreenspottedashlight

blackwhiteorange

fatslimsilmlean

sharproundflat

normalshapedcurvedbluntlittle

roundedordinaryoval

PASCAL VOC 2011 person dataset• 200 images from trainval subset, 1600 pairs

handhands hair facing

snapsnape

thepicture spectacles in glass bag watch glasses GLOBAL

towardsleftright

backwardsforwardsidewards

manlady

womanboygirlladies

babyadultchildkid

childrenadults

shirttshirtjacketdresscoattshirts

wearingnot

smilinghaving

asiancaucasianafricansasianslatin

western

sidebackfront

backsidefrontaltowardturnturnedrear

sidewaysuprightus

blackwhitebluebrownblonderedgreengrayyellowcoloredpinkorange

singlecouple2non

doublegroupcouplesmany3

darklightfairteeskyshowbrightlighterdesignmediumthick

fatslim

normalthinleanfit

averageskinny

femalemale

fullhalfof

sleeveonlyfully

sleevelessjeensclosepartialrain

thinningtorsowaist

indooroutdoordoorhomelystage

insideoutsideout

handed

longshortsmallshot

womenmen

womens

youngoldoldermiddlematureelderly

twoonethreeboth

somebodyweight

sittingstandingwalkingridingcyclingsleepingdancingdriving

nowith

withoutalonebald

ReferenceDiscovering a Lexicon of Parts and Attributes, S. Maji, Workshop on Parts and Attributes, ECCV, 2012

Crowdsourced Discovery of Fine-grained AttributesSubhransu Maji

Toyota Technological Institute at Chicago

Parts

Modifiers (groups)

Attributes

Thickness of the edge is proportional to the frequency of the attribute

sorted by frequency