roles of appearance and contextual information

Roles of Appearance and Contextual Information

Devi Parikh, Larry Zitnick and Tsuhan Chen

When?• High intra-class variance– Chairs

• Low inter-class variance– Lemon vs. tennis ball

• Occlusion

• Low resolution image– Image of a far away scene– Bad quality

[Rabinovich et al., ICCV 07]

[Bustos et.al., CSUR 2005]

Popular motivations for context…

[Antonio Torralba]

However…

Let’s look at it more carefully…

Object appearance

Appearance + Context

Blind recognition

Low resolution

High resolution

Blind recognition

Appearance + Context

When?

• Humans?

• Machines?

Set-up

Low Resolution Appearance Machine

HumanContext

??

? ?

?

High Resolution

Low resolution: Appearance

Low resolution: Context

Low resolution: Appearance + Context

High resolution: Appearance

High resolution: Context

High resolution: Appearance + Context

11 subjects

2 sessions

3 scenarios

70 segments

Related work

[Torralba et al., Tech Report 2007]

Machines

N

jijiij

N

iii ccc

Z 1,1

,1)|P( Sc

??

? ?

?

AppearanceContext

Texture + Shape

[TextonBoost]*

Color

[GMM]

Neural Network

Co-occurrence**

Relative location

Relative scale

Inference:

Belief Propagation

*[Shotton et al., ECCV 2006] **[Rabinovich et al., ICCV 2007]

Machines

Co-occurrence** Relative location Relative scale

**[Rabinovich 2007]

Machines• MSRC dataset• Corel dataset

[Felzenshwalb and Huttenlocher IJCV 2004]

Results

Results

A: Appearance alone

C: Context alone

A+C: Appearance and context

Low High High resolution images do not benefit from

context

Results

A: Appearance alone

C: Context alone


Low High

Low resolution images NEED context

Results

Results (Machine)

Context & Appearance help each other

Results (Machine)

Appearance hurts

Results (Machine)

Context very weak

Results (Machine)

Context makes no difference

Results (Machine)

Results (Machine)

Relative Location

Co-occurrence

Relative Scale

Results (Machine)• Failure cases

Results (Machine)

MSRC Corel

Existing (high) 75 1 81 2

Proposed (high)

91 93

Proposed (low) 83 86

1 [Yang et al., CVPR 2007] 2 [He et al., ECCV 2006]

Results (Machine)

Contributions

• Context is most useful when appearance information is weak

• Low resolution images are an appropriate venue for studying context

Discussion• Improve appearance or context models to achieve human

performance?

• Need to improve both appearance and context models• In low resolution images, appearance information is similar for

humans and machines– Hence, appropriate venue for studying context

• Achieving human performances need not be the ultimate goal

Need to improve context models

Need to improve appearance models

Follow up work (PAMI 2011)

Results

A: Appearance alone

C: Context alone


Low High

Machine do not leverage contextual

information as effectively as humans

Are machine missing a source of context?

Different Sources of Context

Different Sources of Context: None

Different Sources of Context: Cooc

Different Sources of Context: Rel-scale

Different Sources of Context: Rel-loc

Different Sources of Context: All

Different Sources of Context: Blind

Different Sources of Context: Image

High Resolution Appearance

PASCAL

“Natural” scenes

Bounding boxes

Common pixels

More void

Results

MSRC PASCAL50556065707580859095 app

co-occrel-locrel-scaleall-explodedall-no-voidblindall

Results

MSRC PASCAL50556065707580859095 app


Co-occurrence information helps in both datasets. Relative location does not help in PASCAL.

MSRC Location Statistics

Building

TreeSh

eep

AeroplaneFa

ce

Bicycle Sig

nBook

RoadDog

Boat0

102030405060708090

100

BuildingGrass

TreeCow

SheepSky

AeroplaneWater

FaceCar

BicylceFlower

SignBird

BookChairRoad

CatDog

BodyBoat

PASCAL Location Statistics

AerplaneBird

Bottle CarChair

Dining table

HorsePerso

nSh

eepTrain

0102030405060708090

100

AeroplaneBicycle

BirdBoat

BottleBusCarCat

ChairCow

Dining tableDog

HorseMotorbike

PersonPotted plant

SheepSofa

TrainTV / monitor

Results

MSRC PASCAL50556065707580859095 app


Relative scale information does not help across the board.

Results

MSRC PASCAL50556065707580859095 app


Our choice of visualization does not affect performance in MSRC.

Results

MSRC PASCAL50556065707580859095 app


There is information in the “void” (unlabeled) pixels!

We leverage this cue for object detectionCongcong Li, Devi Parikh and Tsuhan Chen.

Extracting Adaptive Contextual Cues from Unlabeled Regions, ICCV 2011

MUCH more interesting analysis and findings in the PAMI 2011 paper:

Exploring Tiny Images: The Roles of Appearance and Contextual Information for Machine and Human Object Recognition

roles of appearance and contextual information

Documents