simultaneous,deep,transfer,across, domains,and,tasks

Simultaneous,Deep,Transfer,Across,Domains,and,Tasks,

!Presenta)on!by!Alejandro,Cartas!

Eric!Tzeng,!Judy!Hoffman,!Trevor!Darrell,!Kate!Saenko!

Domain Adaptation: Train on source adapt to target

backpack chair bike

Source Domainlots of labeled data

⇠ PS(X,Y )

DS = {(xi, yi), 8i 2 {1, . . . , N}}

Original slide: J. Hoffman, Adapting Deep Networks Across Domains, Modalities, and Tasks, ICCV 2015


backpack chair bike


⇠ PS(X,Y )

DS = {(xi, yi), 8i 2 {1, . . . , N}}

bike??

Target Domainunlabeled or limited labels

⇠ PT (Z,H)

?DT = {(zj , ), 8j 2 {1, . . . ,M}}



backpack chair bike

Adapt


⇠ PS(X,Y )

DS = {(xi, yi), 8i 2 {1, . . . , N}}

bike??


⇠ PT (Z,H)

?DT = {(zj , ), 8j 2 {1, . . . ,M}}


Source Data

backpack chair bike

fc8conv1 conv5source data

fc6 fc7 classif cationloss

i

Adap)ng!across!domains!minimize!discrepancy!!

L(xS , yS , xT , yT , ✓D; ✓repr, ✓C) =LC(xS , yS , xT , yT ; ✓repr, ✓C)

+�Lconf

(xS

, x

T

, ✓

D

; ✓repr

)

+⌫Lsoft

(xT

, y

T

; ✓repr

, ✓

C

)

Eric!Tzeng,!et.!al.!Sim

ultane

ous!D

eep!Transfer!Across!D

omains!and

!Tasks,!2015!



+�Lconf

(xS

, x

T

, ✓

D

; ✓repr

)

+⌫Lsoft

(xT

, y

T

; ✓repr

, ✓

C

)

Source Data

backpack chair bike

Target Databackpack

?

fc8conv1 conv5 fc6 fc7

labeled target data


fc6 fc7

classif cationlosssh

ared

shar

ed

shar

ed

shar

ed i


ultane

ous!D


omains!and

!Tasks,!2015!

Adap)ng!across!domains!minimize!discrepancy!!LD(xS , xT , ✓repr; ✓D) = �

X

d

[yD = d] log qd Lconf

(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d

Adapted!from!J.!Hoffman,!Adap)ng!Deep!Networks!Across!Domains,!Modali)es,!and!Tasks,!2015!

q = softmax(✓TDf(x; ✓repr))


✓Cobjectclassifier

LD(xS , xT , ✓repr; ✓D) = �X

d


(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d





Discrepancy


d


(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d




✓Cobjectclassifier domain

classifier

✓D


d


(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d


Adapted!from!J.!Hoffm

an,!Adap)ng!Deep!Netw

orks!Across!Domains,!M

odali)es,!and!Tasks,!2015!


X

d


(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d

domainclassifier

✓D







X

d


(x

S

, x

T

, ✓

D

; ✓

repr

) = �X

d

1

D

log q

d









+�Lconf

(xS

, x

T

, ✓

D

; ✓repr

)

+⌫Lsoft

(xT

, y

T

; ✓repr

, ✓

C

)

Source Data

backpack chair bike

Target Databackpack

?

fc8conv1 conv5 fc6 fc7 all t

arge

t dat

a

source data

labeled target data


fcD

fc6 fc7

classif cationloss

domainconfusion

loss

domainclassif er

losssh

ared

shar

ed

shar

ed

shar

ed

shar

ed

i

i


ultane

ous!D


omains!and

!Tasks,!2015!


backpack chair bike

Adapt


⇠ PS(X,Y )

DS = {(xi, yi), 8i 2 {1, . . . , N}}

bike??


⇠ PT (Z,H)

?DT = {(zj , ), 8j 2 {1, . . . ,M}}


Source!soKlabels!

SourceCNN

SourceCNN

SourceCNN

BottleMug Chai

rLapt

opKeyb

oard

Bottle Mug Chai

rLapt

opKeyb

oard

Bottle Mug Chai

rLapt

opKeyb

oard

Bottle Mug Chai

rLapt

opKeyb

oard

+

softmaxhightemp

softmaxhightemp

softmaxhightemp


ultane

ous!D


omains!and

!Tasks,!2015!

l(bottle)

Source!soKlabels!

Bottle Mug ChairLaptop

Keyboard

Bottle Mug ChairLaptop

Keyboard

Adapt CNN

“Bottle”

Source ActivationsPer Class

backprop

Cross Entropy Loss

softmaxhightemp


ultane

ous!D


omains!and

!Tasks,!2015!

Lsoft

(x

T

, y

T

; ✓

repr

, ✓

C

) = �X

i

l

yTi

log p

i

p = softmax(✓TCf(xT ; ✓repr)/⌧)

Class!correla)on!transfer!loss!


+�Lconf

(xS

, x

T

, ✓

D

; ✓repr

)

+⌫Lsoft

(xT

, y

T

; ✓repr

, ✓

C

)

Source Data

backpack chair bike

Target Databackpack

?

fc8conv1 conv5 fc6 fc7

Source softlabels

all t

arge

t dat

a

source data

labeled target data


softmaxhigh temp

softlabelloss

fcD

fc6 fc7

classif cationloss

domainconfusion

loss

domainclassif er

losssh

ared

shar

ed

shar

ed

shar

ed

shar

ed

i

i


ultane

ous!D


omains!and

!Tasks,!2015!

Office dataset Experiment Adapting Visual Category Models to New Domains 9

31 categories� ��

keyboardheadphonesfile cabinet... laptop letter tray ...

amazon dSLR webcam

...

inst

ance

1in

stan

ce 2

...

...

...

inst

ance

5

...

inst

ance

1in

stan

ce 2

...

...

...

inst

ance

5

...

� �� 3 domains

Fig. 4. New dataset for investigating domain shifts in visual category recognition tasks.Images of objects from 31 categories are downloaded from the web as well as capturedby a high definition and a low definition camera.

popular way to acquire data, as it allows for easy access to large amounts ofdata that lends itself to learning category models. These images are of productsshot at medium resolution typically taken in an environment with studio lightingconditions. We collected two datasets: amazon contains 31 categories4 with anaverage of 90 images each. The images capture the large intra-class variation ofthese categories, but typically show the objects only from a canonical viewpoint.amazonINS contains 17 object instances (e.g. can of Taster’s Choice instantco↵ee) with an average of two images each.

Images from a digital SLR camera: The second domain consists of im-ages that are captured with a digital SLR camera in realistic environments withnatural lighting conditions. The images have high resolution (4288x2848) andlow noise. We have recorded two datasets: dslr has images of the 31 object cat-

4 The 31 categories in the database are: backpack, bike, bike helmet, bookcase, bottle,calculator, desk chair, desk lamp, computer, file cabinet, headphones, keyboard, lap-top, letter tray, mobile phone, monitor, mouse, mug, notebook, pen, phone, printer,projector, puncher, ring binder, ruler, scissors, speaker, stapler, tape, and trash can.

• all classes have source labeled examples

• 15 classes have target labeled examples

• evaluate on remaining 16 classes

[saenko`10]


540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593

594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647

ICCV#937

ICCV#937

ICCV 2015 Submission #937. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

A ! W A ! D D ! A D ! W W ! A W ! D Average

DLID [7] 51.9 – – 78.2 – 89.9 –DeCAF6 S+T [9] 80.7 ± 2.3 – – 94.8 ± 1.2 – – –DaNN [13] 53.6 ± 0.2 – – 71.2 ± 0.0 – 83.5 ± 0.0 –Source CNN 56.5 ± 0.3 64.6 ± 0.4 47.6 ± 0.1 92.4 ± 0.3 42.7 ± 0.1 93.6 ± 0.2 66.22Target CNN 80.5 ± 0.5 81.8 ± 1.0 59.9 ± 0.3 80.5 ± 0.5 59.9 ± 0.3 81.8 ± 1.0 74.05Source+Target CNN 82.5 ± 0.9 85.2 ± 1.1 65.8 ± 0.5 93.9 ± 0.5 65.2 ± 0.7 96.3 ± 0.5 81.50

Ours: dom confusion only 82.8 ± 0.9 85.9 ± 1.1 66.2 ± 0.4 95.6 ± 0.4 64.9 ± 0.5 97.5 ± 0.2 82.13Ours: soft labels only 82.7 ± 0.7 84.9 ± 1.2 66.0 ± 0.5 95.9 ± 0.6 65.2 ± 0.6 98.3 ± 0.3 82.17Ours: dom confusion+soft labels 82.7 ± 0.8 86.1 ± 1.2 66.2 ± 0.3 95.7 ± 0.5 65.0 ± 0.5 97.6 ± 0.2 82.22

Table 1. Multi-class accuracy evaluation on the standard supervised adaptation setting with the Office dataset. We evaluate on all 31 categoriesusing the standard experimental protocol from [28]. Here, we compare against three state-of-the-art domain adaptation methods as well as aCNN trained using only source data, only target data, or both source and target data together.

A ! W A ! D D ! A D ! W W ! A W ! D Average

MMDT [18] – 44.6 ± 0.3 – – – 58.3 ± 0.5 –Source CNN 54.2 ± 0.6 63.2 ± 0.4 36.4 ± 0.1 89.3 ± 0.5 34.7 ± 0.1 94.5 ± 0.2 62.0

Ours: dom confusion only 55.2 ± 0.6 63.7 ± 0.9 41.2 ± 0.1 91.3 ± 0.4 41.1 ± 0.0 96.5 ± 0.1 64.8Ours: soft labels only 56.8 ± 0.4 65.2 ± 0.9 41.7 ± 0.3 89.6 ± 0.1 38.8 ± 0.4 96.5 ± 0.2 64.8Ours: dom confusion+soft labels 59.3 ±0.6 68.0±0.5 43.1± 0.2 90.0± 0.2 40.5±0.2 97.5± 0.1 66.4

Table 2. Multi-class accuracy evaluation on the standard semi-supervised adaptation setting with the Office dataset. We evaluate on 16held-out categories for which we have no access to target labeled data. We show results on these unsupervised categories for the source onlymodel, our model trained using only soft labels for the 15 auxiliary categories, and finally using domain confusion together with soft labelson the 15 auxiliary categories.

target domain. We report accuracies on the remaining un-labeled images, following the standard protocol introducedwith the dataset [28]. In addition to a variety of baselines, wereport numbers for both soft label fine-tuning alone as wellas soft labels with domain confusion in Table 1. Because theOffice dataset is imbalanced, we report multi-class accura-cies, which are obtained by computing per-class accuraciesindependently, then averaging over all 31 categories.

We see that fine-tuning with soft labels or domain con-fusion provides a consistent improvement over hard labeltraining in 5 of 6 shifts. Combining soft labels with do-main confusion produces marginally higher performance onaverage. This result follows the intuitive notion that whenenough target labeled examples are present, directly opti-mizing for the joint source and target classification objective(Source+Target CNN) is a strong baseline and so using ei-ther of our new losses adds enough regularization to improveperformance.

Next, we experiment with the semi-supervised adaptationsetting. We consider the case in which training data andlabels are available for some, but not all of the categories inthe target domain. We are interested in seeing whether wecan transfer information learned from the labeled classes tothe unlabeled classes.

To do this, we consider having 10 target labeled exam-ples per category from only 15 of the 31 total categories,

following the standard protocol introduced with the Officedataset [28]. We then evaluate our classification performanceon the remaining 16 categories for which no data was avail-able at training time.

In Table 2 we present multi-class accuracies over the 16held-out categories and compare our method to a previousdomain adaptation method [18] as well as a source-onlytrained CNN. Note that, since the performance here is com-puted over only a subset of the categories in the dataset, thenumbers in this table should not be directly compared to thesupervised setting in Table 1.

We find that all variations of our method (only soft labelloss, only domain confusion, and both together) outperformthe baselines. Contrary to the fully supervised case, here wenote that both domain confusion and soft labels contributesignificantly to the overall performance improvement of ourmethod. This stems from the fact that we are now evaluat-ing on categories which lack labeled target data, and thusthe network can not implicitly enforce domain invariancethrough the classification objective alone. Separately, thefact that we get improvement from the soft label training onrelated tasks indicates that information is being effectivelytransferred between tasks.

In Figure 5, we show examples for theAmazon!Webcam shift where our method correctlyclassifies images from held out object categories and the

6

Office dataset Experiment

Multiclass accuracy over 16 classes which lack target labels


back packbike

bike helmet

bookcasebottle

calculator

desk chair

desk lamp

desktop co

mputer

file ca

binet

headphones

keyboard

laptop computer

letter tray

mobile phonemonitor

mousemug

paper notebookpenphone

printer

projector

punchers

ring binderruler

scissorsspeaker

stapler

tape dispenser

trash can

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Ours soft label

back packbike

bike helmet

bookcasebottle

calculator

desk chair

desk lamp

desktop co

mputer

file ca

binet

headphones

keyboard

laptop computer

letter tray

mobile phonemonitor

mousemug

paper notebookpenphone

printer

projector

punchers

ring binderruler

scissorsspeaker

stapler

tape dispenser

trash can

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Baseline soft label

ring bindermonitor

Baseline soft activation

Our soft activation

Target test image


back pack bike bike helmet

bookcase bottle calculator

desk chair desk lamp desktop computer

file cabinet headphones keyboard

laptop computer letter tray mobile phone

Source soft labels


Cross-dataset Experiment Setup

Source: ImageNet !Target: Caltech256 !40 categories !Evaluate adaptation performance with 0,1,3,5 target labeled examples per class

[tommasi`14]


ImageNet adapted to Caltech

Number Labeled Target Examples per Category0 1 3 5

Mul

ti-cla

ss A

ccur

acy

72

73

74

75

76

77

78

Source+Target CNNOurs: softlabels onlyOurs: dom confusion+softlabels

[ICCV 2015]

400 120 200Number of labeled target examples

Mul

ticla

ss A

ccur

acy


simultaneous,deep,transfer,across, domains,and,tasks

Data & Analytics