asymmetric tri-training for unsupervised domain adaptation
TRANSCRIPT
![Page 1: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/1.jpg)
Asymmetric Tri-training
for Unsupervised Domain Adaptation
Kuniaki Saito1, Yoshitaka Ushiku1 and Tatsuya Harada1,2
1: The University of Tokyo, 2:RIKEN
ICML 2017 (8/6~8/11), Sydney
![Page 2: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/2.jpg)
Background: Domain Adaptation (DA)
rucksack
keyboard
bicycle
Source TargetSource Target
• Supervised learning with a lot of samples
– Cost to collect samples in various domain
– Classifiers suffer from the change of domain
• The purpose of DA– Training a classifier using source domain that works well on target domain
• Unsupervised Domain Adaptation– Labeled source samples and unlabeled target samples
![Page 3: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/3.jpg)
Related Work
• Applications on computer vision
– Domain transfer + Generative Adversarial Networks
– This paper: a novel approach w/o generative models
• Training CNN for domain adaptation
– Matching hidden features
of different domains[Long+, ICML 2015][Ganin+, ICML 2014]
Real faces to illusts [Taigman+, ICLR 2017] Artificial images to real images [Bousmalis+, CVPR 2017]
No Adapt AdaptedSource Target
Class A
Class B
![Page 4: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/4.jpg)
Theorem [Ben David+, Machine Learning 2010]
•– Related work: regard as being sufficiently small
• Distribution matching approaches aim to minimize
• There is no guarantee that is small enough
– Proposed method: minimizes by reducing error on target samples
• absence of labeled samples
→ We propose to give pseudo-labels to target samples
Theoretical Insight
How much features are discriminative
: Divergence between domains
Error on source domainError on target domain
?
![Page 5: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/5.jpg)
p1
p2
pt
S+Tl
Tl
S : source samplesTl : pseudo-labeled target samples
InputX
F1
F2
Ft
ŷ : Pseudo-label for target sample
y : Label for source sample
F
S+Tl
F1 ,F2 : Labeling networks
Ft : Target specific network
F : Shared network
Proposed Architecture
![Page 6: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/6.jpg)
p1
p2
pt
S+Tl
Tl
S : source samplesTl : pseudo-labeled target samples
InputX
F1
F2
Ft
ŷ : Pseudo-label for target sample
y : Label for source sample
F
S+Tl
F is updated using
gradients from F1,F2,Ft
Proposed Architecture
![Page 7: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/7.jpg)
p1
p2
pt
S
S
S : source samplesTl : pseudo-labeled target samples
Input
X
F1
F2
Ft
ŷ : Pseudo-label for target sample
y : Label for source sample
F
S
All networks are trained
using only source samples.
1. Initial training
![Page 8: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/8.jpg)
p1
p2
TInput
X
F1
F2
F
T
If F1 and F2 agree on their predictions, and either of their
probability is larger than threshold value, corresponding
labels are given to the target sample.
T : Target samples
2. Labeling target samples
![Page 9: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/9.jpg)
F1, F2 : source and pseudo-labeled
samples
Ft: pseudo-labeled ones
F : learn from all gradients
p1
p2
pt
S+Tl
Tl
S : source samplesTl : pseudo-labeled target samples
Input
X
F1
F2
Ft
ŷ : Pseudo-label for target sample
y : Label for source sample
F
S+Tl
3. Retraining network using pseudo-labeled target samples
![Page 10: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/10.jpg)
p1
p2
pt
S+Tl
Tl
S : source samplesTl : pseudo-labeled target samples
Input
X
F1
F2
Ft
ŷ : Pseudo-label for target sample
y : Label for source sample
F
S+Tl
Repeat the 2nd step and 3rd step
until convergence!
3. Retraining network using pseudo-labeled target samples
![Page 11: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/11.jpg)
Overall objective
Overall Objective l1 |W T
1W 2 |+L1 +L2 +L3
W1
W2
p1
p2
pt
S+Tl
F1
F2
Ft
F
S+Tl
Tl
L1
L2
L3
CrossEntropy
To force F1 and F2 to learn from different features.
![Page 12: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/12.jpg)
Experiments
• Four adaptation scenarios between digits datasets
– MNIST, SVHN, SYN DIGIT (synthesized digits)
• One adaptation scenario between traffic signs datasets
– GTSRB (real traffic signs), SYN SIGN (synthesized signs)
• Other experiments are omitted due to the time limit…
– Adaptation on Amazon Reviews
GTSRB SYN SIGNS
SYN DIGITSSVHN
MNISTMNIST-M
![Page 13: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/13.jpg)
Accuracy on Target Domain
• Our method outperformed other methods.
– The effect of BN is obvious in some settings.
– The effect of weight constraint is not obvious.Source MNIST MNIST SVHN SYNDIG SYN NUM
Method Target MN-M SVHN MNIST SVHN GTSRB
Source Only (w/o BN) 59.1 37.2 68.1 84.1 79.2
Source Only (with BN) 57.1 34.9 70.1 85.5 75.7
DANN [Ganin et al., 2014] 81.5 35.7 71.1 90.3 88.7
MMD [Long et al., 2015 ICML] 76.9 - 71.1 88.0 91.1
DSN [Bousmalis et al, 2016 NIPS] 83.2 - 82.7 91.2 93.1
K-NN Labeling [Sener et al., 2016 NIPS] 86.7 40.3 78.8 - -
Ours (w/o BN) 85.3 39.8 79.8 93.1 96.2
Ours (w/o Weight constraint) 94.2 49.7 86.0 92.4 94.0
Ours 94.0 52.8 86.8 92.9 96.2
![Page 14: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/14.jpg)
Summary and Future Work
• Summary
– Problem presentation for domain adaptation
– Proposal of Asymmetric tri-training
– Effectiveness is shown in experiments
• Future work
– Evaluate our method on fine-tuning of pre-trained model
For more details, please refer to…
Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada. Asymmetric Tri-training for Unsupervised Domain Adaptation. International Conference on Machine Learning (ICML), 2017.ICML
![Page 15: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/15.jpg)
Supplemental materials
![Page 16: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/16.jpg)
Relationship with Tri-training
• Tri-training [Zhou et al., 2005]
– Use three classifiers equally
• Use two classifiers to give labels to unlabeled samples
• Train one classifiers by the labeled samples
• Repeat in all combination of classifiers
• Our proposed method
– Use three classifiers asymmetrically
• Use fixed two classifiers to give labels
• Train a fixed one classifier using the pseudo-labeled samples
![Page 17: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/17.jpg)
Accuracy during training
Blue: (correctly labeled samples)/(labeled samples))
Initially, the accuracy is high and gradually decreases.
Red: Accuracy of learned network. It gradually increases.
Green: The number of labeled samples.
![Page 18: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/18.jpg)
A-distance between domains
• A-distance
– Calculated by domain classifier’s error
• Proposed method does not make the divergence small.
– Minimizing the divergence is not a only way to achieve a good
adaptation !!
![Page 19: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/19.jpg)
Analysis by gradient stopping
p1
p2
pt
S+Tl
Tl
F2
Ft
F
S+Tl
F1
![Page 20: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/20.jpg)
Analysis by gradient stopping
p1
p2
pt
S+Tl
Tl
F2
Ft
F
S+Tl
F1
![Page 21: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/21.jpg)
Analysis by gradient stopping
p1
p2
pt
S+Tl
Tl
F2
Ft
F
S+Tl
F1
![Page 22: Asymmetric Tri-training for Unsupervised Domain Adaptation](https://reader034.vdocuments.site/reader034/viewer/2022050614/5a6478787f8b9a2c568b45c3/html5/thumbnails/22.jpg)
Analysis by gradient stopping
p1
p2
pt
S+Tl
Tl
F2
Ft
F
S+Tl
F1