distant domain transfer learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]rajat...

25
Research center for Ubiquitous Computing System, Institute of Computing Technology, Chinese Academy of Sciences Distant Domain Transfer Learning a Jindong Wang [email protected] May 2017 a This slide is based on AAAI-17 paper: Ben Tan, Yu Zhao, Sinno Jialin Pan and Qiang Yang: Distant domain transfer learning.

Upload: others

Post on 01-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

Research center for Ubiquitous Computing System,Institute of Computing Technology, Chinese Academy of Sciences

Distant Domain Transfer Learninga

Jindong [email protected]

May 2017

aThis slide is based on AAAI-17 paper: Ben Tan, Yu Zhao, Sinno Jialin Pan andQiang Yang: Distant domain transfer learning.

Page 2: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

1

Content

Background of Transfer learning

IntroductionAuthorDDTL: Distant Domain Transfer Learning

Selective Learning Algorithm for DDTLInstance Selection via Reconstruction ErrorIncorporation of Side InformationLearning Algorithm

Experiments and Analysis

Related Work

Conclusion

Jindong Wang | Distant Domain Transfer Learning

Page 3: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

2

BackgroundWhat is transfer learning?

ProblemsI Building every model from scratch is time-consuming and

expensive.I But there are many existing knowledge. Can we reuse them?

Jindong Wang | Distant Domain Transfer Learning

Page 4: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

3

BackgroundBrief

Common DefinitionI Wikipedia: research problem in machine learning that focuses on

storing knowledge gained while solving one problem andapplying it to a different but related problem [wik].

TL vs Traditional ML

Traditional ML:

I Training and testing samplesmust be in the same featuredistributions.

I Training samples must beenough.

Transfer learning:

I Source and targetdomains do not need tobe in the samedistributions.

I Less training samples,even none.

Jindong Wang | Distant Domain Transfer Learning

Page 5: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

4

BackgroundApplications

Example: sentiment classificationDVD→ Electronics: Only got sentiment on DVD, how to transfer it toelectronics?

ProceedingsI Data mining: ACM SIGKDD, IEEE ICDM, PKDDI ML & AI: ICML, NIPS, AAAI, IJCAI, ECMLI Applications: ACM TIST, ACM SIGIR, WWW, ACL, IEEE TKDE

Many apps include image classification, natural language processing,activity recognition, and Wifi localization.

Jindong Wang | Distant Domain Transfer Learning

Page 6: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

5

IntroductionAuthor Information

There are 4 authors of this paper:

Tan BenI Ph.D candidate at HKUST

Yu ZhangI Research associate at HKUST

Sinno Jialin PanI Assistant professor at NTUI Google scholar citations: 4,000+I http://www.ntu.edu.sg/home/sinnopan/

Qiang YangI Head of CSE HKUSTI Fellow of AAAI/IEEE/AAAS/IAPRI Google scholar citations: 30,000+I http://www.cs.ust.hk/~qyang/

Jindong Wang | Distant Domain Transfer Learning

Page 7: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

6

IntroductionDistant domain transfer learning

Traditional TL: the source and target domain are close [PY10]DDTL: the source and target domain can be totally different!

I Task 1: Cat→ Tiger, good performance for traditional TL.I Task 2: Face→ Airplane, bad performance for traditional TL.

How to conduct transfer learning in such scenario when source andtarget domain are totally different?

Jindong Wang | Distant Domain Transfer Learning

Page 8: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

6

IntroductionDistant domain transfer learning

Traditional TL: the source and target domain are close [PY10]DDTL: the source and target domain can be totally different!

I Task 1: Cat→ Tiger, good performance for traditional TL.I Task 2: Face→ Airplane, bad performance for traditional TL.

How to conduct transfer learning in such scenario when source andtarget domain are totally different?

Jindong Wang | Distant Domain Transfer Learning

Page 9: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

7

IntroductionProblem definition

Distant domain transfer learning (DDTL): exploit the unlabeleddata in the intermediate domains to build a bridge between sourceand target domain.

Input:I Labeled source domain S = {(x1

S , y1S), · · · , (xnS

S , ynS

S )}I Unlabeled target domain T = {(x1

T , y1T ), · · · , (xnT

T , ynT

T )}I Mixture of unlabeled intermediate domains:I = {(x1

I), · · · , (xnI

I )}Output:

I labels of target domainConstraints:

I pT (x) 6= pS(x), pT (x) 6= pI(x) and pT (y|x) 6= pS(y|x)

I similarity between S and T is very small

Jindong Wang | Distant Domain Transfer Learning

Page 10: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

8

SLA algorithmAutoencoder

Selective Learning Algorithm (SLA) is proposed to solve the DDTLproblem, which is based on autoencoder.

AutoencoderAn unsupervised feed-forward neural network with an input layer,hidden layer and output layer.

I Encoding: h = fe(x)

I Decoding: x̂ = fd(h)

I Objective: min∑n

i=1 ||x̂i − xi||22

To capture spatial information, a convolutional autoencoder isdesired.

Jindong Wang | Distant Domain Transfer Learning

Page 11: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

9

SLA algorithmInstance selection via reconstruction error

Motivation: if data from source / intermediate domain is similar anduseful to the target domain, then one should be able to find a pair ofencoding and decoding functions that have small reconstruction error.

Objective: learn a pair of encoding and decoding functions byminimizing reconstruction errors on source, intermediate and targetdomain simultaneously.

J1(fe, fd, vS , vT ) =1

nSviS ||x̂iS − xiS ||22 +

1

nIviI ||x̂iI − xiI ||22+

1

nT||x̂iT − xiT ||22 +R(vS , vT )

(1)

vS and vT are selection indicators. R(·, ·) is the regularization term.

R(vS , vT ) = −λSnS

nS∑i=1

viS −λTnT

nT∑i=1

viT (2)

Jindong Wang | Distant Domain Transfer Learning

Page 12: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

9

SLA algorithmInstance selection via reconstruction error

Motivation: if data from source / intermediate domain is similar anduseful to the target domain, then one should be able to find a pair ofencoding and decoding functions that have small reconstruction error.

Objective: learn a pair of encoding and decoding functions byminimizing reconstruction errors on source, intermediate and targetdomain simultaneously.

J1(fe, fd, vS , vT ) =1

nSviS ||x̂iS − xiS ||22 +

1

nIviI ||x̂iI − xiI ||22+

1

nT||x̂iT − xiT ||22 +R(vS , vT )

(1)

vS and vT are selection indicators. R(·, ·) is the regularization term.

R(vS , vT ) = −λSnS

nS∑i=1

viS −λTnT

nT∑i=1

viT (2)

Jindong Wang | Distant Domain Transfer Learning

Page 13: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

10

SLA algorithmIncorporation of Side Information

Learning in Eq. (1) is in unsupervised manner, consider to add someside information:

J2(fc, fe, fd) =1

nS

nS∑i=1

viSL(yiS , fc(hiS)) +

1

nT

nT∑i=1

viTL(yiT , fc(hiT ))

+1

nI

nI∑i=1

viIg(fc(hiI))

(3)fc(·) is a classification function, g(·) is the entropy function:g(z) = −z ln z − (1− z) ln(1− z) for 0 ≤ z ≤ 1.Overall objective function:

minΘ,vJ = J1 + J2

s.t. viS , viT ∈ {0, 1}

(4)

Where Θ denotes all parameters (fc(·), fd(·), fe(·)) and v = {vS , vT }.Jindong Wang | Distant Domain Transfer Learning

Page 14: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

11

Learning AlgorithmLearning strategy

Technique: Block Coordinate Decent (BCD), where in eachiteration, variables in each block are optimized sequentially whilekeeping other variables fixed.

I fix v, update Θ using back propagation;I fix Θ, obtain v as follows:

viS =

{1 if L(yis, fc(fe(x

iS))) + ||x̂iS − xiS ||22 < λS

0 otherwise(5)

viT =

{1 if ||x̂iI − xiI ||22 + g(fc(fe(x

iI))) < λI

0 otherwise(6)

Based on the above equations, only samples withlow reconstruction error and high prediction confidence will beselected and used.

Jindong Wang | Distant Domain Transfer Learning

Page 15: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

12

Learning Algorithm

The learning algorithm is as follows:

Jindong Wang | Distant Domain Transfer Learning

Page 16: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

13

Learning AlgorithmDeep architecture

Add convolution layers to the network, it can be viewed as ageneralized autoencoder or convolutional autoencoder with sideinformation.SAN: supervised autoencoder, only autoencoderSCAN: supervised convolutional autoencoder, using convolution

Jindong Wang | Distant Domain Transfer Learning

Page 17: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

14

ExperimentOverview

2 datasetsI Caltech-256: 30,607 images from 256 classesI Animals with Attributes (AwA): 30,475 images with 50 classes

3 categories of baseline methodsI Supervised learning: SVM and CNNI Transfer learning: ASVM, GFK, LAN, DTL and TTLI Self-taught learning method

3 experimentsI Source and target domain are distantI Visualize some intermediate domain dataI Evaluate the learning order of SLA

Jindong Wang | Distant Domain Transfer Learning

Page 18: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

15

ExperimentPerformance comparison

Average accuracies of different algorithms on Caltech-256 and AwA:

Conclusion: For distant domains, SLA algorithm outperforms othermethods.

Jindong Wang | Distant Domain Transfer Learning

Page 19: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

16

ExperimentAlgorithm visualization

Visualization of the selected intermediate data over iterations offace-to-airplane and face-to-watch:

Conclusion:I At the beginning, the intermediate domains are similar to source

domain. In the end, it’s more similar to the target domain.I The number of positive examples in source domain decreases,

and the value of objective function also decreases

Jindong Wang | Distant Domain Transfer Learning

Page 20: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

17

ExperimentLearning orders

Comparison results with different learning orders on the Caltech-256and AwA datasets (Orders of intermediate domains changed):

Conclusion: Three types of different orders obtain worse results thanSLA, and Category is close to SLA because this strategy is close toSLA.

Jindong Wang | Distant Domain Transfer Learning

Page 21: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

18

Related Work

DDTL is a novel difficult problem with many state-of-the-art methodsnot applying to it.

I Typical transfer learning approaches like instance reweighting[DYXY07] and feature mapping [PTKY11] do not apply to thisproblem, as they assume the domains are close.

I Transitive transfer learning [TSZY15]: manually select oneintermediate domain as the bridge; ours automatically selectmany domains.

I TLMS [MMR09]: all the source domains in TLMS are labeledand closely related to the target domain.

I Self-taught learning [RBL+07]: use all domain data to learn;ours use intermediate domains.

I Semi-supervised autoencoder [WRMC12]: useslabeled and unlabeled data for learning; ours use intermediatedomains and we use convolutional layers.

Jindong Wang | Distant Domain Transfer Learning

Page 22: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

19

Conclusion

Contributions of this paperI First work to study DDTL problem using mixture intermediate

domainsI Propose SLA algorithm for DDTL problemI Extensive experiments on real-world datasets show the

effectiveness of SLA

What we should learn from this paperI Good layout, consider it as a template for algorithm-paperI Introduction, tables and figures are goodI Experiment: more datasets, more analysis

Jindong Wang | Distant Domain Transfer Learning

Page 23: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

20

References I

[DYXY07] Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu.Boosting for transfer learning.In Proceedings of the 24th international conference on Machine learning, pages 193–200. ACM, 2007.

[MMR09] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh.Domain adaptation with multiple sources.In Advances in neural information processing systems, pages 1041–1048, 2009.

[PTKY11] Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang.Domain adaptation via transfer component analysis.IEEE Transactions on Neural Networks, 22(2):199–210, 2011.

[PY10] Sinno Jialin Pan and Qiang Yang.A survey on transfer learning.Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345–1359, 2010.

[RBL+07] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng.Self-taught learning: transfer learning from unlabeled data.In Proceedings of the 24th international conference on Machine learning, pages 759–766. ACM, 2007.

[TSZY15] Ben Tan, Yangqiu Song, Erheng Zhong, and Qiang Yang.Transitive transfer learning.In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1155–1164. ACM, 2015.

[wik] https://en.wikipedia.org/wiki/Inductive_transfer.

[WRMC12] Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert.Deep learning via semi-supervised embedding.In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012.

Jindong Wang | Distant Domain Transfer Learning

Page 24: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

Q & A

Page 25: Distant Domain Transfer Learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught

Thank you for your listening