distant domain transfer learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]rajat...

Research center for Ubiquitous Computing System,Institute of Computing Technology, Chinese Academy of Sciences

Distant Domain Transfer Learninga

Jindong [email protected]

May 2017

aThis slide is based on AAAI-17 paper: Ben Tan, Yu Zhao, Sinno Jialin Pan andQiang Yang: Distant domain transfer learning.

1

Content

Background of Transfer learning

IntroductionAuthorDDTL: Distant Domain Transfer Learning

Selective Learning Algorithm for DDTLInstance Selection via Reconstruction ErrorIncorporation of Side InformationLearning Algorithm

Experiments and Analysis

Related Work

Conclusion

Jindong Wang | Distant Domain Transfer Learning

2

BackgroundWhat is transfer learning?

ProblemsI Building every model from scratch is time-consuming and

expensive.I But there are many existing knowledge. Can we reuse them?


3

BackgroundBrief

Common DefinitionI Wikipedia: research problem in machine learning that focuses on

storing knowledge gained while solving one problem andapplying it to a different but related problem [wik].

TL vs Traditional ML

Traditional ML:

I Training and testing samplesmust be in the same featuredistributions.

I Training samples must beenough.

Transfer learning:

I Source and targetdomains do not need tobe in the samedistributions.

I Less training samples,even none.


4

BackgroundApplications

Example: sentiment classificationDVD→ Electronics: Only got sentiment on DVD, how to transfer it toelectronics?

ProceedingsI Data mining: ACM SIGKDD, IEEE ICDM, PKDDI ML & AI: ICML, NIPS, AAAI, IJCAI, ECMLI Applications: ACM TIST, ACM SIGIR, WWW, ACL, IEEE TKDE

Many apps include image classification, natural language processing,activity recognition, and Wifi localization.


5

IntroductionAuthor Information

There are 4 authors of this paper:

Tan BenI Ph.D candidate at HKUST

Yu ZhangI Research associate at HKUST

Sinno Jialin PanI Assistant professor at NTUI Google scholar citations: 4,000+I http://www.ntu.edu.sg/home/sinnopan/

Qiang YangI Head of CSE HKUSTI Fellow of AAAI/IEEE/AAAS/IAPRI Google scholar citations: 30,000+I http://www.cs.ust.hk/~qyang/


http://www.ntu.edu.sg/home/sinnopan/

http://www.cs.ust.hk/~qyang/

6

IntroductionDistant domain transfer learning

Traditional TL: the source and target domain are close [PY10]DDTL: the source and target domain can be totally different!

I Task 1: Cat→ Tiger, good performance for traditional TL.I Task 2: Face→ Airplane, bad performance for traditional TL.

How to conduct transfer learning in such scenario when source andtarget domain are totally different?


7

IntroductionProblem definition

Distant domain transfer learning (DDTL): exploit the unlabeleddata in the intermediate domains to build a bridge between sourceand target domain.

Input:I Labeled source domain S = {(x1

S , y1S), · · · , (xnS

S , ynS

S )}I Unlabeled target domain T = {(x1

T , y1T ), · · · , (xnT

T , ynT

T )}I Mixture of unlabeled intermediate domains:I = {(x1

I), · · · , (xnI

I )}Output:

I labels of target domainConstraints:

I pT (x) 6= pS(x), pT (x) 6= pI(x) and pT (y|x) 6= pS(y|x)

I similarity between S and T is very small


8

SLA algorithmAutoencoder

Selective Learning Algorithm (SLA) is proposed to solve the DDTLproblem, which is based on autoencoder.

AutoencoderAn unsupervised feed-forward neural network with an input layer,hidden layer and output layer.

I Encoding: h = fe(x)

I Decoding: x̂ = fd(h)

I Objective: min∑n

i=1 ||x̂i − xi||22

To capture spatial information, a convolutional autoencoder isdesired.


9

SLA algorithmInstance selection via reconstruction error

Motivation: if data from source / intermediate domain is similar anduseful to the target domain, then one should be able to find a pair ofencoding and decoding functions that have small reconstruction error.

Objective: learn a pair of encoding and decoding functions byminimizing reconstruction errors on source, intermediate and targetdomain simultaneously.

J1(fe, fd, vS , vT ) =1

nSviS ||x̂iS − xiS ||22 +

1

nIviI ||x̂iI − xiI ||22+

1

nT||x̂iT − xiT ||22 +R(vS , vT )

(1)

vS and vT are selection indicators. R(·, ·) is the regularization term.

R(vS , vT ) = −λSnS

nS∑i=1

viS −λTnT

nT∑i=1

viT (2)


10

SLA algorithmIncorporation of Side Information

Learning in Eq. (1) is in unsupervised manner, consider to add someside information:

J2(fc, fe, fd) =1

nS

nS∑i=1

viSL(yiS , fc(hiS)) +

1

nT

nT∑i=1

viTL(yiT , fc(hiT ))

+1

nI

nI∑i=1

viIg(fc(hiI))

(3)fc(·) is a classification function, g(·) is the entropy function:g(z) = −z ln z − (1− z) ln(1− z) for 0 ≤ z ≤ 1.Overall objective function:

minΘ,vJ = J1 + J2

s.t. viS , viT ∈ {0, 1}

(4)

Where Θ denotes all parameters (fc(·), fd(·), fe(·)) and v = {vS , vT }.Jindong Wang | Distant Domain Transfer Learning

11

Learning AlgorithmLearning strategy

Technique: Block Coordinate Decent (BCD), where in eachiteration, variables in each block are optimized sequentially whilekeeping other variables fixed.

I fix v, update Θ using back propagation;I fix Θ, obtain v as follows:

viS =

{1 if L(yis, fc(fe(x

iS))) + ||x̂iS − xiS ||22 < λS

0 otherwise(5)

viT =

{1 if ||x̂iI − xiI ||22 + g(fc(fe(x

iI))) < λI

0 otherwise(6)

Based on the above equations, only samples withlow reconstruction error and high prediction confidence will beselected and used.


12

Learning Algorithm

The learning algorithm is as follows:


13

Learning AlgorithmDeep architecture

Add convolution layers to the network, it can be viewed as ageneralized autoencoder or convolutional autoencoder with sideinformation.SAN: supervised autoencoder, only autoencoderSCAN: supervised convolutional autoencoder, using convolution


14

ExperimentOverview

2 datasetsI Caltech-256: 30,607 images from 256 classesI Animals with Attributes (AwA): 30,475 images with 50 classes

3 categories of baseline methodsI Supervised learning: SVM and CNNI Transfer learning: ASVM, GFK, LAN, DTL and TTLI Self-taught learning method

3 experimentsI Source and target domain are distantI Visualize some intermediate domain dataI Evaluate the learning order of SLA


15

ExperimentPerformance comparison

Average accuracies of different algorithms on Caltech-256 and AwA:

Conclusion: For distant domains, SLA algorithm outperforms othermethods.


16

ExperimentAlgorithm visualization

Visualization of the selected intermediate data over iterations offace-to-airplane and face-to-watch:

Conclusion:I At the beginning, the intermediate domains are similar to source

domain. In the end, it’s more similar to the target domain.I The number of positive examples in source domain decreases,

and the value of objective function also decreases


17

ExperimentLearning orders

Comparison results with different learning orders on the Caltech-256and AwA datasets (Orders of intermediate domains changed):

Conclusion: Three types of different orders obtain worse results thanSLA, and Category is close to SLA because this strategy is close toSLA.


18

Related Work

DDTL is a novel difficult problem with many state-of-the-art methodsnot applying to it.

I Typical transfer learning approaches like instance reweighting[DYXY07] and feature mapping [PTKY11] do not apply to thisproblem, as they assume the domains are close.

I Transitive transfer learning [TSZY15]: manually select oneintermediate domain as the bridge; ours automatically selectmany domains.

I TLMS [MMR09]: all the source domains in TLMS are labeledand closely related to the target domain.

I Self-taught learning [RBL+07]: use all domain data to learn;ours use intermediate domains.

I Semi-supervised autoencoder [WRMC12]: useslabeled and unlabeled data for learning; ours use intermediatedomains and we use convolutional layers.


19

Conclusion

Contributions of this paperI First work to study DDTL problem using mixture intermediate

domainsI Propose SLA algorithm for DDTL problemI Extensive experiments on real-world datasets show the

effectiveness of SLA

What we should learn from this paperI Good layout, consider it as a template for algorithm-paperI Introduction, tables and figures are goodI Experiment: more datasets, more analysis


20

References I

[DYXY07] Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu.Boosting for transfer learning.In Proceedings of the 24th international conference on Machine learning, pages 193–200. ACM, 2007.

[MMR09] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh.Domain adaptation with multiple sources.In Advances in neural information processing systems, pages 1041–1048, 2009.

[PTKY11] Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang.Domain adaptation via transfer component analysis.IEEE Transactions on Neural Networks, 22(2):199–210, 2011.

[PY10] Sinno Jialin Pan and Qiang Yang.A survey on transfer learning.Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345–1359, 2010.

[RBL+07] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng.Self-taught learning: transfer learning from unlabeled data.In Proceedings of the 24th international conference on Machine learning, pages 759–766. ACM, 2007.

[TSZY15] Ben Tan, Yangqiu Song, Erheng Zhong, and Qiang Yang.Transitive transfer learning.In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1155–1164. ACM, 2015.

[wik] https://en.wikipedia.org/wiki/Inductive_transfer.

[WRMC12] Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert.Deep learning via semi-supervised embedding.In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012.


Thank you for your listening

distant domain transfer learningajd92.wang/assets/files/l09_ddtl.pdf · 2020. 7. 15. · 07]rajat...

Documents