feature selection with linked data in social mediacse.msu.edu/~tangjili/publication/sdm12.pdf ·...
TRANSCRIPT
![Page 1: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/1.jpg)
Data Mining and Machine Learning Lab
Feature Selection with Linked Data
in Social Media
Jiliang Tang and Huan Liu
Computer Science and Engineering
Arizona State University
April 26-28, 2012 SDM2012
![Page 2: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/2.jpg)
Social Media
• Explosion of social media generates massive
data in an unprecedented rate
- 200 million Tweets per day
- 3,000 photos in Flickr per minute
-153 million blogs posted per year
![Page 3: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/3.jpg)
Social Media Data
• Massive and high-dimensional social media data
poses challenges to data mining tasks
- Scalability
- Curse of dimensionality
• Feature selection is an effective way to prepare
large-scale, high-dimensional data for effective
data mining
![Page 4: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/4.jpg)
Feature Selection
• Traditional feature selection algorithms
work with “flat" data (attribute-value data)
- Independent and Identically Distributed (i.i.d.)
• Social media data differs from attribute-
value data
- Inherently linked
![Page 5: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/5.jpg)
An Example of Social Media Data
𝑢1
𝑢2
𝑢3
𝑢4
𝑝1 𝑝2
𝑝3 𝑝5
𝑝6
𝑝4
𝑝7
𝑝8
Users
![Page 6: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/6.jpg)
An Example of Social Media Data
𝑢1
𝑢2
𝑢3
𝑢4
𝑝1 𝑝2
𝑝3 𝑝5
𝑝6
𝑝4
𝑝7
𝑝8
Posts
![Page 7: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/7.jpg)
An Example of Social Media Data
𝑢1
𝑢2
𝑢3
𝑢4
𝑝1 𝑝2
𝑝3 𝑝5
𝑝6
𝑝4
𝑝7
𝑝8
User-post
relations
![Page 8: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/8.jpg)
An Example of Social Media Data
𝑢1
𝑢2
𝑢3
𝑢4
𝑝1 𝑝2
𝑝3 𝑝5
𝑝6
𝑝4
𝑝7
𝑝8
User-user
following
![Page 9: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/9.jpg)
Representation for Attribute Value Data
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….
Posts
![Page 10: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/10.jpg)
Representation for Attribute Value Data
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 …. Features
![Page 11: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/11.jpg)
Representation for Attribute Value Data
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….
Labels
![Page 12: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/12.jpg)
Representation for Social Media Data
User-post relations
1
1 1 1
1
1 1
𝑢1
𝑢2
𝑢3
𝑢4
𝑢1 𝑢2 𝑢3 𝑢4
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….
![Page 13: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/13.jpg)
Representation for Social Media Data
1
1 1 1
1
1 1
𝑢1
𝑢2
𝑢3
𝑢4
𝑢1 𝑢2 𝑢3 𝑢4
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….
User-user relations
![Page 14: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/14.jpg)
Representation for Social Media Data
1
1 1 1
1
1 1
𝑢1
𝑢2
𝑢3
𝑢4
𝑢1 𝑢2 𝑢3 𝑢4
𝑝1
𝑝2
𝑝3
𝑝5
𝑝6
𝑝4
𝑝7 𝑝8
𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….
Social
Context
![Page 15: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/15.jpg)
Problem Statement
• Given labeled data X and its label indicator matrix Y, the
whole dataset F, its social context including user-user
following relationships S and user-post relationships P, we
aim to select K most relevant features from m features on
the dataset F with its social context S and P.
![Page 16: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/16.jpg)
Two Fundamental Problems
• Relation extraction
- What are distinctive relations that can be
extracted from linked data
• Mathematical representation
- How to use these relations in feature selection
formulation
![Page 17: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/17.jpg)
𝑢1
𝑢2
𝑢3
𝑢4
𝑝1 𝑝2
𝑝3 𝑝5
𝑝6
𝑝4
𝑝7
𝑝8
Relation Extraction
![Page 18: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/18.jpg)
coPost
• A user can have
multiple posts
![Page 19: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/19.jpg)
coFollowing
𝑢1 𝑢3
𝑝1 𝑝2
𝑝6
𝑝7
𝑢4 𝑝8 • Two users
follow a
third user
![Page 20: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/20.jpg)
coFollowed
𝑢1
𝑢2 𝑝1 𝑝2
𝑝3 𝑝5 𝑝4
𝑢4 𝑝8 • Two users
are followed
by a third
user
![Page 21: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/21.jpg)
Following
𝑢1
𝑢2 𝑝1 𝑝2
𝑝5 𝑝4
• A user follows
another user
𝑝3
![Page 22: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/22.jpg)
Post-Post relations
• What do these relations suggest for posts?
![Page 23: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/23.jpg)
Social Correlation Theories
• Homophily
- People with similar interests are more likely to be
linked
• Social influence
- People that are linked are more likely to have
similar interests
![Page 24: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/24.jpg)
CoPost Hypothesis
• CoPost Hypothesis
- Posts by the same user are more likely to be of
similar topics
𝑢2
𝑝5 𝑝4
𝑝3
![Page 25: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/25.jpg)
CoFollowing Hypothesis
• CoFollowing
Hypothesis
- If two users follow
the same user, their
posts are likely of
similar topics.
𝑢1 𝑢3
𝑝1 𝑝2
𝑝6
𝑝7
𝑢4 𝑝8
![Page 26: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/26.jpg)
CoFollowed Hypothesis
• CoFollowed
Hypothesis
- If two users are followed
by the same user, their
posts are likely of similar
topics
𝑢1
𝑢2 𝑝1 𝑝2
𝑝5 𝑝4
𝑢4 𝑝8
𝑝3
![Page 27: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/27.jpg)
Following Hypothesis
• Following
Hypothesis
- If one user follows
another, their posts are more
likely similar in terms of
topics
𝑢1
𝑢2 𝑝1 𝑝2
𝑝3 𝑝5 𝑝4
![Page 28: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/28.jpg)
Modeling CoFollowing Relation
• Two co-following users have similar interested topics
||||
)(^
k
Ff
i
T
k
Ff
i
kF
fW
F
fT
uT kiki
)(
• Users' topic interests
u Nuu
jiF
T
uji
uTuT,
2
2
^^
1,2
2
W||)()(||||W||||YWX||min
![Page 29: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/29.jpg)
A Reformulation of CoFollowing Relation
• It is equivalent to
ji
j
pofauthortheisuifF
jiH
where
||
1),(
XYEHFFHLXXB
||W||EW)2BWTr(Wmin
TTTT
FI
T
1,2
T
W
![Page 30: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/30.jpg)
A Unique Problem for LinkedFS
• LinkedFS framework is designed to solve
the following optimization problem
1,2
T
W||W||EW)2BWTr(Wmin
![Page 31: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/31.jpg)
LinkedFS
![Page 32: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/32.jpg)
Datasets
• BlogCatalog
- Undirected following
http://dmml.asu.edu/users/xufei/datasets.html
• Digg
- Directed Following
http://www.public.asu.edu/~ylin56/kdd09sup.html
![Page 33: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/33.jpg)
Data Characteristics
![Page 34: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/34.jpg)
Experiment Setting
• Metric
- Classification accuracy
- Classifier : LibSVM
• Baseline methods
- ttest (TT)
- InformationGain (IG)
- FisherScore (FS)
- Joint 2,1-Norms(RFS)
![Page 35: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/35.jpg)
Training and Testing
• Testing (50%) and Training (50%)
• Subsample 5%, 25%, 50% from training
data to construct another three training sets
• Numbers of Selected Features
- ( 50,100,200,300)
![Page 36: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/36.jpg)
Results on Digg
![Page 37: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/37.jpg)
Results on Digg
![Page 38: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/38.jpg)
Performance Improvement
![Page 39: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/39.jpg)
Conclusions
• Investigate a new problem of feature selection for
social media data
• Provide a way to capture link information guided
by social correlation theories
• Propose an effective framework, LinkedFS, for
social media feature selection
![Page 40: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/40.jpg)
Future Work
• Sophisticated ways to exploit social context
• Lack of label information (unsupervised)
• Noise and incomplete social media data
• The strength of social ties ( strong and weak ties
mixed)
![Page 41: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/41.jpg)
Acknowledgments
This work is, in part, sponsored by National Science
Foundation via a grant (#0812551). Comments and
suggestions from DMML members and reviewers are
greatly appreciated.
![Page 42: Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf · Data Mining and Machine Learning Lab Feature Selection with Linked Data in Social](https://reader034.vdocuments.site/reader034/viewer/2022050219/5f64c8c966f53d764b426671/html5/thumbnails/42.jpg)
Questions