towards social user profiling: unified and discriminative influence model for inferring home...
Post on 19-Jan-2016
216 Views
Preview:
TRANSCRIPT
Towards Social User Profiling: Unified and Discriminative Influence Model
for Inferring Home Locations
Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-chuan Chang
University of Illinois at Urbana and Champaign
User profiling infers users’ essential attributes and is important for many services.
and many others.
Personalized Search
Targeted Advertisement
Search Engines
Advertisers
Richard
User
Job: StudentLocation:
Champaign
This paper aims to profile Twitter users’ home locations from both Tweets and Following Network
User Centric Data (Tweets)
Social Network Data (Following network)
Jessie
Rob Lady Gaga
Cindy
Richard
TechChruch
Input
Profiling a User’s Home LocationLocation: Champaign
OutputA user’s home location is defined as the place most his activities happen. It is different from a real-time geo position (e.g., Starbucks at green street)
In Context of Twitter Network
The problem is difficult due to scarce signal challenge
Only 6% messages contains location related terms!
JessieChampaign
Rob Lady GagaNew York
Cindy
Richard
TechChruchUnknown
Only 16% users have locations on their profiles!
Unknown
Following Network
San Francisco
Tweets
The problem is difficult due to noisy signal challenge
Tweets
JessieChampaign
Rob Lady GagaNew York
Cindy
Richard
TechChruchUnknown
Unknown
San Francisco
Following Network
A user tweets about locations different from his home location.
User follows friends who live different locations from his home location.
Scarce Signal Challenge
We propose a unified and discriminative probabilistic framework.
Noisy Signal Challenge
Unify two types of resources as a twitter graph
Model the likelihood of an edge between two nodes via a discriminative Influence model
Profile locations via maximizing the likelihood of observing the graph.
We unify two types of resources as a Directed Heterogeneous Graph
We unify two types of resources as nodes on a heterogeneous graph
We model it as a directed graph.
We associate locations to the nodes.
We aim to infer the locations of unlabeled nodes with locations of labeled nodes.
Head Node
Tail Node
New York
?Champaign
Beijing
San Francisco
?
?
Champaignv2
v1
u2
U6
u1
u3
u4
u5
Unlabeled Node
labeled Node
We observe two key characteristics for the probability of an edge between two nodes
Observation 1 The probability decreases as their distance increases
Observation 2 At the same distance, different head (Chicago, Champaign) nodes have different probabilities to attract tail nodes.
30
35
40
45
70
80
90
100
110
0
50
100
150
200
250
300
350
400
450
500
latitude
Spread of Word "Champaign"
longitude
coun
t
How likely a tail node nj at L(nj) builds an edge e<ni, nj> a head node ni at L(ni)
Conceptual level Discriminative Influence Model θni Influence probabilities decrease from the
center. Different nodes have different influence scope.
We propose a discriminative influence model to capture the two key characteristics
Mathematical Level Gaussian Model
2
in
2
ujiu
2
ujiu
2
i
i
2π
)y(y)x(x
ninij e
2π
1))L(n,θ|n,nP(e
A local profiling algorithm profiles the location of a user via the edges from and to his labeled neighbors.
simple but efficient closed-from solution.
?
Champaignv2
v1
u2
u1
u4
u5
Champaign
New York
Beijing
San Francisco
Influence Scope
Average Distance of a User’ s Followers
User LocationWeighted Average of Different Resources
A global algorithm profiles all the users’ locations together via all the edges in the graph.
complex but accurate iterative algorithm.
New York
?Champaign
Beijing
San Francisco
?
?
Champaignv2
v1
u2
U6
u1
u3
u4
u5
The local algorithm only uses limited information.
Our global algorithm aims to use all information.
We incorporate additional knowledge as constraints for maximizing the likelihood function.
Additional Knowledge: e.g., users only live in cities or towns
Constraint Optimization: we maximize the likelihood in each method under constraints.
Data Set: We crawled a subset of Twitter. We used the users having locations on
profiles. There are 139K users, 50 million tweets and
2 million following relationships. Methods:
User-based Location Profiling Content-based Location Profiling
We compare our method with the-state-of-arts methods on a large Twitter corpus.
Our algorithms are better than the baseline methods as we model edges discriminatively.
Our algorithms can take advantages of modeling two different types of resources
The global profiling algorithm can further improve the local profiling algorithm.
We explore both social network and user-centric data for profiling users locations in a unified approach.
We introduce a discriminative influence model.
We develop two effective profiling methods and extend the methods via modeling
constraints. The framework could be further extended to
profiling other attributes.
Conclusion and Future work
Questions?
top related