tutorial: mining latent entity structures
DESCRIPTION
SIGMOD 2014TRANSCRIPT
Mining Latent Entity Structures Jiawei Han and Chi Wang
Computer Science, University of I l l inois at Urbana -Champaign
June 25, 2014
1
Outline
1. Introduction to mining latent entity structures
2. Mining latent topic hierarchies
3. Mining latent entity relations
4. Mining latent entity concepts
5. Trends and research problems
2
Motivation of Mining Latent Entity Structures The prevalence of unstructured data
Structures are useful for knowledge discovery
3
Too expensive to be structured by human: Automated & scalable
Up to 85% of all
information is
unstructured
-- estimated by
industry analysts
Vast majority of the CEOs
expressed frustration over
their organization’s inability to
glean insights from available
data
-- IBM study with1500+ CEOs
Information Overload: A Critical Problem in Big Data Era
By 2020, information will double every 73 days
-- G. Starkweather (Microsoft), 1992
Unstructured or loosely structured data are prevalent
1700 1750 1800 1850 1900 1950 2000 2050
Information growth
4
Example: Research Publications Every year, hundreds of thousands papers are published
◦ Unstructured data: paper text
◦ Loosely structured entities: authors, venues
papers
venue
author
5
Example: News Articles Every day, >90,000 news articles are produced
◦ Unstructured data: news content
◦ Loosely structured entities: persons, locations, organizations, …
news
person
location organization
6
Example: Social Media Every second, >150K tweets are sent out
◦ Unstructured data: tweet content
◦ Loosely structured entities: twitters, hashtags, URLs, …
Darth Vader
The White House
#maythefourthbewithyou
tweets
hashtag
URL
7
A Semi-Structured Framework (or Model) for Unstructured and Loosely-Structured Data
news papers tweets
venue
author person
location organization
URL
hashtag
text
entity
8
Useful Latent Structures: Topics, Concepts, Relations Top 10 active politicians regarding healthcare issues?
Influential high-tech companies in Silicon Valley?
Top 10 researchers in data mining and their specializations?
Academic family on support vector machine?
Concepts (is-a)
Topics (hierarchical)
Relations text
entity
9
Mining Latent Structures: Topics, Concepts, Relations
10
Concepts
Topics
Relations
text
entity
Unstructured text & linked entities -> structured hierarchies
What Power Can We Gain if More Structures Can Be Discovered? Structured database queries
Information network analysis, …
11
Structures Facilitate Multi-Dimensional Analysis: An EventCube Experiment
12
Distribution along Multiple Dimensions Query ‘health care bill’ in news data
13
Entity Analysis and Profiling Topic distribution for “Stanford University” 14
15 AMETHYST [DANILEVSKY ET AL. 13]
Structures Facilitate Heterogeneous Information Network Analysis
Real-world data: Multiple object types and/or multiple link types
Venue Paper Author
DBLP Bibliographic Network The IMDB Movie Network
Actor
Movie
Director
Movie
Studio
The Facebook Network
16
What Can Be Mined in Structured Information Networks
Example: DBLP: A Computer Science bibliographic database
Knowledge hidden in DBLP Network Mining Functions
Who are the leading researchers on Web search? Ranking
Who are the peer researchers of Jure Leskovec? Similarity Search
Whom will Christos Faloutsos collaborate with? Relationship Prediction
Which types of relationships are most influential for an author to decide her topics? Relation Strength Learning
How was the field of Data Mining emerged or evolving? Network Evolution
Which authors are rather different from his/her peers in IR? Outlier/anomaly detection
17
Similarity Search: Find Similar Objects in Networks Guided by Meta-Paths Who are very similar to Christos Faloutsos?
Meta-Path: Meta-level description of a path between two objects
Christos’s students or close collaborators Similar reputation at similar venues
Meta-Path: Author-Paper-Author (APA) Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)
Schema of the DBLP Network Different meta-paths lead to very
different results!
18
Similarity Search: PathSim Measure Helps Find Peer Objects in Long Tails
Anhai Doan
◦ CS, Wisconsin
◦ Database area
◦ PhD: 2002
Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)
• Jignesh Patel
• CS, Wisconsin
• Database area
• PhD: 1998
• Amol Deshpande
• CS, Maryland
• Database area
• PhD: 2004
• Jun Yang
• CS, Duke
• Database area
• PhD: 2001
PathSim [Sun et al. 11]
19
PathPredict: Meta-Path Based Relationship Prediction Meta path-guided prediction of links and relationships
Insight: Meta path relationships among similar typed links share similar semantics and are comparable and inferable
Bibliographic network: Co-author prediction (A—P—A)
paper topic
venue
author
publish publish-1
mention-1
mention write
write-1
contain/contain-1 cite/cite-1
vs.
20
Meta-Path Based Co-authorship Prediction Co-authorship prediction: Whether two authors start to collaborate
Co-authorship encoded in meta-path: Author-Paper-Author
Topological features encoded in meta-paths
Meta-Path Semantic Meaning
21
The prediction power of each meta-path
Derived by logistic regression
Heterogeneous Network Helps Personalized Recommendation Users and items with limited feedback are connected by a variety of paths Different users may require different models: Relationship heterogeneity makes personalized recommendation models easier to define
Avatar Titanic Aliens Revolutionary Road
James Cameron
Kate Winslet
Leonardo Dicaprio
Zoe Saldana
Adventure Romance
Collaborative filtering methods suffer from the data sparsity issue
# of users or items
A small set of users & items have a large number of ratings
Most users and items have a small number of ratings
# o
f ra
tin
gs
Personalized recommendation with heterogeous networks [Yu et al. 14a]
22
Personalized Recommendation in Heterogeneous Networks Datasets: Methods to compare:
◦ Popularity: Recommend the most popular items to users
◦ Co-click: Conditional probabilities between items
◦ NMF: Non-negative matrix factorization on user feedback
◦ Hybrid-SVM: Use Rank-SVM to utilize both user feedback and information network
Winner: HeteRec personalized recommendation (HeteRec-p)
23
Outline
1. Introduction to mining latent entity structures
2. Mining latent topic hierarchies
3. Mining latent entity relations
4. Mining latent entity concepts
5. Trends and research problems
24
Mining Latent Topical Hierarchy
25
Topics
text
Unstructured text & linked entities -> structured hierarchies
entity
Topic Hierarchy: Summarize the Data with Multiple Granularity
Top 10 researchers in data mining? ◦ And their specializations?
Important research areas in SIGIR conference?
Computer Science
Information technology &
system
Database Information
retrieval
… …
Theory of computation
… …
…
papers
venue
author
26
Methodologies of Topic Mining
27
C. An integrated framework: CATHY
i) Recursive topic discovery ii) Phrase mining iii) Phrase and entity ranking
B. Extension of topic modeling
i) Flat -> hierarchical ii) Unigrams -> ngrams iii) Text -> text + entity
A. Traditional bag-of-words topic modeling
Methodologies of Topic Mining
28
C. An integrated framework: CATHY
i) Recursive topic discovery ii) Phrase mining iii) Phrase and entity ranking
B. Extension of topic modeling
i) Flat -> hierarchical ii) Unigrams -> ngrams iii) Text -> text + entity
A. Traditional bag-of-words topic modeling
A. Bag-of-Words Topic Modeling Widely studied technique for text analysis
◦ Summarize themes/aspects
◦ Facilitate navigation/browsing
◦ Retrieve documents
◦ Segment documents
◦ Many other text mining tasks
Represent each document as a bag of words: all the words within a document are exchangeable
Probabilistic approach
29
Topic: Multinomial Distribution over Words A document is modeled as a sample of mixed topics
How can we discover these topic word distributions from a corpus?
30
[ Criticism of government response to the
hurricane primarily consisted of criticism of its
response to the approach of the storm and its
aftermath, specifically in the delayed response ] to
the [ flooding of New Orleans. … 80% of the 1.3
million residents of the greater New Orleans
metropolitan area evacuated ] …[ Over seventy
countries pledged monetary donations or other
assistance]. …
Topic 1
Topic 3
Topic 2
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
help 0.02
...
city 0.2
new 0.1
orleans 0.05
...
EXAMPLE FROM CHENGXIANG ZHAI'S LECTURE NOTES
Routine of Generative Models Model design: assume the documents are generated by a certain process
Model Inference: Fit the model with observed documents to recover the unknown parameters Θ
31
Generative process with unknown parameters Θ
Criticism of government
response to the hurricane …
corpus
Two representative models: pLSA and LDA
Probabilistic Latent Semantic Analysis (PLSA) [Hofmann 99] 𝑘 topics: 𝑘 multinomial distributions over words
𝐷 documents: 𝐷 multinomial distributions over topics
32
Topic 𝝓𝟏 Topic 𝝓𝒌
… government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
.4 .3 .3 Doc 𝜃1
.2 .5 .3 Doc 𝜃𝐷
…
Generative process: we will generate each token in each document 𝑑 according to 𝜙, 𝜃
PLSA – Model Design 𝑘 topics: 𝑘 multinomial distributions over words
𝐷 documents: 𝐷 multinomial distributions over topics
33
Topic 𝝓𝟏 Topic 𝝓𝒌
… government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
.4 .3 .3 Doc 𝜃1
.2 .5 .3 Doc 𝜃𝐷
…
To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 (e.g. z=1) 2. Sample a word w according to 𝜙𝑧 (e.g. w=government)
.4 .3 .3
Topic 𝝓𝒛
PLSA – Model Inference
What parameters are most likely to generate the observed corpus?
34
To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 (e.g. z=1) 2. Sample a word w according to 𝜙𝑧 (e.g. w=government)
.4 .3 .3
Topic 𝝓𝒛
Criticism of government
response to the hurricane …
corpus Topic 𝝓𝟏 Topic 𝝓𝒌
…
.? .? .? Doc 𝜃1
.? .? .? Doc 𝜃𝐷
…
government ?
response ?
...
donate ?
relief ?
...
PLSA – Model Inference using Expectation-Maximization (EM)
35
E-step: Fix 𝜙, 𝜃, estimate topic labels 𝑧 for every token in every document M-step: Use estimated topic labels 𝑧 to estimate 𝜙, 𝜃 Guaranteed to converge to a stationary point, but not guaranteed optimal
Criticism of government
response to the hurricane …
corpus
Exact max likelihood is hard => approximate optimization with EM
Topic 𝝓𝟏 Topic 𝝓𝒌
…
.? .? .? Doc 𝜃1
.? .? .? Doc 𝜃𝐷
…
government ?
response ?
...
donate ?
relief ?
...
How the EM Algorithm Works
36
.4 .3 .3
Topic 𝝓𝟏 Topic 𝝓𝒌
…
Doc 𝜃1
Doc 𝜃𝐷
…
.2 .5 .3
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
response
criticism
government
hurricane
government
d1
dD Sum fractional
counts
response
M-step
…
E-step
Bayes rule
k
j wjjd
wjjd
k
jjzwpdjzp
jzwpdjzpwdjzp
1' ,'',
,,
1')'|()|'(
)|()|(),|(
Analysis of pLSA PROS
Simple, only one hyperparameter k
Easy to incorporate prior in the EM algorithm
CONS
High model complexity -> prone to overfitting
The EM solution is neither optimal nor unique
37
Latent Dirichlet Allocation (LDA) [Blei et al. 02] Impose Dirichlet prior to the model parameters -> Bayesian version of pLSA
38
Generative process: First generate 𝜙, 𝜃 with Dirichlet prior, then generate each token in each document 𝑑 according to 𝜙, 𝜃
Topic 𝝓𝟏 Topic 𝝓𝒌
… government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
.4 .3 .3 Doc 𝜃1
.2 .5 .3 Doc 𝜃𝐷
…
𝛽
𝛼
Same as pLSA
To mitigate overfitting
LDA – Model Inference MAXIMUM LIKELIHOOD
Aim to find parameters that maximize the likelihood
Exact inference is intractable
Approximate inference ◦ Variational EM [Blei et al. 03]
◦ Markov chain Monte Carlo (MCMC) – collapsed Gibbs sampler [Griffiths & Steyvers 04]
METHOD OF MOMENTS
Aim to find parameters that fit the moments (expectation of patterns)
Exact inference is tractable ◦ Tensor orthogonal decomposition
[Anandkumar et al. 12]
◦ Scalable tensor orthogonal decomposition [Wang et al. 14a]
39
MCMC – Collapsed Gibbs Sampler [Griffiths & Steyvers 04]
40
response
criticism
government
hurricane
government
d1
dD
response
…
Iter 1 Iter 2 Iter 1000 …
…
… Topic 𝝓𝟏 Topic 𝝓𝒌
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
kn
n
VN
NjzP
i
i
i
d
d
j
j
j
w
ii
)(
)(
)(
)(
),|( zwSample each zi conditioned on z-i
Estimated 𝜙𝑗,𝑤𝑖 Estimated 𝜃𝑑𝑖,𝑗
Method of Moments [Anandkumar et al. 12, Wang et al. 14a]
What parameters are most likely to generate the observed corpus? Criticism of
government response to the
hurricane …
corpus Topic 𝝓𝟏 Topic 𝝓𝒌
… government ?
response ?
...
donate ?
relief ?
...
criticism government response: 0.001
government response hurricane: 0.005
criticism response hurricane: 0.004
:
criticism: 0.03
response: 0.01
government: 0.04
:
criticism response: 0.001
criticism government: 0.002
government response: 0.003
:
Moments: expectation of patterns
What parameters fit the empirical moments?
length 1 length 2 (pair) length 3 (triple)
41
Guaranteed Topic Recovery Theorem. The patterns up to length 3 are sufficient for topic recovery
𝑀2 = 𝜆𝑗𝝓𝒋 ⊗𝝓𝒋
𝑘
𝑗=1
, 𝑀3 = 𝜆𝑗𝝓𝒋 ⊗𝝓𝒋 ⊗𝝓𝒋
𝑘
𝑗=1
42
criticism government response: 0.001
government response hurricane: 0.005
criticism response hurricane: 0.004
:
criticism: 0.03
response: 0.01
government: 0.04
:
criticism response: 0.001
criticism government: 0.002
government response: 0.003
:
length 1 length 2 (pair) length 3 (triple)
V
V: vocabulary size; k: topic number V
V
V V
Tensor Orthogonal Decomposition for LDA
43
A: 0.03 AB: 0.001 ABC: 0.001
B: 0.01 BC: 0.002 ABD: 0.005
C: 0.04 AC: 0.003 BCD: 0.004
: : :
Normalized pattern counts
𝑀2
𝑀3
V
V: vocabulary size k: topic number
V
V
V
V
k k
k
𝑇
Input corpus
Topic 𝝓𝟏
Topic 𝝓𝒌
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
[ANANDKUMAR ET AL. 12]
Tensor Orthogonal Decomposition for LDA – Not Scalable
44
A: 0.03 AB: 0.001 ABC: 0.001
B: 0.01 BC: 0.002 ABD: 0.005
C: 0.04 AC: 0.003 BCD: 0.004
: : :
Normalized pattern counts
𝑀2
𝑀3
V
V
V
V
V
k k
k
𝑇
Input corpus
Topic 𝝓𝟏
Topic 𝝓𝒌
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
Prohibitive to compute
Time: 𝑶 𝑽𝟑𝒌 + 𝑳𝒍𝟐
Space: 𝑶 𝑽𝟑
V: vocabulary size; k: topic number L: # tokens; l: average doc length
Scalable Tensor Orthogonal Decomposition
45
A: 0.03 AB: 0.001 ABC: 0.001
B: 0.01 BC: 0.002 ABD: 0.005
C: 0.04 AC: 0.003 BCD: 0.004
: : :
Normalized pattern counts
𝑀2
𝑀3
V
V
V
V
V
k k
k
𝑇
Input corpus
Topic 𝝓𝟏
Topic 𝝓𝒌
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
Sparse & low rank
Decomposable
1st scan
2nd scan
Time: 𝑶 𝑳𝒌𝟐 + 𝒌𝒎
Space: 𝑶 𝒎 # nonzero 𝒎≪ 𝑽𝟐
[WANG ET AL. 14A]
STOD is 20-3000 times faster
And the recovery error is low when the sample is large enough
Variance is almost 0
46
STOD – Scalable tensor orthogonal decomposition TOD – Tensor orthogonal decomposition
L=19M L=39M
Runtime
Recovery error
Summary of LDA Model Inference MAXIMUM LIKELIHOOD
Approximate inference ◦ slow, scan data thousands of times
◦ large variance, no theoretic guarantee
Numerous follow-up work ◦ further approximation [Porteous et al.
08, Yao et al. 09, Hoffman et al. 12] etc.
◦ parallelization [Newman et al. 09] etc.
◦ online learning [Hoffman et al. 13] etc.
METHOD OF MOMENTS
STOD [Wang et al. 14a] ◦ fast, scan data twice
◦ robust recovery with theoretic guarantee
New and promising!
47
Methodologies of Topic Mining
48
C. An integrated framework: CATHY
i) Recursive topic discovery ii) Phrase mining iii) Phrase and entity ranking
B. Extension of topic modeling
i) Flat -> hierarchical ii) Unigrams -> ngrams iii) Text -> text + entity
A. Traditional bag-of-words topic modeling
B. i) Flat Topics -> Hierarchical Topics In PLSA and LDA, a topic is selected from a flat pool of topics
In hierarchical topic models, a topic is selected from a hierarchy
49
Topic 𝝓𝟏 Topic 𝝓𝒌
… government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 2. Sample a word w according to 𝜙𝑧
.4 .3 .3
Topic 𝝓𝒛
o
o/1 o/2
o/2/1 o/1/1 o/1/2 o/2/2
Information technology & system
DB IR
CS
Hierarchical Topic Models Topics form a tree structure
◦ nested Chinese Restaurant Process [Griffiths et al. 04]
◦ recursive Chinese Restaurant Process [Kim et al. 12a]
◦ LDA with Topic Tree [Wang et al. 14b]
Topics form a DAG structure ◦ Pachinko Allocation [Li & McCallum 06]
◦ hierarchical Pachinko Allocation [Mimno et al. 07]
◦ nested Chinese Restaurant Franchise [Ahmed et al. 13]
50
o
o/1 o/2
o/2/1 o/1/1 o/1/2 o/2/2
o
o/1 o/2
o/2/1 o/1/1 o/1/2 o/2/2
DAG: DIRECTED ACYCLIC GRAPH
Hierarchical Topic Model Inference MAXIMUM LIKELIHOOD
Exact inference is intractable
Approximate inference: variational inference or MCMC
Non recursive – all the topics are inferred at once
METHOD OF MOMENTS
Scalable Tensor Recursive Orthogonal Decomposition [Wang et al. 14b] ◦ fast and robust recovery with theoretic
guarantee
Recursive method - only for LDA with Topic Tree model
51
Most popular
Recursive Inference for LDA with Topic Tree A large tree subsumes a smaller tree with shared model parameters
52
Inference order
[WANG ET AL. 14B]
Flexible to decide when to terminate
Easy to revise the tree structure
Scalable Tensor Recursive Orthogonal Decomposition
53
A: 0.03 AB: 0.001 ABC: 0.001
B: 0.01 BC: 0.002 ABD: 0.005
C: 0.04 AC: 0.003 BCD: 0.004
: : :
Normalized pattern counts for t
k k
k
𝑇 (𝑡)
Input corpus
Topic 𝝓𝒕/𝟏
Topic 𝝓𝒕/𝒌
…
government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
[WANG ET AL. 14B]
+ Topic t
B. ii) Unigrams -> N-Grams Motivation: unigrams can be difficult to interpret
54
learning
reinforcement
support
machine
vector
selection
feature
random
:
versus
learning
support vector machines
reinforcement learning
feature selection
conditional random fields
classification
decision trees
:
The topic that represents the area of Machine Learning
Various Strategies Strategy 1: generate bag-of-words -> generate sequence of tokens
◦ Bigram topical model [Wallach 06], topical n-gram model [Wang et al. 07], phrase discovering topic model [Lindsey et al. 12]
Strategy 2: post bag-of-words model inference, visualize topics with n-grams ◦ Label topic [Mei et al. 07], TurboTopic [Blei & Lafferty 09], KERT [Danilevsky et al. 14]
Strategy 3: prior bag-of-words model inference, mine phrases and impose to the bag-of-words model ◦ Frequent pattern-enriched topic model [Kim et al. 12b], ToPMine [El-kishky et al. 14]
55
Strategy 1 – Topical N-Gram Model & Phrase Discovering Topic Model
56
To generate a token in document 𝑑: 1. Sample a binary variable 𝑥 according to the previous token & topic label 2. Sample a topic label 𝑧 according to 𝜃𝑑 3. If 𝑥 = 0 (new phrase), sample a word w according to 𝜙𝑧; otherwise,
sample a word w according to 𝑧 and the previous token
[white
house]
[reports
[white]
[black d1 dD
color]
…
0
z x
0
1
0
0
1
z x
High model complexity - overfitting High inference cost - slow MCMC inference
[WANG ET AL. 07, LINDSEY ET AL. 12]
Strategy 2 – Topical Keyphrase Extraction & Ranking (KERT)
57
learning
support vector machines
reinforcement learning
feature selection
conditional random fields
classification
decision trees
:
Topical keyphrase
extraction & ranking
knowledge discovery using least squares support vector machine classifiers
support vectors for reinforcement learning
a hybrid approach to feature selection
pseudo conditional random fields
automatic web page classification in a dynamic and hierarchical way
inverse time dependency in convex regularized learning
postprocessing decision trees to extract actionable knowledge
variance minimization least squares support vector machines
…
Unigram topic assignment: Topic 1 & Topic 2
[DANILEVSKY ET AL. 14]
Framework of KERT 1. Run bag-of-words model inference, and assign topic label to each token
2. Extract candidate keyphrases within each topic
3. Rank the keyphrases in each topic
◦ Popularity: ‘information retrieval’ vs. ‘cross-language information retrieval’
◦ Discriminativeness: only frequent in documents about topic t
◦ Concordance: ‘active learning’ vs.‘learning classification’
◦ Completeness: ‘vector machine’ vs. ‘support vector machine’
58
Frequent pattern mining
Comparability property: directly compare phrases of mixed lengths
Comparison of phrase ranking methods
The topic that represents the area of Machine Learning 59
kpRel [Zhao et al. 11]
KERT (-popularity)
KERT (-discriminativeness)
KERT (-concordance)
KERT [Danilevsky et al. 14]
learning effective support vector machines learning learning
classification text feature selection classification support vector machines
selection probabilistic reinforcement learning selection reinforcement learning
models identification conditional random fields feature feature selection
algorithm mapping constraint satisfaction decision conditional random fields
features task decision trees bayesian classification
decision planning dimensionality reduction trees decision trees
: : : : :
Strategy 3 – Phrase Mining + Topic Model (ToPMine)
60
Phrase mining and
document segmentation
knowledge discovery using least squares support vector machine classifiers…
Knowledge discovery and support vector machine should have coherent topic labels
[EL-KISHKY ET AL. 14]
Strategy 2: the tokens in the same phrase may be assigned to different topics
Solution: switch the order of phrase mining and topic model inference
[knowledge discovery] using [least squares]
[support vector machine] [classifiers] …
[knowledge discovery] using [least squares]
[support vector machine] [classifiers] …
Topic model inference
with phrase constraints
More challenging than in strategy 2!
Phrase Mining: Frequent Pattern Mining + Statistical Analysis
61
Significance score [Church et al. 91] 𝛼(𝐴, 𝐵)
=|𝐴𝐵| − |𝐴||𝐵|/𝑛
𝐴𝐵
Good Phrases
Phrase Mining: Frequent Pattern Mining + Statistical Analysis
62
Raw freq
[support vector machine]: 90 80
[vector machine]: 95 0
[support vector]: 100 20
True freq
[Markov blanket] [feature selection] for [support
vector machines]
[knowledge discovery] using [least squares]
[support vector machine] [classifiers]
…[support vector] for [machine learning]…
Significance score [Church et al. 91] 𝛼(𝐴, 𝐵)
=|𝐴𝐵| − |𝐴||𝐵|/𝑛
𝐴𝐵
Example Topical Phrases
63
ToPMine [El-kishky et al. 14] – Strategy 3 (67 seconds)
information retrieval feature selection
social networks machine learning
web search semi supervised
search engine large scale
information extraction support vector machines
question answering active learning
web pages face recognition
: :
Topic 1 Topic 2
social networks information retrieval
web search text classification
time series machine learning
search engine support vector machines
management system information extraction
real time neural networks
decision trees text categorization
: :
Topic 1 Topic 2
PDLDA [Lindsey et al. 12] – Strategy 1 (3.72 hours)
ToPMine: Experiments on Associate Press News (1989) 64
ToPMine: Experiments on Yelp Reviews 65
66
Runtime
Strategy 3 1 2 1 2
fastest slow slow fast
Comparison of three strategies
strategy 3 > strategy 2 > strategy 1
Coherence of topics
Summary of Topical N-Gram Mining Strategy 1: generate bag-of-words -> generate sequence of tokens
◦ integrated complex model; phrase quality and topic inference rely on each other
◦ slow and overfitting
Strategy 2: post bag-of-words model inference, visualize topics with n-grams ◦ phrase quality relies on topic labels for unigrams
◦ can be fast
Strategy 3: prior bag-of-words model inference, mine phrases and impose to the bag-of-words model ◦ topic inference relies on correct partition of documents, but not sensitive
◦ can be fast
67
B. iii) Text Only -> Text + Entity
What should be the output?
How to use linked entity information?
68
text
Criticism of government
response to the hurricane …
Text-only corpus
entity
Topic 𝝓𝟏 Topic 𝝓𝒌
… government 0.3
response 0.2
...
donate 0.1
relief 0.05
...
.4 .3 .3 Doc 𝜃1
.2 .5 .3 Doc 𝜃𝐷
…
Three Modeling Strategies RESEMBLE ENTITIES TO DOCUMENTS
An entity has a multinomial distribution over topics
RESEMBLE ENTITIES TO WORDS
A topic has a multinomial distribution over each type of entities
69
.3 .4 .3
.2 .5 .3 SIGMOD
Surajit Chaudhuri
… Topic 1
KDD 0.3
ICDM 0.2
...
Over venues
Jiawei Han 0.1
Christos Faloustos 0.05
...
Over authors
RESEMBLE ENTITIES TO TOPICS
An entity has a multinomial distribution over words
SIGMOD database 0.3
system 0.2
...
Resemble Entities to Documents Regularization - Linked documents or entities have similar topic distributions
◦ iTopicModel [Sun et al. 09a]
◦ TMBP-Regu [Deng et al. 11]
Use entities as additional sources of topic choices for each token ◦ Contextual focused topic model [Chen et al. 12] etc.
Aggregate documents linked to a common entity as a pseudo document ◦ Co-regularization of inferred topics under multiple views [Tang et al. 13]
70
Resemble Entities to Documents Regularization - Linked documents or entities have similar topic distributions
71
iTopicModel [Sun et al. 09a] TMBP-Regu [Deng et al. 11]
Doc 𝜃2
Doc 𝜃3
Doc 𝜃1
𝜃2 should be similar to 𝜃1, 𝜃3 𝜃1d should be similar to 𝜃5
𝑢, 𝜃2𝑢, 𝜃2
𝑣
Resemble Entities to Documents Use entities as additional sources of topic choice for each token
◦ Contextual focused topic model [Chen et al. 12]
72
To generate a token in document 𝑑: 1. Sample a variable 𝑥 for the context type 2. Sample a topic label 𝑧 according to 𝜃 of the context type decided by 𝑥 3. Sample a word w according to 𝜙𝑧
.4 .3 .3 𝑥 = 1, sample 𝑧 from document’s topic distribution
𝑥 = 2, sample 𝑧 from author’s topic distribution .3 .4 .3
.2 .5 .3 𝑥 = 3, sample 𝑧 from venue’s topic distribution
On Random Sampling over Joins
Surajit Chaudhuri SIGMOD
Resemble Entities to Documents Aggregate documents linked to a common entity as a pseudo document
◦ Co-regularization of inferred topics under multiple views [Tang et al. 13]
73
Document view
A single paper Author view
All Surajit Chaudhuri’s
papers
Venue view
All SIGMOD papers
Topic 𝝓𝟏 Topic 𝝓𝒌 …
Three Modeling Strategies RESEMBLE ENTITIES TO DOCUMENTS
An entity has a multinomial distribution over topics
RESEMBLE ENTITIES TO WORDS
A topic has a multinomial distribution over each type of entities
74
.3 .4 .3
.2 .5 .3 SIGMOD
Surajit Chaudhuri
… Topic 1
KDD 0.3
ICDM 0.2
...
Over venues
Jiawei Han 0.1
Christos Faloustos 0.05
...
Over authors
RESEMBLE ENTITIES TO TOPICS
An entity has a multinomial distribution over words
SIGMOD database 0.3
system 0.2
...
Resemble Entities to Topics Entity-Topic Model (ETM) [Kim et al. 12c]
75
Topic 𝝓𝟏
… data 0.3
mining 0.2
...
SIGMOD
database 0.3
system 0.2
...
…
Surajit Chaudhuri
database 0.1
query 0.1
...
…
text venue author
To generate a token in document 𝑑: 1. Sample an entity 𝑒 2. Sample a topic label 𝑧 according to 𝜃𝑑 3. Sample a word w according to 𝜙𝑧,𝑒
𝜙𝑧,𝑒~𝐷𝑖𝑟(𝑤1𝜙𝑧 +𝑤2𝜙𝑒)
Paper text Surajit Chaudhuri SIGMOD
Example topics learned by ETM
On a news dataset about Japan tsunami 2011
76
𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧
𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙e
Three Modeling Strategies RESEMBLE ENTITIES TO DOCUMENTS
An entity has a multinomial distribution over topics
RESEMBLE ENTITIES TO WORDS
A topic has a multinomial distribution over each type of entities
77
.3 .4 .3
.2 .5 .3 SIGMOD
Surajit Chaudhuri
… Topic 1
KDD 0.3
ICDM 0.2
...
Over venues
Jiawei Han 0.1
Christos Faloustos 0.05
...
Over authors
RESEMBLE ENTITIES TO TOPICS
An entity has a multinomial distribution over words
SIGMOD database 0.3
system 0.2
...
Resemble Entities to Words Entities as additional elements to be generated for each doc
◦ Conditionally independent LDA [Cohn & Hofmann 01]
◦ CorrLDA1 [Blei & Jordan 03]
◦ SwitchLDA & CorrLDA2 [Newman et al. 06]
◦ NetClus [Sun et al. 09b]
78
To generate a token/entity in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 2. Sample a token w / entity e according to 𝜙𝑧 or 𝜙𝑧
𝑒
Topic 1
KDD 0.3
ICDM 0.2
...
venues
Jiawei Han 0.1
Christos Faloustos 0.05
...
authors
data 0.2
mining 0.1
...
words
Comparison of Three Modeling Strategies for Text + Entity RESEMBLE ENTITIES TO DOCUMENTS
Entities regularize textual topic discovery
RESEMBLE ENTITIES TO WORDS
Entities enrich and regularize the textual representation of topics
79
.3 .4 .3
.2 .5 .3 SIGMOD
Surajit Chaudhuri
… Topic 1
KDD 0.3
ICDM 0.2
...
Over venues
Jiawei Han 0.1
Christos Faloustos 0.05
...
Over authors
RESEMBLE ENTITIES TO TOPICS
Each entity has its own profile SIGMOD database 0.3
system 0.2
... # params = k*E*V
# params = k*(E+V)
Methodologies of Topic Mining
80
C. An integrated framework: CATHY
i) Recursive topic discovery ii) Phrase mining iii) Phrase and entity ranking
B. Extension of topic modeling
i) Flat -> hierarchical ii) Unigrams -> ngrams iii) Text -> text + entity
A. Traditional bag-of-words topic modeling
Break
81
C. An Integrated Framework
How to choose & integrate?
82
Recursive Non recursive Hierarchy
Sequence of tokens generative model
• Strategy 1
Post inference, visualize topics with n-grams
• Strategy 2
Prior inference, mine phrases and impose to the bag-of-words model
• Strategy 3
P
h
r
a
s
e
E
n
t
i
t
y
Resemble entities to documents
• Modeling strategy 1
Resemble entities to topics
• Modeling strategy 2
Resemble entities to words
• Modeling strategy 3
C. An Integrated Framework
Compatible & effective
83
Recursive Non recursive Hierarchy
P
h
r
a
s
e
E
n
t
i
t
y
Resemble entities to documents
• Modeling strategy 1
Resemble entities to topics
• Modeling strategy 2
Resemble entities to words
• Modeling strategy 3
Sequence of tokens generative model
• Strategy 1
Post inference, visualize topics with n-grams
• Strategy 2
Prior model inference, mine phrases and impose to the bag-of-words model
• Strategy 3
C. An Integrated Framework: Construct A Topical HierarchY (CATHY)
Hierarchy + phrase + entity
84
i) Hierarchical topic discovery with entities
ii) Phrase mining
iii) Rank phrases & entities per topic
Output hierarchy with phrases & entities
Input collection
text
o
o/1
o/1/1 o/1/2
o/2
o/2/1
entity
Mining Framework – CATHY Construct A Topical HierarchY
85
i) Hierarchical topic discovery with entities
ii) Phrase mining
iii) Rank phrases & entities per topic
Output hierarchy with phrases & entities
Input collection
text
o
o/1
o/1/1 o/1/2
o/2
o/2/1
entity
Hierarchical Topic Discovery with Multi-Typed Entities [Wang et al. 13a,b] Resemble entities to words – every topic has a multinomial distribution over each type of entities
86
Topic 1
KDD 0.3
ICDM 0.2
...
Jiawei Han 0.1
Christos Faloustos 0.05
...
data 0.2
mining 0.1
...
Topic k
𝜙11 𝜙1
2 𝜙13
𝜙𝑘1 𝜙𝑘
2 𝜙𝑘3
SIGMOD 0.3
VLDB 0.3
...
Surajit Chaudhuri 0.1
Jeff Naughton 0.05
...
database 0.2
system 0.1
...
…
venues authors words
Hierarchical Topic Discovery: Link Patterns
87
Computing machinery and intelligence
intelligence
computing machinery
A.M. Turing
A.M. Turing
Hierarchical Topic Discovery: Link-Weighted Heterogeneous Network
88
word author venue
text
A.M. Turing intelligence
system database
SIGMOD
venue
author
Hierarchical Topic Discovery: Generative Model for Link Patterns A single link has a latent topic path z
89
o
o/1 o/2
o/2/1 o/1/1 o/1/2 o/2/2
Information technology & system
DB IR
To generate a link between type 𝑡1 and type 𝑡2: 1. Sample a topic label 𝑧 according to 𝜃
Suppose 𝑡1 = 𝑡2 = word
Hierarchical Topic Discovery: Generative Model for Link Patterns
90
database
To generate a link between type 𝑡1 and type 𝑡2: 1. Sample a topic label 𝑧 according to 𝜃
2. Sample the first end node 𝑢 according to 𝜙𝑧𝑡1
Suppose 𝑡1 = 𝑡2 = word
Topic o/1/2
database 0.2
system 0.1
...
Hierarchical Topic Discovery: Generative Model for Link Patterns
91
database system
To generate a link between type 𝑡1 and type 𝑡2: 1. Sample a topic label 𝑧 according to 𝜃
2. Sample the first end node 𝑢 according to 𝜙𝑧𝑡1
3. Sample the second end node 𝑣 according to 𝜙𝑧𝑡2
Suppose 𝑡1 = 𝑡2 = word
Topic o/1/2
database 0.2
system 0.1
...
Hierarchical Topic Discovery: Generative Model for Link Patterns
92
Equivalently, we can generate # links between u and v:
𝑒𝑢,𝑣 = 𝑒𝑢,𝑣1 + ⋯+ 𝑒𝑢,𝑣
𝑘 , 𝑒𝑢,𝑣𝑧 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝜃𝑧 𝜙𝑧,𝑢
𝑡1 𝜙𝑧,𝑣𝑡2 )
Suppose 𝑡1 = 𝑡2 = word
database system
0 1 2 3 4 5
0 1 2 3 4 5
𝑒𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒,𝑠𝑦𝑠𝑡𝑒𝑚𝑜/1/2 𝐷𝐵
~
𝑒𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒,𝑠𝑦𝑠𝑡𝑒𝑚𝑜/1/1(𝐼𝑅)
~
database system 5
4
1
Recursive Model Inference Using Expectation-Maximization (EM)
93
system database
system
database
system database
Topic o/1 Topic o/2 Topic o
100 95 5
+
Topic o/1
KDD 0.3
ICDM 0.2
...
Jiawei Han 0.1
Christos Faloustos 0.05
...
data 0.2
mining 0.1
...
𝜙𝑜/11 𝜙𝑜/1
2 𝜙𝑜/13
…
Topic o/k
... ... ...
𝜙𝑜/𝑘1 𝜙𝑜/𝑘
2 𝜙𝑜/𝑘3
Bayes rule Sum &
normalize counts
M-ste
p
E-ste
p
Top-Down Recursion
94
system database
system
database
system database
Topic o/1 Topic o/2 Topic o
100 95 5
+
system
database
Topic o/1
95 system
database
system database
65 30
+
Topic o/1/1 Topic o/1/2
Extension: Learn Link Type Importance Different link types may have different importance in topic discovery
Introduce a link type weight 𝜶𝒙,𝒚
◦ Original link weight 𝒆𝒊,𝒋𝒙,𝒚,𝒛
→ 𝜶𝒙,𝒚𝒆𝒊,𝒋𝒙,𝒚,𝒛
◦ 𝛼 > 1 – more important
◦ 0 < 𝛼 < 1 – less important
we can assume w.l.o.g 𝛼𝑥,𝑦𝑛𝑥,𝑦
𝑥,𝑦 = 1
The EM solution is invariant to a constant scaleup of all the link weights
rescale
95
Optimal Weight
Average link weight KL-divergence of prediction from observation
96
Learned Link Importance & Topic Coherence
97
Learned importance of different link types Level Word-word Word-author Author-author Word-venue Author-venue
1 .2451 .3360 .4707 5.7113 4.5160
2 .2548 .7175 .6226 2.9433 2.9852
-1
0
1
2
NetClus CATHY (equal importance) CATHY (learn importance)
Average pointwise mutual information (PMI) of each topic
Word-word Word-author Author-author Word-venue Author-venue Overall
Phrase Mining
Frequent pattern mining; no NLP parsing
Statistical analysis for filtering bad phrases
98
i) Hierarchical topic discovery with entities
ii) Phrase mining
iii) Rank phrases & entities per topic
Output hierarchy with phrases & entities
Input collection
text o
o/1
o/1/1 o/1/2
o/2
o/2/1
Examples of Mined Phrases News Computer science
99
information retrieval feature selection
social networks machine learning
web search semi supervised
search engine large scale
information extraction support vector machines
question answering active learning
web pages face recognition
: :
: :
energy department president bush
environmental protection agency white house
nuclear weapons bush administration
acid rain house and senate
nuclear power plant members of congress
hazardous waste defense secretary
savannah river capital gains tax
: :
: :
Phrase & Entity Ranking
Ranking criteria: popular, discriminative, concordant
100
1. Hierarchical topic discovery w/ entities
2. Phrase mining
3. Rank phrases & entities per topic
Output hierarchy w/ phrases & entities
Input collection
text o
o/1
o/1/1 o/1/2
o/2
o/2/1
entity
Phrase & Entity Ranking – Estimate Topical Frequency
E.g.
𝑝 𝑧 = 𝐷𝐵 𝑞𝑢𝑒𝑟𝑦 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 =𝑝 𝑧=𝐷𝐵 𝑝 𝑞𝑢𝑒𝑟𝑦 𝑧=𝐷𝐵 𝑝 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑧=𝐷𝐵
𝑝 𝑧=𝑡 𝑝 𝑞𝑢𝑒𝑟𝑦 𝑧=𝑡 𝑝 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑧=𝑡𝑡
=𝜃𝐷𝐵𝜙𝐷𝐵,𝑞𝑢𝑒𝑟𝑦𝜙𝐷𝐵,𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔
𝜃𝑡𝜙𝑡,𝑞𝑢𝑒𝑟𝑦𝜙𝑡,𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔𝑡
101
Pattern Total ML DB DM IR
support vector machines 85 85 0 0 0 query processing 252 0 212 27 12 Hui Xiong 72 0 0 66 6 SIGIR 2242 444 378 303 1117
Estimated by Bayes rule Frequent pattern mining
Phrase & Entity Ranking – Ranking Function ‘Popular’ indicator of phrase or entity 𝐴 in topic 𝑡: 𝑝 𝐴 𝑡
‘Discriminative’ indicator of phrase or entity 𝐴 in topic 𝑡: log𝑝 𝐴 𝑡
𝑝 𝐴 𝑇
‘Concordance’ indicator of phrase 𝐴: 𝛼(𝐴) =|𝐴|−𝐸( 𝐴 )
𝑠𝑡𝑑( 𝐴 )
𝑟𝑡 𝐴 = 𝑝 𝐴 𝑡 log𝑝 𝐴 𝑡
𝑝 𝐴 𝑇+ 𝜔𝛼(𝐴)
Pointwise KL-divergence
102
𝑇: topic for comparison
Significance score used for phrase mining
Example topics: database & information retrieval
103
database system
query processing
concurrency control…
Divesh Srivastava
Surajit Chaudhuri
Jeffrey F. Naughton…
ICDE
SIGMOD
VLDB…
text categorization
text classification
document clustering
multi-document summarization…
relevance feedback
query expansion
collaborative filtering
information filtering…
……
……
information retrieval
retrieval
question answering…
W. Bruce Croft
James Allan
Maarten de Rijke…
SIGIR
ECIR
CIKM…
…
Evaluation: Intrusion detection [Chang et al. 09]
Evaluation method - Intrusion detection
104
Phrase Intrusion
Question 1/80 Topic Intrusion
Parent topic database systems data management query processing management system data system
Child topic 1 web search search engine semantic web search results web pages
Child topic 2 data management data integration data sources data warehousing data applications
Child topic 3 query processing query optimization query databases relational databases query data
Child topic 4 database system database design expert system management system design system
Question 1/130 data mining association rules logic programs data streams
Question 2/130 natural language query optimization data management database systems
parent
child1 child2 intruder
Phrase 1
Phrase 2
Phrase 3
Phrase 4
…
Phrase 1
Phrase 2
Phrase 3
Phrase 4
…
normal intruder
which child topic does not belong to the given parent topic?
which phrase does not belong?
Ranked phrases + entities > unigrams
105
0%
20%
40%
60%
80%
100%
CS Topic Intrusion NEWS Topic Intrusion
% of the hierarchy interpreted by people
1. hPAM 2. NetClus 3. CATHY (unigram)
3 + phrase 3 + entity 3 + phrase + entity
65% 66%
Applications: entity, community profiling…
106
Christos Faloutsos in data mining
data mining / data streams / time series /
association rules / mining patterns
time series
nearest neighbor
association rules
mining patterns
data streams
high dimensional data
111.6
papers
21.0 35.6 33.3
data mining / data streams / nearest
neighbor / time series / mining patterns
selectivity estimation
sensor networks
nearest neighbor
time warping
large graphs
large datasets
67.8
papers
16.7 16.4 20.0
Eamonn J. Keogh
Jessica Lin
Michail Vlachos
Michael J. Passani
Matthias Renz
Divesh Srivasta
Surajit Chaudhuri
Nick Koudas
Jeffrey F. Naughton
Yannis Papakonstantinou
Jiawei Han
Ke Wang
Xifeng Yan
Bing Liu
Mohammed J. Zaki
Charu C. Aggarwal
Graham Cormode
S. Muthukrishnan
Philip S. Yu
Xiaolei Li
Philip S. Yu in data mining
Important research areas in SIGIR conference ?
107
260.0 583.0
support vector machines
collaborative filtering
text categorization
text classification
conditional random fields
information systems
artificial intelligence
distributed information
retrieval
query evaluation
event detection
large collections
similarity search
duplicate detection
large scale
information retrieval
question answering
web search
natural language
document retrieval
SIGIR (2,432 papers)
443.8 377.7 302.7 1,117.4
information retrieval
question answering
relevance feedback
document retrieval
ad hoc
web search
search engine
search results
world wide web
web search results
word sense disambiguation
named entity
named entity recognition
domain knowledge
dependency parsing
127.3 108.9
matrix factorization
hidden markov models
maximum entropy
link analysis
non-negative matrix
factorization
text categorization
text classification
document clustering
multi-document summarization
naïve bayes
160.3
DM IR DB ML
Outline
1. Introduction to mining latent entity structures
2. Mining latent topic hierarchies
3. Mining latent entity relations
4. Mining latent entity concepts
5. Trends and research problems
108
Discovery of Hidden Entity Relations
109
Relations
text
entity
Topics
Unstructured text & linked entities -> structured hierarchies
Example Hidden Relations Academic family from research publications
Social relationship from online social network
110
Alumni Colleague
Club friend
Jeff Ullman
Surajit Chaudhuri (1991)
Jeffrey Naughton (1987)
Joseph M. Hellerstein (1995)
Entity Relation Mining A. Representative mining paradigms
B. Patterns, rules and constraints for reasoning
C. Methodologies for dependency modeling
111
Mining Paradigms A. Representative mining paradigms
◦ Similarity search of relationships
◦ Classify or cluster entity relationships
◦ Slot filling
B. Patterns, rules and constraints for reasoning
C. Methodologies for dependency modeling
112
Similarity Search of Relationships Input: relation instance
Output: relation instances with similar semantics
113
(Jeff Ullman, Surajit Chaudhuri)
(Jeffrey Naughton, Joseph M. Hellerstein)
(Jiawei Han, Chi Wang)
…
Is advisor of
(Apple, iPad)
(Microsoft, Surface)
(Amazon, Kindle)
…
Produce tablet
Classify or Cluster Entity Relationships Input: relation instances with unknown relationship
Output: predicted relationship or clustered relationship
114
(Jeff Ullman, Surajit Chaudhuri)
Is advisor of
(Jeff Ullman, Hector Garcia)
Is colleague of
Alumni Colleague
Club friend
Slot Filling Input: relation instance with a missing element (slot)
Output: fill the slot
is advisor of (?, Surajit Chaudhuri) Jeff Ullman
produce tablet (Apple, ?) iPad
115
Model Brand
S80 ?
A10 ?
T1460 ?
Model Brand
S80 Nikon
A10 Canon
T1460 Benq
Patterns, Rules & Constraints for Reasoning
A. Representative mining paradigms ◦ Similarity search of relationships
◦ Classify or cluster entity relationships
◦ Slot filling
B. Patterns, rules and constraints for reasoning ◦ Text patterns
◦ Linkage patterns
◦ Dependency rules and constraints
C. Methodologies for dependency modeling
116
Text Patterns Syntactic patterns
◦ [Bunescu & Mooney 05b]
Dependency parse tree patterns ◦ [Zelenko et al. 03]
◦ [Culotta & Sorensen 04]
◦ [Bunescu & Mooney 05a]
Topical patterns ◦ [McCallum et al. 05] etc.
117
The headquarters of Google are situated in Mountain View
Jane says John heads XYZ Inc.
Emails between McCallum & Padhraic Smyth
Linkage Patterns - Topology Edge sign prediction [Leskovec et al. 10]
◦ Epinions: Trust/Distrust
◦ Wikipedia: Support/Oppose
◦ Slashdot: Friend/Foe
Triad counts: counts of signed triads edge u->v takes part in
118
Linkage Patterns - Traffic Manager-subordinate relationship from email exchange [Diehl et al. 07]
◦ # msgs
◦ quartiles for # recipients
119
Possible communication events for relationship (𝑛𝑎, 𝑛𝑏)
Linkage Patterns – Traffic Dynamics Advisor-advisee relationship from research publications [Wang et al. 10]
120
Smith
2000
2000
2001
2002
2003
1999
Ada
Ying
2004
A (papers by the
candidate advisor)
B 𝑘𝑢𝑙𝑐 𝐴, 𝐵 =
𝐴 ∩ 𝐵
2
1
𝐴+
1
𝐵
𝐼𝑅 𝐴, 𝐵 =𝐴 − |𝐵|
|𝐴 ∪ 𝐵|
Kulczinski
Imbalance ratio
0
0.5
1
2000 2001 2002 2003 2004
Coauthorship trend over time
Kulczinski Imbalance Ratio End time Start time
Dependency Rules & Constraints (Advisor-Advisee Relationship)
E.g., role transition - one cannot be advisor before graduation
121
2000
2000
2001
1999
Ada Bob
Ying
Ada
BobYing
Graduate in 2001Start in 2000
Graduate in 1998
Ada
Bob Ying
Graduate in 2001 Start in 2000
Graduate in 1998
Dependency Rules & Constraints (Social Relationship) ATTRIBUTE-RELATIONSHIP
Friends of the same relationship type share the same value for only certain attribute
CONNECTION-RELATIONSHIP
The friends having different relationships are loosely connected
122
Methodologies for Dependency Modeling Factor graph
◦ [Wang et al. 10, 11, 12]
◦ [Tang et al. 11]
Optimization framework ◦ [McAuley & Leskovec 12]
◦ [Li, Wang & Chang 14]
Graph-based ranking ◦ [Yakout et al. 12]
123
Factor Graph [Wang et al. 10, 11, 12]
124
Factor graph – model the joint likelihood by multiple factors
Each factor is a potential function defined on a few variables ◦ 𝑦𝑥 - 𝑎𝑥’s advisor. E.g., 𝑦2 ∈ 0,1 , 𝑦4 ∈ 0,3
◦ 𝑔𝑥 𝑦𝑥 - traffic dynamic; 𝑓𝑥 𝑦𝑥 , 𝐶𝑥 - role transition
Candidate children’s 𝑦
𝑃 =1
𝑍 𝑔𝑥(𝑦𝑥)
𝑎𝑢𝑡ℎ𝑜𝑟 𝑎𝑥
𝑓𝑥 𝑦𝑥, 𝐶𝑥
average of Kulczinski measure and IR measure
1 if all the constraints are satisfied; 0 otherwise
a1
a0
a2 a3
a4a5
f 0(y0,y1, y2,y3,y4,y5)
y1
y0
y2 y3
y4y5
f 1(y1, y2,y3)
g2(y2)
g5(y5)g4(y4)
f 3(y3, y4,y5)
g3(y3)
g1(y1)
g0(y0)
Factor Graph - Inference Traditional inference algorithm message passing [Koller & Friedman 09] ◦ Exact inference: sum-product + junction tree
◦ Approximate inference: loopy belief propagation
More optimized message passing algorithm [Wang et al. 10] ◦ Specialized schedule according to the DAG structure
◦ Eliminate function nodes and internal messages
◦ Reduce the message from a vector to a scalar
125
y1
y2 y3f 1
Standard message passing algorithm message is a marginal function (a vector)
a1
a2 a3
Our message passing algorithm message is a scalar
Not scalable. Fails to finish for thousands of variables
Flooding schedule; causing repetitive information flow
Our message passing algorithm
1
Phase 1
a5
a2 a3
a4
a0
a1
1
1
1
11
2
3
2
Phase 2
a5
a2 a3
a4
a0
a1
1
3
1
13
2
1
senty0
10
recv ?senty0
10
recv 1
senty2
u2,0
0
recv ?u2,1
1
?
senty5
u5,0
0
recv ?
senty1
u1,0
0
recv ?
senty4
u4,0
0
recv ?u4,3
3
?
u5,3
3
?
senty3
u3,0
0
recv ?u3,1
1
?
senty2
u2,0
0
recv v2,0
u2,1
1
v2,1
senty5
u5,0
0
recv v5,0
u5,3
3
v5,3
senty1
u1,0
0
recv v1,0
senty3
u3,0
0
recv v3,0
u3,1
1
v3,1
senty4
u4,0
0
recv v4,0
u4,3
3
v4,3
𝑟𝑖𝑗 = max 𝑃(𝒚| 𝑦𝑖 = 𝑗) = exp (𝑠𝑒𝑛𝑡𝑖𝑗 + 𝑟𝑒𝑐𝑣𝑖𝑗)
After the message passing, we collect the message to find the most likely advisor
how likely 𝑎𝑗 is 𝑎𝑖’s
advisor
126
Experiment Results DBLP data: 654,628 authors, 1,076,946 publications, publishing time provided
Labeled data: Math Genealogy; AI Genealogy; Faculty Homepage
Datasets RULE SVM Our method
TEST1 69.9% 73.4% 84.4%
TEST2 69.8% 74.6% 84.3%
TEST3 80.6% 86.7% 91.3%
Heuristic
rule
Support vector machine
(no role transition)
Accuracy of relationship prediction
127
The proposed algorithm is efficient and effective
Advisee Top Ranked Advisor Time Note
David M. Blei
1. Michael I. Jordan 01-03 PhD advisor, 2004 grad
2. John D. Lafferty 05-06 Postdoc, 2006
Hong Cheng
1. Qiang Yang 02-03 MS advisor, 2003
2. Jiawei Han 04-08 PhD advisor, 2008
Sergey Brin
1. Rajeev Motawani 97-98 “Unofficial advisor”
128
Other examples of hierarchical relationship Reply relationship in online discussion [Wang et al. 11]
129
Other examples of hierarchical relationship
Family tree
[Wang et al. 12]
130
Solution in the generic setting
Step 1: create a candidate DAG according to certain expectation about a partial order among the objects
Step 2: model the dependency of relations with a factor graph
Step 3: learn the importance of different expectations by Maximum A Posterior (MAP) inference
Step 4: maximize the joint likelihood and predict true relationship
A large number of factors may exist
Their importance is unknown
a1
a0
a2 a3
a4a5
f 0(y0,y1, y2,y3,y4,y5)
y1
y0
y2 y3
y4y5
f 1(y1, y2,y3)
g2(y2)
g5(y5)g4(y4)
f 3(y3, y4,y5)
g3(y3)
g1(y1)
g0(y0)
a1
a0
a2 a3
a4a5
L2 regularization L-BFGS
131
Categorization of potential functions See other types of potentials in [Wang et al. 12]
Type Cognitive description Potential definition
Homophile Parent and child are similar
Polarity Parent is superior to child
pairwise potentials
singleton potentials
e.g, Kulczinsky
Imbalance Ratio
Type Cognitive description Potential definition
Constraints Restrict certain patterns 𝑓 𝑦𝑖 , 𝑦𝑗 = −|𝐶𝑃 𝑣𝑖 , 𝑣𝑗 , 𝑣𝑦𝑖 , 𝑣𝑦𝑗 |
…
e.g., role transition
132
Optimization Framework [Li et al. 14] Relationship in online social networks
133
Employer: Yahoo! College: Stanford
Employer: ? College:?
Employer: Yahoo! College: Berkeley
Employer: Twitter College: Berkeley
Employer: ? College:?
Employer: Twitter College: UIUC
Employer: ? College:?
Employer: JP Morgan College: UIUC
Employer: ? College:?
Employer: Google College: UIUC
Some users provide attributes in their online profiles Some users’ attributes are missing
Optimization Framework to Model the Dual Dependency Design cost function to capture the dependences
Find the unknown variables that minimize the cost function
134
Unobserved Friends’ circles
Observed User Connections
Partially Observed User Attributes
Friends of the same relationship type share the same value for only certain attribute
The friends having different relationships are loosely connected
Co-Profiling of Relationship & Attributes
𝜆1 𝑤𝑖 𝑓𝑖 − 𝑓𝑗2
𝑒𝑖𝑗∈𝐸,𝑣𝑖,𝑣𝑗∈𝐶𝑖
𝐾
𝑡=1
+ 𝜆2 𝑤𝑖𝑓𝑖 − 1 2
𝑣𝑖∈𝐿∩𝐶𝑖
𝐾
𝑡=1
+ 𝜆3 1
𝐾
𝑒𝑖𝑗∈𝐸′,𝑥𝑖≠𝑥𝑗
135
f3 =<1, 0, 0, 0, 1, 0, 0.1>
Circle1: friends likely to share employee Circle 2: friends likely to share college
Employer: Yahoo! College: Stanford
Employer: ? College:?
Employer: Yahoo! College: Berkeley
Employer: ? College:?
Employer: Twitter College: UIUC
Employer: Google College: UIUC
Employer: Yahoo
College: UIUC
Attribute Vector f1 =<1, 0, 0, 1, 0, 0, 0.1>
Circle Assignment x1=1
x3=1
Association Vector w1 =<1, 1, 1, 0, 0, 0, 0> w2 =<0, 0, 0, 1, 1, 1, 0>
f4 =<0, 1, 0, 0, 0, 1, 0 0.1>
x4=2
Attribute-Relationship Dependence Connection-Relationship Dependence
Yahoo Twitter Google Berkeley Stanford UIUC …
Inference Algorithm
𝜆1 𝑤𝑖 𝑓𝑖 − 𝑓𝑗2
𝑒𝑖𝑗∈𝐸,𝑣𝑖,𝑣𝑗∈𝐶𝑖
𝐾
𝑡=1
+ 𝜆2 𝑤𝑖𝑓𝑖 − 1 2
𝑣𝑖∈𝐿∩𝐶𝑖
𝐾
𝑡=1
+ 𝜆3 1
𝐾
𝑒𝑖𝑗∈𝐸′,𝑥𝑖≠𝑥𝑗
1. Update User Attribute Vectors 𝑓𝑖 ◦ Only propagate values from friends in the same circles ◦ Only propagate the attribute value associated with the circle
2. Update User Circle Assignments 𝑥𝑖 , 𝐶𝑖 ◦ Consider both user’s attributes and connections
3. Update Circle Association Vectors 𝑤𝑖 ◦ Make association vector sparse
136
Coprofiling of relationship + attributes outperforms baselines for both
137
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7Accuracy for College
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.62
APw APi APc CP
Accuracy for Employer
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
RPa RPn RPan CP
Profiling college classmates
0
0.1
0.2
0.3
0.4
0.5
0.6
RPa RPn RPan CP
Profiling colleagues
Users Connections
19K 110K
LinkedIn data
8K connection are labeled
Methodologies for Dependency Modeling Factor graph
◦ [Wang et al. 10, 11, 12]
◦ [Tang et al. 11]
Optimization framework ◦ [McAuley & Leskovec 12]
◦ [Li, Wang & Chang 14]
Graph-based ranking ◦ [Yakout et al. 12]
◦ Suitable for discrete variables
◦ Probabilistic model with general inference algorithms
◦ Both discrete and real variables
◦ Special optimization algorithm needed
◦ Similar to PageRank
◦ Suitable when the problem can be modeled as ranking on graphs
138
Outline
1. Introduction to mining latent entity structures
2. Mining latent topic hierarchies
3. Mining latent entity relations
4. Mining latent entity concepts
5. Trends and research problems
139
Mining Entity Concepts for Typing
140
Concepts Relations
text
entity
Topics
Unstructured text & linked entities -> structured hierarchies
Concepts as Entity Types Top 10 active politicians regarding healthcare issues?
Influential high-tech companies in Silicon Valley?
141
Concept Entity Mention
politician Obama says more than 6M signed up for
health care…
high-tech company
Apple leads in list of Silicon Valley's most-valuable brands…
Source of Concepts, Entities & Mentions
142
Concept Entity Mention
politician Obama says more than 6M signed up for
health care…
high-tech company
Apple leads in list of Silicon Valley's most-valuable brands…
in taxonomy / ad-hoc in text / in web tables
Mining Entity Concepts for Typing Large scale taxonomies
◦ Built from human-curated knowledgebases
◦ Constructed from free text
Type entities in text
Type entities in web tables and lists
143
Large Scale Taxonomies Name Source # concepts # entities Hierarchy
Dbpedia (v3.9) Wikipedia infoboxes 529 3M Tree
YAGO2s Wiki, WordNet, GeoNames 350K 10M Tree
Freebase Miscellaneous 23K 23M Flat
Probase (MS.KB) Web text 2M 5M DAG
144
YAGO2s Freebase
Type Entities in Text Relying on knowledgebases – entity linking
◦ Context similarity: [Bunescu & Pascal 06] etc.
◦ Topical coherence: [Cucerzan 07] etc.
◦ Context similarity + entity popularity + topical coherence: Wikifier [Ratinov et al. 11]
◦ Jointly linking multiple mentions: AIDA [Hoffart et al. 11] etc.
◦ …
145
Limitation of Entity Linking Low recall of knowledgebases
Sparse concept descriptors
Can we type entities without relying on knowledgebases?
Yes! Exploit the redundancy in the corpus ◦ Not relying on knowledgebases: targeted disambiguation of ad-hoc, homogeneous
entities [Wang et al. 12]
◦ Partially relying on knowledgebases: mining additional evidence in the corpus for disambiguation [Li et al. 13]
146
82 of 900 shoe brands exist in Wiki
Michael Jordan won the best paper award
Targeted Disambiguation [Wang et al. 12]
147
Entity Id
Entity Name
e1 Microsoft
e2 Apple
e3 HP
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
Target entities
d1
d2
d3
d4
d5
Targeted Disambiguation
148
Entity Id
Entity Name
e1 Microsoft
e2 Apple
e3 HP
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
d1
d2
d4
d5
d3
Target entities
Insight – Context Similarity
149
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
Similar
Insight – Context Similarity
150
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
Dissimilar
Insight – Context Similarity
151
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
Dissimilar
Insight – Leverage Homogeneity
Hypothesis: the context between two true mentions is more similar than between two false mentions across two distinct entities, as well as between a true mention and a false mention.
Caveat: the context of false mentions can be similar among themselves within an entity
152
Sun
IT Corp.SundaySurnamenewspaper
Apple
IT Corp.
fruit
HP
IT Corp.
horsepower
others
Insight – Comention
153
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
High confidence
Insight – Leverage Homogeneity
154
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
True
True
Insight – Leverage Homogeneity
155
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
True
True
True
Insight – Leverage Homogeneity
156
Microsoft and Apple are the developers of three of the most popular operating systems
Apple trees take four to five years to produce their first fruit…
Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …
CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy
Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …
True
True
True
False
False
MentionRank
157
Entity Id
Entity Name
e1 Microsoft
e2 Apple
𝑟23
𝑟21
𝑟11
𝑟22
d1
d2
d3
MentionRank
158
• 𝑟𝑖𝑗 ∈ [0,1] – confidence for
(ei,dj) to be a true mention.
• 𝜋𝑖𝑗 ∈ [0,1] – the prior
estimate of (ei,dj) to be a true mention (co-mention)
• 𝜇𝑖𝑗,𝑖′𝑗′ ∈ 0,1 – the weight
between (ei,dj) and (ei’,dj’) (context similarity)
Experiment results
159
(a) PL
Programming Languages Science Fiction Books
Java, Python, Ruby, Perl, R… A Pleasure to Burn, Golden Reflections, Pathfinder…
Type Entities in Web Tables & Lists [Limaye et al. 10]
160
Type Entities in Web Tables & Lists Collective typing - Annotate the entire column of a table or a list
161
Paper Taxonomy Method
[Limaye et al. 10] YAGO Conditional random field for joint annotation (supervised)
[Venetis et al. 11] Google’s IsA DB
1) Max likelihood (hit entities between 10% and 50%) 𝑝(𝑒|𝑐)𝑒 𝑝 𝑒 𝑐 ∝ 𝑝(𝑐|𝑒)/𝑝 𝑐 is smoothed by frequency of (e,c) in DB;
2) Majority (>50%) voting 1
𝑅𝑎𝑛𝑘 𝑐|𝑒𝑒
[Song et al. 11] MS.KB Naïve Bayes 𝑝 𝑐 𝑝 𝑒 𝑐𝑒 𝑝(𝑒|𝑐) is smoothed by frequency of (e,c) in MS.KB
[Pimplikar & Sarawagi 12]
Ad-hoc column name
A dual problem: find columns that best fit query concepts Conditional random fields (supervised)
Outline
1. Introduction to mining latent entity structures
2. Mining latent topic hierarchies
3. Mining latent entity relations
4. Mining latent entity concepts
5. Trends and research problems
162
Open Problems on Mining Latent Entity Structures
What is the best way to organize information and interact with users?
163
Understand the Data
System, architecture and database
Information quality and security
164
Coverage & Volatility
Utility
How do we design such a multi-layer organization system?
How do we control information quality and resolve conflicts?
Understand the People
NLP, ML, AI
HCI, Crowdsourcing, Web search, domain experts
165
Understand & answer natural language questions
Explore latent structures with user guidance
Mining Relations & Concepts from Multiple Sources Knowledgebase
Taxonomy
Web tables
Web pages
Domain text
Social media
Social networks
…
166
Freebase
Satori
Annotate Enrich
Enrich
Guide
Integration of NLP & Data Mining Approaches
NLP - analyzing single sentences Data mining - analyzing big data
167
Application to the slot filling task [Yu et al. 14b]
168
0 2 4 6 8 10 12 14 16 18 20
0
5
10
15
20
25
30
35
F-m
esa
ure
(%
)
System
Before
After
Enhance Individual SF Systems
0 10000 20000 30000 40000
0
2000
4000
6000
8000
10000
12000
14000
13
2
4
5
#tr
uth
s
6 Oracle
5 MTM
4 SVM
3 Linguistic Indicator
2 Voting
1 Baseline
#total responses
6
Truth finding efficiency
Mining Latent Entity Structures from Massive Unstructured & Linked Data
169
Concepts
Topics
Relations
text
entity
Frequent pattern mining + probabilistic analysis
Biography
170
Chi Wang
(http://cs.uiuc.edu/homes/chiwang1),
Ph.D. candidate in Computer Science, UIUC.
His Ph.D. research has been focused on
mining hidden entity structures from
massive data and information networks,
with over 30 research publications in refereed journals
and conferences including KDD, WWW, SIGIR, ICDM,
SDM, CIKM, ECMLPKDD, IJCAI, and SIGMOD. He is the
winner of the prestige Microsoft Research Graduate
Research Fellowship, KDDCUP'13 runner-up award, and
best paper award candidates in CIKM'13 and ICDM'13.
Jiawei Han (http://www.cs.uiuc.edu/hanj),
Abel Bliss Professor, Computer Science,
UIUC. He has been researching into data
mining, database systems, and information
networks, with over 600 conference and
journal publications. He is Fellow of ACM
and Fellow of IEEE, the first author of the textbook “Data
Mining: Concepts and Techniques" 3rd ed., (Morgan
Kaufmann, 2011), and the Director of Information Network
Academic Research Center, supported by the U.S. Army
Research Lab since 2009. He has delivered over 20 tutorials
in major conferences including SIGMOD, VLDB, KDD,
WWW, WSDM, ICDE, ICDM, SDM, ASONAM, and CIKM.
References 1. [Wang et al. 14a] C. Wang, X. Liu, Y. Song, J. Han. Scalable Moment-based Inference for Latent
Dirichlet Allocation, ECMLPKDD’14.
2. [Li et al. 14] R. Li, C. Wang, K. Chang. User Profiling in Ego Network: An Attribute and Relationship Type Co-profiling Approach, WWW’14.
3. [Danilevsky et al. 14] M. Danilevsky, C. Wang, N. Desai, X. Ren, J. Guo, J. Han. Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents“, SDM’14.
4. [Wang et al. 13b] C. Wang, M. Danilevsky, J. Liu, N. Desai, H. Ji, J. Han. Constructing Topical Hierarchies in Heterogeneous Information Networks, ICDM’13.
5. [Wang et al. 13a] C. Wang, M. Danilevsky, N. Desai, Y. Zhang, P. Nguyen, T. Taula, and J. Han. A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy, KDD’13.
6. [Li et al. 13] Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining Evidences for Named Entity Disambiguation, KDD’13.
171
References 7. [Wang et al. 12a] C. Wang, K. Chakrabarti, T. Cheng, S. Chaudhuri. Targeted Disambiguation
of Ad-hoc, Homogeneous Sets of Named Entities, WWW’12.
8. [Wang et al. 12b] C. Wang, J. Han, Q. Li, X. Li, W. Lin and H. Ji. Learning Hierarchical Relationships among Partially Ordered Objects with Heterogeneous Attributes and Links, SDM’12.
9. [Wang et al. 11] H. Wang, C. Wang, C. Zhai and J. Han. Learning Online Discussion Structures by Conditional Random Fields, SIGIR’11.
10. [Wang et al. 10] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu and J. Guo. Mining Advisor-advisee Relationship from Research Publication Networks, KDD’10.
11. [Danilevsky et al. 13] M. Danilevsky, C. Wang, F. Tao, S. Nguyen, G. Chen, N. Desai, J. Han. AMETHYST: A System for Mining and Exploring Topical Hierarchies in Information Networks, KDD’13.
172
References 12. [Sun et al. 11] Y. Sun, J. Han, X. Yan, P. S. Yu, T. Wu. Pathsim: Meta path-based top-k
similarity search in heterogeneous information networks, VLDB’11.
13. [Hofmann 99] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis, UAI’99.
14. [Blei et al. 03] D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet allocation, the Journal of machine Learning research, 2003.
15. [Griffiths & Steyvers 04] T. L. Griffiths, M. Steyvers. Finding scientific topics, Proc. of the National Academy of Sciences of USA, 2004.
16. [Anandkumar et al. 12] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, arXiv:1210.7559, 2012.
17. [Porteous et al. 08] I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation, KDD’08.
173
References 18. [Hoffman et al. 12] M. Hoffman, D. M. Blei, D. M. Mimno. Sparse stochastic inference for
latent dirichlet allocation, ICML’12.
19. [Yao et al. 09] L. Yao, D. Mimno, A. McCallum. Efficient methods for topic model inference on streaming document collections, KDD’09.
20. [Newman et al. 09] D. Newman, A. Asuncion, P. Smyth, M. Welling. Distributed algorithms for topic models, Journal of Machine Learning Research, 2009.
21. [Hoffman et al. 13] M. Hoffman, D. Blei, C. Wang, J. Paisley. Stochastic variational inference, Journal of Machine Learning Research, 2013.
22. [Griffiths et al. 04] T. Griffiths, M. Jordan, J. Tenenbaum, and D. M. Blei. Hierarchical topic models and the nested chinese restaurant process, NIPS’04.
23. [Kim et al. 12a] J. H. Kim, D. Kim, S. Kim, and A. Oh. Modeling topic hierarchies with the recursive chinese restaurant process, CIKM’12.
174
References 24. [Wang et al. 14b] C. Wang, X. Liu, Y. Song, J. Han. Scalable and Robust Construction of
Topical Hierarchies, arXiv: 1403.3460, 2014.
25. [Li & McCallum 06] W. Li, A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations, ICML’06.
26. [Mimno et al. 07] D. Mimno, W. Li, A. McCallum. Mixtures of hierarchical topics with pachinko allocation, ICML’07.
27. [Ahmed et al. 13] A. Ahmed, L. Hong, A. Smola. Nested chinese restaurant franchise process: Applications to user tracking and document modeling, ICML’13.
28. [Wallach 06] H. M. Wallach. Topic modeling: beyond bag-of-words, ICML’06.
29. [Wang et al. 07] X. Wang, A. McCallum, X. Wei. Topical n-grams: Phrase and topic discovery, with an application to information retrieval, ICDM’07.
175
References 30. [Lindsey et al. 12] R. V. Lindsey, W. P. Headden, III, M. J. Stipicevic. A phrase-discovering
topic model using hierarchical pitman-yor processes, EMNLP-CoNLL’12.
31. [Mei et al. 07] Q. Mei, X. Shen, C. Zhai. Automatic labeling of multinomial topic models, KDD’07.
32. [Blei & Lafferty 09] D. M. Blei, J. D. Lafferty. Visualizing Topics with Multi-Word Expressions, arXiv:0907.1013, 2009.
33. [Danilevsky et al. 14] M. Danilevsky, C. Wang, N. Desai, J. Guo, J. Han. Automatic construction and ranking of topical keyphrases on collections of short documents, SDM’14.
34. [Kim et al. 12b] H. D. Kim, D. H. Park, Y. Lu, C. Zhai. Enriching Text Representation with Frequent Pattern Mining for Probabilistic Topic Modeling, ASIST’12.
35. [El-kishky et al. 14] A. El-Kishky, Y. Song, C. Wang, C.R. Voss, J. Han. Scalable Topical Phrase Mining from Large Text Corpora, arXiv: 1406.6312, 2014.
176
References 36. [Zhao et al. 11] W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E.-P. Lim, X. Li. Topical
keyphrase extraction from twitter, HLT’11.
37. [Church et al. 91] K. Church, W. Gale, P. Hanks, D. Kindle. Chap 6, Using statistics in lexical analysis, 1991.
38. [Sun et al. 09a] Y. Sun, J. Han, J. Gao, Y. Yu. itopicmodel: Information network-integrated topic modeling, ICDM’09.
39. [Deng et al. 11] H. Deng, J. Han, B. Zhao, Y. Yu, C. X. Lin. Probabilistic topic models with biased propagation on heterogeneous information networks, KDD’11.
40. [Chen et al. 12] X. Chen, M. Zhou, L. Carin. The contextual focused topic model, KDD’12.
41. [Tang et al. 13] J. Tang, M. Zhang, Q. Mei. One theme in all views: modeling consensus topics in multiple contexts, KDD’13.
177
References 42. [Kim et al. 12c] H. Kim, Y. Sun, J. Hockenmaier, J. Han. Etm: Entity topic models for mining
documents associated with entities, ICDM’12.
43. [Cohn & Hofmann 01] D. Cohn, T. Hofmann. The missing link-a probabilistic model of document content and hypertext connectivity, NIPS’01.
44. [Blei & Jordan 03] D. Blei, M. I. Jordan. Modeling annotated data, SIGIR’03.
45. [Newman et al. 06] D. Newman, C. Chemudugunta, P. Smyth, M. Steyvers. Statistical Entity-Topic Models, KDD’06.
46. [Sun et al. 09b] Y. Sun, Y. Yu, J. Han. Ranking-based clustering of heterogeneous information networks with star network schema, KDD’09.
47. [Chang et al. 09] J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, D.M. Blei. Reading tea leaves: How humans interpret topic models, NIPS’09.
178
References 48. [Bunescu & Mooney 05a] R. C. Bunescu, R. J. Mooney. A shortest path dependency kernel for
relation extraction, HLT’05.
49. [Bunescu & Mooney 05b] R. C. Bunescu, R. J. Mooney. Subsequence kernels for relation extraction, NIPS’05.
50. [Zelenko et al. 03] D. Zelenko, C. Aone, A. Richardella. Kernel methods for relation extraction, Journal of Machine Learning Research, 2003.
51. [Culotta & Sorensen 04] A. Culotta, J. Sorensen. Dependency tree kernels for relation extraction, ACL’04.
52. [McCallum et al. 05] A. McCallum, A. Corrada-Emmanuel, X. Wang. Topic and role discovery in social networks, IJCAI’05.
53. [Leskovec et al. 10] J. Leskovec, D. Huttenlocher, J. Kleinberg. Predicting positive and negative links in online social networks, WWW’10.
179
References 54. [Diehl et al. 07] C. Diehl, G. Namata, L. Getoor. Relationship identification for social network
discovery, AAAI’07.
55. [Tang et al. 11] W. Tang, H. Zhuang, J. Tang. Learning to infer social ties in large networks, ECMLPKDD’11.
56. [McAuley & Leskovec 12] J. McAuley, J. Leskovec. Learning to discover social circles in ego networks, NIPS’12.
57. [Yakout et al. 12] M. Yakout, K. Ganjam, K. Chakrabarti, S. Chaudhuri. InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables, SIGMOD’12.
58. [Koller & Friedman 09] D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques, 2009.
59. [Bunescu & Pascal 06] R. Bunescu, M. Pasca. Using encyclopedic knowledge for named entity disambiguation, EACL’06.
180
References 60. [Cucerzan 07] S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data,
EMNLP-CoNLL’07.
61. [Ratinov et al. 11] L. Ratinov, D. Roth, D. Downey, M. Anderson. Local and global algorithms for disambiguation to wikipedia, ACL’11.
62. [Hoffart et al. 11] J. Hoffart, M. Yosef, I. Bordino, H. F•urstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum. Robust disambiguation of named entities in text, EMNLP’11.
63. [Limaye et al. 10] G. Limaye, S. Sarawagi, S. Chakrabarti. Annotating and searching web tables using entities, types and relationships, VLDB’10.
64. [Venetis et al. 11] P. Venetis, A. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, C. Wu. Recovering semantics of tables on the web, VLDB’11.
65. [Song et al. 11] Y. Song, H. Wang, Z. Wang, H. Li, W. Chen. Short Text Conceptualization using a Probabilistic Knowledgebase, IJCAI’11.
181
References 66. [Pimplikar & Sarawagi 12] R. Pimplikar, S. Sarawagi. Answering table queries on the web
using column keywords, VLDB’12.
67. [Yu et al. 14a] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, J. Han. Personalized Entity Recommendation: A Heterogeneous Information Network Approach, WSDM’14.
68. [Yu et al. 14b] D. Yu, H. Huang, T. Cassidy, H. Ji, C. Wang, S. Zhi, J. Han, C. Voss. The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding with Multi-layer Linguistic Indicators, COLING’14.
182