tutorial: mining latent entity structures

Mining Latent Entity Structures Jiawei Han and Chi Wang

Computer Science, University of I l l inois at Urbana -Champaign

June 25, 2014

1

Outline

1. Introduction to mining latent entity structures

2. Mining latent topic hierarchies

3. Mining latent entity relations

4. Mining latent entity concepts

5. Trends and research problems

2

Motivation of Mining Latent Entity Structures The prevalence of unstructured data

Structures are useful for knowledge discovery

3

Too expensive to be structured by human: Automated & scalable

Up to 85% of all

information is

unstructured

-- estimated by

industry analysts

Vast majority of the CEOs

expressed frustration over

their organization’s inability to

glean insights from available

data

-- IBM study with1500+ CEOs

Information Overload: A Critical Problem in Big Data Era

By 2020, information will double every 73 days

-- G. Starkweather (Microsoft), 1992

Unstructured or loosely structured data are prevalent

1700 1750 1800 1850 1900 1950 2000 2050

Information growth

4

Example: Research Publications Every year, hundreds of thousands papers are published

◦ Unstructured data: paper text

◦ Loosely structured entities: authors, venues

papers

venue

author

5

Example: News Articles Every day, >90,000 news articles are produced

◦ Unstructured data: news content

◦ Loosely structured entities: persons, locations, organizations, …

news

person

location organization

6

Example: Social Media Every second, >150K tweets are sent out

◦ Unstructured data: tweet content

◦ Loosely structured entities: twitters, hashtags, URLs, …

Darth Vader

The White House

#maythefourthbewithyou

tweets

twitter

hashtag

URL

7

A Semi-Structured Framework (or Model) for Unstructured and Loosely-Structured Data

news papers tweets

venue

author person

location organization

twitter

URL

hashtag

text

entity

8

Useful Latent Structures: Topics, Concepts, Relations Top 10 active politicians regarding healthcare issues?

Influential high-tech companies in Silicon Valley?

Top 10 researchers in data mining and their specializations?

Academic family on support vector machine?

Concepts (is-a)

Topics (hierarchical)

Relations text

entity

9

Mining Latent Structures: Topics, Concepts, Relations

10

Concepts

Topics

Relations

text

entity

Unstructured text & linked entities -> structured hierarchies

What Power Can We Gain if More Structures Can Be Discovered? Structured database queries

Information network analysis, …

11

Structures Facilitate Multi-Dimensional Analysis: An EventCube Experiment

12

http://fangbo.cs.illinois.edu:8080/

Distribution along Multiple Dimensions Query ‘health care bill’ in news data

13

Entity Analysis and Profiling Topic distribution for “Stanford University” 14

15 AMETHYST [DANILEVSKY ET AL. 13]

Structures Facilitate Heterogeneous Information Network Analysis

Real-world data: Multiple object types and/or multiple link types

Venue Paper Author

DBLP Bibliographic Network The IMDB Movie Network

Actor

Movie

Director

Movie

Studio

The Facebook Network

16

What Can Be Mined in Structured Information Networks

Example: DBLP: A Computer Science bibliographic database

Knowledge hidden in DBLP Network Mining Functions

Who are the leading researchers on Web search? Ranking

Who are the peer researchers of Jure Leskovec? Similarity Search

Whom will Christos Faloutsos collaborate with? Relationship Prediction

Which types of relationships are most influential for an author to decide her topics? Relation Strength Learning

How was the field of Data Mining emerged or evolving? Network Evolution

Which authors are rather different from his/her peers in IR? Outlier/anomaly detection

17

Similarity Search: Find Similar Objects in Networks Guided by Meta-Paths Who are very similar to Christos Faloutsos?

Meta-Path: Meta-level description of a path between two objects

Christos’s students or close collaborators Similar reputation at similar venues

Meta-Path: Author-Paper-Author (APA) Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)

Schema of the DBLP Network Different meta-paths lead to very

different results!

18

Similarity Search: PathSim Measure Helps Find Peer Objects in Long Tails

Anhai Doan

◦ CS, Wisconsin

◦ Database area

◦ PhD: 2002

Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)

• Jignesh Patel

• CS, Wisconsin

• Database area

• PhD: 1998

• Amol Deshpande

• CS, Maryland

• Database area

• PhD: 2004

• Jun Yang

• CS, Duke

• Database area

• PhD: 2001

PathSim [Sun et al. 11]

19

PathPredict: Meta-Path Based Relationship Prediction Meta path-guided prediction of links and relationships

Insight: Meta path relationships among similar typed links share similar semantics and are comparable and inferable

Bibliographic network: Co-author prediction (A—P—A)

paper topic

venue

author

publish publish-1

mention-1

mention write

write-1

contain/contain-1 cite/cite-1

vs.

20

Meta-Path Based Co-authorship Prediction Co-authorship prediction: Whether two authors start to collaborate

Co-authorship encoded in meta-path: Author-Paper-Author

Topological features encoded in meta-paths

Meta-Path Semantic Meaning

21

The prediction power of each meta-path

Derived by logistic regression

Heterogeneous Network Helps Personalized Recommendation Users and items with limited feedback are connected by a variety of paths Different users may require different models: Relationship heterogeneity makes personalized recommendation models easier to define

Avatar Titanic Aliens Revolutionary Road

James Cameron

Kate Winslet

Leonardo Dicaprio

Zoe Saldana

Adventure Romance

Collaborative filtering methods suffer from the data sparsity issue

# of users or items

A small set of users & items have a large number of ratings

Most users and items have a small number of ratings

# o

f ra

tin

gs

Personalized recommendation with heterogeous networks [Yu et al. 14a]

22

Personalized Recommendation in Heterogeneous Networks Datasets: Methods to compare:

◦ Popularity: Recommend the most popular items to users

◦ Co-click: Conditional probabilities between items

◦ NMF: Non-negative matrix factorization on user feedback

◦ Hybrid-SVM: Use Rank-SVM to utilize both user feedback and information network

Winner: HeteRec personalized recommendation (HeteRec-p)

23

Outline






24

Mining Latent Topical Hierarchy

25

Topics

text


entity

Topic Hierarchy: Summarize the Data with Multiple Granularity

Top 10 researchers in data mining? ◦ And their specializations?

Important research areas in SIGIR conference?

Computer Science

Information technology &

system

Database Information

retrieval

… …

Theory of computation

… …

…

papers

venue

author

26

Methodologies of Topic Mining

27

C. An integrated framework: CATHY

i) Recursive topic discovery ii) Phrase mining iii) Phrase and entity ranking

B. Extension of topic modeling

i) Flat -> hierarchical ii) Unigrams -> ngrams iii) Text -> text + entity

A. Traditional bag-of-words topic modeling


28






A. Bag-of-Words Topic Modeling Widely studied technique for text analysis

◦ Summarize themes/aspects

◦ Facilitate navigation/browsing

◦ Retrieve documents

◦ Segment documents

◦ Many other text mining tasks

Represent each document as a bag of words: all the words within a document are exchangeable

Probabilistic approach

29

Topic: Multinomial Distribution over Words A document is modeled as a sample of mixed topics

How can we discover these topic word distributions from a corpus?

30

[ Criticism of government response to the

hurricane primarily consisted of criticism of its

response to the approach of the storm and its

aftermath, specifically in the delayed response ] to

the [ flooding of New Orleans. … 80% of the 1.3

million residents of the greater New Orleans

metropolitan area evacuated ] …[ Over seventy

countries pledged monetary donations or other

assistance]. …

Topic 1

Topic 3

Topic 2

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

help 0.02

...

city 0.2

new 0.1

orleans 0.05

...

EXAMPLE FROM CHENGXIANG ZHAI'S LECTURE NOTES

Routine of Generative Models Model design: assume the documents are generated by a certain process

Model Inference: Fit the model with observed documents to recover the unknown parameters Θ

31

Generative process with unknown parameters Θ

Criticism of government

response to the hurricane …

corpus

Two representative models: pLSA and LDA

Probabilistic Latent Semantic Analysis (PLSA) [Hofmann 99] 𝑘 topics: 𝑘 multinomial distributions over words

𝐷 documents: 𝐷 multinomial distributions over topics

32

Topic 𝝓𝟏 Topic 𝝓𝒌

… government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

.4 .3 .3 Doc 𝜃1

.2 .5 .3 Doc 𝜃𝐷

…

Generative process: we will generate each token in each document 𝑑 according to 𝜙, 𝜃

PLSA – Model Design 𝑘 topics: 𝑘 multinomial distributions over words

𝐷 documents: 𝐷 multinomial distributions over topics

33


… government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

.4 .3 .3 Doc 𝜃1

.2 .5 .3 Doc 𝜃𝐷

…

To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 (e.g. z=1) 2. Sample a word w according to 𝜙𝑧 (e.g. w=government)

.4 .3 .3

Topic 𝝓𝒛

PLSA – Model Inference

What parameters are most likely to generate the observed corpus?

34

To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 (e.g. z=1) 2. Sample a word w according to 𝜙𝑧 (e.g. w=government)

.4 .3 .3

Topic 𝝓𝒛



corpus Topic 𝝓𝟏 Topic 𝝓𝒌

…

.? .? .? Doc 𝜃1

.? .? .? Doc 𝜃𝐷

…

government ?

response ?

...

donate ?

relief ?

...

PLSA – Model Inference using Expectation-Maximization (EM)

35

E-step: Fix 𝜙, 𝜃, estimate topic labels 𝑧 for every token in every document M-step: Use estimated topic labels 𝑧 to estimate 𝜙, 𝜃 Guaranteed to converge to a stationary point, but not guaranteed optimal



corpus

Exact max likelihood is hard => approximate optimization with EM


…

.? .? .? Doc 𝜃1

.? .? .? Doc 𝜃𝐷

…

government ?

response ?

...

donate ?

relief ?

...

How the EM Algorithm Works

36

.4 .3 .3


…

Doc 𝜃1

Doc 𝜃𝐷

…

.2 .5 .3

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

response

criticism

government

hurricane

government

d1

dD Sum fractional

counts

response

M-step

…

E-step

Bayes rule

k

j wjjd

wjjd

k

jjzwpdjzp

jzwpdjzpwdjzp

1' ,'',

,,

1')'|()|'(

)|()|(),|(

Analysis of pLSA PROS

Simple, only one hyperparameter k

Easy to incorporate prior in the EM algorithm

CONS

High model complexity -> prone to overfitting

The EM solution is neither optimal nor unique

37

Latent Dirichlet Allocation (LDA) [Blei et al. 02] Impose Dirichlet prior to the model parameters -> Bayesian version of pLSA

38

Generative process: First generate 𝜙, 𝜃 with Dirichlet prior, then generate each token in each document 𝑑 according to 𝜙, 𝜃


… government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

.4 .3 .3 Doc 𝜃1

.2 .5 .3 Doc 𝜃𝐷

…

𝛽

𝛼

Same as pLSA

To mitigate overfitting

LDA – Model Inference MAXIMUM LIKELIHOOD

Aim to find parameters that maximize the likelihood

Exact inference is intractable

Approximate inference ◦ Variational EM [Blei et al. 03]

◦ Markov chain Monte Carlo (MCMC) – collapsed Gibbs sampler [Griffiths & Steyvers 04]

METHOD OF MOMENTS

Aim to find parameters that fit the moments (expectation of patterns)

Exact inference is tractable ◦ Tensor orthogonal decomposition

[Anandkumar et al. 12]

◦ Scalable tensor orthogonal decomposition [Wang et al. 14a]

39

MCMC – Collapsed Gibbs Sampler [Griffiths & Steyvers 04]

40

response

criticism

government

hurricane

government

d1

dD

response

…

Iter 1 Iter 2 Iter 1000 …

…

… Topic 𝝓𝟏 Topic 𝝓𝒌

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

kn

n

VN

NjzP

i

i

i

d

d

j

j

j

w

ii

)(

)(

)(

)(

),|( zwSample each zi conditioned on z-i

Estimated 𝜙𝑗,𝑤𝑖 Estimated 𝜃𝑑𝑖,𝑗

Method of Moments [Anandkumar et al. 12, Wang et al. 14a]

What parameters are most likely to generate the observed corpus? Criticism of

government response to the

hurricane …

corpus Topic 𝝓𝟏 Topic 𝝓𝒌

… government ?

response ?

...

donate ?

relief ?

...

criticism government response: 0.001

government response hurricane: 0.005

criticism response hurricane: 0.004

:

criticism: 0.03

response: 0.01

government: 0.04

:

criticism response: 0.001

criticism government: 0.002

government response: 0.003

:

Moments: expectation of patterns

What parameters fit the empirical moments?

length 1 length 2 (pair) length 3 (triple)

41

Guaranteed Topic Recovery Theorem. The patterns up to length 3 are sufficient for topic recovery

𝑀2 = 𝜆𝑗𝝓𝒋 ⊗𝝓𝒋

𝑘

𝑗=1

, 𝑀3 = 𝜆𝑗𝝓𝒋 ⊗𝝓𝒋 ⊗𝝓𝒋

𝑘

𝑗=1

42

criticism government response: 0.001

government response hurricane: 0.005

criticism response hurricane: 0.004

:

criticism: 0.03

response: 0.01

government: 0.04

:

criticism response: 0.001

criticism government: 0.002

government response: 0.003

:

length 1 length 2 (pair) length 3 (triple)

V

V: vocabulary size; k: topic number V

V

V V

Tensor Orthogonal Decomposition for LDA

43

A: 0.03 AB: 0.001 ABC: 0.001

B: 0.01 BC: 0.002 ABD: 0.005

C: 0.04 AC: 0.003 BCD: 0.004

: : :

Normalized pattern counts

𝑀2

𝑀3

V

V: vocabulary size k: topic number

V

V

V

V

k k

k

𝑇

Input corpus

Topic 𝝓𝟏

Topic 𝝓𝒌

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

[ANANDKUMAR ET AL. 12]

Tensor Orthogonal Decomposition for LDA – Not Scalable

44

A: 0.03 AB: 0.001 ABC: 0.001

B: 0.01 BC: 0.002 ABD: 0.005

C: 0.04 AC: 0.003 BCD: 0.004

: : :


𝑀2

𝑀3

V

V

V

V

V

k k

k

𝑇

Input corpus

Topic 𝝓𝟏

Topic 𝝓𝒌

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

Prohibitive to compute

Time: 𝑶 𝑽𝟑𝒌 + 𝑳𝒍𝟐

Space: 𝑶 𝑽𝟑

V: vocabulary size; k: topic number L: # tokens; l: average doc length

Scalable Tensor Orthogonal Decomposition

45

A: 0.03 AB: 0.001 ABC: 0.001

B: 0.01 BC: 0.002 ABD: 0.005

C: 0.04 AC: 0.003 BCD: 0.004

: : :


𝑀2

𝑀3

V

V

V

V

V

k k

k

𝑇

Input corpus

Topic 𝝓𝟏

Topic 𝝓𝒌

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

Sparse & low rank

Decomposable

1st scan

2nd scan

Time: 𝑶 𝑳𝒌𝟐 + 𝒌𝒎

Space: 𝑶 𝒎 # nonzero 𝒎≪ 𝑽𝟐

[WANG ET AL. 14A]

STOD is 20-3000 times faster

And the recovery error is low when the sample is large enough

Variance is almost 0

46

STOD – Scalable tensor orthogonal decomposition TOD – Tensor orthogonal decomposition

L=19M L=39M

Runtime

Recovery error

Summary of LDA Model Inference MAXIMUM LIKELIHOOD

Approximate inference ◦ slow, scan data thousands of times

◦ large variance, no theoretic guarantee

Numerous follow-up work ◦ further approximation [Porteous et al.

08, Yao et al. 09, Hoffman et al. 12] etc.

◦ parallelization [Newman et al. 09] etc.

◦ online learning [Hoffman et al. 13] etc.

METHOD OF MOMENTS

STOD [Wang et al. 14a] ◦ fast, scan data twice

◦ robust recovery with theoretic guarantee

New and promising!

47


48






B. i) Flat Topics -> Hierarchical Topics In PLSA and LDA, a topic is selected from a flat pool of topics

In hierarchical topic models, a topic is selected from a hierarchy

49


… government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

To generate a token in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 2. Sample a word w according to 𝜙𝑧

.4 .3 .3

Topic 𝝓𝒛

o

o/1 o/2

o/2/1 o/1/1 o/1/2 o/2/2

Information technology & system

DB IR

CS

Hierarchical Topic Models Topics form a tree structure

◦ nested Chinese Restaurant Process [Griffiths et al. 04]

◦ recursive Chinese Restaurant Process [Kim et al. 12a]

◦ LDA with Topic Tree [Wang et al. 14b]

Topics form a DAG structure ◦ Pachinko Allocation [Li & McCallum 06]

◦ hierarchical Pachinko Allocation [Mimno et al. 07]

◦ nested Chinese Restaurant Franchise [Ahmed et al. 13]

50

o

o/1 o/2

o/2/1 o/1/1 o/1/2 o/2/2

o

o/1 o/2

o/2/1 o/1/1 o/1/2 o/2/2

DAG: DIRECTED ACYCLIC GRAPH

Hierarchical Topic Model Inference MAXIMUM LIKELIHOOD

Exact inference is intractable

Approximate inference: variational inference or MCMC

Non recursive – all the topics are inferred at once

METHOD OF MOMENTS

Scalable Tensor Recursive Orthogonal Decomposition [Wang et al. 14b] ◦ fast and robust recovery with theoretic

guarantee

Recursive method - only for LDA with Topic Tree model

51

Most popular

Recursive Inference for LDA with Topic Tree A large tree subsumes a smaller tree with shared model parameters

52

Inference order

[WANG ET AL. 14B]

Flexible to decide when to terminate

Easy to revise the tree structure

Scalable Tensor Recursive Orthogonal Decomposition

53

A: 0.03 AB: 0.001 ABC: 0.001

B: 0.01 BC: 0.002 ABD: 0.005

C: 0.04 AC: 0.003 BCD: 0.004

: : :

Normalized pattern counts for t

k k

k

𝑇 (𝑡)

Input corpus

Topic 𝝓𝒕/𝟏

Topic 𝝓𝒕/𝒌

…

government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

[WANG ET AL. 14B]

+ Topic t

B. ii) Unigrams -> N-Grams Motivation: unigrams can be difficult to interpret

54

learning

reinforcement

support

machine

vector

selection

feature

random

:

versus

learning

support vector machines

reinforcement learning

feature selection

conditional random fields

classification

decision trees

:

The topic that represents the area of Machine Learning

Various Strategies Strategy 1: generate bag-of-words -> generate sequence of tokens

◦ Bigram topical model [Wallach 06], topical n-gram model [Wang et al. 07], phrase discovering topic model [Lindsey et al. 12]

Strategy 2: post bag-of-words model inference, visualize topics with n-grams ◦ Label topic [Mei et al. 07], TurboTopic [Blei & Lafferty 09], KERT [Danilevsky et al. 14]

Strategy 3: prior bag-of-words model inference, mine phrases and impose to the bag-of-words model ◦ Frequent pattern-enriched topic model [Kim et al. 12b], ToPMine [El-kishky et al. 14]

55

Strategy 1 – Topical N-Gram Model & Phrase Discovering Topic Model

56

To generate a token in document 𝑑: 1. Sample a binary variable 𝑥 according to the previous token & topic label 2. Sample a topic label 𝑧 according to 𝜃𝑑 3. If 𝑥 = 0 (new phrase), sample a word w according to 𝜙𝑧; otherwise,

sample a word w according to 𝑧 and the previous token

[white

house]

[reports

[white]

[black d1 dD

color]

…

0

z x

0

1

0

0

1

z x

High model complexity - overfitting High inference cost - slow MCMC inference

[WANG ET AL. 07, LINDSEY ET AL. 12]

Strategy 2 – Topical Keyphrase Extraction & Ranking (KERT)

57

learning


reinforcement learning

feature selection


classification

decision trees

:

Topical keyphrase

extraction & ranking

knowledge discovery using least squares support vector machine classifiers

support vectors for reinforcement learning

a hybrid approach to feature selection

pseudo conditional random fields

automatic web page classification in a dynamic and hierarchical way

inverse time dependency in convex regularized learning

postprocessing decision trees to extract actionable knowledge

variance minimization least squares support vector machines

…

Unigram topic assignment: Topic 1 & Topic 2

[DANILEVSKY ET AL. 14]

Framework of KERT 1. Run bag-of-words model inference, and assign topic label to each token

2. Extract candidate keyphrases within each topic

3. Rank the keyphrases in each topic

◦ Popularity: ‘information retrieval’ vs. ‘cross-language information retrieval’

◦ Discriminativeness: only frequent in documents about topic t

◦ Concordance: ‘active learning’ vs.‘learning classification’

◦ Completeness: ‘vector machine’ vs. ‘support vector machine’

58

Frequent pattern mining

Comparability property: directly compare phrases of mixed lengths

Comparison of phrase ranking methods

The topic that represents the area of Machine Learning 59

kpRel [Zhao et al. 11]

KERT (-popularity)

KERT (-discriminativeness)

KERT (-concordance)

KERT [Danilevsky et al. 14]

learning effective support vector machines learning learning

classification text feature selection classification support vector machines

selection probabilistic reinforcement learning selection reinforcement learning

models identification conditional random fields feature feature selection

algorithm mapping constraint satisfaction decision conditional random fields

features task decision trees bayesian classification

decision planning dimensionality reduction trees decision trees

: : : : :

Strategy 3 – Phrase Mining + Topic Model (ToPMine)

60

Phrase mining and

document segmentation

knowledge discovery using least squares support vector machine classifiers…

Knowledge discovery and support vector machine should have coherent topic labels

[EL-KISHKY ET AL. 14]

Strategy 2: the tokens in the same phrase may be assigned to different topics

Solution: switch the order of phrase mining and topic model inference

[knowledge discovery] using [least squares]

[support vector machine] [classifiers] …


[support vector machine] [classifiers] …

Topic model inference

with phrase constraints

More challenging than in strategy 2!

Phrase Mining: Frequent Pattern Mining + Statistical Analysis

61

Significance score [Church et al. 91] 𝛼(𝐴, 𝐵)

=|𝐴𝐵| − |𝐴||𝐵|/𝑛

𝐴𝐵

Good Phrases

Phrase Mining: Frequent Pattern Mining + Statistical Analysis

62

Raw freq

[support vector machine]: 90 80

[vector machine]: 95 0

[support vector]: 100 20

True freq

[Markov blanket] [feature selection] for [support

vector machines]


[support vector machine] [classifiers]

…[support vector] for [machine learning]…

Significance score [Church et al. 91] 𝛼(𝐴, 𝐵)

=|𝐴𝐵| − |𝐴||𝐵|/𝑛

𝐴𝐵

Example Topical Phrases

63

ToPMine [El-kishky et al. 14] – Strategy 3 (67 seconds)

information retrieval feature selection

social networks machine learning

web search semi supervised

search engine large scale

information extraction support vector machines

question answering active learning

web pages face recognition

: :

Topic 1 Topic 2

social networks information retrieval

web search text classification

time series machine learning

search engine support vector machines

management system information extraction

real time neural networks

decision trees text categorization

: :

Topic 1 Topic 2

PDLDA [Lindsey et al. 12] – Strategy 1 (3.72 hours)

ToPMine: Experiments on Associate Press News (1989) 64

ToPMine: Experiments on Yelp Reviews 65

66

Runtime

Strategy 3 1 2 1 2

fastest slow slow fast

Comparison of three strategies

strategy 3 > strategy 2 > strategy 1

Coherence of topics

Summary of Topical N-Gram Mining Strategy 1: generate bag-of-words -> generate sequence of tokens

◦ integrated complex model; phrase quality and topic inference rely on each other

◦ slow and overfitting

Strategy 2: post bag-of-words model inference, visualize topics with n-grams ◦ phrase quality relies on topic labels for unigrams

◦ can be fast

Strategy 3: prior bag-of-words model inference, mine phrases and impose to the bag-of-words model ◦ topic inference relies on correct partition of documents, but not sensitive

◦ can be fast

67

B. iii) Text Only -> Text + Entity

What should be the output?

How to use linked entity information?

68

text



Text-only corpus

entity


… government 0.3

response 0.2

...

donate 0.1

relief 0.05

...

.4 .3 .3 Doc 𝜃1

.2 .5 .3 Doc 𝜃𝐷

…

Three Modeling Strategies RESEMBLE ENTITIES TO DOCUMENTS

An entity has a multinomial distribution over topics

RESEMBLE ENTITIES TO WORDS

A topic has a multinomial distribution over each type of entities

69

.3 .4 .3

.2 .5 .3 SIGMOD

Surajit Chaudhuri

… Topic 1

KDD 0.3

ICDM 0.2

...

Over venues

Jiawei Han 0.1

Christos Faloustos 0.05

...

Over authors

RESEMBLE ENTITIES TO TOPICS

An entity has a multinomial distribution over words

SIGMOD database 0.3

system 0.2

...

Resemble Entities to Documents Regularization - Linked documents or entities have similar topic distributions

◦ iTopicModel [Sun et al. 09a]

◦ TMBP-Regu [Deng et al. 11]

Use entities as additional sources of topic choices for each token ◦ Contextual focused topic model [Chen et al. 12] etc.

Aggregate documents linked to a common entity as a pseudo document ◦ Co-regularization of inferred topics under multiple views [Tang et al. 13]

70

Resemble Entities to Documents Regularization - Linked documents or entities have similar topic distributions

71

iTopicModel [Sun et al. 09a] TMBP-Regu [Deng et al. 11]

Doc 𝜃2

Doc 𝜃3

Doc 𝜃1

𝜃2 should be similar to 𝜃1, 𝜃3 𝜃1d should be similar to 𝜃5

𝑢, 𝜃2𝑢, 𝜃2

𝑣

Resemble Entities to Documents Use entities as additional sources of topic choice for each token

◦ Contextual focused topic model [Chen et al. 12]

72

To generate a token in document 𝑑: 1. Sample a variable 𝑥 for the context type 2. Sample a topic label 𝑧 according to 𝜃 of the context type decided by 𝑥 3. Sample a word w according to 𝜙𝑧

.4 .3 .3 𝑥 = 1, sample 𝑧 from document’s topic distribution

𝑥 = 2, sample 𝑧 from author’s topic distribution .3 .4 .3

.2 .5 .3 𝑥 = 3, sample 𝑧 from venue’s topic distribution

On Random Sampling over Joins

Surajit Chaudhuri SIGMOD

Resemble Entities to Documents Aggregate documents linked to a common entity as a pseudo document

◦ Co-regularization of inferred topics under multiple views [Tang et al. 13]

73

Document view

A single paper Author view

All Surajit Chaudhuri’s

papers

Venue view

All SIGMOD papers

Topic 𝝓𝟏 Topic 𝝓𝒌 …





74

.3 .4 .3

.2 .5 .3 SIGMOD

Surajit Chaudhuri

… Topic 1

KDD 0.3

ICDM 0.2

...

Over venues

Jiawei Han 0.1


...

Over authors



SIGMOD database 0.3

system 0.2

...

Resemble Entities to Topics Entity-Topic Model (ETM) [Kim et al. 12c]

75

Topic 𝝓𝟏

… data 0.3

mining 0.2

...

SIGMOD

database 0.3

system 0.2

...

…

Surajit Chaudhuri

database 0.1

query 0.1

...

…

text venue author

To generate a token in document 𝑑: 1. Sample an entity 𝑒 2. Sample a topic label 𝑧 according to 𝜃𝑑 3. Sample a word w according to 𝜙𝑧,𝑒

𝜙𝑧,𝑒~𝐷𝑖𝑟(𝑤1𝜙𝑧 +𝑤2𝜙𝑒)

Paper text Surajit Chaudhuri SIGMOD

Example topics learned by ETM

On a news dataset about Japan tsunami 2011

76

𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧

𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙𝑧,𝑒 𝜙e





77

.3 .4 .3

.2 .5 .3 SIGMOD

Surajit Chaudhuri

… Topic 1

KDD 0.3

ICDM 0.2

...

Over venues

Jiawei Han 0.1


...

Over authors



SIGMOD database 0.3

system 0.2

...

Resemble Entities to Words Entities as additional elements to be generated for each doc

◦ Conditionally independent LDA [Cohn & Hofmann 01]

◦ CorrLDA1 [Blei & Jordan 03]

◦ SwitchLDA & CorrLDA2 [Newman et al. 06]

◦ NetClus [Sun et al. 09b]

78

To generate a token/entity in document 𝑑: 1. Sample a topic label 𝑧 according to 𝜃𝑑 2. Sample a token w / entity e according to 𝜙𝑧 or 𝜙𝑧

𝑒

Topic 1

KDD 0.3

ICDM 0.2

...

venues

Jiawei Han 0.1


...

authors

data 0.2

mining 0.1

...

words

Comparison of Three Modeling Strategies for Text + Entity RESEMBLE ENTITIES TO DOCUMENTS

Entities regularize textual topic discovery


Entities enrich and regularize the textual representation of topics

79

.3 .4 .3

.2 .5 .3 SIGMOD

Surajit Chaudhuri

… Topic 1

KDD 0.3

ICDM 0.2

...

Over venues

Jiawei Han 0.1


...

Over authors


Each entity has its own profile SIGMOD database 0.3

system 0.2

... # params = k*E*V

# params = k*(E+V)


80






Break

81

C. An Integrated Framework

How to choose & integrate?

82

Recursive Non recursive Hierarchy

Sequence of tokens generative model

• Strategy 1

Post inference, visualize topics with n-grams

• Strategy 2

Prior inference, mine phrases and impose to the bag-of-words model

• Strategy 3

P

h

r

a

s

e

E

n

t

i

t

y

Resemble entities to documents

• Modeling strategy 1

Resemble entities to topics


Resemble entities to words


C. An Integrated Framework

Compatible & effective

83

Recursive Non recursive Hierarchy

P

h

r

a

s

e

E

n

t

i

t

y

Resemble entities to documents


Resemble entities to topics


Resemble entities to words


Sequence of tokens generative model

• Strategy 1

Post inference, visualize topics with n-grams

• Strategy 2

Prior model inference, mine phrases and impose to the bag-of-words model

• Strategy 3

C. An Integrated Framework: Construct A Topical HierarchY (CATHY)

Hierarchy + phrase + entity

84

i) Hierarchical topic discovery with entities

ii) Phrase mining

iii) Rank phrases & entities per topic

Output hierarchy with phrases & entities

Input collection

text

o

o/1

o/1/1 o/1/2

o/2

o/2/1

entity

Mining Framework – CATHY Construct A Topical HierarchY

85


ii) Phrase mining



Input collection

text

o

o/1

o/1/1 o/1/2

o/2

o/2/1

entity

Hierarchical Topic Discovery with Multi-Typed Entities [Wang et al. 13a,b] Resemble entities to words – every topic has a multinomial distribution over each type of entities

86

Topic 1

KDD 0.3

ICDM 0.2

...

Jiawei Han 0.1


...

data 0.2

mining 0.1

...

Topic k

𝜙11 𝜙1

2 𝜙13

𝜙𝑘1 𝜙𝑘

2 𝜙𝑘3

SIGMOD 0.3

VLDB 0.3

...

Surajit Chaudhuri 0.1

Jeff Naughton 0.05

...

database 0.2

system 0.1

...

…

venues authors words

Hierarchical Topic Discovery: Link Patterns

87

Computing machinery and intelligence

intelligence

computing machinery

A.M. Turing

A.M. Turing

Hierarchical Topic Discovery: Link-Weighted Heterogeneous Network

88

word author venue

text

A.M. Turing intelligence

system database

SIGMOD

venue

author

Hierarchical Topic Discovery: Generative Model for Link Patterns A single link has a latent topic path z

89

o

o/1 o/2

o/2/1 o/1/1 o/1/2 o/2/2

Information technology & system

DB IR

To generate a link between type 𝑡1 and type 𝑡2: 1. Sample a topic label 𝑧 according to 𝜃

Suppose 𝑡1 = 𝑡2 = word

Hierarchical Topic Discovery: Generative Model for Link Patterns

90

database


2. Sample the first end node 𝑢 according to 𝜙𝑧𝑡1


Topic o/1/2

database 0.2

system 0.1

...


91

database system


2. Sample the first end node 𝑢 according to 𝜙𝑧𝑡1

3. Sample the second end node 𝑣 according to 𝜙𝑧𝑡2


Topic o/1/2

database 0.2

system 0.1

...


92

Equivalently, we can generate # links between u and v:

𝑒𝑢,𝑣 = 𝑒𝑢,𝑣1 + ⋯+ 𝑒𝑢,𝑣

𝑘 , 𝑒𝑢,𝑣𝑧 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝜃𝑧 𝜙𝑧,𝑢

𝑡1 𝜙𝑧,𝑣𝑡2 )


database system

0 1 2 3 4 5

0 1 2 3 4 5

𝑒𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒,𝑠𝑦𝑠𝑡𝑒𝑚𝑜/1/2 𝐷𝐵

~

𝑒𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒,𝑠𝑦𝑠𝑡𝑒𝑚𝑜/1/1(𝐼𝑅)

~

database system 5

4

1

Recursive Model Inference Using Expectation-Maximization (EM)

93

system database

system

database

system database

Topic o/1 Topic o/2 Topic o

100 95 5

+

Topic o/1

KDD 0.3

ICDM 0.2

...

Jiawei Han 0.1


...

data 0.2

mining 0.1

...

𝜙𝑜/11 𝜙𝑜/1

2 𝜙𝑜/13

…

Topic o/k

... ... ...

𝜙𝑜/𝑘1 𝜙𝑜/𝑘

2 𝜙𝑜/𝑘3

Bayes rule Sum &

normalize counts

M-ste

p

E-ste

p

Top-Down Recursion

94

system database

system

database

system database

Topic o/1 Topic o/2 Topic o

100 95 5

+

system

database

Topic o/1

95 system

database

system database

65 30

+

Topic o/1/1 Topic o/1/2

Extension: Learn Link Type Importance Different link types may have different importance in topic discovery

Introduce a link type weight 𝜶𝒙,𝒚

◦ Original link weight 𝒆𝒊,𝒋𝒙,𝒚,𝒛

→ 𝜶𝒙,𝒚𝒆𝒊,𝒋𝒙,𝒚,𝒛

◦ 𝛼 > 1 – more important

◦ 0 < 𝛼 < 1 – less important

we can assume w.l.o.g 𝛼𝑥,𝑦𝑛𝑥,𝑦

𝑥,𝑦 = 1

The EM solution is invariant to a constant scaleup of all the link weights

rescale

95

Optimal Weight

Average link weight KL-divergence of prediction from observation

96

Learned Link Importance & Topic Coherence

97

Learned importance of different link types Level Word-word Word-author Author-author Word-venue Author-venue

1 .2451 .3360 .4707 5.7113 4.5160

2 .2548 .7175 .6226 2.9433 2.9852

-1

0

1

2

NetClus CATHY (equal importance) CATHY (learn importance)

Average pointwise mutual information (PMI) of each topic

Word-word Word-author Author-author Word-venue Author-venue Overall

Phrase Mining

Frequent pattern mining; no NLP parsing

Statistical analysis for filtering bad phrases

98


ii) Phrase mining



Input collection

text o

o/1

o/1/1 o/1/2

o/2

o/2/1

Examples of Mined Phrases News Computer science

99

information retrieval feature selection

social networks machine learning

web search semi supervised

search engine large scale

information extraction support vector machines

question answering active learning

web pages face recognition

: :

: :

energy department president bush

environmental protection agency white house

nuclear weapons bush administration

acid rain house and senate

nuclear power plant members of congress

hazardous waste defense secretary

savannah river capital gains tax

: :

: :

Phrase & Entity Ranking

Ranking criteria: popular, discriminative, concordant

100

1. Hierarchical topic discovery w/ entities

2. Phrase mining

3. Rank phrases & entities per topic

Output hierarchy w/ phrases & entities

Input collection

text o

o/1

o/1/1 o/1/2

o/2

o/2/1

entity

Phrase & Entity Ranking – Estimate Topical Frequency

E.g.

𝑝 𝑧 = 𝐷𝐵 𝑞𝑢𝑒𝑟𝑦 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 =𝑝 𝑧=𝐷𝐵 𝑝 𝑞𝑢𝑒𝑟𝑦 𝑧=𝐷𝐵 𝑝 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑧=𝐷𝐵

𝑝 𝑧=𝑡 𝑝 𝑞𝑢𝑒𝑟𝑦 𝑧=𝑡 𝑝 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑧=𝑡𝑡

=𝜃𝐷𝐵𝜙𝐷𝐵,𝑞𝑢𝑒𝑟𝑦𝜙𝐷𝐵,𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔

𝜃𝑡𝜙𝑡,𝑞𝑢𝑒𝑟𝑦𝜙𝑡,𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔𝑡

101

Pattern Total ML DB DM IR

support vector machines 85 85 0 0 0 query processing 252 0 212 27 12 Hui Xiong 72 0 0 66 6 SIGIR 2242 444 378 303 1117

Estimated by Bayes rule Frequent pattern mining

Phrase & Entity Ranking – Ranking Function ‘Popular’ indicator of phrase or entity 𝐴 in topic 𝑡: 𝑝 𝐴 𝑡

‘Discriminative’ indicator of phrase or entity 𝐴 in topic 𝑡: log𝑝 𝐴 𝑡

𝑝 𝐴 𝑇

‘Concordance’ indicator of phrase 𝐴: 𝛼(𝐴) =|𝐴|−𝐸( 𝐴 )

𝑠𝑡𝑑( 𝐴 )

𝑟𝑡 𝐴 = 𝑝 𝐴 𝑡 log𝑝 𝐴 𝑡

𝑝 𝐴 𝑇+ 𝜔𝛼(𝐴)

Pointwise KL-divergence

102

𝑇: topic for comparison

Significance score used for phrase mining

Example topics: database & information retrieval

103

database system

query processing

concurrency control…

Divesh Srivastava

Surajit Chaudhuri

Jeffrey F. Naughton…

ICDE

SIGMOD

VLDB…

text categorization

text classification

document clustering

multi-document summarization…

relevance feedback

query expansion

collaborative filtering

information filtering…

……

……

information retrieval

retrieval

question answering…

W. Bruce Croft

James Allan

Maarten de Rijke…

SIGIR

ECIR

CIKM…

…

Evaluation: Intrusion detection [Chang et al. 09]

Evaluation method - Intrusion detection

104

Phrase Intrusion

Question 1/80 Topic Intrusion

Parent topic database systems data management query processing management system data system

Child topic 1 web search search engine semantic web search results web pages

Child topic 2 data management data integration data sources data warehousing data applications

Child topic 3 query processing query optimization query databases relational databases query data

Child topic 4 database system database design expert system management system design system

Question 1/130 data mining association rules logic programs data streams

Question 2/130 natural language query optimization data management database systems

parent

child1 child2 intruder

Phrase 1

Phrase 2

Phrase 3

Phrase 4

…

Phrase 1

Phrase 2

Phrase 3

Phrase 4

…

normal intruder

which child topic does not belong to the given parent topic?

which phrase does not belong?

Ranked phrases + entities > unigrams

105

0%

20%

40%

60%

80%

100%

CS Topic Intrusion NEWS Topic Intrusion

% of the hierarchy interpreted by people

1. hPAM 2. NetClus 3. CATHY (unigram)

3 + phrase 3 + entity 3 + phrase + entity

65% 66%

Applications: entity, community profiling…

106

Christos Faloutsos in data mining

data mining / data streams / time series /

association rules / mining patterns

time series

nearest neighbor

association rules

mining patterns

data streams

high dimensional data

111.6

papers

21.0 35.6 33.3

data mining / data streams / nearest

neighbor / time series / mining patterns

selectivity estimation

sensor networks

nearest neighbor

time warping

large graphs

large datasets

67.8

papers

16.7 16.4 20.0

Eamonn J. Keogh

Jessica Lin

Michail Vlachos

Michael J. Passani

Matthias Renz

Divesh Srivasta

Surajit Chaudhuri

Nick Koudas

Jeffrey F. Naughton

Yannis Papakonstantinou

Jiawei Han

Ke Wang

Xifeng Yan

Bing Liu

Mohammed J. Zaki

Charu C. Aggarwal

Graham Cormode

S. Muthukrishnan

Philip S. Yu

Xiaolei Li

Philip S. Yu in data mining

Important research areas in SIGIR conference ?

107

260.0 583.0


collaborative filtering

text categorization

text classification


information systems

artificial intelligence

distributed information

retrieval

query evaluation

event detection

large collections

similarity search

duplicate detection

large scale


question answering

web search

natural language

document retrieval

SIGIR (2,432 papers)

443.8 377.7 302.7 1,117.4


question answering

relevance feedback

document retrieval

ad hoc

web search

search engine

search results

world wide web

web search results

word sense disambiguation

named entity

named entity recognition

domain knowledge

dependency parsing

127.3 108.9

matrix factorization

hidden markov models

maximum entropy

link analysis

non-negative matrix

factorization

text categorization

text classification

document clustering

multi-document summarization

naïve bayes

160.3

DM IR DB ML

Outline






108

Discovery of Hidden Entity Relations

109

Relations

text

entity

Topics


Example Hidden Relations Academic family from research publications

Social relationship from online social network

110

Alumni Colleague

Club friend

Jeff Ullman

Surajit Chaudhuri (1991)

Jeffrey Naughton (1987)

Joseph M. Hellerstein (1995)

Entity Relation Mining A. Representative mining paradigms

B. Patterns, rules and constraints for reasoning

C. Methodologies for dependency modeling

111

Mining Paradigms A. Representative mining paradigms

◦ Similarity search of relationships

◦ Classify or cluster entity relationships

◦ Slot filling

B. Patterns, rules and constraints for reasoning


112

Similarity Search of Relationships Input: relation instance

Output: relation instances with similar semantics

113

(Jeff Ullman, Surajit Chaudhuri)

(Jeffrey Naughton, Joseph M. Hellerstein)

(Jiawei Han, Chi Wang)

…

Is advisor of

(Apple, iPad)

(Microsoft, Surface)

(Amazon, Kindle)

…

Produce tablet

Classify or Cluster Entity Relationships Input: relation instances with unknown relationship

Output: predicted relationship or clustered relationship

114

(Jeff Ullman, Surajit Chaudhuri)

Is advisor of

(Jeff Ullman, Hector Garcia)

Is colleague of

Alumni Colleague

Club friend

Slot Filling Input: relation instance with a missing element (slot)

Output: fill the slot

is advisor of (?, Surajit Chaudhuri) Jeff Ullman

produce tablet (Apple, ?) iPad

115

Model Brand

S80 ?

A10 ?

T1460 ?

Model Brand

S80 Nikon

A10 Canon

T1460 Benq

Patterns, Rules & Constraints for Reasoning

A. Representative mining paradigms ◦ Similarity search of relationships

◦ Classify or cluster entity relationships

◦ Slot filling

B. Patterns, rules and constraints for reasoning ◦ Text patterns

◦ Linkage patterns

◦ Dependency rules and constraints


116

Text Patterns Syntactic patterns

◦ [Bunescu & Mooney 05b]

Dependency parse tree patterns ◦ [Zelenko et al. 03]

◦ [Culotta & Sorensen 04]

◦ [Bunescu & Mooney 05a]

Topical patterns ◦ [McCallum et al. 05] etc.

117

The headquarters of Google are situated in Mountain View

Jane says John heads XYZ Inc.

Emails between McCallum & Padhraic Smyth

Linkage Patterns - Topology Edge sign prediction [Leskovec et al. 10]

◦ Epinions: Trust/Distrust

◦ Wikipedia: Support/Oppose

◦ Slashdot: Friend/Foe

Triad counts: counts of signed triads edge u->v takes part in

118

Linkage Patterns - Traffic Manager-subordinate relationship from email exchange [Diehl et al. 07]

◦ # msgs

◦ quartiles for # recipients

119

Possible communication events for relationship (𝑛𝑎, 𝑛𝑏)

Linkage Patterns – Traffic Dynamics Advisor-advisee relationship from research publications [Wang et al. 10]

120

Smith

2000

2000

2001

2002

2003

1999

Ada

Ying

2004

A (papers by the

candidate advisor)

B 𝑘𝑢𝑙𝑐 𝐴, 𝐵 =

𝐴 ∩ 𝐵

2

1

𝐴+

1

𝐵

𝐼𝑅 𝐴, 𝐵 =𝐴 − |𝐵|

|𝐴 ∪ 𝐵|

Kulczinski

Imbalance ratio

0

0.5

1

2000 2001 2002 2003 2004

Coauthorship trend over time

Kulczinski Imbalance Ratio End time Start time

Dependency Rules & Constraints (Advisor-Advisee Relationship)

E.g., role transition - one cannot be advisor before graduation

121

2000

2000

2001

1999

Ada Bob

Ying

Ada

BobYing

Graduate in 2001Start in 2000

Graduate in 1998

Ada

Bob Ying

Graduate in 2001 Start in 2000

Graduate in 1998

Dependency Rules & Constraints (Social Relationship) ATTRIBUTE-RELATIONSHIP

Friends of the same relationship type share the same value for only certain attribute

CONNECTION-RELATIONSHIP

The friends having different relationships are loosely connected

122

Methodologies for Dependency Modeling Factor graph

◦ [Wang et al. 10, 11, 12]

◦ [Tang et al. 11]

Optimization framework ◦ [McAuley & Leskovec 12]

◦ [Li, Wang & Chang 14]

Graph-based ranking ◦ [Yakout et al. 12]

123

Factor Graph [Wang et al. 10, 11, 12]

124

Factor graph – model the joint likelihood by multiple factors

Each factor is a potential function defined on a few variables ◦ 𝑦𝑥 - 𝑎𝑥’s advisor. E.g., 𝑦2 ∈ 0,1 , 𝑦4 ∈ 0,3

◦ 𝑔𝑥 𝑦𝑥 - traffic dynamic; 𝑓𝑥 𝑦𝑥 , 𝐶𝑥 - role transition

Candidate children’s 𝑦

𝑃 =1

𝑍 𝑔𝑥(𝑦𝑥)

𝑎𝑢𝑡ℎ𝑜𝑟 𝑎𝑥

𝑓𝑥 𝑦𝑥, 𝐶𝑥

average of Kulczinski measure and IR measure

1 if all the constraints are satisfied; 0 otherwise

a1

a0

a2 a3

a4a5

f 0(y0,y1, y2,y3,y4,y5)

y1

y0

y2 y3

y4y5

f 1(y1, y2,y3)

g2(y2)

g5(y5)g4(y4)

f 3(y3, y4,y5)

g3(y3)

g1(y1)

g0(y0)

Factor Graph - Inference Traditional inference algorithm message passing [Koller & Friedman 09] ◦ Exact inference: sum-product + junction tree

◦ Approximate inference: loopy belief propagation

More optimized message passing algorithm [Wang et al. 10] ◦ Specialized schedule according to the DAG structure

◦ Eliminate function nodes and internal messages

◦ Reduce the message from a vector to a scalar

125

y1

y2 y3f 1

Standard message passing algorithm message is a marginal function (a vector)

a1

a2 a3

Our message passing algorithm message is a scalar

Not scalable. Fails to finish for thousands of variables

Flooding schedule; causing repetitive information flow

Our message passing algorithm

1

Phase 1

a5

a2 a3

a4

a0

a1

1

1

1

11

2

3

2

Phase 2

a5

a2 a3

a4

a0

a1

1

3

1

13

2

1

senty0

10

recv ?senty0

10

recv 1

senty2

u2,0

0

recv ?u2,1

1

?

senty5

u5,0

0

recv ?

senty1

u1,0

0

recv ?

senty4

u4,0

0

recv ?u4,3

3

?

u5,3

3

?

senty3

u3,0

0

recv ?u3,1

1

?

senty2

u2,0

0

recv v2,0

u2,1

1

v2,1

senty5

u5,0

0

recv v5,0

u5,3

3

v5,3

senty1

u1,0

0

recv v1,0

senty3

u3,0

0

recv v3,0

u3,1

1

v3,1

senty4

u4,0

0

recv v4,0

u4,3

3

v4,3

𝑟𝑖𝑗 = max 𝑃(𝒚| 𝑦𝑖 = 𝑗) = exp (𝑠𝑒𝑛𝑡𝑖𝑗 + 𝑟𝑒𝑐𝑣𝑖𝑗)

After the message passing, we collect the message to find the most likely advisor

how likely 𝑎𝑗 is 𝑎𝑖’s

advisor

126

Experiment Results DBLP data: 654,628 authors, 1,076,946 publications, publishing time provided

Labeled data: Math Genealogy; AI Genealogy; Faculty Homepage

Datasets RULE SVM Our method

TEST1 69.9% 73.4% 84.4%

TEST2 69.8% 74.6% 84.3%

TEST3 80.6% 86.7% 91.3%

Heuristic

rule

Support vector machine

(no role transition)

Accuracy of relationship prediction

127

The proposed algorithm is efficient and effective

Advisee Top Ranked Advisor Time Note

David M. Blei

1. Michael I. Jordan 01-03 PhD advisor, 2004 grad

2. John D. Lafferty 05-06 Postdoc, 2006

Hong Cheng

1. Qiang Yang 02-03 MS advisor, 2003

2. Jiawei Han 04-08 PhD advisor, 2008

Sergey Brin

1. Rajeev Motawani 97-98 “Unofficial advisor”

128

Other examples of hierarchical relationship Reply relationship in online discussion [Wang et al. 11]

129

Other examples of hierarchical relationship

Family tree

[Wang et al. 12]

130

Solution in the generic setting

Step 1: create a candidate DAG according to certain expectation about a partial order among the objects

Step 2: model the dependency of relations with a factor graph

Step 3: learn the importance of different expectations by Maximum A Posterior (MAP) inference

Step 4: maximize the joint likelihood and predict true relationship

A large number of factors may exist

Their importance is unknown

a1

a0

a2 a3

a4a5

f 0(y0,y1, y2,y3,y4,y5)

y1

y0

y2 y3

y4y5

f 1(y1, y2,y3)

g2(y2)

g5(y5)g4(y4)

f 3(y3, y4,y5)

g3(y3)

g1(y1)

g0(y0)

a1

a0

a2 a3

a4a5

L2 regularization L-BFGS

131

Categorization of potential functions See other types of potentials in [Wang et al. 12]

Type Cognitive description Potential definition

Homophile Parent and child are similar

Polarity Parent is superior to child

pairwise potentials

singleton potentials

e.g, Kulczinsky

Imbalance Ratio

Type Cognitive description Potential definition

Constraints Restrict certain patterns 𝑓 𝑦𝑖 , 𝑦𝑗 = −|𝐶𝑃 𝑣𝑖 , 𝑣𝑗 , 𝑣𝑦𝑖 , 𝑣𝑦𝑗 |

…

e.g., role transition

132

Optimization Framework [Li et al. 14] Relationship in online social networks

133

Employer: Yahoo! College: Stanford

Employer: ? College:?

Employer: Yahoo! College: Berkeley

Employer: Twitter College: Berkeley


Employer: Twitter College: UIUC


Employer: JP Morgan College: UIUC


Employer: Google College: UIUC

Some users provide attributes in their online profiles Some users’ attributes are missing

Optimization Framework to Model the Dual Dependency Design cost function to capture the dependences

Find the unknown variables that minimize the cost function

134

Unobserved Friends’ circles

Observed User Connections

Partially Observed User Attributes

Friends of the same relationship type share the same value for only certain attribute

The friends having different relationships are loosely connected

Co-Profiling of Relationship & Attributes

𝜆1 𝑤𝑖 𝑓𝑖 − 𝑓𝑗2

𝑒𝑖𝑗∈𝐸,𝑣𝑖,𝑣𝑗∈𝐶𝑖

𝐾

𝑡=1

+ 𝜆2 𝑤𝑖𝑓𝑖 − 1 2

𝑣𝑖∈𝐿∩𝐶𝑖

𝐾

𝑡=1

+ 𝜆3 1

𝐾

𝑒𝑖𝑗∈𝐸′,𝑥𝑖≠𝑥𝑗

135

f3 =<1, 0, 0, 0, 1, 0, 0.1>

Circle1: friends likely to share employee Circle 2: friends likely to share college

Employer: Yahoo! College: Stanford


Employer: Yahoo! College: Berkeley


Employer: Twitter College: UIUC

Employer: Google College: UIUC

Employer: Yahoo

College: UIUC

Attribute Vector f1 =<1, 0, 0, 1, 0, 0, 0.1>

Circle Assignment x1=1

x3=1

Association Vector w1 =<1, 1, 1, 0, 0, 0, 0> w2 =<0, 0, 0, 1, 1, 1, 0>

f4 =<0, 1, 0, 0, 0, 1, 0 0.1>

x4=2

Attribute-Relationship Dependence Connection-Relationship Dependence

Yahoo Twitter Google Berkeley Stanford UIUC …

Inference Algorithm

𝜆1 𝑤𝑖 𝑓𝑖 − 𝑓𝑗2

𝑒𝑖𝑗∈𝐸,𝑣𝑖,𝑣𝑗∈𝐶𝑖

𝐾

𝑡=1

+ 𝜆2 𝑤𝑖𝑓𝑖 − 1 2

𝑣𝑖∈𝐿∩𝐶𝑖

𝐾

𝑡=1

+ 𝜆3 1

𝐾

𝑒𝑖𝑗∈𝐸′,𝑥𝑖≠𝑥𝑗

1. Update User Attribute Vectors 𝑓𝑖 ◦ Only propagate values from friends in the same circles ◦ Only propagate the attribute value associated with the circle

2. Update User Circle Assignments 𝑥𝑖 , 𝐶𝑖 ◦ Consider both user’s attributes and connections

3. Update Circle Association Vectors 𝑤𝑖 ◦ Make association vector sparse

136

Coprofiling of relationship + attributes outperforms baselines for both

137

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7Accuracy for College

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

APw APi APc CP

Accuracy for Employer

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

RPa RPn RPan CP

Profiling college classmates

0

0.1

0.2

0.3

0.4

0.5

0.6

RPa RPn RPan CP

Profiling colleagues

Users Connections

19K 110K

LinkedIn data

8K connection are labeled

Methodologies for Dependency Modeling Factor graph

◦ [Wang et al. 10, 11, 12]

◦ [Tang et al. 11]

Optimization framework ◦ [McAuley & Leskovec 12]

◦ [Li, Wang & Chang 14]

Graph-based ranking ◦ [Yakout et al. 12]

◦ Suitable for discrete variables

◦ Probabilistic model with general inference algorithms

◦ Both discrete and real variables

◦ Special optimization algorithm needed

◦ Similar to PageRank

◦ Suitable when the problem can be modeled as ranking on graphs

138

Outline






139

Mining Entity Concepts for Typing

140

Concepts Relations

text

entity

Topics


Concepts as Entity Types Top 10 active politicians regarding healthcare issues?

Influential high-tech companies in Silicon Valley?

141

Concept Entity Mention

politician Obama says more than 6M signed up for

health care…

high-tech company

Apple leads in list of Silicon Valley's most-valuable brands…

Source of Concepts, Entities & Mentions

142

Concept Entity Mention

politician Obama says more than 6M signed up for

health care…

high-tech company

Apple leads in list of Silicon Valley's most-valuable brands…

in taxonomy / ad-hoc in text / in web tables

Mining Entity Concepts for Typing Large scale taxonomies

◦ Built from human-curated knowledgebases

◦ Constructed from free text

Type entities in text

Type entities in web tables and lists

143

Large Scale Taxonomies Name Source # concepts # entities Hierarchy

Dbpedia (v3.9) Wikipedia infoboxes 529 3M Tree

YAGO2s Wiki, WordNet, GeoNames 350K 10M Tree

Freebase Miscellaneous 23K 23M Flat

Probase (MS.KB) Web text 2M 5M DAG

144

YAGO2s Freebase

Type Entities in Text Relying on knowledgebases – entity linking

◦ Context similarity: [Bunescu & Pascal 06] etc.

◦ Topical coherence: [Cucerzan 07] etc.

◦ Context similarity + entity popularity + topical coherence: Wikifier [Ratinov et al. 11]

◦ Jointly linking multiple mentions: AIDA [Hoffart et al. 11] etc.

◦ …

145

Limitation of Entity Linking Low recall of knowledgebases

Sparse concept descriptors

Can we type entities without relying on knowledgebases?

Yes! Exploit the redundancy in the corpus ◦ Not relying on knowledgebases: targeted disambiguation of ad-hoc, homogeneous

entities [Wang et al. 12]

◦ Partially relying on knowledgebases: mining additional evidence in the corpus for disambiguation [Li et al. 13]

146

82 of 900 shoe brands exist in Wiki

Michael Jordan won the best paper award

Targeted Disambiguation [Wang et al. 12]

147

Entity Id

Entity Name

e1 Microsoft

e2 Apple

e3 HP

Microsoft and Apple are the developers of three of the most popular operating systems

Apple trees take four to five years to produce their first fruit…

Microsoft’s new operating system, Windows 8, is a PC operating system for the tablet age …

CEO Meg Whitman said that HP is focusing on Windows 8 for its tablet strategy

Audi is offering a racing version of its hottest TT model: a 380 HP, front-wheel …

Target entities

d1

d2

d3

d4

d5

Targeted Disambiguation

148

Entity Id

Entity Name

e1 Microsoft

e2 Apple

e3 HP






d1

d2

d4

d5

d3

Target entities

Insight – Context Similarity

149






Similar


150






Dissimilar


151






Dissimilar

Insight – Leverage Homogeneity

Hypothesis: the context between two true mentions is more similar than between two false mentions across two distinct entities, as well as between a true mention and a false mention.

Caveat: the context of false mentions can be similar among themselves within an entity

152

Sun

IT Corp.SundaySurnamenewspaper

Apple

IT Corp.

fruit

HP

IT Corp.

horsepower

others

Insight – Comention

153






High confidence


154






True

True


155






True

True

True


156






True

True

True

False

False

MentionRank

157

Entity Id

Entity Name

e1 Microsoft

e2 Apple

𝑟23

𝑟21

𝑟11

𝑟22

d1

d2

d3

MentionRank

158

• 𝑟𝑖𝑗 ∈ [0,1] – confidence for

(ei,dj) to be a true mention.

• 𝜋𝑖𝑗 ∈ [0,1] – the prior

estimate of (ei,dj) to be a true mention (co-mention)

• 𝜇𝑖𝑗,𝑖′𝑗′ ∈ 0,1 – the weight

between (ei,dj) and (ei’,dj’) (context similarity)

Experiment results

159

(a) PL

Programming Languages Science Fiction Books

Java, Python, Ruby, Perl, R… A Pleasure to Burn, Golden Reflections, Pathfinder…

Type Entities in Web Tables & Lists [Limaye et al. 10]

160

Type Entities in Web Tables & Lists Collective typing - Annotate the entire column of a table or a list

161

Paper Taxonomy Method

[Limaye et al. 10] YAGO Conditional random field for joint annotation (supervised)

[Venetis et al. 11] Google’s IsA DB

1) Max likelihood (hit entities between 10% and 50%) 𝑝(𝑒|𝑐)𝑒 𝑝 𝑒 𝑐 ∝ 𝑝(𝑐|𝑒)/𝑝 𝑐 is smoothed by frequency of (e,c) in DB;

2) Majority (>50%) voting 1

𝑅𝑎𝑛𝑘 𝑐|𝑒𝑒

[Song et al. 11] MS.KB Naïve Bayes 𝑝 𝑐 𝑝 𝑒 𝑐𝑒 𝑝(𝑒|𝑐) is smoothed by frequency of (e,c) in MS.KB

[Pimplikar & Sarawagi 12]

Ad-hoc column name

A dual problem: find columns that best fit query concepts Conditional random fields (supervised)

Outline






162

Open Problems on Mining Latent Entity Structures

What is the best way to organize information and interact with users?

163

Understand the Data

System, architecture and database

Information quality and security

164

Coverage & Volatility

Utility

How do we design such a multi-layer organization system?

How do we control information quality and resolve conflicts?

Understand the People

NLP, ML, AI

HCI, Crowdsourcing, Web search, domain experts

165

Understand & answer natural language questions

Explore latent structures with user guidance

Mining Relations & Concepts from Multiple Sources Knowledgebase

Taxonomy

Web tables

Web pages

Domain text

Social media

Social networks

…

166

Freebase

Satori

Annotate Enrich

Enrich

Guide

Integration of NLP & Data Mining Approaches

NLP - analyzing single sentences Data mining - analyzing big data

167

Application to the slot filling task [Yu et al. 14b]

168

0 2 4 6 8 10 12 14 16 18 20

0

5

10

15

20

25

30

35

F-m

esa

ure

(%

)

System

Before

After

Enhance Individual SF Systems

0 10000 20000 30000 40000

0

2000

4000

6000

8000

10000

12000

14000

13

2

4

5

#tr

uth

s

6 Oracle

5 MTM

4 SVM

3 Linguistic Indicator

2 Voting

1 Baseline

#total responses

6

Truth finding efficiency

Mining Latent Entity Structures from Massive Unstructured & Linked Data

169

Concepts

Topics

Relations

text

entity

Frequent pattern mining + probabilistic analysis

Biography

170

Chi Wang

(http://cs.uiuc.edu/homes/chiwang1),

Ph.D. candidate in Computer Science, UIUC.

His Ph.D. research has been focused on

mining hidden entity structures from

massive data and information networks,

with over 30 research publications in refereed journals

and conferences including KDD, WWW, SIGIR, ICDM,

SDM, CIKM, ECMLPKDD, IJCAI, and SIGMOD. He is the

winner of the prestige Microsoft Research Graduate

Research Fellowship, KDDCUP'13 runner-up award, and

best paper award candidates in CIKM'13 and ICDM'13.

Jiawei Han (http://www.cs.uiuc.edu/hanj),

Abel Bliss Professor, Computer Science,

UIUC. He has been researching into data

mining, database systems, and information

networks, with over 600 conference and

journal publications. He is Fellow of ACM

and Fellow of IEEE, the first author of the textbook “Data

Mining: Concepts and Techniques" 3rd ed., (Morgan

Kaufmann, 2011), and the Director of Information Network

Academic Research Center, supported by the U.S. Army

Research Lab since 2009. He has delivered over 20 tutorials

in major conferences including SIGMOD, VLDB, KDD,

WWW, WSDM, ICDE, ICDM, SDM, ASONAM, and CIKM.

References 1. [Wang et al. 14a] C. Wang, X. Liu, Y. Song, J. Han. Scalable Moment-based Inference for Latent

Dirichlet Allocation, ECMLPKDD’14.

2. [Li et al. 14] R. Li, C. Wang, K. Chang. User Profiling in Ego Network: An Attribute and Relationship Type Co-profiling Approach, WWW’14.

3. [Danilevsky et al. 14] M. Danilevsky, C. Wang, N. Desai, X. Ren, J. Guo, J. Han. Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents“, SDM’14.

4. [Wang et al. 13b] C. Wang, M. Danilevsky, J. Liu, N. Desai, H. Ji, J. Han. Constructing Topical Hierarchies in Heterogeneous Information Networks, ICDM’13.

5. [Wang et al. 13a] C. Wang, M. Danilevsky, N. Desai, Y. Zhang, P. Nguyen, T. Taula, and J. Han. A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy, KDD’13.

6. [Li et al. 13] Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining Evidences for Named Entity Disambiguation, KDD’13.

171

References 7. [Wang et al. 12a] C. Wang, K. Chakrabarti, T. Cheng, S. Chaudhuri. Targeted Disambiguation

of Ad-hoc, Homogeneous Sets of Named Entities, WWW’12.

8. [Wang et al. 12b] C. Wang, J. Han, Q. Li, X. Li, W. Lin and H. Ji. Learning Hierarchical Relationships among Partially Ordered Objects with Heterogeneous Attributes and Links, SDM’12.

9. [Wang et al. 11] H. Wang, C. Wang, C. Zhai and J. Han. Learning Online Discussion Structures by Conditional Random Fields, SIGIR’11.

10. [Wang et al. 10] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu and J. Guo. Mining Advisor-advisee Relationship from Research Publication Networks, KDD’10.

11. [Danilevsky et al. 13] M. Danilevsky, C. Wang, F. Tao, S. Nguyen, G. Chen, N. Desai, J. Han. AMETHYST: A System for Mining and Exploring Topical Hierarchies in Information Networks, KDD’13.

172

References 12. [Sun et al. 11] Y. Sun, J. Han, X. Yan, P. S. Yu, T. Wu. Pathsim: Meta path-based top-k

similarity search in heterogeneous information networks, VLDB’11.

13. [Hofmann 99] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis, UAI’99.

14. [Blei et al. 03] D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet allocation, the Journal of machine Learning research, 2003.

15. [Griffiths & Steyvers 04] T. L. Griffiths, M. Steyvers. Finding scientific topics, Proc. of the National Academy of Sciences of USA, 2004.

16. [Anandkumar et al. 12] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, arXiv:1210.7559, 2012.

17. [Porteous et al. 08] I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation, KDD’08.

173

References 18. [Hoffman et al. 12] M. Hoffman, D. M. Blei, D. M. Mimno. Sparse stochastic inference for

latent dirichlet allocation, ICML’12.

19. [Yao et al. 09] L. Yao, D. Mimno, A. McCallum. Efficient methods for topic model inference on streaming document collections, KDD’09.

20. [Newman et al. 09] D. Newman, A. Asuncion, P. Smyth, M. Welling. Distributed algorithms for topic models, Journal of Machine Learning Research, 2009.

21. [Hoffman et al. 13] M. Hoffman, D. Blei, C. Wang, J. Paisley. Stochastic variational inference, Journal of Machine Learning Research, 2013.

22. [Griffiths et al. 04] T. Griffiths, M. Jordan, J. Tenenbaum, and D. M. Blei. Hierarchical topic models and the nested chinese restaurant process, NIPS’04.

23. [Kim et al. 12a] J. H. Kim, D. Kim, S. Kim, and A. Oh. Modeling topic hierarchies with the recursive chinese restaurant process, CIKM’12.

174

References 24. [Wang et al. 14b] C. Wang, X. Liu, Y. Song, J. Han. Scalable and Robust Construction of

Topical Hierarchies, arXiv: 1403.3460, 2014.

25. [Li & McCallum 06] W. Li, A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations, ICML’06.

26. [Mimno et al. 07] D. Mimno, W. Li, A. McCallum. Mixtures of hierarchical topics with pachinko allocation, ICML’07.

27. [Ahmed et al. 13] A. Ahmed, L. Hong, A. Smola. Nested chinese restaurant franchise process: Applications to user tracking and document modeling, ICML’13.

28. [Wallach 06] H. M. Wallach. Topic modeling: beyond bag-of-words, ICML’06.

29. [Wang et al. 07] X. Wang, A. McCallum, X. Wei. Topical n-grams: Phrase and topic discovery, with an application to information retrieval, ICDM’07.

175

References 30. [Lindsey et al. 12] R. V. Lindsey, W. P. Headden, III, M. J. Stipicevic. A phrase-discovering

topic model using hierarchical pitman-yor processes, EMNLP-CoNLL’12.

31. [Mei et al. 07] Q. Mei, X. Shen, C. Zhai. Automatic labeling of multinomial topic models, KDD’07.

32. [Blei & Lafferty 09] D. M. Blei, J. D. Lafferty. Visualizing Topics with Multi-Word Expressions, arXiv:0907.1013, 2009.

33. [Danilevsky et al. 14] M. Danilevsky, C. Wang, N. Desai, J. Guo, J. Han. Automatic construction and ranking of topical keyphrases on collections of short documents, SDM’14.

34. [Kim et al. 12b] H. D. Kim, D. H. Park, Y. Lu, C. Zhai. Enriching Text Representation with Frequent Pattern Mining for Probabilistic Topic Modeling, ASIST’12.

35. [El-kishky et al. 14] A. El-Kishky, Y. Song, C. Wang, C.R. Voss, J. Han. Scalable Topical Phrase Mining from Large Text Corpora, arXiv: 1406.6312, 2014.

176

References 36. [Zhao et al. 11] W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E.-P. Lim, X. Li. Topical

keyphrase extraction from twitter, HLT’11.

37. [Church et al. 91] K. Church, W. Gale, P. Hanks, D. Kindle. Chap 6, Using statistics in lexical analysis, 1991.

38. [Sun et al. 09a] Y. Sun, J. Han, J. Gao, Y. Yu. itopicmodel: Information network-integrated topic modeling, ICDM’09.

39. [Deng et al. 11] H. Deng, J. Han, B. Zhao, Y. Yu, C. X. Lin. Probabilistic topic models with biased propagation on heterogeneous information networks, KDD’11.

40. [Chen et al. 12] X. Chen, M. Zhou, L. Carin. The contextual focused topic model, KDD’12.

41. [Tang et al. 13] J. Tang, M. Zhang, Q. Mei. One theme in all views: modeling consensus topics in multiple contexts, KDD’13.

177

References 42. [Kim et al. 12c] H. Kim, Y. Sun, J. Hockenmaier, J. Han. Etm: Entity topic models for mining

documents associated with entities, ICDM’12.

43. [Cohn & Hofmann 01] D. Cohn, T. Hofmann. The missing link-a probabilistic model of document content and hypertext connectivity, NIPS’01.

44. [Blei & Jordan 03] D. Blei, M. I. Jordan. Modeling annotated data, SIGIR’03.

45. [Newman et al. 06] D. Newman, C. Chemudugunta, P. Smyth, M. Steyvers. Statistical Entity-Topic Models, KDD’06.

46. [Sun et al. 09b] Y. Sun, Y. Yu, J. Han. Ranking-based clustering of heterogeneous information networks with star network schema, KDD’09.

47. [Chang et al. 09] J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, D.M. Blei. Reading tea leaves: How humans interpret topic models, NIPS’09.

178

References 48. [Bunescu & Mooney 05a] R. C. Bunescu, R. J. Mooney. A shortest path dependency kernel for

relation extraction, HLT’05.

49. [Bunescu & Mooney 05b] R. C. Bunescu, R. J. Mooney. Subsequence kernels for relation extraction, NIPS’05.

50. [Zelenko et al. 03] D. Zelenko, C. Aone, A. Richardella. Kernel methods for relation extraction, Journal of Machine Learning Research, 2003.

51. [Culotta & Sorensen 04] A. Culotta, J. Sorensen. Dependency tree kernels for relation extraction, ACL’04.

52. [McCallum et al. 05] A. McCallum, A. Corrada-Emmanuel, X. Wang. Topic and role discovery in social networks, IJCAI’05.

53. [Leskovec et al. 10] J. Leskovec, D. Huttenlocher, J. Kleinberg. Predicting positive and negative links in online social networks, WWW’10.

179

References 54. [Diehl et al. 07] C. Diehl, G. Namata, L. Getoor. Relationship identification for social network

discovery, AAAI’07.

55. [Tang et al. 11] W. Tang, H. Zhuang, J. Tang. Learning to infer social ties in large networks, ECMLPKDD’11.

56. [McAuley & Leskovec 12] J. McAuley, J. Leskovec. Learning to discover social circles in ego networks, NIPS’12.

57. [Yakout et al. 12] M. Yakout, K. Ganjam, K. Chakrabarti, S. Chaudhuri. InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables, SIGMOD’12.

58. [Koller & Friedman 09] D. Koller, N. Friedman. Probabilistic Graphical Models: Principles and Techniques, 2009.

59. [Bunescu & Pascal 06] R. Bunescu, M. Pasca. Using encyclopedic knowledge for named entity disambiguation, EACL’06.

180

References 60. [Cucerzan 07] S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data,

EMNLP-CoNLL’07.

61. [Ratinov et al. 11] L. Ratinov, D. Roth, D. Downey, M. Anderson. Local and global algorithms for disambiguation to wikipedia, ACL’11.

62. [Hoffart et al. 11] J. Hoffart, M. Yosef, I. Bordino, H. F•urstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum. Robust disambiguation of named entities in text, EMNLP’11.

63. [Limaye et al. 10] G. Limaye, S. Sarawagi, S. Chakrabarti. Annotating and searching web tables using entities, types and relationships, VLDB’10.

64. [Venetis et al. 11] P. Venetis, A. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, C. Wu. Recovering semantics of tables on the web, VLDB’11.

65. [Song et al. 11] Y. Song, H. Wang, Z. Wang, H. Li, W. Chen. Short Text Conceptualization using a Probabilistic Knowledgebase, IJCAI’11.

181

References 66. [Pimplikar & Sarawagi 12] R. Pimplikar, S. Sarawagi. Answering table queries on the web

using column keywords, VLDB’12.

67. [Yu et al. 14a] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, J. Han. Personalized Entity Recommendation: A Heterogeneous Information Network Approach, WSDM’14.

68. [Yu et al. 14b] D. Yu, H. Huang, T. Cassidy, H. Ji, C. Wang, S. Zhi, J. Han, C. Voss. The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding with Multi-layer Linguistic Indicators, COLING’14.

182