a word at a time computing word relatedness using temporal semantic analysis kira radinsky, eugene...

23
A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Upload: jaden-farrant

Post on 29-Mar-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

A Word at a TimeComputing Word Relatedness using Temporal Semantic Analysis

Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Page 2: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Introduction

• A rich source of information can be revealed by studying the patterns of word occurrence over time

• Example: “peace” and “war”

• Corpus: New York Times over 130 years• Word <=> time series of its occurrence in NYT articles

• Hypothesis: Correlation between 2 words time series Semantic Relation

• Proposed method: Temporal Semantic Analysis (TSA)

Page 3: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Introduction

Page 4: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Introduction

Page 5: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

1. TSA

Page 6: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Temporal Semantic Analysis

• 3 main steps:

1. Represent words as concepts vectors

2. Extract temporal dynamics for each concept

3. Extend static representation with temporal dynamics

Page 7: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

1. Words as concept vectors

Page 8: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

2. Temporal dynamics

• c : concept represented by a sequence of words wc1,…,wck

• d : a document• ε : proximity relaxation parameter (ε = 20 in the

experiments)• c appears in d if its words appear in d with a distance of

at most ε words between each pair wci, wcj

• Example: “Great Fire of London”

Page 9: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

2. Temporal dynamics

• t1,…,tn : a sequence of consecutive discrete time points (days)

• H = D1,…,Dn : history represented by a set of document collections, where Di is a collection of documents associated with time ti

• the dynamics of a concept c is the time series of its frequency of appearance in H

Page 10: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

3. Extend static representation

Page 11: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

2. Using TSA for computing Semantic

Relatedness

Page 12: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Using TSA for computing Semantic Relatedness

• Compare by weighted distance between time series of concept vectors

• Combine it with the static semantic similarity measure

Page 13: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Algorithm

• t1, t2 : words• C(t1) = {c1,…,cn}and C(t2) = {c1,…,cm}: sets of concepts

of t1 and t2

• Q(c1,c2) : function that determines relatedness between two concepts c1 and c2 using their dynamics (time series)

Page 14: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Algorithm

Page 15: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Cross Correlation

• Pearson's product-moment coefficient:

• A statistic method for measuring similarity of two random variables

• Example: “computer” and “radio”

Page 16: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Dynamic Time Warping

• Measure similarity between 2 time series that may differ in time scale but similar in shape

• Used in speech recognition

• It defines a local cost matrix

• Temporal Weighting Function

Page 17: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

3. Experimentation

s

Page 18: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Experimentations: Setup

• New York Times archive (1863 – 2004) Each day: average of 50 abstracts of article 1.42 Gb of texts 565 540 distinct words

• A new algorithm to automatically benchmark word relatedness tasks

• Same vector representation for each method tested

• Comparison to human judgment (WS-353 and Amazon MTurk)

Page 19: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

TSA vs. ESA

Page 20: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

TSA vs. Temporal Word Similarity

Page 21: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Word Frequency Effects

Page 22: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Size of Temporal Concept Vector

Page 23: A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch

Conclusion

• Two innovations:o Temporal Semantic Analysiso A new method for measuring semantic relatedness of terms

• Many advantages (robustness, tunable, can be used to study language evolution over time)

• Significant improvements in computing words relatedness