automatically tracking events in the news · automatically tracking events in the news ... joint...

60
1 Automatically Tracking Events in the News Kathleen McKeown Department of Computer Science Columbia University

Upload: duongphuc

Post on 21-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

1

Automatically Tracking Events in

the News

Kathleen McKeown

Department of Computer Science

Columbia University

Page 2: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

2

Vision

Generating presentations that connect

Events

Opinions

Personal accounts

Their impact on the world

Page 3: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

3

Two Tasks

Monitoring events over time

Predicting their impact on financial

markets

Joint work with Tony Jebara and David

Yao

Page 4: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

4

Machine learning framework

Data (often labeled)

Extraction of “features” from text data

Prediction of output

Page 5: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

5

Machine learning framework

Data (often labeled)

Extraction of “features” from text data

Prediction of output

What data is available for learning?

Page 6: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

6

Machine learning framework

Data (often labeled)

Extraction of “features” from text data

Prediction of output

What features yield good predictions?

Page 7: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

7

Two Tasks

Monitoring events over time

Predicting their impact on financial

markets

Joint work with Tony Jebara and David

Yao

Page 8: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

8

Monitor events over time

Input: streaming data

News, social media, web pages

At every hour, what’s new

Page 9: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

9

Page 10: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

10

Page 11: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

11

Text Compression

Page 12: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

12

Data

NIST evaluation on temporal

summarization

hourly web crawl

October 2011 - February 2013

16.1TB

Different categories of disaster

Climate, man-made, social unrest

Page 13: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

13

Page 14: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

14

Page 15: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

15

Page 16: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

16

Page 17: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

17

Page 18: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

18

Page 19: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

19

Page 20: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

20

Page 21: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

21

Page 22: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

22

Page 23: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

23

Page 24: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

24

Page 25: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

25

Page 26: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

26

Temporal Summarization Approach

At time t:

1. Predict salience for input sentences Disaster-specific features for predicting

salience

2. Remove redundant sentences

3. Cluster and select exemplar sentences

for t Incorporate salience prediction as a prior

Kedzie & al, Bloomberg Social Good

Workshop, KDD 2014

Kedzie & al, ACL 2015

Page 27: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

27

Page 28: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

28

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

Page 29: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

29

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

Why is the number of capitalized words

important?

Page 30: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

30

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 31: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

31

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 32: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

32

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

Why are synonyms, hypernyms and

hyponyms important?

Page 33: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

33

Predicting Salience: Model Features

Basic sentence level features

sentence length

punctuation count

number of capitalized words

number of event type synonyms, hypernyms, and hyponyms

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 34: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

34

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

generic news corpus (10 years AP and NY Times articles)

domain specific corpus (disaster related Wikipedia

articles)

What does a generic language model

capture?

What does a domain specific language

model capture?

Page 35: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

35

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

generic news corpus (10 years AP and NY Times articles)

domain specific corpus (disaster related Wikipedia

articles)

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 36: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

36

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

generic news corpus (10 years AP and NY Times articles)

domain specific corpus (disaster related Wikipedia

articles)

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 37: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

37

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

generic news corpus (10 years AP and NY Times articles)

domain specific corpus (disaster related Wikipedia

articles)

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 38: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

38

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

Geographic Features

tag input with Named-Entity tagger

get coordinates for locations and mean distance to event

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 39: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

39

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

Geographic Features

tag input with Named-Entity tagger

get coordinates for locations and mean distance to event

High Salience

Nicaragua's disaster management said it had issued a local

tsunami alert.

Medium Salience

People streamed out of homes, schools and oce buildings as

far north as Mexico City.

Low Salience

Add to Digg Add to del.icio.us Add to Facebook Add to

Myspace

Page 40: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

40

Predicting Salience: Model Features

Basic sentence level features

Language Models (5-gram Kneser-Ney model)

Geographic Features

Temporal Features measuring burstiness of words

How might we measure the burstiness of

words?

Page 41: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

41

Determining Redundancy

Use a semantic similarity metric

Discard sentences with similarity to

previous sentences

Page 42: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

42

Page 43: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

43

Page 44: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

44

Page 45: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

45

Page 46: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

46

Page 47: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

47

SubEvent Identification

➔Decompose articles on a main event into

related sub-events:

Manhattan Blackout Breezy Point fire Public Transit

Outage

Hurricane

Sandy

Page 48: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

48

Page 49: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

49

Two Tasks

Monitoring events over time

Predicting their impact on financial

markets

Joint work with Tony Jebara and David

Yao

Page 50: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

50

COLUMBIA DATA SCIENCE INSTITUTE Can we predict the effect that a particular event – such as extreme weather or political activity -- would have on financial markets?

Machine Learning

Natural Language

Processing

Financial and Economic Indices

Data Science

Take market financial data and traditional news feeds.

Use Natural Language Processing to transform raw data into structured event streams.

Apply Machine Learning tools to uncover previously hidden relationships between events and market behavior.

Page 51: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

51

Fin

anci

al D

ata

New

s Feeds

Infe

rence

Bayesi

an N

etw

ork

Str

uct

ure

Dis

coveryE

ventt

Str

eam

s

Predicting Market Impact

Page 52: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

52

NLP – Event Features

Binary features

calculated from news

Event type:

11 possible categories

Event location:

US or World

Sampled daily and hourly

Page 53: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

53

Financials, Macroeconomics

Financial market indices (65)

• Stock market (15 broad + 27 sector)

• Bond market (8)

• Volatility index (6)

• Commodities (9)

Macroeconomic indicators (8)

• real GDP (2)

• CPI (1)

• PPI (1)

• Income (1)

• Consumption (1)

• Employment/Unemployment situation (2)

Index/indicators selection

Page 54: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

54

Transform and Binarize

Compute relative change in indices

Page 55: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

55

Equal frequency discretization (S&P 500)

Page 56: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

56

ML – Structure Learning

To learn structure we assume samples are drawn iid from

unknown pairwise binary graphical model (no hidden

variables)

Training data: 2009-2014 Newsblaster data and financial

indicators

Use method of Ravikumar, Wainwright, Lafferty [2010]

Asymptotically optimal given mild assumptions & regularizer l1

Page 57: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

57

SCI TECH ECONOMIC

DISASTER

RVX VXD VXO

RVX – Russell 2000 Volatility

VXD – Dow Volatility

VXO – S&P 100 Volatility

Page 58: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

58

Results

We can predict

relation between event

and market impact

with significantly higher

average test likelihood

for held-out test days

Daily observations are

orders of magnitude more

likely under our model…

Naïve Tree D-Tree Ours-53.7 -37.8 -34.4 -26.4

Page 59: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

59

Impact

Learn from past events to predict the

impact of new events on financial markets

Identify new events as they occur in news

feeds

Graph modeling enables identification of

structural relationships between detected

events and financial events

Page 60: Automatically Tracking Events in the News · Automatically Tracking Events in the News ... Joint work with Tony Jebara and David Yao. 4 ... Nicaragua's disaster management said it

60

Thank You!

The research presented here has been

supported in part by DARPA GRAPH,

and NSF.