1 sentiment polarity identification in financial news: a cohesion-based approach author:ann devitt...

24
1 Sentiment Polarity Identification in Financial News: A Cohesion-based Approach Author: Ann Devitt Khurshid Ahmad (School of Computer Science & Stat istics, Trinity College Dublin, Ireland) Source: ACL 2007

Upload: lucy-woods

Post on 01-Jan-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

1

Sentiment Polarity Identification in Financial News:A Cohesion-based Approach

Author: Ann Devitt

Khurshid Ahmad

(School of Computer Science & Statistics,

Trinity College Dublin, Ireland)

Source: ACL 2007

2

Motivation & Goal

• Both the informational and affective aspects of news text affect the markets in profound ways

• The goal is to explore a computable metric of positive or negative polarity in financial news text

1. which is consistent with human judgments 2. can be used in a quantitative analysis of

news sentiment impact on financial markets

3

Why the issue is important?

• Prospect Theory• Herbert Simon (1978 Nobel Prize) and Daniel

Kahneman (2002 Prize) – shows that people in markets can behave irrationally and that th

is bounded rationality is inspired by hear from others about the conditions

• Robert Engle (2003 Prize) – positive news is typically related to large changes in prices but o

nly for a short time– conversely the effect of negative news on prices and volumes of

trading is longer lasting

4

Idea

• The contribution to an overall sentiment polarity and intensity metric of individual lexical items is determined by their connectivity and position within a representation of the text based on the principles of lexical cohesion

5

Related Work

• Cognitive Theories of Emotion – what constitutes emotion?

• Categorical – e.g. anger, fear, sadness, surprise…

• Or Dimensional – Evaluation: good–bad axis– Activation: strong-weak axis

• Sentiment Analysis – computational linguistics focused on examining

• what textual features (lexical, syntactic, punctuation, etc) contribute to affective content of text

• how these features derive a sentiment metric for a word, sentence or whole text.

6

Related Work

• Sentiment and News Impact Analysis– Niederhoffer (1971) analysed 20 years of New York Ti

mes headlines• classified into 19 semantic categories and on a good-bad rati

ng scale • to evaluate how the markets reacted to good and bed news • found that markets overreact to bad news

– Davis et al (2006) investigate the effects of optimistic or pessimistic language used in financial press releases on future firm performance

7

Cohesion-based Text Representation Algorithm

• builds a graph representation of text from part-of-speech tagged text

• without disambiguation

• using WordNet as a real world knowledge source

• to reduce information loss in the transition from text to text-based structure

8

Cohesive structureof a text

• The representation is designed within the theoretical framework of lexical cohesion (Halliday and Hasan, 1976)

• Graph representation combines:1. information derived from the text2. and WordNet semantic content

– Nodes: concepts in or derived from the text connected by relations between these concepts in WordNet• such as antonymy or hypernymy

– Topological features of the knowledge base provides the facility to manipulate or control how the WordNet semantic content information is interpreted

9

Graphs of subtopics

• Skorochod'ko (1972) suggested to view the cohesion in a document as a sequence of densely interrelated subtopics

10

Sentiment Polarity Overlay

• SentiWordNet was used as it provides positive and negative polarity value for a set of “affective” terms which conflates Osgood’s (1957) evaluative and activation dimensions

• The contribution of individual polarity nodes to the polarity metric of the text as a whole is determined with 1. respect to the textual information2. and WN semantic 3. and topological features encoded in the underlying g

raph representation of the text

11

Three polarity metrics

1. Basic Cohesion Metric2. Relation Type Metric 3. Node Specificity Metric

• Basic Cohesion Metric – based solely on frequency of sentiment-bear

ing nodes in or derived from the source text • i.e. the sum of polarity values for all nodes in the

graph

12

Three polarity metrics

• Relation Type Metric – with respect to the types of WordNet relations

in the text-derived graph – sentiment value of each node is the product of

its polarity value and a relation weight for each relation this node enters into in the graph structure

– Potentially affect-effecting relations have the strongest weighting: antonymy, hypernymy

13

Three polarity metrics

• Node Specificity Metric • with respect to a measure of node specificity calculated

on the basis of topographical features of WordNet – Highly specific nodes or concepts may carry more informational

• specificity measure: – factor out population sparseness or density in WordNet by eval

uating1. the contribution of each node relative to its depth in the hierarchy2. its connectivity (branching Factor)3. its siblings

14

Further specialised accordingto the following two boolean flags

• InText:– consider potentially noisy contributions from n

odes not in the source text – the metric is calculated based on

• 1) only those nodes representing terms in the source text, or

• 2) all nodes in the graph representation derived from the text

15

Further specialised accordingto the following two boolean flags

• Modifiers: – In SentiWN, it seems that modifiers have mor

e reliable values than nouns or verbs – the metric is calculated using

• 1) all open class POS or • 2) modifiers alone

16

Evaluation • The goal is to examine the relationship between financial markets a

nd financial news– to establishing reliability for a sentiment measure of news – the quantitative measure of polarity must be consistent with human judg

ments of polarity • Not all news items may express overt sentiment

– selected a news topic which was considered a priori to have emotive content

• Corpus – takeovers and mergers are usually seen as highly emotive contexts – the topic of a corpus of news texts spans 4 months in 2006 : aggressive

takeover bid of a low-cost airline (Ryanair) for the Irish flag-carrier airline (Aer Lingus)

– Both airlines have a strong (positive and negative) emotional attachment – Collect news through the “surprise” take-over bid announcement by Rya

nair, to the withdrawal of the bid by Ryanair in December 2006

17

Evaluation

• Gold Standard – A set of 30 texts selected from the corpus– annotated by 3 people – on a 7-point scale from very positive to very

negative – the respondents were asked also to rate the

semantic orientation of the texts with respect to the two takeover bid players

– the respondents were first asked to rate their personal attitude to the two airlines

18

Evaluation

• Performance Metrics – The performance of the polarity algorithm w

as evaluated on: 1. Polarity direction: the task of assigning a binary p

ositive/negative value to a text – Performance is reported using standard recall and preci

sion metrics

2. Polarity intensity: the task of assigning a value to indicate the strength of the negative/positive polarity in a text – Performance is reported as a correlation with average h

uman ratings

19

Evaluation

• Baseline – the baseline for comparison sums the SentiW

N polarity rating for only those lexical items present in the text, not exploiting any aspect of the graph representation of the text

20

Results and Discussion

• Binary Polarity Assignment

21

22

23

Polarity Intensity Values

24

Conclusions and Future Work

• This research is expected to contribute to sentiment analysis in finance

• To capture a psychologically plausible measure of affective content of text by exploiting available resources

• To broader evaluation of the measure relative to human judgments and existing metrics