1 sentiment polarity identification in financial news: a cohesion-based approach author:ann devitt...
TRANSCRIPT
1
Sentiment Polarity Identification in Financial News:A Cohesion-based Approach
Author: Ann Devitt
Khurshid Ahmad
(School of Computer Science & Statistics,
Trinity College Dublin, Ireland)
Source: ACL 2007
2
Motivation & Goal
• Both the informational and affective aspects of news text affect the markets in profound ways
• The goal is to explore a computable metric of positive or negative polarity in financial news text
1. which is consistent with human judgments 2. can be used in a quantitative analysis of
news sentiment impact on financial markets
3
Why the issue is important?
• Prospect Theory• Herbert Simon (1978 Nobel Prize) and Daniel
Kahneman (2002 Prize) – shows that people in markets can behave irrationally and that th
is bounded rationality is inspired by hear from others about the conditions
• Robert Engle (2003 Prize) – positive news is typically related to large changes in prices but o
nly for a short time– conversely the effect of negative news on prices and volumes of
trading is longer lasting
4
Idea
• The contribution to an overall sentiment polarity and intensity metric of individual lexical items is determined by their connectivity and position within a representation of the text based on the principles of lexical cohesion
5
Related Work
• Cognitive Theories of Emotion – what constitutes emotion?
• Categorical – e.g. anger, fear, sadness, surprise…
• Or Dimensional – Evaluation: good–bad axis– Activation: strong-weak axis
• Sentiment Analysis – computational linguistics focused on examining
• what textual features (lexical, syntactic, punctuation, etc) contribute to affective content of text
• how these features derive a sentiment metric for a word, sentence or whole text.
6
Related Work
• Sentiment and News Impact Analysis– Niederhoffer (1971) analysed 20 years of New York Ti
mes headlines• classified into 19 semantic categories and on a good-bad rati
ng scale • to evaluate how the markets reacted to good and bed news • found that markets overreact to bad news
– Davis et al (2006) investigate the effects of optimistic or pessimistic language used in financial press releases on future firm performance
7
Cohesion-based Text Representation Algorithm
• builds a graph representation of text from part-of-speech tagged text
• without disambiguation
• using WordNet as a real world knowledge source
• to reduce information loss in the transition from text to text-based structure
8
Cohesive structureof a text
• The representation is designed within the theoretical framework of lexical cohesion (Halliday and Hasan, 1976)
• Graph representation combines:1. information derived from the text2. and WordNet semantic content
– Nodes: concepts in or derived from the text connected by relations between these concepts in WordNet• such as antonymy or hypernymy
– Topological features of the knowledge base provides the facility to manipulate or control how the WordNet semantic content information is interpreted
9
Graphs of subtopics
• Skorochod'ko (1972) suggested to view the cohesion in a document as a sequence of densely interrelated subtopics
10
Sentiment Polarity Overlay
• SentiWordNet was used as it provides positive and negative polarity value for a set of “affective” terms which conflates Osgood’s (1957) evaluative and activation dimensions
• The contribution of individual polarity nodes to the polarity metric of the text as a whole is determined with 1. respect to the textual information2. and WN semantic 3. and topological features encoded in the underlying g
raph representation of the text
11
Three polarity metrics
1. Basic Cohesion Metric2. Relation Type Metric 3. Node Specificity Metric
• Basic Cohesion Metric – based solely on frequency of sentiment-bear
ing nodes in or derived from the source text • i.e. the sum of polarity values for all nodes in the
graph
12
Three polarity metrics
• Relation Type Metric – with respect to the types of WordNet relations
in the text-derived graph – sentiment value of each node is the product of
its polarity value and a relation weight for each relation this node enters into in the graph structure
– Potentially affect-effecting relations have the strongest weighting: antonymy, hypernymy
13
Three polarity metrics
• Node Specificity Metric • with respect to a measure of node specificity calculated
on the basis of topographical features of WordNet – Highly specific nodes or concepts may carry more informational
• specificity measure: – factor out population sparseness or density in WordNet by eval
uating1. the contribution of each node relative to its depth in the hierarchy2. its connectivity (branching Factor)3. its siblings
14
Further specialised accordingto the following two boolean flags
• InText:– consider potentially noisy contributions from n
odes not in the source text – the metric is calculated based on
• 1) only those nodes representing terms in the source text, or
• 2) all nodes in the graph representation derived from the text
15
Further specialised accordingto the following two boolean flags
• Modifiers: – In SentiWN, it seems that modifiers have mor
e reliable values than nouns or verbs – the metric is calculated using
• 1) all open class POS or • 2) modifiers alone
16
Evaluation • The goal is to examine the relationship between financial markets a
nd financial news– to establishing reliability for a sentiment measure of news – the quantitative measure of polarity must be consistent with human judg
ments of polarity • Not all news items may express overt sentiment
– selected a news topic which was considered a priori to have emotive content
• Corpus – takeovers and mergers are usually seen as highly emotive contexts – the topic of a corpus of news texts spans 4 months in 2006 : aggressive
takeover bid of a low-cost airline (Ryanair) for the Irish flag-carrier airline (Aer Lingus)
– Both airlines have a strong (positive and negative) emotional attachment – Collect news through the “surprise” take-over bid announcement by Rya
nair, to the withdrawal of the bid by Ryanair in December 2006
17
Evaluation
• Gold Standard – A set of 30 texts selected from the corpus– annotated by 3 people – on a 7-point scale from very positive to very
negative – the respondents were asked also to rate the
semantic orientation of the texts with respect to the two takeover bid players
– the respondents were first asked to rate their personal attitude to the two airlines
18
Evaluation
• Performance Metrics – The performance of the polarity algorithm w
as evaluated on: 1. Polarity direction: the task of assigning a binary p
ositive/negative value to a text – Performance is reported using standard recall and preci
sion metrics
2. Polarity intensity: the task of assigning a value to indicate the strength of the negative/positive polarity in a text – Performance is reported as a correlation with average h
uman ratings
19
Evaluation
• Baseline – the baseline for comparison sums the SentiW
N polarity rating for only those lexical items present in the text, not exploiting any aspect of the graph representation of the text
24
Conclusions and Future Work
• This research is expected to contribute to sentiment analysis in finance
• To capture a psychologically plausible measure of affective content of text by exploiting available resources
• To broader evaluation of the measure relative to human judgments and existing metrics