a differential semantics approach to the annotation of synsets in wordnet dan tufiş, dan...

26
A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial Intelligence Romanian Academy

Upload: vivien-adams

Post on 30-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

A Differential Semantics Approach to the Annotation of

Synsets in WordNet

Dan Tufiş, Dan Ştefănescu{tufis danstef}@racai.ro

Research Institute for Artificial Intelligence Romanian Academy

Connotation Versus Subjectivity

• There are various definitions of these terms, and most time they are taken to be synonymic (although they are not).

• Connotation of a word (which is contrasted with the denotation of a word-the dictionary meaning) is intrinsically subjective, referring to emotional responses commonly associated with its referent (that to which it refers).

2

• Subjectivity refers to words’ capacity of being used in expressing opinions, particular feelings, beliefs, and desires.

• The connotation is more about word meanings while the subjectivity is more about phrases/sentences meaning; subjectivity is on an upper layer and builds on the connotations of the constituents

Connotation Versus Subjectivity

3

– the word home means “the place where one lives”, but by connotation, also suggests something good i.e security, family, love and comfort....

– the word murder means “unlawful premeditated killing of a human being by a human being” but by connotation, also suggests something bad i.e. fear, disgust, repulsion…

– the word criminal means “someone who has committed (or been legally convicted of) a crime” but by connotation, it suggests a very bad person, fear, disgust, repulsion…

4

Theory of Semantic Differentiation • (Osgood et al., 1957) “The Measurement of Meaning”.• The semantic differentiation technique uses pairs of

antonyms (factors) as semantic dimensions along which most adjectives can be differentiated. Lots of subjects were asked to scale the meanings of words along several factors

• Osgood and his colleagues showed that most of the variance in judgment could be explained by only three factors– <good-bad> (evaluative)– <strong-weak> (potency)– <active-passive> (activity)

5

Kamps and Marx Approach to PWN Adjectives Annotation

Jaap Kamps and Maarten Marx “Words with Attitude” (2004) is based on (Osgood et al., 1957) “The Measurement of Meaning”.– they gave an algorithmic interpretation to the semantic

differentiation techniques based on the synonymy relations as encoded in WordNet1.7

– deals only with adjectives in WordNet and ignores the sense distinctions

6

Some Definitions

• Two words w and w are related if there exists a sequence of words (w w1 w2 … wi … w ) so that any two adjacent words in the sequence belong to the same synset. If the length of such a sequence is n+1, one says that w and w are n-related.

• good and proper are 2-related :– synset 0161119-a: (good:14 right:13 ripe:3)– synset 00140845-a: (right:6 proper:3 suitable:3)

7

Some OtherDefinitions

• Let MPL(wi, wj) be the partial function:

– MPL(wi, wj)=

• n if n is the smallest number such that wi and wj are n-related and

• undefined if wi and wj are not related

• MPL(wi, wj) has the following properties:

– MPL(wi, wj) = 0 iff wi, = wj

– MPL(wi, wj) = MPL(wj, wi)

– MPL(wi, wj) + MPL(wj, wk) MPL(wi, wk)

• MPL is a distance measure that can be used as a metric for the semantic relatedness of two words. 8

Some More Definitions

• Let <w,w> be a differential semantic factor.

• The partial function: – TRI (wi, w, w) = (MPL(wi,w) - MPL(wi,w)) / MPL(w,w)

when all MPL are defined

– undefined otherwise

• TRI(wi, w, w) [-1,1] measures the closeness of ∊ wi to the factor poles:– a negative value means wi is closer to w while, a positive one

indicates closeness to w

9

Our Approach: Some Definitions

• Let <w,w> be a differential semantic factor

• Any word wi in WordNet that can be reached on a path from w to w is given a score number which is a function of the distances from wi to w and to w (TRI)

• The set of these words defines the coverage of the <w,w> factor – COV(<w, w>).

• The set of all synsets containing the words defines the semantic coverage of the corresponding S-factor – SCOV(<Sw,Sw >).

10

AA

w

ww

w

w w

ww

w

w w

w

w

ww

w w

ww

w

w

B

COVERAGE of factors <AA>, <B B> … is the same

B

Moving towards synsets• S-factor = a pair of synsets (S,S) for which there exists wi

α:siα∊S

and wi β:si

β∊Sβ so that wi

α:siα and wi

β:si β are antonyms and

MPL(wiα,wi

β) is defined.S and S have opposite meanings, but only

wiα and wi

β are antonyms.

• MPLS(S,S) = MPL(wiα,wi

β)

• A synset or a literal of a certain synset is n-scoped relative to an S-factor <S,S> if the synset’s SUMO/MILO label L is a node in the tree-like structure having as root the n-th parent of the lowest common ancestor of the SUMO/MILO labels corresponding to S and S.

• n defines the depth of the scope coverage SCOVn(<S,S>) and every synset in this coverage is n-scoped. If the root synset is Sγ we will use also use the notation SCOV(< S,S>)Sγ 12

13

Scope

Moving towards synsets

• For an S-factor <S,S> TRIS (Si, S, S) is defined as the average of the TRI values associated to the literals making the synset:

• where m is the number of literals in Si, wj are the literals in Si, and <wk

α,wl β> is the factor determining the <S,

S> S-factor

• TRIS has values in the [-1,1] interval. If it is not defined, we assign TRIS a value outside the considered interval (2 for exampe).

14

Some examples: scoped S-factors

• ({fairness:1 …}<-> {unfairness:2…}) NormativeAttribute

• ({comfort:1 …} <–> {discomfort:1 …}) StateOfMind

• ({trust:3 …} <–> {distrust:2 …}) TraitAttribute

• ({increase:3… }<->{decrease:2…}) QuantityChange

• ({demand:1…}<->{supply:2…}) Entity

• ({good:1…}<->{bad:1…}) SubjectiveAssessmentAttribute

• ({strong:1…}<->{weak:1…}) SubjectiveAssessmentAttribute

• ({active:1…}<->{inactive:2}) BiologicalAttribute

15

Factors & Scoped S-factorsWord Class Factors Scoped

S-FactorsCoverage (literals)

Coverage (synsets)

Adjectives 335 332 5,307 (24.68%)

5,291 (28.50%)

Adverbs (factors & scores imported from adjectives)

335 332 1,943 (41.69%)

1,571 (42.87%)

Nouns 85 78 11,109 (9.59%)

11,007 (13.81%)

Verbs 254 247 6,467 (57.19%)

8,589 (64.58%)

16

The Annotations• Each connotative synset attached with a vector the

size of which depends on the POS of the synset– Noun synsets => a vector of 78 values– Verb synsets => a vector of 247 values– Adj and Adv synsets => a vector of 332 values

• Although all factors for a POS cover the same synsets, the vectors for different synsets of the same POS may be very different;

• The selection of the scoped S-factors has tremendous relevance with respect to the domain for which a text analysis is achieved

18

S-Factors & Connotation Scores

19

A tool for connotation

scoring

Some examples• Assume we selected the following factors

– for nouns:• ({comfort:1 …} – {discomfort:1 …}) StateOfMind

• ({pleasure:1 …} – {pain:2 …}) EmotionalState

• ({trust:3 …} – {distrust:2 …}) TraitAttribute

– for verbs:• ({get well:1…}– {get worse:1…} OrganismProcess

• ({enjoy:4… }– {suffer:1…}) AsymmetricRelation, IrreflexiveRelation

• ({believe:1…} – {disbelieve:1 …}) Entity

“His lies will be dealt with in the court and his immorality will be proved.”

The mark-up of the exampleHis lies:1 <comfort:-0.11 pleasure:-0.23 trust:-0.11> will be dealt:2 <get well:0.42 enjoy:0 believe:0.62> within the court:1 <comfort:-0.11 pleasure:-0.07 trust:-0.11> and his immorality:2 <comfort:-0.22 pleasure:-0.07 trust:-0.11> will be proved:3 <get well:0.14 enjoy:-0.2 believe:0.25>. A rough analysis suggests that we have a subjectively loaded sentence which expresses: -lack of confort (i.e discomfort (average score: -0.14), -lack of pleasure (i.e pain (average score: -0.12), -lack of trust (i.e distrust (average score -0.11), -getting well (average score: 0.28), -not enjoying (i.e. suffering (average score: -0.1) and -believing (average score: 0.43).

An alternative mark-up of the example

In SentiWordNet (Essuli &Sebastiani, 2006) each synset is annotated by a triplet <P: N: O:γ> where P denotes the synset positive load, N stands for its negative load and O represents the objectity of the considered meaning:

His lies:1 <P:0 N:0 O:1> will be dealt:2<P:0.125 N:0 O:0.875> within the court:1 <P:0 N:0 O:1> and his immorality:2 <P:0.75 N:0 O:0.25> will be proved:3<P:0 N:0 O:1>.

In terms of <P, N, O> triad, one would eventually obtain an almost objective statement (average score 0.825) with a significant load of positivism (average score 0.175) and no negativity at all.

Using the S-factors to Compute Binary Judgments

Score (synsetK) =

– SF is the set of S-factors for the POS of synsetK, while Vector (i, synsetK) is the value of the kth cell of the S-factor vector associated with the synsetK

– A negative Score means that the aggregated connotational value for the synsetK is closer to the connotations induces by the first words in the antonymic pairs making the S-factors, while a positive value means closeness to the second words of the antonymic pairs.

|,|1

),(*SFi

Ki synsetiVector

Data and tools: http://www.racai.ro/differentialsemantics/

0_0

• N: 38 / 78 (48.71%)

• V: 73 / 247 (29.55%)

• A: 252 / 332 (75.90%)

26