an attempt at group belief characterization and detection danny dunlavy computer science and...
Post on 12-Jan-2016
219 Views
Preview:
TRANSCRIPT
An Attempt at Group Belief
Characterization and Detection
Danny DunlavyComputer Science and Informatics Department (1415)
Sandia National Laboratories
Nick Pattengale, Travis Bauer
July 23, 2008
SAND2008-5426P
Disclaimers
• We do not think our problem is well formed
• We are not sure whether our approach is sound
• We are not confident an answer is in our data
Problem Description
• Given– Set of beliefs / statements
– Set of groups
– Beliefs held by groups
– Documents associated with groups
• Tasks– General: Detect / track / predict beliefs and /or changes
– Specific 1: Detect change in belief at a given point in time• Dates: July 2005-July 2006; split date: January 2006• Data marked as “Before” and “After”
– Specific 2: Differentiate between groups by belief
Could have been Jenny Holzerisms
1) Exceptional people deserve special concessions
2) Potential counts for nothing until it's realized
3) Reticence and secrecy are excellent pasttimes
4) People won't behave if they have nothing to lose
5) Fake or real indifference is a powerful weapon
6) Guilt and self-laceration are indulgences
7) Myth can make reality more intelligible
8) To disagree presupposes moral integrity
9) It is heroic to try to stop time
10) It can be helpful to keep going no matter what
1) Hamas is a terrorist organization
2) Hamas should disarm
3) Hamas should take part in government
4) Hamas should take part in PNA elections
5) Israel is a state
6) Israel should be destroyed
7) Israel should occupy Palestine
8) Oslo Accords is a peace solution
9) Political law is Islamic law
10) There exists a two state solution
Beliefs
Groups
• Fatah (F)• Islamic Jihad (IJ)• Israel (I)• Military Wing (MW)• Muslim Brotherhood (MB)• Palistinian Authority (PA)• Political Bureau (PB)• Quds Brigades (QB)• Syria (S)• United States (US)
Beliefs Held by Groups
1 2 3 4 5 6 7 8 9 10
F X X X X X X
IJ X
IJ X X X X X X X
MW X X X
MB X X X
PA
PB X X X X X X
QB X X X
S X
US X X X X X X
Belief
Gro
up
Beliefs Held by Groups
1 2 3 4 5 6 7 8 9 10
F -0.5 0.5 0.5 0.5 0.5 0 0.5 0 -0.5 0.5
IJ -1 -1 -1 -1 -0.5 0.5 -1 -1 1 1
IJ 1 1 0.5 0.5 0.5 -0.5 1 1 -1 -1
MW -0.5 -0.5 -0.5 0 -0.5 1 -0.5 -0.5 0.5 0.5
MB -1 -1 -1 -1 -0.5 0.5 -1 -1 1 1
PA 0 0 0 0 0 0 0 0 0 0
PB -0.5 0.5 0.5 0.5 -1 1 0.5 -0.5 -0.5 0.5
QB -1 -1 -1 -1 -0.5 0.5 -1 -1 1 1
S 0 0 0 0 -0.5 0.5 -0.5 0 0 0
US 1 1 1 1 0 -0.5 1 1 -1 -0.5
Belief
Gro
up
Documents
Number of Documents
0
200
400
600
800
1000
1200
F IJ I MW MB PA PB QB S US
Group
Before
After
Documents
Words Per Document
0
200
400
600
800
1000
1200
1400
1600
F IJ I MW MB PA PB QB S US
Group
Before
After
Solution Approach
• Split data into two groups– Before (training) / After (testing)
• Create a weighted vector space model– STANLEY– Term space defined by “Before” split
• Create binary classifier models– Scenario 1: Model each group per belief– Scenario 2: Model all groups per belief
• Apply classifier models– Apply models for a group to that group’s documents
• Do test documents align with the same beliefs in general?
– Apply model for all groups to each group’s documents• Can we align beliefs and/or groups to specific documents?
Identified Challenges / Issues / Problems
• Beliefs used as labels only– Semantics/meaning of beliefs not used in analysis
• Beliefs labeled by subject matter experts based on understanding of groups and beliefs– Data not considered in labeling process
• Groups are labeled by beliefs, not data– Documents labeled by group– Groups labeled by beliefs
• Data collected using keyword search related to groups only– Beliefs not taken into account– Data is about groups, not authored by groups
• Data not labeled for validation of problem we are solving– Detected changes cannot be validated– Method evaluation is difficult
Binary Classifier Methods
• Random Forest (D. Dunlavy)– Ensemble of decision tree base classifiers (200)
• Data sampling with replacement to train each base classifier (10%)• Feature sampling at each node split in the trees (100)• Information gain (entropy) used to determine feature and split used
• Kernel Perceptron (T. Bauer [analysis], J. Basilico [code])
– Classification function:
– Linear kernel:
– Polynomial kernel:
– Radial Basis kernel:
Evaluation
• Labeling statistics– Positive: has a belief; negative: does not have belief
• TP: true positives (labeled +, predicted +)• TN: true negatives (labeled -, predicted -)• FP: false positives (labeled -, predicted +)• FN: false negatives (labeled +, predicted -)
• Performance Measures– Accuracy:
– Precision:
– Recall:
Training Results
Training (*Before)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 2 3 4 5 6 7 8 9 10
Belief
Acc
ura
cy
Random Forest
Polynomial Kernel Perceptron
Linear Kernel Perceptron
Radial Basis Kernel Perceptron
Training Results
Training (*Before)
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 2 3 4 5 6 7 8 9 10
Belief
Pre
cisi
on Random Forest
Polynomial Kernel Perceptron
Linear Kernel Perceptron
Radial Basis Kernel Perceptron
Training Results
Training (*Before)
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 2 3 4 5 6 7 8 9 10
Belief
Rec
all
Random Forest
Polynomial Kernel Perceptron
Linear Kernel Perceptron
Radial Basis Kernel Perceptron
Testing Data
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 2 3 4 5 6 7 8 9 10
Belief
Ac
cu
rac
y
F
IJ
I
MW
MB
PA
PB
QB
S
US
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 2 3 4 5 6 7 8 9 10
Belief
Pre
cisi
on
F
IJ
I
MW
MB
PA
PB
QB
S
US
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 2 3 4 5 6 7 8 9 10
Belief
Recall
F
IJ
I
MW
MB
PA
PB
QB
S
US
Polynomial Kernel Perceptron
1 2 3 4 5 6 7 8 9 10
F
IJ
IJ
MW
MB
PA
PB
QB
S
US
Percentage Correct: 68.00%
Accuracy: Green indicates that the model chose the belief that the SME chose. Red indicates that the software chose differently.
Random Forest
1 2 3 4 5 6 7 8 9 10
F
IJ
IJ
MW
MB X
PA
PB
QB
S X X X X
US X
Percentage Correct: 72.00%
Accuracy: Green indicates that the model chose the belief that the SME chose. Red indicates that the software chose differently.
General Thoughts / Questions
• What features are important / available?– We used terms
• Problems: negation, lack of context, intent– Audience, purpose, goal, context of document
• Would you say something different if different people were here?• Are we modeling groups or individuals?
– Outliers, subgroup detection• Who/what is the source of data/documents?
– Group members versus outsiders (reporters, etc.)– Level of intimacy with or knowledge of group– Can we incorporate / model perspective into analysis?
• Can we identify / define an ideology?– Do we need to in order to model changes in ideology?
• Is there a topology of ideologies?– Are relationships between ideologies important?
Thank You
An Attempt at Group Belief
Characterization and Detection
Danny Dunlavy
dmdunla@sandia.gov
http://www.cs.sandia.gov/~dmdunla
top related