clare llewellyn lasiuk july 5th 2013
DESCRIPTION
Using argument analysis to structure user generated content.TRANSCRIPT
Clare Llewellyn University of Edinburgh
Argumentation on the web - always vulgar and often convincing?
User Generated Content
Various Conversations
Various Conversations
Main points of discussion:
RM is bad / old / Australian / has power over politicians / owns newspapers
RM does / doesn’t understand the internet
Free content is good / bad
The joke belongs to Tim Vine or Stuart Francis
Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack
The Problem
Can we somehow structure this data so we can read it and add to it at the most relevant point?
Solutions?
Argumentation
A participant makes a claim that represents their position
The participant backs up that claim with evidence
A counter claim challenges the position
The composer of the original claim may evaluate their position.
Claim
Counter Claim
Evidence
Counter Evidence
Evaluation
Macro / Micro Argumentation
Micro-level:Simple claimQualified claimGrounded claimGrounded and qualified claimNon-argumentative moves
Macro-level:ArgumentCounter argumentIntegration (reply)Non-argumentative moves
Weinberger and Fischer (2006)
Methodology*
* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)
1. Identify discussions on different topics
2. Identify spans of text that represent the core points in the discussion
3. Classify into a structure so as to define the relationships between spans of text
4. Present this information to users
Data Sets
Hand annotated corpus of tweets from the London Riots (7729) www.analysingsocialmedia.org
Comments from the Guardian newspaper (partially hand annotated for topic)
Tweets with the #OR2012 (5416)
• Extract individual discussion
• Unsupervised clustering – very objective
• Selection of algorithm
Unigram / Bigram Frequency
Incremental Clustering
K-means
Topic modelling
Possible tools
NLTK (nltk.org)
Weka (www.cs.waikato.ac.nz/ml/weka/)
Mallet (mallet.cs.umass.edu)
Twitter Workbench (www.analysingsocialmedia.org/projects)
1. Topic Identification
Example Clusters
Topic Modelling Incremental Clustering
Are you doing what a human would do?
Results for comments data:
Evaluation
2. Text Span Identification
Define a set of rules that allows the extraction of macro level argumentation
Annotated text you can use machine learning
Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim
Claim
Counter Claim
Rules production
Method:Rules are a generalisation from a large amount of data (14000 quotes)Use Words / POS / Negation / SymbolsUse the rules to find this patterns where not explicitly mentioned in text
Examples:– Before:
• @USERNAME:– After:
• i don't• i think you• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)
– Both• START X i 'm not
Tools:LTT- TTT2 www.ltg.ed.ac.uk/software/
3. Classify into a structure
Method
Based on Rose et al. (2008)
Use supervised machine learning to classify tweets into an argument structure
Using TagHelper tool kit (based on Weka) – www.cs.cmu.edu/~cprose/TagHelper.html– LightSide lightsidelabs.com– Decide on a machine learning algorithm– Define feature sets– Train and test
Data Set Tweets
Coded with the classification system:
1. Claim without evidence2. Claim with evidence3. Counter-claim without evidence4. Counter-claim with evidence5. Implicit request for verification6. Explicit request for verification7. Comment8. Other
Classification – Feature Selection
FeaturesUnigrams+ line length+ POS Bigrams + bigrams + punctuation+ stemming + no stemming + rare words + line length, punctuation and rare words+ no stop list
AlgorithmsSupport Vector MachineDecision TreeNaive Bayes