clare llewellyn lasiuk july 5th 2013

Clare Llewellyn University of Edinburgh

Argumentation on the web - always vulgar and often convincing?

User Generated Content

Various Conversations

Various Conversations

Main points of discussion:

RM is bad / old / Australian / has power over politicians / owns newspapers

RM does / doesn’t understand the internet

Free content is good / bad

The joke belongs to Tim Vine or Stuart Francis

Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack

The Problem

Can we somehow structure this data so we can read it and add to it at the most relevant point?

Solutions?

Argumentation

A participant makes a claim that represents their position

The participant backs up that claim with evidence

A counter claim challenges the position

The composer of the original claim may evaluate their position.

Claim

Counter Claim

Evidence

Counter Evidence

Evaluation

Macro / Micro Argumentation

Micro-level:Simple claimQualified claimGrounded claimGrounded and qualified claimNon-argumentative moves

Macro-level:ArgumentCounter argumentIntegration (reply)Non-argumentative moves

Weinberger and Fischer (2006)

Methodology*

* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)

1. Identify discussions on different topics

2. Identify spans of text that represent the core points in the discussion

3. Classify into a structure so as to define the relationships between spans of text

4. Present this information to users

Data Sets

Hand annotated corpus of tweets from the London Riots (7729) www.analysingsocialmedia.org

Comments from the Guardian newspaper (partially hand annotated for topic)

Tweets with the #OR2012 (5416)

• Extract individual discussion

• Unsupervised clustering – very objective

• Selection of algorithm

Unigram / Bigram Frequency

Incremental Clustering

K-means

Topic modelling

Possible tools

NLTK (nltk.org)

Weka (www.cs.waikato.ac.nz/ml/weka/)

Mallet (mallet.cs.umass.edu)

Twitter Workbench (www.analysingsocialmedia.org/projects)

1. Topic Identification

http://www.cs.waikato.ac.nz/ml/weka/

Example Clusters

Topic Modelling Incremental Clustering

Are you doing what a human would do?

Results for comments data:

Evaluation

2. Text Span Identification

Define a set of rules that allows the extraction of macro level argumentation

Annotated text you can use machine learning

Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim

Claim

Counter Claim

Rules production

Method:Rules are a generalisation from a large amount of data (14000 quotes)Use Words / POS / Negation / SymbolsUse the rules to find this patterns where not explicitly mentioned in text

Examples:– Before:

• @USERNAME:– After:

• i don't• i think you• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)

– Both• START X i 'm not

Tools:LTT- TTT2 www.ltg.ed.ac.uk/software/

3. Classify into a structure

Method

Based on Rose et al. (2008)

Use supervised machine learning to classify tweets into an argument structure

Using TagHelper tool kit (based on Weka) – www.cs.cmu.edu/~cprose/TagHelper.html– LightSide lightsidelabs.com– Decide on a machine learning algorithm– Define feature sets– Train and test

http://www.cs.cmu.edu/~cprose/TagHelper.html

Data Set Tweets

Coded with the classification system:

1. Claim without evidence2. Claim with evidence3. Counter-claim without evidence4. Counter-claim with evidence5. Implicit request for verification6. Explicit request for verification7. Comment8. Other

Classification – Feature Selection

FeaturesUnigrams+ line length+ POS Bigrams + bigrams + punctuation+ stemming + no stemming + rare words + line length, punctuation and rare words+ no stop list

AlgorithmsSupport Vector MachineDecision TreeNaive Bayes

QUESTIONS?

Clare Llewellyn University of Edinburgh

[email protected]

mailto:[email protected]

clare llewellyn lasiuk july 5th 2013

Technology