part i: a pipeline of natural language processing (nlp) techniques

Part I: A pipeline of natural languageprocessing (NLP) techniques applied to argument mining

Katarzyna (Kasia) Budzynska1 &Serena Villata2

1Centre for Argument Technology (ARG-tech)

Institute of Philosophy & Sociology, Polish Academy of SciencesUniversity of Dundee, UK

2Universite Cote d'Azur, CNRS, Inria, I3S, France

2

Who are we?

Kasia Budzynska Serena Villata

ARG-tech (Dundee & Warsaw) CNRS (Sophia)

Philosophy & Linguistics Computer Science

www.argdiap.pl/budzynska www.i3s.unice.fr/~villata/

http://arg.tech

3

Where are we?

● Computational linguistics - statistical or rule-based modelling ofnatural language

● NLP - interactions between computers and human languages

● Data mining – automatic extraction of data (often numerical)

● Text mining – automatic extraction of data from natural languagetexts

● Argument mining (ArgMin) - automatic extraction of argumentsfrom natural language texts

Why to do it?● We have a problem: good thing, i.e. lots of information, creates a

problem of lots of information...

● Big Data problem

● How much is lots (big)?

Big Data● We have a problem: good thing, i.e. lots of information, creates a

problem of lots of information...

● How much is lots (big)? When data is produced faster than we canprocess it

3Vs, 2001● Volume (amount of data): big data doesn't sample; it just observes

and tracks what happens● Velocity (speed of data processing): big data is often available in

real-time● Variety (number of types of data): big data draws from text,

images, audio, video

Big Data and industry● eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as

a 40PB Hadoop cluster for search, consumer recommendations, andmerchandising. Inside eBay’s 90PB data warehouse

● Amazon.com handles millions of back-end operations every day, aswell as queries from more than half a million third-party sellers. Thecore technology that keeps Amazon running is Linux-based and as of 2005they had the world’s three largest Linux databases, with capacities of 7.8 TB,18.5 TB, and 24.7 TB

● Facebook handles 50 billion photos from its user base.

● As of August 2012, Google was handling roughly 100 billion searches permonth

● Walmart handles more than 1 million customer transactions everyhour, which are imported into databases estimated to contain more than 2.5petabytes (2560 terabytes) of data – the equivalent of 167 times theinformation contained in all the books in the US Library of Congress

Big Data and academia● “Big Data creates numerous opportunities for driving research and

development, increasing productivity and transforming public andprivate sector organisations. RCUK (Research Councils in UK) iscapitalising on this with their innovative investment ‘Big Data andEnergy Efficient Computing’. This recognises the enormouspotential of Big Data for the UK economy with a total sum of£189 million to invest in its exploitation” (Research Councils on BigData. At http://www.rcuk.ac.uk/research/infrastructure/big-data/)

● “Worldwide Big Data technology and services are expected to growfrom 2.4 billion EUR in 2010 to 12.7 billion EUR in 2015. Thechallenge is to strengthen Europe’s position as provider ofproducts and services based on digital content and data.” (CouncilDecision Establishing the Specific Programme ImplementingHORIZON 2020)

How to process Big Data?

By mining information:

NLP techniques(e.g. machine

learning algorithms)input output

What can we mine?Sentiment analysis

● mining attitudes towards something (positive, neutral, negative)

● e.g. number of people liking new Mercedes vs. number of peoplenot liking new Mercedes

● application: stock market

Opinion mining

● Mining opinions about sth

● e.g. people thinking new Mercedes is too expensive, people thinkingnew Mercedes is reliable

● Media analysis

What can we mine?Argument mining

● Mining arguments pro- and con- people's opinions● e.g. not only the information that people like Mercedes but also why:

people like new Mercedes, because they think it is reliable

12

When did ArgMin start?● Before 2014: less than 10 papers

● After 2014:

1) From Real Data to Argument Mining, 12th ArgDiaP Conference, Polish Academyof Sciences, Warsaw, Poland, 23-24 May 2014. (22 papers)

2) 1st Workshop on Argumentation Mining, 52nd Annual Meeting of theAssociation for Computational Linguistics, Baltimore, US, June 26, 2014 (18papers)

3) SICSA Workshop on Argument Mining: Perspectives from InformationExtraction, Information Retrieval and Computational Linguistics, ARG-tech,Dundee, Scotland, 9-10 July 2014. (ca. 25 participants)

4) Frontiers and Connections between Argumentation Theory and NaturalLanguage Processing, Bertinoro (Forlì-Cesena), Italy, 21-25 July 2014.

5) 2nd Workshop on Argumentation Mining, 53nd Annual Meeting of theAssociation for Computational Linguistics, Denver, US, June 04, 2015 (16papers)

6) Arguments in Natural Language: The Long Way to Analyze the Reasons toBelieve and the Reasons to Act, 1st European Conference on Argumentation,Lisbon, 9-12 July 2015

7) Natural Language Argumentation: Mining, Processing, and Reasoning overTextual Arguments, Dagstuhl Seminar, April 17 – 22 , 2016

8) 3rd Workshop on Argumentation Mining, 54nd Annual Meeting of theAssociation for Computational Linguistics), Berlin, Aug 12, 2016 (20 papers)

What is an argument?● Pro-argument (argumentation, Default Inference) - holds between

two propositions when one proposition provides a reason to acceptanother proposition. In other words, for a given claim p, a supportingclaim q can be used to reply to the question “Why p?” (“Because q”).

● Con-argument (attack, Default Conflict) - holds between twopropositions which cannot be both true at the same time. Speakers useconflicting propositions to attack another speaker's claims by means ofproviding counter-claims.

What is an argument?

Mining arguments pro- & con-

IBM's Watson Debater:● Watson, IBM's supercomputer made famous three years ago for beating

the very best human opponents at a game of Jeopardy● now comes with an impressive new feature. When asked to discuss

any topic, it can autonomously scan its knowledge database forrelevant content, "understand" the data, and argue both for andagainst that topic.

http://arg.tech/ibmdebater

16

NLP pipeline

NL texts lots & lots

17

Texts in natural language

● legal texts (Palau and Moens 2009)

● online comments (Park and Cardie 2014; Villalba and Saint-Dizier2012)

● philosophical texts (Lawrence et al. 2014)

● technical texts (Saint-Dizier. 2014)

● scientific papers (Teufel et al. 2009)

● moral debate in radio (Budzynska et al. 2014)

18

Texts in natural language: Moral Maze, BBC4 Radio Program

Michael Buerk: Michael Portillo?

Michael Portillo: I suppose it’s difficult for savers to take the high moralground, because they are lenders. And if they're lenders, that impliesthere are borrowers.

Simon Rose: Oh yes, of course. I mean there should be both savers andborrowers, naturally. I mean what savers are doing, by delayingconsumption, is providing the capital that one hopes will go to creategrowth in the economy.

Michael Portillo: But I wonder if it’s, as it were, intellectually honest tokind of play out the virtues of saving, as opposed to borrowing, whenreally, unless the two kind of balance out in an economy, there’s nopoint saving. If somebody’s not willing to reward you by borrowingyour savings, there’s no point doing the saving.

19

NLP pipeline


annotation scheme (theory)

20

Theories & Annotation schemes

Theory: Basic argument structure

Scheme (Palau & Moens 2009):

● Premise● Conclusion● Argument

21





Theory: (Swales 1990)

Scheme (Teufel et al. 2009):

● Aim● Textual● Own● Background ● Contrast● Basis● Other

22





Theory: (Swales 1990)

Scheme (Teufel et al. 2009):

● Aim● Textual● Own● Background ● Contrast● Basis● Other

Theory: Inference Anchoring Theory, IAT(Budzynska & Reed 2011)

Scheme (Budzynska et al. 2014b):

● Inference● Conflict● Rephrasing● Transition● Illocutionary connections

– Assertion– Pure question– Assertive question– Rhetorical question– Pure challenge– Assertive challenge– Rhetorical challenge– Popular concession– Agreement– Disagreement– Argumentation

23

NLP pipeline



24

Manual annotation: xml

Michael Buerk: Michael Portillo?

Michael Portillo: I suppose <argumentation><assertion> it’s difficult forsavers to take the high moral ground</assertion>, because<assertion>they are lenders</assertion></argumentation>.

Simon Rose: <rephrasing><agreement>Oh yes, of course</agreement>.I mean there should be both savers and borrowers,naturally</rephrasing>.

25

Manual annotationOVA+ (http://ova.arg-tech.org/)

26

Corpus: MM2012a(http://corpora.aifdb.org/)

27

Argument corpora

● MM2012a (Budzynska et al. 2014b; http://corpora.aifdb.org/mm2012a)

– 4 transcripts of the BBC 4 radio program Moral Maze – Analysed in OVA+– 58,000 words

● 1502 asserting● 297 questing or challenging● 581 arguing● 204 disagreeing● 102 agreeing

● AraucariaDB corpus (Reed et al. 2008; http://corpora.aifdb.org/araucaria)

– Newspaper editorials; Parliamentary Records; Judicial summaries; Discussionboards

– Analysed in Araucaria

– About 5 items from each of the 20 sources over a period of several weeks

– 700 analyses; almost 4,000 atomic propositions; 1,500 reconstructed premises;80,000 words

28

NLP pipeline



kappa

29

Inter-annotator agreement

Annotator 1 Annotator 2

Reasonably good > 70%

30

Results evaluation

● Agreement - simple proportion of matches (the percentage ofmatches)

● Cohen's kappa - the agreement between two annotators who eachclassify N items into C mutually exclusive categories (Cohen, 1960)

– po is the relative observed agreement among annotators

– pe is the probability of chance agreement (the probabilities of each

annotator randomly saying each category)

– 0.41–0.60 moderate, 0.61–0.80 substantial, 0.81–1 almost perfectagreement (Landis and Koch, 1977)

31

Inter-annotator agreements

● eRulemaking corpus (Park & Cardie 2014) – 2 annotators

– 0.73

● Corpus MM2012a (Budzynska et al. 2014b) – 2 annotators– Segmentation: 0.81

– Illocutionary forces : 0.90

– Conflict relations : 0.80

– Inference relations : 0.75

– Transitions : 0.77

– Indexical illocutionary forces: 0.78

32

NLP pipeline


grammars +classifiers


kappa

33

NLP techniques (text mining)

Choose the approach:

● Structural approach - grammars (i.e. hand coded set of rules)

● Statistical approach – machine learning (i.e. general learningalgorithms)

– Supervised ML

– Unsupervised ML

Look at existing similar work:

● Sentiment analysis

● Opinion mining

34

Statistical NLP techniques (Moenset al. 2007)

Features:● Unigrams – Each word in the sentence.● Bigrams – Each pair of successive words.● Trigrams – Each three successive words.● Adverbs – Detected with a part-of-speech (POS) tagger (e.g. QTag).● Verbs – Detected with a POS tagger. Only the main verbs (excluding “to be”, “to do” and “to

have”) are considered.● Modal auxiliary – Indicates if a modal auxiliary is present using a POS tagger.● Word couples – All possible combinations of two words in the sentence are considered.● Text statistics – Sentence length, average word length and number of punctuation marks.● Punctuation – The sequence of punctuation marks present in the sentence is used as a feature

(e.g. “:.”). When a punctuation mark occurs more than once in a row, it is considered the samepattern (e.g. two or more successive commas both result in “,+”).

● Key words – Keywords refer to 286 words or word sequences obtained from a list of termsindicative for argumentation. Examples from the list are “but”, “consequently”, “because of”.

● Parse features – In the parse tree of each sentence we used the depth of the tree and the numberof subclauses as features.

Classifiers:

– Naive Bayes classifier

35

Statistical NLP techniques

classifiers

find features

manually annotated NL text

input

classifiers

learned features

unannotated NL text

automatically annotated NL text

input output

TRAINING PHASE:

TESTING PHASE:

36

NLP pipeline




automaticallyannotatedarguments

kappa

37

Automatically annotated data

<utterance speaker = ”lj” illoc = ”standard assertion”><textunit nb = ”215”> it was a ghastly aberration </textunit> .

<utterance speaker = ”cl” illoc = ”RQ”> <textunit nb= ”216”> or was it in fact typical ? < /textunit> </utterance> .

<utterance speaker = ”cl” illoc = ”RQ-AQ”> <textunit nb = ”217”> was it the product of a policy that was unsustainable that could only be pursued by increasing repression? < /textunit> .

38

NLP pipeline





kappa

cf.

precision& recall

39

Results evaluation

● Accuracy - simple proportion of matches (the percentage of matchesbetween manual vs. automatic annotation)

– Manual annotation = gold standard

40

Results evaluation

● Recall - how many times the program 'missed out argument':

– R = tp/(tp+fn)

● Precision - how many times the program 'found ghost-argument':

– P = tp/(tp+fp)

Manualannotation

Automaticannotation

Interpretation Precision Recall

argument argument TP

– – TN

argument – FN 1/(1+1) = .5

– argument FP 1/(1+1) = .5 1/(1+1) = .5

argument – FN .5 1/(1+2) = .33

– argument FP 1/(1+2) = .33 .33

41

Results evaluation

● Recall - how many times the program 'missed out argument':

– R = tp/(tp+fn)

● Precision - how many times the program 'found ghost-argument':

– P = tp/(tp+fp)

● F1 score (F-score, F-measure) – the proportion of precision & recall:

–

42

Results evaluation

● ECHR corpus (Palau & Moens 2009), f-score:– 0.68 – classification of premise

– 0.74 – classification of conclusion

– 0.60 – determination of argument structure

● eRulemaking corpus (Park & Cardie 2014)

– For UnVerif vs All – Pre: 86.86, Rec: 83.05, f-score: 84.91;

– For VERIFnon vs All – Pre: 49.88, Rec: 55.14, f-score: 52.37

● Argument Schemes corpus (Lawrence and Reed 2015)

– Discourse Indicators – Pre: 1.00, Rec: 0.08, f-score: 0.15

– Topic Similarity – Pre: 0.70, Rec: 0.54, f-score: 0.61

– Schematic Structure – Pre: 0.82, Rec: 0.69, f-score: 0.75

– Combined Methods – Pre: 0.91, Rec: 0.77, f-score: 0.83

43

NLP pipeline





kappa

cf.

.

.

.

.real

arguments

.

.

.

.automatically

extracted arguments

precision& recall

44

Summary

● Text mining is the response for Big Data problem

● Argument mining is the natural continuation of other areas of TxtMinwhich proved to be highly valuable commercially

● The NLP pipeline consists of two general steps:– linguistic step: development of large corpora with manual

annotated structured argument data (evaluated by inter-annotatoragreement)

– computational step: the development of either grammars (structural approach) or classifiers (statistical approach) toautomatically annotate arguments (evaluated by accuracy or F1-score)

45

● F. Boltuzic and J. Snajder. 2014. Back up your Stance: Recognizing Arguments in Online Discussions. Proc. of the FirstWorkshop on Argumentation Mining. ACL, Baltimore, Maryland, 49–58.

● K. Budzynska, M. Janier, J. Kang, C. Reed, P. Saint-Dizier, M. Stede, and O. Yaskorska (2014a) Towards Argument Mining fromDialogue. Proc. of the Fifth International Conference on Computational Models of Argument (COMMA 2014). 185–196.

● K. Budzynska, M. Janier, C. Reed, P. Saint-Dizier, M. Stede, and O. Yaskorska. (2014b) A Model for Processing IllocutionaryStructures and Argumentation in Debates. Proc. of the 9th of the Language Resources and Evaluation Conference (LREC). 917–924.

● E. Cabrio and S. Villata. 2012. Generating Abstract Arguments: a Natural Language Approach. Proc. of the Fourth InternationalConference on Computational Models of Argument (COMMA 2012). IOS Press, 454–461.

● V. Wei Feng and G. Hirst. 2011. Classifying arguments by scheme. Proc. of the 49th Annual Meeting of ACL: Human LanguageTechnologies-Volume 1. Association for Computational Linguistics, 987–996.

● J. Lawrence and C. Reed. 2015. Combining Argument Mining Techniques. Proc. of the 2nd Workshop on ArgumentationMining. Association for Computational Linguistics, Denver, CO, 127–136.

● J. Lawrence, C. Reed, C. Allen, S. McAlister, & A. Ravenscroft. 2014. Mining Arguments from 19th Century Philosophical TextsUsing Topic Based Modelling. 1st Workshop on Argumentation Mining. ACL,79-87.

● M.F. Moens, E. Boiy, R. M. Palau, and C. Reed (2007) Automatic detection of arguments in legal texts. Proc. of the 11thinternational conference on Artificial intelligence and law. ACM, 225–230

● H. Nguyen and D. Litman. 2015. Extracting Argument and Domain Words for Identifying Argument Components in Texts. Proc.of the 2nd Workshop on Argumentation Mining. ACL, 22–28.

● N. Ong, D. Litman, and A. Brusilovsky. 2014. Ontology-Based Argument Mining and Automatic Essay Scoring. Proc. of the FirstWorkshop on Argumentation Mining. ACL, 24–28.

● R. Palau and M.F. Moens. 2009. Argumentation mining: the detection, classification and structure of arguments in text. InProceedings of the 12th international conference on artificial intelligence and law. ACM, 98–107.

● J. Park and C. Cardie (2014) Identifying Appropriate Support for Propositions in Online User Comments, Proc. of the ACLWorkshop on Argumentation Mining.

● A. Peldszus. 2014. Towards segment-based recognition of argumentation structure in short texts. Proc. of the First Workshop onArgumentation Mining. Association for Computational Linguistics, Baltimore, Maryland, 88–97.

● A Peldszus and M. Stede. 2013a. From argument diagrams to argumentation mining in texts: a survey. International Journal ofCognitive Informatics and Natural Intelligence (IJCINI) 7, 1 (2013), 1–31.

● C. Reed, R. Palau, G. Rowe, and M.F. Moens. (2008) Language resources for studying argument. Proc. of the 6th LanguageResources and Evaluation Conference (LREC), 91–100.

● P. Saint-Dizier. 2014. Challenges of Discourse processing: the case of technical documents, Cambridge SP. ● C. Stab and I. Gurevych. 2014. Identifying Argumentative Discourse Structures in Persuasive Essays. Proc. of the 2014

Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, Doha, Qatar, 46–56.● M. Villalba and P. Saint-Dizier. 2012. Some Facets of Argument Mining for Opinion Analysis. Proc. of the Fourth International

Conference on Computational Models of Argument (COMMA 2012). 23–34.● A. Wyner, W. Peters, and D. Price. 2015. Argument Discovery and Extraction with the Argument Workbench. Proc. of the 2nd

Workshop on Argumentation Mining. ACL, Denver, CO, 78–83.

part i: a pipeline of natural language processing (nlp) techniques

Documents