claimskg: a knowledge graph and a model for controversial ... · models are still strongly...

45
Konstantin Todorov Katarina Boland, Stefan Dietze, Pavlos Fafalios, Andon Tchechmedjiev ClaimsKG: A Knowledge Graph and a Model for Controversial Claims Nov. 22, 2019 - Montpellier

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Konstantin Todorov

Katarina Boland, Stefan Dietze, Pavlos Fafalios, Andon Tchechmedjiev

ClaimsKG: A Knowledge Graph and a Model for Controversial Claims

Nov. 22, 2019 - Montpellier

Page 2: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Beyond Facts: Claims and Discourse on the Web

2

Page 3: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Beyond Facts: Claims and Discourse on the Web

3

Page 4: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

A map to this talk

4

Page 5: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

5

Page 6: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

6

statement or assertion that something is the case, typically without providing evidence or proof

Page 7: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

7

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

Page 8: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

8

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

the assertion the argumentaims to proveor the thesisto be justified

AM

Page 9: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

the assertion the argumentaims to proveor the thesisto be justified

AM

a proposition, an idea which is eithertrue or false, put forward by somebody as true

Page 10: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

10

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

the assertion the argumentaims to proveor the thesisto be justified

AM

a proposition, an idea which is eithertrue or false, put forward by somebody as true

general, concise statement that directly supports or contests the topic

Page 11: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

11

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

the assertion the argumentaims to proveor the thesisto be justified

AM

a proposition, an idea which is eithertrue or false, put forward by somebody as true

general, concise statement that directly supports or contests the topic

statement formulating a problem together with a concrete solution

QA

Page 12: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: what is a claim?

12

statement or assertion that something is the case, typically without providing evidence or proof

statement supported by (a group of) people or organizations that appear newsworthy, significant and verifiable

the assertion the argumentaims to proveor the thesisto be justified

AM

a proposition, an idea which is eithertrue or false, put forward by somebody as true

general, concise statement that directly supports or contests the topic

statement formulating a problem together with a concrete solution

QA

parametrized query over a database

DB

Page 13: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: examplesAre these two the same?

- Mitt Romney says ‘Barack Obama could have gotten crippling sanctions against Iran. He did not.’

- Paul Ryan says ‘the Obama administration ‘watered down sanctions’ against Iran.’

And those two?- “Climate change is a hoax invented by the chinese.”, D. Trump- “Climate change is a hoax invented by the chinese.”, B. Obama

What is being assessed here?- Colin Kaepernick says Winston Churchill said, "A lie gets halfway around the

world before the truth has a chance to get its pants on."—> the claim is true, while the claim in the claim is false. 13

Page 14: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation: examplesWhat here is the claim to extract? (Example from evidence detection studies)

- Topic: Use of performance enhancing drugs (PEDs) in professional sportsClaim (conclusive part): PEDs can be harmful to athletes healthEvidence: A 2006 study examined 320 athletes for psychiatric side effects induced by anabolic steroid use. The study found a higher incidence of mood disorders in these athletes compared to a control group.

Annotating claims - what is the event to annotate here?- “The UN summit did not allow to make much progress on the topic of environment”,

said E. Macron, at a press conference at the Elysée.

14

Page 15: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation

A claim is inherently complex, where its interpretation usually is strongly dependent on its context, such as its source, timing, or location.

A claim carries a variety of intentional or unintended meanings: subtle changes in the wording or context can have significant effects on its validity

The used terminology and the underlying conceptual understandings / models are still strongly diverging across communities

Need clear definitions and structured knowledge about claims, their context and constituents to enable machine-interpretation, discoverability and reuse

15

Page 16: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conceptual ModelThree main components:

1. proposition2. utterance3. context

16

Page 17: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conceptual Model - Claim Propositionthe meaning of one or more semantically equivalent claim utterances expressed in different linguistic forms or contexts

17

Page 18: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conceptual Model - Claim Utteranceact of expressing a claim proposition in a specific natural language and form

18

Page 19: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conceptual Model - Claim Contextbackground information about the utterance needed to understand the proposition

19

● who● where● when● why

Page 20: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Conceptual Model - Instantiation Example

20

Page 21: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

RDF Implementation

21

i) relying on stable term identifiers and persistent hosting, ii) supported by a community, iii) extensible

Page 22: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

In practice - ClaimsKG: a Knowledge Graph of Fact-checked Claims

22

A. Tchechmedjiev, K. Boland, P. Fafalios, S. Dietze, K. Todorov et al., ClaimsKG – A Knowledge Graph of fact-checked Claims, ISWC2019

A. Tchechmedjiev, K. Boland, P. Fafalios, S. Dietze, K. Todorov et al., Exploring Fact-checked Claims and their Descriptive Statistics, ISWC2019 - Satellites

https://data.gesis.org/claimskg/site/

Page 23: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation

23

Page 24: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Motivation

24

Page 25: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Examples of Dedicated Resources

25

Web sources Crowdsourced / manual annotation

The “Liar” benchmark - 12.8K claims from Politifact

Clef “Check that!” challenge- 150 claims from Factcheck.org

The “Emergent” dataset- 300 claims from various

fact-checking portals

Limited in size, static, not regenerable, lack metadata/context, no shared data model

The Open Domain Deception Dataset

- 7.2K claims from 512 unique contributors

The SemEval challenge- 5.5K claims

The FEVER dataset- >185K claims

Page 26: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG: A KG of Fact-checked Claims…and more

A dataset- Openly available dynamic KG of claims and associated metadata

A model- A schema for fact-checked claims based on established vocabularies

Tools for construction- An open source pipeline for crawling, extracting and lifting data

Applications for search and exploration- User-friendly open source apps for non-computer scientists

26

Page 27: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

What is a claim for ClaimsKG?

27

Claim: a statement reviewed by a fact checking organisation.

In the rest of this talk:

Page 28: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

A Fact-checked Claims Model

28

Page 29: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

A Fact-checked Claims Model: an Example

29

Page 30: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

30

Page 31: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

Snopes - https://www.snopes.com/ Politifact - http://www.politifact.com/ Africa Check - https://africacheck.org/ Truth Or Fiction - http://TruthOrFiction.com Check Your Fact - http://checkyourfact.com

FactsCan - http://factscan.ca/ Fact Check AFP - https://factcheck.afp.com/ Factuel AFP - https://factuel.afp.com/ Full Fact - https://fullfact.org/

→ Mainly related to politics, but country-specific and multilingual.31

Page 32: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

Website-specific extractors → a multi-sourced dataset (as a CSV file)

- the text of the claim;- its truth value (both a normalized value and the original one);- a link to the claim review from the fact-checking website;- the references used to verify the claim;- the entities from the claim and its review, their Wikipedia categories;- the author of the claim and of the claim review;- the date of publication of the claim and of the review;- the title of the review article;- a set of keywords that act like topics (e.g., “healthcare” or “abortion”)

https://github.com/claimskg/claimskg-extractor

32

Page 33: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

Entities annotation → names of persons, organisations, locations,…

- Entities extracted from the texts of the claims, their reviews and keywords - Links to Wikipedia pages and a DBpedia URI- Using the TagMe tool (https://tagme.d4science.org/tagme/)

https://github.com/claimskg/tagme

33

Page 34: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

Populating the KG following our data model

- Read the extracted information from the CSV file - Generate unique URI identifiers as UUIDs using key attributes as seeds- Handling simple claims co-references via exact matching

- Regular updates (bi-monthly)- Maintenance

https://github.com/claimskg/claimskg_generator

34

Page 35: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

ClaimsKG Construction Pipeline

Tools for access, information retrieval and exploration

→ public SPARQL endpoint: https://data.gesis.org/claimskg/sparql→ Open source user apps for exploring fact-checked claims and their descriptive statistics built on top of ClaimsKG and its SPARQL endpoint

- ClaimsKG Explorer: https://data.gesis.org/claimskg/explorer/home- ClaimsKG Statistical Observatory: https://data.gesis.org/claimskg/observatory

35

Page 36: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Explore more claims by that or other politicians or public figures - true or false, related to a particular time-frame or to a given topicAccess the claims’ references and journalistic reviewsLearn more about their statistics, e.g., when was the peak of false claims last year and on what topic?Discover trends, e.g., how have falsehoods about immigration been spreading over the past 7 years?Extract labeled data to train your fact checking tool!

Web apps for claims exploration and search

Page 37: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Web apps for claims exploration and search

37

https://data.gesis.org/claimskg/explorer/home

https://data.gesis.org/claimskg/observatory

Page 38: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Statistics per property Claim data coverage (as of October 2019)

38

Page 39: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Statistics per rating typeTruth values and claim exact matches

39

Page 40: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

What can we do with ClaimsKG - some examples- Advanced, entity-centric search, exploration and information discovery,

exploiting data from various sources via federated SPARQL queries- the top-3 journalists mentioned in claim reviews of 2018;- the number of claims per month mentioning Obamacare in 2016;- the top-5 persons mentioned in false claims together with “abortion”;- all false claims of 2017 by Donald Trump about climate change...

40

Page 41: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

What can we do with ClaimsKG - some examples- Advanced, entity-centric search, exploration and information discovery, by

exploiting data from various sources via federated SPARQL queries- the top-3 journalists mentioned in claim reviews of 2018.- the number of claims per month mentioning Obamacare in 2016.- the top-5 persons mentioned in false claims together with “abortion”.

- Support social science research and use-case studies on particular topics- Agenda-setting studies on the influence of mass media on the public’s focus of attention- Study the evolution of the political discourse about immigration over the last years

41

Page 42: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

What can we do with ClaimsKG - some examples- Advanced, entity-centric search, exploration and information discovery, by

exploiting data from various sources via federated SPARQL queries- Requesting the top-3 journalists mentioned in claim reviews of 2018.- Requesting the number of claims per month mentioning Donald Trump in 2016.- Requesting the top-5 persons mentioned in false claims together with “abortion”.

- Support social science research and use-case studies on particular topics- Agenda-setting studies on the influence of mass media on the public’s focus of attention- Study the evolution of the political discourse about immigration over the last years

- Support CS / AI research by providing readily balanced training datasets - Topic-wise filtering, customized set of attributes- Define classification tasks: [{true} vs. {false}], [{true, false} vs. {mixture}], etc.

42

Page 43: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Challenges / Ongoing Work

Enriching & curating ClaimsKG - Diversify fields/topics, sources and languages, discover missing links, etc.- Linking claims to web documents and to authors (when missing)

Extracting a topic structure from the ClaimsKG’s keywords; linking it to established vocabularies in political and social sciences

- Improve interoperability and claim discovery across fact-checking portals

Understand and analyse the biases of journalistic fact-checked data- Fact-checked claims widely used for automatic fact-checking algorithms and a variety of

use-cases in social sciences- What are the limitations of studies based on such data?

43

Page 44: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Challenges / Ongoing Work

Matching / clustering claims according to various criteria- Contexts, topics, sources, proposition, utterances,...

Identifying complex claim structures from certain fact-checking portals- Colin Kaepernick says Winston Churchill said, "A lie gets halfway around the world before the

truth has a chance to get its pants on.": the claim is true, while the subclaim is false.

Towards an extended and generalized claims model- What are the dimensions that define a claim, outside of the fact-checking context? - How do definitions from different fields generalize to a common understanding of that notion?- Go beyond facts, the current center of interest in traditional KGs

44

Page 45: ClaimsKG: A Knowledge Graph and a Model for Controversial ... · models are still strongly diverging across communities ... - An open source pipeline for crawling, extracting and

Thank you for listening.

https://data.gesis.org/claimskg/site/

45

We thank