automated detection of criminal offences in social media ... · / akademics ... solution: spell...

25
Automated detection of criminal offences in social media postings An AI-based case study focusing on 'Incitement to Hatred‘ (§ 130 StGB) in German law Prof. Dr.-Ing. Torsten Zesch Dr. Semire Yekta Sprachtechnologie Abteilung Informatik und Angew. Kognitionswissenschaft Fakultät Ingenieurwissenschaften Dr. iur. Frederike Zufall Law, Science, Technology and Society Research Group Free University Brussels (VUB)

Upload: others

Post on 01-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

Automated detection of criminal offences

in social media postings An AI-based case study focusing on 'Incitement to Hatred‘

(§ 130 StGB) in German law

Prof. Dr.-Ing. Torsten ZeschDr. Semire YektaSprachtechnologieAbteilung Informatik und Angew. KognitionswissenschaftFakultät Ingenieurwissenschaften

Dr. iur. Frederike ZufallLaw, Science, Technology and Society Research GroupFree University Brussels (VUB)

Page 2: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

Interdisciplinary Team

Language Technology / AI

Inputs:

▪ Written / Spoken / Handwriting / Pictures

▪ especially for data from social media

Outputs:

▪ Deep semantic analysis

▪ Sentiment

▪ Argumentation

NLP/AI Framework DKPro

(https://dkpro.github.io)

Legal expertise

▪ fully-qualified German lawyer

(Volljuristin)

▪ EU law, IT law

▪ computational law

▪ foundational research on data-driven law

▪ interdisciplinary background

▪ Law Science Technology & Society

Research Group (LSTS), Brussels

▪ Waseda University Institute for Advanced

Study (WIAS), Tokyo

Page 3: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech Expertise

Hate speech definitions

▪ B. Ross et.al, Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee

Crisis. In Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-

Mediated Communication (Michael Beißwenger, Michael Wojatzki, Torsten Zesch, eds.), 2016.

Implicitness

▪ Benikova, D., Wojatzki, M., & Zesch, T. (2017). What does this imply? Examining the Impact of

Implicitness on the Perception of Hate Speech. In GSCL 2017, Berlin, Germany.

Hate speech towards women

▪ Gold, D., Wojatzki, M., Horsmann, T., & Zesch, T. (2018) Do Women Perceive Hate Differently:

Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments. In KONVENS.

Hate speech detection systems

▪ Zhang, H., Wojatzki, M., Horsmann, T., & Zesch, T. (2019). ltl. uni-due at SemEval-2019 Task 5: Simple

but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter. In SemEval 2019.

▪ Aggarwal, P., Horsmann, T., Wojatzki, M., & Zesch, T. (2019). LTL-UDE at SemEval-2019 Task 6: BERT

and Two-Vote Classification for Categorizing Offensiveness. In SemEval 2019.

Legal perspective (basis of this talk)

▪ Zufall, F., Horsmann, T., & Zesch, T. (2019). From Legal to Technical Concept: Towards an Automated

Classification of German Political Twitter Postings as Criminal Offenses. In NAACL.

Page 4: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech – Scientific Definition

Social Media

Comments

yes

Hate speech?no

AI

Scientist

Page 5: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech – Scientific Definition

Merkel Vasallen sind bei Twitter unerwünscht!

Merkel minions are not wanted on Twitter!

Deutsche Medien, Halbwahrheiten und einseitige Betrachtung, wie

bei allen vom Staat finanzierten "billigen" Propagandainstitutionen

😜

German media, half-truths and one-sided consideration, as with all

"cheap" propaganda institutions financed by the state 😜

Source: GermEval 2018 Annotation categoy: abusive comments (highest category)

Page 6: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech – Facebook’s Definition

Social Media

Comments

yes

illegal?

not wanted? no

• probably overblocking

Page 7: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech – Facebook’s Definition

Social Media

Comments

yes

illegal?

not wanted? no

Page 8: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Hate Speech – Legal Definition

Social Media

Comments

no

• relatively few decided cases

illegal?

Page 9: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Deconstructing the Law

Page 10: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Deconstructing the Law

Is it a group?

Page 11: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Deconstructing the Law

• LGBTQ+

• Jews

• Refugees

• Muslims

• Politicians

• Disabled

• ...

Is it a group?

Set of target groups

Page 12: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Deconstructing the Law

Is it a group?

Set of target groups

Which target act?

Page 13: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Example

Target group?

Akademiker sind alles Lügner.

Academics are all liars.

Page 14: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Example

• incitement of hatred

• call for violence

• call for arbitrary measures

• assault human dignity by

• insult

• maliciously maligning

• defaming

Target group? Target act?

Akademiker sind alles Lügner.

Academics are all liars.

Page 15: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

More examples

Donald Trump is a liar

➔ no target group + no target act = legal

Muslims are nice

➔ target group + no target act = legal

Muslims are rapists

➔ target group + target act = illegal

Kill all liars

➔ no target group + target act = legal

Kill all muslims

➔ target group + target act = illegal

Page 16: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Challenges – Target Group

Akademiker sind alles Lügner. / Academics are all liars.

Spelling

▪ Akdemiker ... / Akademics ...

➔ solution: spell checking

Synonyms / Dynamic language use

▪ Akademtischks ... / Academchiks ...

➔ solution: contextualized distributional replacement vectors

Implicit language use

▪ Diese universitären Typen ... / Those college guys ...

➔ ongoing work

Page 17: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Challenges – Target Act

▪ Akademiker sind alles Lügner. / Academics are all liars.

▪ Akademiker sind Helden / Academics are heroes

▪Muslime sind Gläubige / Muslims are believers

▪Muslime sind Vergewaltiger / Muslims are rapists

▪ Flüchtlinge sind Schmarotzer / Refugees are scroungers

▪ ...

X is Y

Page 18: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Challenges – Target Act

Assault human dignity by defaming

▪ Akademiker sind alles Lügner.

▪ Alle Akademiker lügen.

▪ Akademiker sagen nie die Wahrheit. / Academics are all liars. / All academics are lying.

/ Academics never tell the truth.

Call for violence

▪ Man sollte alle Akademiker an die Wand nageln.

▪ Ertränken das Akademikerpack.

▪ Akademiker in die Tonne treten. / All academics should be nailed to the wall. / Drown

the academician pack. / Kick the academics into the bin.

Page 19: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Challenges – Target Act

Call for arbitrary measures

▪ Einsperren allesamt die Herren in ihren Talaren. / Lock up all the men in their gowns.

Implicitness

▪ Eigentlich waren das doch alles Akademiker in Köln am Hbf. / Actually, they were all

academics on Cologne Central Station.

▪ Akademikern sollte man jeden Tag einen Finger zuschicken. / Academics should be

sent a finger every day.

Page 20: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

General Challenge – Irony and Sarcasm

Akademiker sind alles solche Lügner 😜

Academics are all such liars😜

Wie können es diese Rapefugees wagen ohne

Berufsabschluss vor Krieg abzuhauen.

How dare those rapefugees to flee war without a professional

qualification.

Page 21: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Proof of Concept – Legal or not?

Page 22: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Possible Use Cases

AI / NLPTarget group

Target act

?

Ranking comments

Find similar

comment

???

Page 23: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Beyond §130 StGB

§ 130 para. 1, para. 2 StGB → incitement to hatred

▪ this presentation, paper submitted

§§ 185, 186, 187 StGB → defamatory conduct

▪ Zufall, F., Horsmann, T., & Zesch, T. (2019). From Legal to Technical Concept:

Towards an Automated Classification of German Political Twitter Postings as

Criminal Offenses. In NAACL 2019.

§ 130 para. 3, para. 4 StGB → incitement to hatred with Nazi background

▪ future work

Page 24: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Beyond German

Artificial Intelligence / Natural Language Processing

▪ relatively language independent

Legal situation

▪ Different in most countries

▪ But EU Framework quite similar to German law

Page 25: Automated detection of criminal offences in social media ... · / Akademics ... solution: spell checking Synonyms / Dynamic language use Akademtischks ... / Academchiks ... solution:

LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit

Summary

AI-based system

▪ State-of-the-art language technology → go beyond keyword search

▪ Operationalization of legal assessment in a working system

Limitations

▪ Not enough (annotated) comments for developing a truly robust system

▪ Implicitness still challenging (but manageable)

▪ Irony / Sarcasm really, really challenging

Thank You!