automated detection of criminal offences in social media ... · / akademics ... solution: spell...
TRANSCRIPT
Automated detection of criminal offences
in social media postings An AI-based case study focusing on 'Incitement to Hatred‘
(§ 130 StGB) in German law
Prof. Dr.-Ing. Torsten ZeschDr. Semire YektaSprachtechnologieAbteilung Informatik und Angew. KognitionswissenschaftFakultät Ingenieurwissenschaften
Dr. iur. Frederike ZufallLaw, Science, Technology and Society Research GroupFree University Brussels (VUB)
Interdisciplinary Team
Language Technology / AI
Inputs:
▪ Written / Spoken / Handwriting / Pictures
▪ especially for data from social media
Outputs:
▪ Deep semantic analysis
▪ Sentiment
▪ Argumentation
NLP/AI Framework DKPro
(https://dkpro.github.io)
Legal expertise
▪ fully-qualified German lawyer
(Volljuristin)
▪ EU law, IT law
▪ computational law
▪ foundational research on data-driven law
▪ interdisciplinary background
▪ Law Science Technology & Society
Research Group (LSTS), Brussels
▪ Waseda University Institute for Advanced
Study (WIAS), Tokyo
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech Expertise
Hate speech definitions
▪ B. Ross et.al, Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee
Crisis. In Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-
Mediated Communication (Michael Beißwenger, Michael Wojatzki, Torsten Zesch, eds.), 2016.
Implicitness
▪ Benikova, D., Wojatzki, M., & Zesch, T. (2017). What does this imply? Examining the Impact of
Implicitness on the Perception of Hate Speech. In GSCL 2017, Berlin, Germany.
Hate speech towards women
▪ Gold, D., Wojatzki, M., Horsmann, T., & Zesch, T. (2018) Do Women Perceive Hate Differently:
Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments. In KONVENS.
Hate speech detection systems
▪ Zhang, H., Wojatzki, M., Horsmann, T., & Zesch, T. (2019). ltl. uni-due at SemEval-2019 Task 5: Simple
but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter. In SemEval 2019.
▪ Aggarwal, P., Horsmann, T., Wojatzki, M., & Zesch, T. (2019). LTL-UDE at SemEval-2019 Task 6: BERT
and Two-Vote Classification for Categorizing Offensiveness. In SemEval 2019.
Legal perspective (basis of this talk)
▪ Zufall, F., Horsmann, T., & Zesch, T. (2019). From Legal to Technical Concept: Towards an Automated
Classification of German Political Twitter Postings as Criminal Offenses. In NAACL.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech – Scientific Definition
Social Media
Comments
yes
Hate speech?no
AI
Scientist
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech – Scientific Definition
Merkel Vasallen sind bei Twitter unerwünscht!
Merkel minions are not wanted on Twitter!
Deutsche Medien, Halbwahrheiten und einseitige Betrachtung, wie
bei allen vom Staat finanzierten "billigen" Propagandainstitutionen
😜
German media, half-truths and one-sided consideration, as with all
"cheap" propaganda institutions financed by the state 😜
Source: GermEval 2018 Annotation categoy: abusive comments (highest category)
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech – Facebook’s Definition
Social Media
Comments
yes
illegal?
not wanted? no
• probably overblocking
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech – Facebook’s Definition
Social Media
Comments
yes
illegal?
not wanted? no
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Hate Speech – Legal Definition
Social Media
Comments
no
• relatively few decided cases
illegal?
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Deconstructing the Law
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Deconstructing the Law
Is it a group?
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Deconstructing the Law
• LGBTQ+
• Jews
• Refugees
• Muslims
• Politicians
• Disabled
• ...
Is it a group?
Set of target groups
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Deconstructing the Law
Is it a group?
Set of target groups
Which target act?
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Example
Target group?
Akademiker sind alles Lügner.
Academics are all liars.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Example
• incitement of hatred
• call for violence
• call for arbitrary measures
• assault human dignity by
• insult
• maliciously maligning
• defaming
Target group? Target act?
Akademiker sind alles Lügner.
Academics are all liars.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
More examples
Donald Trump is a liar
➔ no target group + no target act = legal
Muslims are nice
➔ target group + no target act = legal
Muslims are rapists
➔ target group + target act = illegal
Kill all liars
➔ no target group + target act = legal
Kill all muslims
➔ target group + target act = illegal
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Challenges – Target Group
Akademiker sind alles Lügner. / Academics are all liars.
Spelling
▪ Akdemiker ... / Akademics ...
➔ solution: spell checking
Synonyms / Dynamic language use
▪ Akademtischks ... / Academchiks ...
➔ solution: contextualized distributional replacement vectors
Implicit language use
▪ Diese universitären Typen ... / Those college guys ...
➔ ongoing work
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Challenges – Target Act
▪ Akademiker sind alles Lügner. / Academics are all liars.
▪ Akademiker sind Helden / Academics are heroes
▪Muslime sind Gläubige / Muslims are believers
▪Muslime sind Vergewaltiger / Muslims are rapists
▪ Flüchtlinge sind Schmarotzer / Refugees are scroungers
▪ ...
X is Y
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Challenges – Target Act
Assault human dignity by defaming
▪ Akademiker sind alles Lügner.
▪ Alle Akademiker lügen.
▪ Akademiker sagen nie die Wahrheit. / Academics are all liars. / All academics are lying.
/ Academics never tell the truth.
Call for violence
▪ Man sollte alle Akademiker an die Wand nageln.
▪ Ertränken das Akademikerpack.
▪ Akademiker in die Tonne treten. / All academics should be nailed to the wall. / Drown
the academician pack. / Kick the academics into the bin.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Challenges – Target Act
Call for arbitrary measures
▪ Einsperren allesamt die Herren in ihren Talaren. / Lock up all the men in their gowns.
Implicitness
▪ Eigentlich waren das doch alles Akademiker in Köln am Hbf. / Actually, they were all
academics on Cologne Central Station.
▪ Akademikern sollte man jeden Tag einen Finger zuschicken. / Academics should be
sent a finger every day.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
General Challenge – Irony and Sarcasm
Akademiker sind alles solche Lügner 😜
Academics are all such liars😜
Wie können es diese Rapefugees wagen ohne
Berufsabschluss vor Krieg abzuhauen.
How dare those rapefugees to flee war without a professional
qualification.
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Proof of Concept – Legal or not?
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Possible Use Cases
AI / NLPTarget group
Target act
?
Ranking comments
Find similar
comment
???
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Beyond §130 StGB
§ 130 para. 1, para. 2 StGB → incitement to hatred
▪ this presentation, paper submitted
§§ 185, 186, 187 StGB → defamatory conduct
▪ Zufall, F., Horsmann, T., & Zesch, T. (2019). From Legal to Technical Concept:
Towards an Automated Classification of German Political Twitter Postings as
Criminal Offenses. In NAACL 2019.
§ 130 para. 3, para. 4 StGB → incitement to hatred with Nazi background
▪ future work
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Beyond German
Artificial Intelligence / Natural Language Processing
▪ relatively language independent
Legal situation
▪ Different in most countries
▪ But EU Framework quite similar to German law
LTLab | Europäischer Polizeikongress - Forum: 2I Künstliche Intelligenz in der Polizeiarbeit
Summary
AI-based system
▪ State-of-the-art language technology → go beyond keyword search
▪ Operationalization of legal assessment in a working system
Limitations
▪ Not enough (annotated) comments for developing a truly robust system
▪ Implicitness still challenging (but manageable)
▪ Irony / Sarcasm really, really challenging
Thank You!