detecting cyberbullying with morphosemantic...

37
Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski 1 , Fumito Masui 1 , Yoko Nakajima 2 , Yasutomo Kimura 3 , Rafal Rzepka 4 and Kenji Araki 4 1. Kitami Institute of Technology 2. Kushiro National College of Technology 3. Otaru University of Commerce 4. Hokkaido University Kitami Institute of Technology

Upload: others

Post on 22-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Detecting Cyberbullying with Morphosemantic Patterns

Michal Ptaszynski1, Fumito Masui1, Yoko Nakajima2, Yasutomo Kimura3, Rafal Rzepka4 and Kenji Araki4

1. Kitami Institute of Technology2. Kushiro National College of Technology

3. Otaru University of Commerce4. Hokkaido University

Kitami Institute of Technology

Page 2: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

1. Cyberbullying as social problem

2. Previous research

3. Proposed method

4. Experiments

5. Future work

Outline

Page 3: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Cyberbullying

- Slandering and humiliating people on the Internet.

- Recently noticed social problem.

Introduction

HELP by ICT

INTERNET PATROL• Internet monitoring by PTA.• Request site admin to

remove harmful entries.• High cost of time

and fatigue for net-patrol members.

Page 4: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Page 5: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Category Relevance Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Affect analysis of cyberbullying data

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.

Page 6: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Affect analysis of cyberbullying data

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.

Previous Research

2009 2010 2011 2012 2013 2014 2015

SO-PMI-IR / phrases

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Category Relevance Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

Page 7: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

SO-PMI-IR / phrases

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Page 8: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Category Relevance Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Page 9: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Page 10: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Page 11: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Page 12: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Page 13: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Previous Research

2009 2010 2011 2012 2013 2014 2015 2016

Affect analysis of cyberbullying data

SO-PMI-IR / phrases

SVM / optimization

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res

earch, Vol. 1, Issue 3, pp. 135-154, 2010.

T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.

Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance

Optimization

T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi

zation. In Proceedings of IJCNLP 2013, pp. 579-586.

Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.

Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet

2013PATENT

Language Combinatorics/ Preprocessing

M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.

Language Combinatorics

Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on

Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31

Automatic acquisition of harmful words

S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.

Featuresophistication

simple→ →sophisticatedsemanticpat.

syntacticpat.wordpatterns

phrasesbag-of-words

words

Page 14: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Proposed Method

Page 15: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Morphology Semantics =

Previously used for:• analysis of Indonesian suffix in Wordnet [*1]

• analysis of Croatian lexis [*2]

[*1] Christiane Fellbaum, Anne Osherson, and Peter E. Clark. 2009. Putting semantics into Word-Net’s “morphosemantic” links. [*2] Ida Raffaelli. 2013. The model of morphosemantic patterns in the description of lexical architecture.

Morphosemantics

Morpho-semantics+

noun,verb,adjective,etc.

actor,action,object,patient,etc.

effectiveforlanguageswithstronglyrelated

morphologyandsemantics(e.g.Japanese)

Page 16: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Morphology Semantics =

Previously used for:• analysis of Indonesian suffix in Wordnet [*1]

• analysis of Croatian lexis [*2]

[*1] Christiane Fellbaum, Anne Osherson, and Peter E. Clark. 2009. Putting semantics into Word-Net’s “morphosemantic” links. [*2] Ida Raffaelli. 2013. The model of morphosemantic patterns in the description of lexical architecture.

Morphosemantics

Morpho-semantics+

noun,verb,adjective,etc.

actor,action,object,patient,etc.

effectiveforlanguageswithstronglyrelated

morphologyandsemantics(e.g.Japanese)

Page 17: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Morphological analysis

“John killed Mary.”

“noun verb(past) noun”

MeCabStandard tool morphologyfor Japanese [*3]

[*3] http://mecab.sourceforge.net

Page 18: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Semantic role labelling

“actor action patient”

ASA(Argument Structure Analyzer)

Thesaurus based of predicate argument structure analyzer

for Japanese [*4]

“John killed Mary.”

[*4] http://cl.it.okayama-u.ac.jp/study/project/asa/asa-scala

Page 19: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Example of morphosemantic structure (MS)

Japanese : ニホンウナギが絶滅危惧種に指定され、完全養殖によるウナギの量産に期待が高まっている。

Transcription: Nihonunagi ga zetsumetsu kigushu ni shitei sare, kanzen yoshoku ni yoru unagi no ryousan ni kitai gatakamatte iru.

English : As Japanese eel has been specified as an endangered species, the expectations grow towards mass production of eel in full aquaculture.

MS : [Object] [Agent] [State change] [Action] [Noun]

[State change] [Object] [State change]

Page 20: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Pattern Extraction

Page 21: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Pattern Extraction

Sentence patterns = ordered non-repeated combinations of sentence elements.

for 1 ≤ k ≤ n , there is all possible k-long patterns, and

Extract patterns from all sentences and calculate occurrence.

Michal Ptaszynski, R. Rzepka, K. Araki and Y. Momouchi. 2011. Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion. Int. J. of Computational Linguistics (IJCL), Vol. 2, Issue 1, pp. 24-36.

SPEC – Sentence Pattern Extraction arChitecture

Page 22: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Pattern Extraction

Example: What a nice day !

5-element pattern: What a nice day ! (1)

4-el. patterns: 3-el. patterns: 2-el. patterns: 1-el. patterns: What a nice * ! a nice * ! What a WhatWhat a nice day What a nice What * ! aWhat a * day ! What a * ! nice * ! nice

(5) (10) (10) (5). . .

. . .

. . .

. . .

Page 23: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Pattern Extraction

Sentence patterns = ordered non-repeated combinations of sentence elements.

for 1 ≤ k ≤ n , there is all possible k-long patterns, and

Normalized pattern weight

Score for one sentence

Michal Ptaszynski, R. Rzepka, K. Araki and Y. Momouchi. 2011. Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion. Int. J. of Computational Linguistics (IJCL), Vol. 2, Issue 1, pp. 24-36.

SPEC – Sentence Pattern Extraction arChitecture

Classify new input

with pattern

list

Page 24: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Dataset•Actual data collected by Internet Patrol (annotated by experts)

•From unofficial school forums (BBS)•Provided by Human Right Center in Japan (Mie Prefecture)

•According to the Definition by Japanese Ministry of Education (MEXT)

•1,490 harmful and 1,508 non-harmful entries

Page 25: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Experiment setup

Pattern List Modification1. All patterns2. Zero-patterns deleted3. Ambiguous patterns deleted

10-fold Cross Validation

All patterns vs. only n-grams

Weight Calculation Modifications1. Normalized2. Award length3. Award length and occurrence

Automatic threshold optmization

One experiment = 420 runs

1. MorphologyPreprocessing 2. Semantics

3. Morphosemantics

Page 26: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc

F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65

BEP 0.610.670.64

Page 27: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Best F-score:

OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc

F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65

BEP 0.610.670.64

Similar for all

Page 28: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Best F-score:

OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc

F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65

BEP 0.610.670.64

Best Precision:

BestAccuracy:

1. Only semantics2. Morphosemantics3. POS

BestBEP:

Similar for all

Page 29: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Statisticalsignificance

Page 30: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Statisticalsignificance

Difference with POS always significant

Page 31: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Statisticalsignificance

Difference with POS always significant

Difference between Only semantics and Morphosemantics

almost neversignificant

Page 32: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Statisticalsignificance

Difference with POS always significant

Difference between Only semantics and Morphosemantics

almost neversignificant

Semanticsalone– usuallymoreeffectivethanfullMorphosemantic structure

Useslessinformation– alsomoreefficient

ButadvantagetoMorphosemantics couldbeacoincidence– needmoredata,furtherexperiments

Page 33: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

ResultsComparison with state-of-the-art

Page 34: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Results

Proposed method:• More efficient

(user does almost nothing)• Applicable to other languages• Can point out non-harmful elements

Comparison with state-of-the-art

Page 35: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Conclusions

• Presented research on cyberbullying detection.• Proposed novel method.

• Automatic extraction of sophisticated morphosemantic patterns.• Used patterns in classification of cyberbullying.• Tested on actual data obtained by Internet patrol.• Outperformed previous methods.• Requires minimal human effort.

Page 36: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Future work

• Apply different preprocessing and classifiers for further improvement.

• Test on new data • Obtain new data by applying in practice.• Verify the actual amount of CB information on the Internet and

reevaluate in more realistic conditions.

Page 37: Detecting Cyberbullying with Morphosemantic …arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/SCIS...Detecting Cyberbullying with Morphosemantic Patterns Michal Ptaszynski1, Fumito

Thank you for your kind attention!

Michal [email protected]

Kitami Institute of Technology