detecting cyberbullying with morphosemantic...
TRANSCRIPT
Detecting Cyberbullying with Morphosemantic Patterns
Michal Ptaszynski1, Fumito Masui1, Yoko Nakajima2, Yasutomo Kimura3, Rafal Rzepka4 and Kenji Araki4
1. Kitami Institute of Technology2. Kushiro National College of Technology
3. Otaru University of Commerce4. Hokkaido University
Kitami Institute of Technology
1. Cyberbullying as social problem
2. Previous research
3. Proposed method
4. Experiments
5. Future work
Outline
Cyberbullying
- Slandering and humiliating people on the Internet.
- Recently noticed social problem.
Introduction
HELP by ICT
INTERNET PATROL• Internet monitoring by PTA.• Request site admin to
remove harmful entries.• High cost of time
and fatigue for net-patrol members.
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Previous Research
2009 2010 2011 2012 2013 2014 2015
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Category Relevance Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Affect analysis of cyberbullying data
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.
Affect analysis of cyberbullying data
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.
Previous Research
2009 2010 2011 2012 2013 2014 2015
SO-PMI-IR / phrases
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Category Relevance Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
SO-PMI-IR / phrases
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010.
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Category Relevance Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Previous Research
2009 2010 2011 2012 2013 2014 2015
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Previous Research
2009 2010 2011 2012 2013 2014 2015 2016
Affect analysis of cyberbullying data
SO-PMI-IR / phrases
SVM / optimization
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, K. Araki, and Y. Momouchi. 2010. In the Service of Online Order: Tackling Cyber-Bullying with Machine Learning and Affect Analysis. International Journal of Computational Linguistics Res
earch, Vol. 1, Issue 3, pp. 135-154, 2010.
T. Matsuba, F. Masui, A. Kawai, N. Isu. 2011. Study on the polarity classification model for the purpose of detecting harmful information on informal school sites (in Japanese), In Proceedings of NLP2011, pp. 388-391.
Michal Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka and K. Araki. 2010. Machine Learning and Affect Analysis Against Cyber-Bullying. In Proceedings of AISB’10, 29th March – 1st April 2010. Category Relevance
Optimization
T. Nitta, F. Masui, M. Ptaszynski, Y. Kimura, R. Rzepka, K. Araki. 2013. Detecting Cyberbullying Entries on Informal School Websites Based on Category Relevance Maximi
zation. In Proceedings of IJCNLP 2013, pp. 579-586.
Patent No. 2013-245813. Inventors: FumitoMasui, Michal Ptaszynski, Nitta Taisei.
Patent name: An Apparatus and Method for Detection of Harmful Entries on Internet
2013PATENT
Language Combinatorics/ Preprocessing
M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Extracting Patterns of Harmful Expressions for Cyberbullying Detection, 7th Language & Technology Conference (LTC'15), 2015.11.27-29.
Language Combinatorics
Michal Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, K. Araki. 2015. Brute Force Works Best Against Bullying, IJCAI 2015 Workshop on
Intelligent Personalization (IP 2015), Buenos Aires, 2015.07.25-31
Automatic acquisition of harmful words
S. Hatakeyama, F. Masui, M. Ptaszynski, K. Yamamoto. 2015. Improving Performance of Cyberbullying Detection Method with Double Filtered Point-wise Mutual Information. ACM Symposium on Cloud Computing 2015 (SoCC'15), August 2015.
Featuresophistication
simple→ →sophisticatedsemanticpat.
syntacticpat.wordpatterns
phrasesbag-of-words
words
Proposed Method
Morphology Semantics =
Previously used for:• analysis of Indonesian suffix in Wordnet [*1]
• analysis of Croatian lexis [*2]
[*1] Christiane Fellbaum, Anne Osherson, and Peter E. Clark. 2009. Putting semantics into Word-Net’s “morphosemantic” links. [*2] Ida Raffaelli. 2013. The model of morphosemantic patterns in the description of lexical architecture.
Morphosemantics
Morpho-semantics+
noun,verb,adjective,etc.
actor,action,object,patient,etc.
effectiveforlanguageswithstronglyrelated
morphologyandsemantics(e.g.Japanese)
Morphology Semantics =
Previously used for:• analysis of Indonesian suffix in Wordnet [*1]
• analysis of Croatian lexis [*2]
[*1] Christiane Fellbaum, Anne Osherson, and Peter E. Clark. 2009. Putting semantics into Word-Net’s “morphosemantic” links. [*2] Ida Raffaelli. 2013. The model of morphosemantic patterns in the description of lexical architecture.
Morphosemantics
Morpho-semantics+
noun,verb,adjective,etc.
actor,action,object,patient,etc.
effectiveforlanguageswithstronglyrelated
morphologyandsemantics(e.g.Japanese)
Morphological analysis
“John killed Mary.”
“noun verb(past) noun”
MeCabStandard tool morphologyfor Japanese [*3]
[*3] http://mecab.sourceforge.net
Semantic role labelling
“actor action patient”
ASA(Argument Structure Analyzer)
Thesaurus based of predicate argument structure analyzer
for Japanese [*4]
“John killed Mary.”
[*4] http://cl.it.okayama-u.ac.jp/study/project/asa/asa-scala
Example of morphosemantic structure (MS)
Japanese : ニホンウナギが絶滅危惧種に指定され、完全養殖によるウナギの量産に期待が高まっている。
Transcription: Nihonunagi ga zetsumetsu kigushu ni shitei sare, kanzen yoshoku ni yoru unagi no ryousan ni kitai gatakamatte iru.
English : As Japanese eel has been specified as an endangered species, the expectations grow towards mass production of eel in full aquaculture.
MS : [Object] [Agent] [State change] [Action] [Noun]
[State change] [Object] [State change]
Pattern Extraction
Pattern Extraction
Sentence patterns = ordered non-repeated combinations of sentence elements.
for 1 ≤ k ≤ n , there is all possible k-long patterns, and
Extract patterns from all sentences and calculate occurrence.
Michal Ptaszynski, R. Rzepka, K. Araki and Y. Momouchi. 2011. Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion. Int. J. of Computational Linguistics (IJCL), Vol. 2, Issue 1, pp. 24-36.
SPEC – Sentence Pattern Extraction arChitecture
Pattern Extraction
Example: What a nice day !
5-element pattern: What a nice day ! (1)
4-el. patterns: 3-el. patterns: 2-el. patterns: 1-el. patterns: What a nice * ! a nice * ! What a WhatWhat a nice day What a nice What * ! aWhat a * day ! What a * ! nice * ! nice
(5) (10) (10) (5). . .
. . .
. . .
. . .
Pattern Extraction
Sentence patterns = ordered non-repeated combinations of sentence elements.
for 1 ≤ k ≤ n , there is all possible k-long patterns, and
Normalized pattern weight
Score for one sentence
Michal Ptaszynski, R. Rzepka, K. Araki and Y. Momouchi. 2011. Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion. Int. J. of Computational Linguistics (IJCL), Vol. 2, Issue 1, pp. 24-36.
SPEC – Sentence Pattern Extraction arChitecture
Classify new input
with pattern
list
Dataset•Actual data collected by Internet Patrol (annotated by experts)
•From unofficial school forums (BBS)•Provided by Human Right Center in Japan (Mie Prefecture)
•According to the Definition by Japanese Ministry of Education (MEXT)
•1,490 harmful and 1,508 non-harmful entries
Experiment setup
Pattern List Modification1. All patterns2. Zero-patterns deleted3. Ambiguous patterns deleted
10-fold Cross Validation
All patterns vs. only n-grams
Weight Calculation Modifications1. Normalized2. Award length3. Award length and occurrence
Automatic threshold optmization
One experiment = 420 runs
1. MorphologyPreprocessing 2. Semantics
3. Morphosemantics
Results
OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc
F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65
BEP 0.610.670.64
Results
Best F-score:
OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc
F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65
BEP 0.610.670.64
Similar for all
Results
Best F-score:
OptimizedforPOSSemanticrolesMorphosemantics↓ Pr ReF1Acc Pr ReF1Acc Pr Re F1Acc
F-score0.530.950.680.550.630.74 0.680.670.61 0.760.680.64Precision0.930.030.060.510.930.060.110.540.850.100.180.55Accuracy 0.58 0.780.660.610.800.490.61 0.690.620.720.670.65
BEP 0.610.670.64
Best Precision:
BestAccuracy:
1. Only semantics2. Morphosemantics3. POS
BestBEP:
Similar for all
Results
Statisticalsignificance
Results
Statisticalsignificance
Difference with POS always significant
Results
Statisticalsignificance
Difference with POS always significant
Difference between Only semantics and Morphosemantics
almost neversignificant
Results
Statisticalsignificance
Difference with POS always significant
Difference between Only semantics and Morphosemantics
almost neversignificant
Semanticsalone– usuallymoreeffectivethanfullMorphosemantic structure
Useslessinformation– alsomoreefficient
ButadvantagetoMorphosemantics couldbeacoincidence– needmoredata,furtherexperiments
ResultsComparison with state-of-the-art
Results
Proposed method:• More efficient
(user does almost nothing)• Applicable to other languages• Can point out non-harmful elements
Comparison with state-of-the-art
Conclusions
• Presented research on cyberbullying detection.• Proposed novel method.
• Automatic extraction of sophisticated morphosemantic patterns.• Used patterns in classification of cyberbullying.• Tested on actual data obtained by Internet patrol.• Outperformed previous methods.• Requires minimal human effort.
Future work
• Apply different preprocessing and classifiers for further improvement.
• Test on new data • Obtain new data by applying in practice.• Verify the actual amount of CB information on the Internet and
reevaluate in more realistic conditions.