![Page 1: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/1.jpg)
DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis
Muhammad Abdul-Mageed1,2, Hassan AlHuzliy1, Duaa’ Abu Elhija1, Mona Diab2
Indiana University1, The George Washington University2
![Page 2: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/2.jpg)
2
Emotions
• Categories of emotion: – Ekman (e.g., 1992) proposes there are 6 basic
emotions: anger, disgust, fear, happiness, sadness, and surprise
– Plutchik (1980, 1985, 1994) adds trust and anticipation • Emotion on 3 dimensions:– e.g., Francisco and Gervas (2006) mark the attributes
of pleasantness, activation, and dominance in the genre of fairy tales.
– DINA is focused on the Ekman emotions.
![Page 3: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/3.jpg)
3
Motivations• Opinion Mining:– Provides an enriching component beyond the mere binary
valence (i.e. positive and negative) of most sentiment analysis systems.
• Health & Wellness– Early detection of certain emotional disorders such as depression. – Improving the well-being of people by exposing them to desired
emotions (since emotion is contagious [Kramer et al., 2014]).• Education:– Integrating emotionally-aware agents in intelligent
computer-assisted language learning, for example, should prove useful and enhance the naturalness of the pedagogical experience.
![Page 4: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/4.jpg)
4
Motivations Cont.• Marketing:– e.g., emotion-sensitive language generation can help with
marketing (Heath et al., 2001; Tan et al., 2014), political campaigning, etc.
• Security:– Deflect potential hazards and anticipate dangerous
behaviors • Author Profiling:– Useful for predicting age and gender (Meina et al., 2013;
Flekova and Gurevych, 2013; Farias et al., 2013; Bamman et al., 2014; Forner et al., 2013) and personality (Mohammad and Kiritchenko, 2013)
![Page 5: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/5.jpg)
5
Related Work• SemEval-2007 Affective Text task (Strapparava and
Mihalcea, 2007) [SEM07]: – Collection and classification of emotion and
valence in news headlines• Aman and Szpakowicz (2007):– Annotation and detection of emotions from blogs
• Qadir and Riloff (2014), Mohammad (2012), Wang et al. (2012):– use hashtags as an approximation of emotion
categories to collect emotion data
![Page 6: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/6.jpg)
6
Arabic: Motivations
• Morphologically Rich Language– Highly inflected: person, number, gender, case,
mood, aspect, voice• Strategic Language:– One of the 6 languages of UN, with ~ 300M
speakers worldwide• Exponential Web growth:– More than 2000% growth rate on the Web in 2010
onwards (www.internetworldstats.com).
![Page 7: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/7.jpg)
7
Arabic Dialects
![Page 8: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/8.jpg)
8
Data Collection
• Crawled Twitter data using a seed set of size < 10 phrases for each of the six Ekman emotion types.
• Each phrase is composed of an emotion word (e.g., “happy”) and the first personal pronoun “I”.
• We collect only tweets where a seed phrase occurs in the tweet body text.
• This approach does not depend on hashtags.• We collect 500 tweets from each of the 6 emotion
types. Total = 3,000.• Seeds capture various Arabic dialects.
![Page 9: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/9.jpg)
9
Seeds
Table 1. Example seeds
![Page 10: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/10.jpg)
10
Annotation
• To verify the utility of this seeds approach, two college-educated native speakers of Arabic labeled the data.
• For labeling, we use one of four tags from the set {“no-emotion/zero”, “weak-emotion”, “moderate/fair-emotion”, “strong-emotion”}.
• We measure inter-annotator agreement as to these intensity labels in Cohen’s Kappa.
• We also calculate the % of emotion-carrying tweets per category (those that did not end up assigned the label “no-emotion/zero”).
![Page 11: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/11.jpg)
11
DINA: Agreement & % Emotion
Table 3. Agreement in fine-grained annotation and average percentage of emotion
![Page 12: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/12.jpg)
12
Gold Labels from Happiness Class
Table 2. Agreement in happiness annotation
![Page 13: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/13.jpg)
13
Examples: Anger
![Page 14: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/14.jpg)
14
Examples: Disgust
![Page 15: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/15.jpg)
15
Examples: Fear
![Page 16: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/16.jpg)
16
Examples: Happiness
![Page 17: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/17.jpg)
17
Examples: Sadness
![Page 18: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/18.jpg)
18
Examples: Surprise
![Page 19: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/19.jpg)
19
Context of No- and Mixed Emotions
• Even with a list of well-crafted seeds, both annotators assign “no-emotion” for 7.5% of the data.
• This is a function of emotion being a pragmatics-level phenomenon.
• Contexts for “no-emotion” include:– Reported speech– Sarcasm
![Page 20: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/20.jpg)
20
Reported Speech
![Page 21: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/21.jpg)
21
Sarcasm
![Page 22: P05- DINA: A Multi-Dialect Dataset for Arabic Emotion Analysis](https://reader036.vdocuments.site/reader036/viewer/2022062306/58a3af411a28ab9e6a8b6571/html5/thumbnails/22.jpg)
22
Conclusion
• Emotion is like other pragmatic-level phenomena; hence a seed-collection approach is useful, but not perfect.
• Phenomena like reported speech and sarcasm interact with our method for emotion data collection.
• DINA is multidialectal, but we do not have exact dialect labels on the tweets.
• DINA is at 3,000 tweets, and we plan to grow the size.• Full evaluation of DINA is only possible when we build
models exploiting these data, which we plan to do.