women in isis propaganda: a natural language processing
TRANSCRIPT
Women in ISIS Propaganda: A NaturalLanguage Processing Analysis of Topics andEmotions in a Comparison with Mainstream
Religious Group
Mojtaba Heidarysafa1, Kamran Kowsari1, Tolu Odukoya2, Philip Potter3,Laura E. Barnes1,3, and Donald E. Brown1,3
1 Department of Systems & Information Engineering, University of Virginia,Charlottesville, VA, USA
2 School of Data Science, University of Virginia, Charlottesville, VA, USA3 Department of Politics, University of Virginia, Charlottesville, VA, USA
∗ Co-corresponding authors: [email protected]
Abstract. Online propaganda is central to the recruitment strategies ofextremist groups and in recent years these efforts have increasingly ex-tended to women. To investigate ISIS’ approach to targeting women intheir online propaganda and uncover implications for counterterrorism,we rely on text mining and natural language processing (NLP). Specifi-cally, we extract articles published in Dabiq and Rumiyah (ISIS’s onlineEnglish language publications) to identify prominent topics. To iden-tify similarities or differences between these texts and those producedby non-violent religious groups, we extend the analysis to articles froma Catholic forum dedicated to women. We also perform an emotionalanalysis of both of these resources to better understand the emotionalcomponents of propaganda. We rely on Depechemood (a lexical-baseemotion analysis method) to detect emotions most likely to be evokedin readers of these materials. The findings indicate that the emotionalappeal of ISIS and Catholic materials are similar.
Keywords: ISIS propaganda, Topic modeling, Emotion detection, Nat-ural language processing
1 Introduction
Since its rise in 2013, the Islamic State of Iraq and Syria (ISIS) has utilized theInternet to spread its ideology, radicalize individuals, and recruit them to theircause. In comparison to other Islamic extremist groups, ISIS’ use of technologywas more sophisticated, voluminous, and targeted. For example, during ISIS’advance toward Mosul, ISIS related accounts tweeted some 40,000 tweets inone day [6].However, this heavy engagement forced social media platforms toinstitute policies to prevent unchecked dissemination of terrorist propaganda totheir users, forcing ISIS to adapt to other means to reach their target audience.
arX
iv:1
912.
0380
4v1
[cs
.CL
] 9
Dec
201
9
2 M. Heidarysafa et al.
One such approach was the publication of online magazines in different lan-guages including English. Although discontinued now, these online resources pro-vided a window into ISIS ideology, recruitment, and how they wanted the worldto perceive them. For example, after predominantly recruiting men, ISIS beganto also include articles in their magazines that specifically addressed women.ISIS encouraged women to join the group by either traveling to the caliphate orby carrying out domestic attacks on behalf of ISIS in their respective countries.This tactical change concerned both practitioners and researchers in the coun-terterrorism community. New advancements in data science can shed light onexactly how the targeting of women in extremist propaganda works and whetherit differs significantly from mainstream religious rhetoric.
We utilize natural language processing methods to answer three questions:
– What are the main topics in women-related articles in ISIS’ online maga-zines?
– What similarities and/or differences do these topics have with non-violent,non-Islamic religious material addressed specifically to women?
– What kind of emotions do these articles evoke in their readers and are theresimilarities in the emotions evoked from both ISIS and non-violent religiousmaterials?
As these questions suggest, to understand what, if anything, makes extremist ap-peals distinctive, we need a point of comparison in terms of the outreach effortsto women from a mainstream, non-violent religious group. For this purpose, werely on an online Catholic women’s forum. Comparison between Catholic mate-rial and the content of ISIS’ online magazines allows for novel insight into thedistinctiveness of extremist rhetoric when targeted towards the female popula-tion. To accomplish this task, we employ topic modeling and an unsupervisedemotion detection method.
The rest of the paper is organized as follows: in Section 2, we review relatedworks on ISIS propaganda and applications of natural language methods. Sec-tion 3 describes data collection and pre-processing. Section 4 describes in detailthe approach. Section 5 reports the results, and finally, Section 6 presents theconclusion.
2 Related Work
Soon after ISIS emerged and declared its caliphate, counterterrorism practi-tioners and political science researchers started to turn their attention towardsunderstanding how the group operated. Researchers investigated the origins ofISIS, its leadership, funding, and how they rose became a globally dominant non-state actor [12]. This interest in the organization’s distinctiveness immediatelyled to inquiries into ISIS’ rhetoric, particularly their use of social media andonline resources in recruitment and ideological dissemination. For example, Al-Tamimi examines how ISIS differentiated itself from other jihadist movementsby using social media with unprecedented efficiency to improve its image with
Women in ISIS Propaganda: A Natural Language Processing Analysis 3
locals [2]. One of ISIS’ most impressive applications of its online prowess wasin the recruitment process. The organization has used a variety of materials,especially videos, to recruit both foreign and local fighters. Research shows thatISIS propaganda is designed to portray the organization as a provider of justice,governance, and development in a fashion that resonates with young western-ers [7]. This propaganda machine has become a significant area of research, withscholars such as Winter identifying key themes in it such as brutality, mercy,victimhood, war, belonging and utopianism. [26]. However, there has been in-sufficient attention focused on how these approaches have particularly targetedand impacted women. This is significant given that scholars have identified thedistinctiveness of this population when it comes to nearly all facets of terrorism.
ISIS used different types of media to propagate its messages, such as videos,images, texts, and even music. Twitter was particularly effective and the ArabicTwitter app allowed ISIS to tweet extensively without triggering spam-detectionmechanisms the platform uses [6]. Scholars followed the resulting trove of dataand this became the preeminent way in which they assess ISIS messages. Forexample, in [4] they use both lexical analysis of tweets as well as social networkanalysis to examine ISIS support or opposition on Twitter. Other researchersused data mining techniques to detect pro-ISIS user divergence behavior at vari-ous points in time [18]. By looking at these works, the impact of using text min-ing and lexical analysis to address important questions becomes obvious. Properusage of these tools allows the research community to analyze big chunks of un-structured data. This approach, however, became less productive as the socialmedia networks began cracking down and ISIS recruiters moved off of them.
With their ability to operate freely on social media now curtailed, ISIS re-cruiters and propagandists increased their attentiveness to another longstandingtool–English language online magazines targeting western audiences. Al Hayat,the media wing of ISIS, published multiple online magazines in different lan-guages including English. The English online magazine of ISIS was named Dabiqand first appeared on the dark web on July 2014 and continued publishing for15 issues. This publication was followed by Rumiyah which produced 13 Englishlanguage issues through September 2017. The content of these magazines pro-vides a valuable but underutilized resource for understanding ISIS strategies andhow they appeal to recruits, specifically English-speaking audiences. They alsoprovide a way to compare ISIS’ approach with other radical groups. Ingram com-pared Dabiq contents with Inspire (Al Qaeda publication) and suggested that AlQaeda heavily emphasized identity-choice, while ISIS’ messages were more bal-anced between identity-choice and rational-choice [8]. In another research paper,Wignell et al. [25] compared Dabiq and Rumiah by examining their style andwhat both magazine messages emphasized. Despite the volume of research onthese magazines, only a few researchers used lexical analysis and mostly reliedon experts’ opinions. [23] is one exception to this approach where they used wordfrequency on 11 issues of Dabiq publications and compared attributes such asanger, anxiety, power, motive, etc.
4 M. Heidarysafa et al.
This paper seeks to establish how ISIS specifically tailored propaganda tar-geting western women, who became a particular target for the organization asthe “caliphate” expanded. Although the number of recruits is unknown, in 2015it was estimated that around 10 percent of all western recruits were female [15].Some researchers have attempted to understand how ISIS propaganda targetswomen. Kneip, for example, analyzed women’s desire to join as a form of eman-cipation [9]. We extend that line of inquiry by leveraging technology to answerkey outstanding questions about the targeting of women in ISIS propaganda.
To further assess how ISIS propaganda might affect women, we used emotiondetection methods on these texts. Emotion detection techniques are mostly di-vided into lexicon-base or machine learning-base methods. Lexicon-base methodsrely on several lexicons while machine learning (ML) methods use algorithm todetect the elation of texts as inputs and emotions as the target, usually trained ona large corpus. Unsupervised methods usually use Non-negative matrix factoriza-tion (NMF) and Latent Semantic Analysis (LSA) [5] approaches. An importantdistinction that should be made when using text for emotion detection is thatemotion detected in the text and the emotion evoked in the reader of that textmight differ. In the case of propaganda, it is more desirable to detect possibleemotions that will be evoked in a hypothetical reader. In the next section, wedescribe methods to analyze content and technique to find evoked emotions ina potential reader using available natural language processing tools.
3 Data Collection & Pre-Processing
3.1 Data collection
Finding useful collections of texts where ISIS targets women is a challenging task.Most of the available material are not reflecting ISIS’ official point of view orthey do not talk specifically about women. However, ISIS’ online magazines arevaluable resources for understanding how the organization attempts to appealto western audiences, particularly women. Looking through both Dabiq andRumiyah, many issues of the magazines contain articles specifically addressingwomen, usually with “ to our sisters ” incorporated into the title. Seven out offifteen Dabiq issues and all thirteen issues of Rumiyah contain articles targetingwomen, clearly suggesting an increase in attention to women over time.
We converted all the ISIS magazines to texts using pdf readers and all ar-ticles that addressed women in both magazines (20 articles) were selected forour analysis. To facilitate comparison with a mainstream, non-violent religiousgroup, we collected articles from catholicwomensforum.org, an online resourcecatering to Catholic women. We scrapped 132 articles from this domain. Whilethis number is large, the articles themselves are much shorter than those pub-lished by ISIS. These texts were pre-processed by tokenizing the sentences andeliminating non-word tokens and punctuation marks. Also, all words turned intolower case and numbers and English stop words such as “our, is, did, can, etc.” have been removed from the produced tokens. For the emotion analysis part,we used a spacy library as part of speech tagging to identify the exact role of
Women in ISIS Propaganda: A Natural Language Processing Analysis 5
words in the sentence. A word and its role have been used to look for emotionalvalues of that word in the same role in the sentence.
3.2 Pre-Processing
Text Cleaning and Pre-processing Most text and document datasets con-tain many unnecessary words such as stopwords, misspelling, slang, etc. In manyalgorithms, especially statistical and probabilistic learning algorithms, noise andunnecessary features can have adverse effects on system performance. In thissection, we briefly explain some techniques and methods for text cleaning andpre-processing text datasets [10].
Tokenization Tokenization is a pre-processing method which breaks a streamof text into words, phrases, symbols, or other meaningful elements called to-kens [24]. The main goal of this step is to investigate the words in a sentence [24].Both text classification and text mining requires a parser which processes thetokenization of the documents; for example:sentence [1] :
After sleeping for four hours, he decided to sleep for another four.In this case, the tokens are as follows:
{“After” “sleeping” “for” “four” “hours” “he” “decided” “to” “sleep” “for”“another” “four”}.
Stop words Text and document classification includes many words which donot hold important significance to be used in classification algorithms such as{“a”, “about”, “above”, “across”, “after”, “afterwards”, “again”,. . .}. The mostcommon technique to deal with these words is to remove them from the textsand documents [19].
Term Frequency-Inverse Document Frequency K Sparck Jones [20] pro-posed inverse document frequency (IDF) as a method to be used in conjunctionwith term frequency in order to lessen the effect of implicitly common words inthe corpus. IDF assigns a higher weight to words with either high frequency orlow frequency term in the document. This combination of TF and IDF is wellknown as term frequency-inverse document frequency (tf-idf). The mathemat-ical representation of the weight of a term in a document by tf-idf is given inEquation 1.
W (d, t) = TF (d, t) ∗ log( N
df(t) ) (1)
Here N is the number of documents and df(t) is the number of documentscontaining the term t in the corpus. The first term in Equation 1 improves therecall while the second term improves the precision of the word embedding [22].Although tf-idf tries to overcome the problem of common terms in the document,
6 M. Heidarysafa et al.
it still suffers from some other descriptive limitations. Namely, tf-idf cannotaccount for the similarity between the words in the document since each wordis independently presented as an index. However, with the development of morecomplex models in recent years, new methods, such as word embedding, havebeen presented that can incorporate concepts such as similarity of words andpart of speech tagging.
4 Method
In this section, we describe our methods used for comparing topics and evokedemotions in both ISIS and non-violent religious materials.
4.1 Content Analysis
The key task in comparing ISIS material with that of a non-violent group involvesanalyzing the content of these two corpora to identify the topics. For our analysis,we considered a simple uni-gram model where each word is considered as a singleunit. Understanding what words appear most frequently provides a simple metricfor comparison. To do so we normalized the count of words with the numberof words in each corpora to account for the size of each corpus. It should benoted, however, that a drawback of word frequencies is that there might be somedominant words that will overcome all the other contents without conveyingmuch information.
Topic modeling methods are the more powerful technique for understandingthe contents of a corpus. These methods try to discover abstract topics in acorpus and reveal hidden semantic structures in a collection of documents. Themost popular topic modeling methods use probabilistic approaches such as prob-abilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA).LDA is a generalization of pLSA where documents are considered as a mixtureof topics and the distribution of topics is governed by a Dirichlet prior (α). Fig-ure 1 shows plate notation of general LDA structure where β represents prior ofword distribution per topic and θ refers to topics distribution for documents [3].Since LDA is among the most widely utilized algorithms for topic modeling, weapplied it to our data. However, the coherence of the topics produced by LDAis poorer than expected.
To address this lack of coherence, we applied non-negative matrix factoriza-tion (NMF). This method decomposes the term-document matrix into two non-negative matrices as shown in Figure 2. The resulting non-negative matrices aresuch that their product closely approximate the original data. Mathematicallyspeaking, given an input matrix of document-terms V , NMF finds two matricesby solving the following equation [13]:
minW,H‖V −WH‖F s.t H ≥ 0,W ≥ 0.
Where W is topic-word matrix and H represents topic-document matrix.
Women in ISIS Propaganda: A Natural Language Processing Analysis 7
Fig. 1: Plate notation of LDA model
NMF appears to provide more coherent topic on specific corpora. O’Callaghanet al. compared LDA with NMF and concluded that NMF performs better incorporas with specific and non-mainstream areas [14]. Our findings align withthis assessment and thus our comparison of topics is based on NMF.
4.2 Emotion detection
Propaganda effectiveness hinges on the emotions that it elicits. But detectingemotion in text requires that two essential challenges are overcome.
First, emotions are generally complex and emotional representation mod-els are correspondingly contested. Despite this, some models proposed by psy-chologists have gained wide-spread usage that extends to text-emotion analysis.Robert Plutchik presented a model that arranged emotions from basic to com-plex in a circumplex as shown in Figure 3. The model categorizes emotions into8 main subsets and with addition of intensity and interactions it will classifyemotions into 24 classes [17]. Other models have been developed to capture allemotions by defining a 3-dimensional model of pleasure, arousal, and dominance.
Fig. 2: NMF decomposition of document-term matrix [11]
8 M. Heidarysafa et al.
The second challenge lies in using text for detecting emotion evoked in apotential reader. Common approaches use either lexicon-base methods (such askeyword-based or ontology-based model) or machine learning-base models (usu-ally using large corpus with labeled emotions) [5]. These methods are suited toaddressing the emotion that exist in the text, but in the case of propaganda weare more interested in emotions that are elicited in the reader of such materi-als. The closest analogy to this problem can be found in research that seek tomodel feelings of people after reading a news article. One solution for this typeof problem is to use an approach called Depechemood.
Depechemood is a lexicon-based emotion detection method gathered fromcrowd-annotated news [21]. Drawing on approximately 23.5K documents withaverage of 500 words per document from rappler.com, researchers asked sub-jects to report their emotions after reading each article. They then multipliedthe document-emotion matrix and word-document matrix to derive emotion-word matrix for these words. Due to limitations of their experiment setup, theemotion categories that they present does not exactly match the emotions fromthe Plutchik wheel categories. However, they still provide a good sense of thegeneral feeling of an individual after reading an article. The emotion categories ofDepechemood are: AFRAID, AMUSED, ANGRY, ANNOYED, DON’T CARE,HAPPY, INSPIRED, SAD. Depechemood simply creates dictionaries of wordswhere each word has scores between 0 and 1 for all of these 8 emotion categories.We present our finding using this approach in the result section.
Fig. 3: 2D representation of Plutchik wheel of emotions [16]
Women in ISIS Propaganda: A Natural Language Processing Analysis 9
Table 1: NMF Topics of women in ISIS
earl
yis
lam
wom
enIs
lam
/kh
ilafa
hm
arri
age
isla
mic
pray
ing
wom
en’s
life
hijr
ahis
lam
icdi
vorc
em
othe
rhoo
dsp
ousa
lre
lati
onsh
ipT
opic
0T
opic
1T
opic
2T
opic
3T
opic
4T
opic
5T
opic
6T
opic
7T
opic
8T
opic
91
said
wa
husban
dmasjid
worldly
islamic
wala
mou
rning
said
spou
ses
2kh
adija
hhijrah
wom
anprayer
spen
dstate
said
idda
hibn
said
3mun
afiqin
salla
msharah
said
regards
khila
fah
bara
widow
jihad
husban
d4
iman
alay
hisaid
masajid
wearin
gab
usake
home
repo
rted
backbitin
g5
abu
salla
llhu
wives
home
mat
brothe
rsen
mity
husban
dchild
ren
listening
6ibn
khilfah
wife
going
men
struation
arriv
edka
bibn
charity
wife
7du
nya
abmarry
wom
anprop
het
knew
salam
said
child
abu
8mothe
rsisters
ibn
ibn
life
mujah
idin
remaine
dpe
rfum
eab
udivorce
9pe
ople
radiyallh
umarrie
dprop
het
clothing
previous
prayer
wom
anwealth
word
10asma
ibn
jihd
hadith
small
soon
shariah
away
flock
prob
lems
11hu
sban
dqu
rndu
nyrepo
rted
lived
child
ren
muslim
snigh
tmuslim
instead
12ba
krislamic
say
prevent
time
later
affectio
nwear
cause
backbite
13be
lievers
praise
perm
itted
men
family
inform
edislam
widow
salbu
khari
relatio
nship
14he
arts
state
lawful
default
aishah
hijrah
anger
house
prop
het
tong
ue15
kille
dsaid
proh
ibite
dmuslim
garm
ent
shariah
aqidah
sleep
policy
wom
an16
know
slavegirl
lord
leav
ing
despite
dua
good
marrie
dshep
herd
person
17steadfast
land
islam
abu
follo
wjourne
yrelig
ion
clothing
soul
brothe
r18
sumay
yah
firm
fear
stay
concern
returned
return
abwaging
home
19sisters
peop
leman
wives
living
umm
state
pregna
ntlord
hind
20spread
ing
land
ssister
shariah
food
leave
relativ
eshu
sban
dsmothe
rna
rrated
10 M. Heidarysafa et al.
Table 2: NMF Topics of women in catholic forum
fem
inis
mla
wge
nder
iden
tity
mar
rage
/di
vorc
ech
urch
mot
herh
ood
birt
hco
ntro
llif
ese
xual
ity
pare
ntin
g
Top
ic0
Top
ic1
Top
ic2
Top
ic3
Top
ic4
Top
ic5
Top
ic6
Top
ic7
Top
ic8
Top
ic9
1wom
enab
ortio
ngend
ermarria
gechurch
mom
human
aehu
man
sex
parents
2ab
ortio
ncourt
lgbt
divo
rce
catholic
god
pill
social
sexu
alscho
ol3
pro
constit
utiona
lidentit
yfamily
bishop
smary
vitae
ecology
men
child
ren
4men
circuit
tran
sgen
der
child
ren
pope
chris
tcontraception
work
regn
erus
scho
ols
5life
fede
ral
ideology
marita
lsyno
dmothe
rcontrol
revo
lutio
ncheap
fede
ralist
6feminist
decision
tran
smarria
ges
priests
jesus
birth
peop
leconsent
education
7march
abortio
nssex
divorced
fran
cis
like
wom
enfamily
wom
enstud
ents
8feminists
state
sexu
alspou
ses
wom
enba
bycontraceptives
moral
desire
tran
sgen
der
9wom
anka
nsas
male
annu
lment
rome
love
health
political
porn
child
10feminism
suprem
ereality
families
letter
mothe
rhoo
deff
ects
politics
weinstein
read
ing
11equa
lity
law
female
spou
sechris
tchild
man
date
man
market
kids
12female
med
icaid
activ
ists
marrie
dmccarric
kworld
fertility
dign
itymarria
geyouth
13choice
roe
catholic
aban
done
dvatic
anlife
iud
person
mating
girl
14male
constit
ution
biological
love
bishop
charlie
catholic
ecological
power
family
15movem
ent
judg
edy
spho
riacatholics
holy
child
ren
contraceptive
society
pornograph
ygend
er16
vulnerab
lecase
person
church
god
light
sang
erna
ture
flyconfused
17sex
plan
ned
agen
dafid
elity
faith
ful
son
med
ical
world
good
continue
18toda
ypa
rentho
odpe
ople
olde
rfaith
little
prevent
liberty
like
boy
19tim
erig
htorientation
situations
priestho
odgot
sexu
alself
risk
public
20hu
man
government
repo
rthu
sban
dau
thority
day
free
middle
kind
news
Women in ISIS Propaganda: A Natural Language Processing Analysis 11
Fig. 4: Word frequency of most common words in catholic corpora
Fig. 5: Word frequency of most common words in dabiq corpora
5 Results
In this section, we present the results of our analysis based on the contents of ISISpropaganda materials as compared to articles from the Catholic women forum.We then present the results of emotion analysis conducted on both corpora.
5.1 Content Analysis
After pre-processing the text, both corpora were analyzed for word frequencies.These word frequencies have been normalized by the number of words in eachcorpus. Figure 5 shows the most common words in each of these corpora.
A comparison of common words suggests that those related to marital rela-tionships ( husband, wife, etc.) appear in both corpora, but the religious themeof ISIS material appears to be stronger. A stronger comparison can be made
12 M. Heidarysafa et al.
using topic modeling techniques to discover main topics of these documents. Al-though we used LDA, our results by using NMF outperform LDA topics, dueto the nature of these corpora. Also, fewer numbers of ISIS documents mightcontribute to the comparatively worse performance. Therefore, we present onlyNMF results. Based on their coherence, we selected 10 topics for analyzing withinboth corporas. Table 1 and Table 2 show the most important words in each topicwith a general label that we assigned to the topic manually. Based on the NMFoutput, ISIS articles that address women include topics mainly about Islam,women’s role in early Islam, hijrah (moving to another land), spousal relations,marriage, and motherhood.
The topics generated from the Catholic women forum are clearly quite differ-ent. Some, however, exist in both contexts. More specifically, marriage/divorce,motherhood, and to some extent spousal relations appeared in both generatedtopics. This suggests that when addressing women in a religious context, thesemay be very broadly effective and appeal to the feminine audience. More impor-tantly, suitable topic modeling methods will be able to identify these similaritiesno matter the size of the corpus we are working with. Although, finding the sim-ilarities/differences between topics in these two groups of articles might providesome new insights, we turn to emotional analysis to also compare the emotionsevoked in the audience.
5.2 Emotion Analysis
We rely on Depechemood dictionaries to analyze emotions in both corpora. Thesedictionaries are freely available and come in multiple arrangements. We used aversion that includes words with their part of speech (POS) tags. Only words thatexist in the Depechemood dictionary with the same POS tag are considered forour analysis. We aggregated the score for each word and normalized each articleby emotions. To better compare the result, we added a baseline of 100 randomarticles from a Reuters news dataset as a non-religious general resource whichis available in an NLTK python library. Figure 6 shows the aggregated score fordifferent feelings in our corpora.
Both Catholic and ISIS related materials score the highest in “inspired” cat-egory. Furthermore, in both cases, being afraid has the lowest score. However,this is not the case for random news material such as the Reuters corpus, whichare not that inspiring and, according to this method, seems to cause more fearin their audience. We investigate these results further by looking at the mostinspiring words detected in these two corpora. Table 3 presents 10 words thatare among the most inspiring in both corpora. The comparison of the two listsindicate that the method picks very different words in each corpus to reach tothe same conclusion. Also, we looked at separate articles in each of the issues ofISIS material addressing women. Figure 7 shows emotion scores in each of the20 issues of ISIS propaganda. As demonstrated, in every separate article, thismethod gives the highest score to evoking inspirations in the reader. Also, inmost of these issues the method scored “being afraid” as the lowest score in eachissue.
Women in ISIS Propaganda: A Natural Language Processing Analysis 13
Fig. 6: Comparison of emotions of our both corpora along with Reuters news
6 Conclusion and Future Work
In this paper, we have applied natural language processing methods to ISISpropaganda materials in an attempt to understand these materials using avail-able technologies. We also compared these texts with a non-violent religiousgroups’ (both focusing on women related articles) to examine possible similari-ties or differences in their approaches. To compare the contents, we used wordfrequency and topic modeling with NMF. Also, our results showed that NMFoutperforms LDA due to the niche domain and relatively small number of doc-uments.
The results suggest that certain topics play a particularly important roles inISIS propaganda targeting women. These relate to the role of women in earlyIslam, Islamic ideology, marriage/divorce, motherhood, spousal relationships,and hijrah (moving to a new land).
Comparing these topics with those that appeared on a Catholic women fo-rum, it seems that both ISIS and non-violent groups use topics about moth-erhood, spousal relationship, and marriage/divorce when they address women.Moreover, we used Depechemood methods to analyze the emotions that thesematerials are likely to elicit in readers. The result of our emotion analysis sug-gests that both corpuses used words that aim to inspire readers while avoidingfear. However, the actual words that lead to these effects are very different inthe two contexts. Overall, our findings indicate that, using proper methods, au-tomated analysis of large bodies of textual data can provide novel insight insightinto extremist propaganda that can assist the counterterrorism community.
14 M. Heidarysafa et al.
Fig. 7: Feeling detected in ISIS magazines (first 7 issues belong to Dabiq and last13 belong to Rumiyah)
Table 3: Words with highest inspiring scores
Words
GroupCatholic ISIS
W ord1 avarice uprightnessW ord2 perceptive memorizationW ord3 educationally mercifulW ord4 stereotypically afflictionW ord5 distrustful gentlenessW ord6 reverence masjidW ord7 unbounded verilyW ord8 antichrist sublimityW ord9 loneliness recompenseW ord10 feelings fierceness
References
1. Aggarwal, C.C.: Machine learning for text. Springer (2018)2. Al-Tamimi, A.J.: The dawn of the islamic state of iraq and ash-sham. Current
Trends in Islamist Ideology 16, 5 (2014)3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine
Learning research 3(Jan), 993–1022 (2003)4. Bodine-Baron, E., Helmus, T.C., Magnuson, M., Winkelman, Z.: Examining isis
support and opposition networks on twitter. Tech. rep., RAND Corporation SantaMonica United States (2016)
5. Canales, L., Martínez-Barco, P.: Emotion detection from text: A survey. In: Pro-ceedings of the Workshop on Natural Language Processing in the 5th InformationSystems Research Working Days (JISIC), pp. 37–43 (2014)
6. Farwell, J.P.: The media strategy of isis. Survival 56(6), 49–55 (2014)
Women in ISIS Propaganda: A Natural Language Processing Analysis 15
7. Gates, S., Podder, S.: Social media, recruitment, allegiance and the islamic state.Perspectives on Terrorism 9(4) (2015)
8. Ingram, H.J.: An analysis of inspire and dabiq: Lessons from aqap and islamicstate’s propaganda war. Studies in Conflict & Terrorism 40(5), 357–375 (2017)
9. Kneip, K.: Female jihad–women in the isis. politikon 29 (2016)10. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown,
D.: Text classification algorithms: A survey. Information 10(4), 150 (2019)11. Kuang, D., Brantingham, P.J., Bertozzi, A.L.: Crime topic modeling. Crime Sci-
ence 6(1), 12 (2017)12. Laub, Z., Masters, J.: Islamic state in iraq and greater syria. The Council on
Foreign Relations. June 12 (2014)13. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix fac-
torization. Nature 401(6755), 788 (1999)14. O’callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coher-
ence of descriptors in topic modeling. Expert Systems with Applications 42(13),5645–5657 (2015)
15. Perešin, A.: Fatal attraction: Western muslimas and isis. Perspectives on Terrorism9(3) (2015)
16. Plutchik, R.: A general psychoevolutionary theory of emotion. In: Theories ofemotion, pp. 3–33. Elsevier (1980)
17. Plutchik, R.: The nature of emotions: Human emotions have deep evolutionaryroots, a fact that may explain their complexity and provide tools for clinical prac-tice. American scientist 89(4), 344–350 (2001)
18. Rowe, M., Saif, H.: Mining pro-isis radicalisation signals from social media users.In: Tenth International AAAI Conference on Web and Social Media (2016)
19. Saif, H., Fernández, M., He, Y., Alani, H.: On stopwords, filtering and data sparsityfor sentiment analysis of twitter (2014)
20. Sparck Jones, K.: A statistical interpretation of term specificity and its applicationin retrieval. Journal of documentation 28(1), 11–21 (1972)
21. Staiano, J., Guerini, M.: Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv preprint arXiv:1405.1605 (2014)
22. Tokunaga, T., Makoto, I.: Text categorization based on weighted inverse documentfrequency. In: Special Interest Groups and Information Process Society of Japan(SIG-IPSJ. Citeseer (1994)
23. Vergani, M., Bliuc, A.M.: The evolution of the isis’language: a quantitative anal-ysis of the language of the first year of dabiq magazine. Sicurezza, Terrorismo eSocietà= Security, Terrorism and Society 2(2), 7–20 (2015)
24. Verma, T., Renu, R., Gaur, D.: Tokenization and filtering process in rapidminer.International Journal of Applied Information Systems 7(2), 16–18 (2014)
25. Wignell, P., Tan, S., O’Halloran, K., Lange, R.: A mixed methods empirical ex-amination of changes in emphasis and style in the extremist magazines dabiq andrumiyah. Perspectives on Terrorism 11(2), 2–20 (2017)
26. Winter, C.: The Virtual’Caliphate’: Understanding Islamic State’s PropagandaStrategy, vol. 25. Quilliam London (2015)