deterrent: knowledge guided graph attention network for...

11
DETERRENT: Knowledge Guided Graph Aention Network for Detecting Healthcare Misinformation Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon Lee The Pennsylvania State University, PA, USA {lzc334,hxs378,mfg5544,fenglong,szw494,dongwon}@psu.edu ABSTRACT To provide accurate and explainable misinformation detection, it is often useful to take an auxiliary source (e.g., social context and knowledge base) into consideration. Existing methods use social contexts such as users’ engagements as complementary informa- tion to improve detection performance and derive explanations. However, due to the lack of sufficient professional knowledge, users seldom respond to healthcare information, which makes these methods less applicable. In this work, to address these short- comings, we propose a novel knowledge guided graph attention network for detecting health misinformation better. Our proposal, named as DETERRENT, leverages on the additional information from medical knowledge graph by propagating information along with the network, incorporates a Medical Knowledge Graph and an Article-Entity Bipartite Graph, and propagates the node embeddings through Knowledge Paths. In addition, an attention mechanism is applied to calculate the importance of entities to each article, and the knowledge guided article embeddings are used for mis- information detection. DETERRENT addresses the limitation on social contexts in the healthcare domain and is capable of providing useful explanations for the results of detection. Empirical valida- tion using two real-world datasets demonstrated the effectiveness of DETERRENT. Comparing with the best results of eight com- peting methods, in terms of F1 Score, DETERRENT outperforms all methods by at least 4.78% on the diabetes dataset and 12.79% on cancer dataset. We release the source code of DETERRENT at: https://github.com/cuilimeng/DETERRENT. KEYWORDS Healthcare misinformation, fake news, graph neural network, medical knowledge graph ACM Reference Format: Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon Lee. 2020. DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining USB Stick (KDD ’20), August 23–27, 2020, Virtual Event, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3394486.3403092 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. KDD ’20, August 23–27, 2020, Virtual Event, USA © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00 https://doi.org/10.1145/3394486.3403092 1 INTRODUCTION Lower body mass index (BMI) is consistently associated with reduced type II diabetes risk, among people with varied family history, genetic risk factors and weight, according to a new study. (BMI, Diagnoses, Diabetes) Besides chemicals, cancer loves sugar. A study at the University of Melbourne, Australia discovered a strong correlation between sugary sodrinks and 11 dierent kinds of cancer, including pancreatic, liver, kidney, and colorectal. (Sugar, CreatesRiskFor, Nonalcoholic Fatty Liver Disease) (Liver Diseases, CreatesRiskFor, Liver Cancer) Herbal supplement found to be more eective at managing diabetes than metformin drug. (Herbal Supplement, DoesNotHeal, Diabetes) Triples from KG (Family History, Causes, Diabetes) (Metformin, Heals, Diabetes) (Weight Gain, CreatesRiskFor, Diabetes) Fact Misinformation Misinformation Figure 1: Healthcare article examples and related triples from a medical knowledge graph (KG). The triples can ei- ther enhance or weaken the augments in the articles. The popularity of online social networks has promoted the growth of various applications and information, which also en- ables users to browse and publish such information more freely. In the healthcare domain, patients often browse the Internet looking for information about illnesses and symptoms. For example, nearly 65% of Internet users use the Internet to search for related topics in healthcare [25]. However, the quality of online healthcare infor- mation is questionable. Many studies [12, 32] have confirmed the existence and the spread of healthcare misinformation. For example, a study of three health social networking websites found that 54% of posts contained medical claims that are inaccurate or incomplete [38]. Healthcare misinformation has detrimental societal effects. First, community’s trust and support for public health agencies is under- mined by misinformation, which could hinder public health control. For example, the rapid spread of misinformation is undermining trust in vaccines crucial to public health 1 . Second, health rumors that circulate on social media could directly threaten public health. During the 2014 Ebola outbreak, the World Health Organization (WHO) noted that some misinformation on social media about cer- tain products that could prevent or cure the Ebola virus disease has led to deaths 2 . Thus, detecting healthcare misinformation is critically important. Though misinformation detection in other domains such as poli- tics and gossips have been extensively studied [1, 26, 29], health- care misinformation detection has its unique properties and chal- lenges. First, as non-health professionals can easily rely on given health information, it is difficult for them to discern information correctly, especially when the misinformation was intentionally 1 https://www.nature.com/articles/d41586-018-07034-4 2 https://www.who.int/mediacentre/news/ebola/15-august-2014/en/

Upload: others

Post on 07-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

DETERRENT: Knowledge Guided Graph Attention Network forDetecting Healthcare Misinformation

Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon LeeThe Pennsylvania State University, PA, USA

{lzc334,hxs378,mfg5544,fenglong,szw494,dongwon}@psu.edu

ABSTRACTTo provide accurate and explainable misinformation detection, itis often useful to take an auxiliary source (e.g., social context andknowledge base) into consideration. Existing methods use socialcontexts such as users’ engagements as complementary informa-tion to improve detection performance and derive explanations.However, due to the lack of sufficient professional knowledge,users seldom respond to healthcare information, which makesthese methods less applicable. In this work, to address these short-comings, we propose a novel knowledge guided graph attentionnetwork for detecting health misinformation better. Our proposal,named as DETERRENT, leverages on the additional informationfrom medical knowledge graph by propagating information alongwith the network, incorporates a Medical Knowledge Graph and anArticle-Entity Bipartite Graph, and propagates the node embeddingsthrough Knowledge Paths. In addition, an attention mechanismis applied to calculate the importance of entities to each article,and the knowledge guided article embeddings are used for mis-information detection. DETERRENT addresses the limitation onsocial contexts in the healthcare domain and is capable of providinguseful explanations for the results of detection. Empirical valida-tion using two real-world datasets demonstrated the effectivenessof DETERRENT. Comparing with the best results of eight com-peting methods, in terms of F1 Score, DETERRENT outperformsall methods by at least 4.78% on the diabetes dataset and 12.79%on cancer dataset. We release the source code of DETERRENT at:https://github.com/cuilimeng/DETERRENT.

KEYWORDSHealthcare misinformation, fake news, graph neural network,

medical knowledge graph

ACM Reference Format:Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang,and Dongwon Lee. 2020. DETERRENT: Knowledge Guided Graph AttentionNetwork for Detecting Healthcare Misinformation. In Proceedings of the26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining USBStick (KDD ’20), August 23–27, 2020, Virtual Event, USA. ACM, New York,NY, USA, 11 pages. https://doi.org/10.1145/3394486.3403092

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’20, August 23–27, 2020, Virtual Event, USA© 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00https://doi.org/10.1145/3394486.3403092

1 INTRODUCTION

Lower body mass index (BMI) is consistently associated with reduced type II diabetes risk, among people with varied family history, genetic risk factors and weight, according to a new study.

(BMI, Diagnoses, Diabetes)

Besides chemicals, cancer loves sugar. A study at the University of Melbourne, Australia discovered a strong correlation between sugary soft drinks and 11 different kinds of cancer, including pancreatic, liver, kidney, and colorectal.

(Sugar, CreatesRiskFor, Nonalcoholic Fatty Liver Disease) (Liver Diseases, CreatesRiskFor, Liver Cancer)

Herbal supplement found to be more effective at managing diabetes than metformin drug.

(Herbal Supplement, DoesNotHeal, Diabetes)

Triples from KG

(Family History, Causes, Diabetes)

(Metformin, Heals, Diabetes)

(Weight Gain, CreatesRiskFor, Diabetes)

Fact

Misinformation

Misinformation

Figure 1: Healthcare article examples and related triplesfrom a medical knowledge graph (KG). The triples can ei-ther enhance or weaken the augments in the articles.

The popularity of online social networks has promoted thegrowth of various applications and information, which also en-ables users to browse and publish such information more freely. Inthe healthcare domain, patients often browse the Internet lookingfor information about illnesses and symptoms. For example, nearly65% of Internet users use the Internet to search for related topicsin healthcare [25]. However, the quality of online healthcare infor-mation is questionable. Many studies [12, 32] have confirmed theexistence and the spread of healthcare misinformation. For example,a study of three health social networking websites found that 54%of posts contained medical claims that are inaccurate or incomplete[38].

Healthcare misinformation has detrimental societal effects. First,community’s trust and support for public health agencies is under-mined by misinformation, which could hinder public health control.For example, the rapid spread of misinformation is underminingtrust in vaccines crucial to public health1. Second, health rumorsthat circulate on social media could directly threaten public health.During the 2014 Ebola outbreak, the World Health Organization(WHO) noted that some misinformation on social media about cer-tain products that could prevent or cure the Ebola virus diseasehas led to deaths2. Thus, detecting healthcare misinformation iscritically important.

Though misinformation detection in other domains such as poli-tics and gossips have been extensively studied [1, 26, 29], health-care misinformation detection has its unique properties and chal-lenges. First, as non-health professionals can easily rely on givenhealth information, it is difficult for them to discern informationcorrectly, especially when the misinformation was intentionally

1https://www.nature.com/articles/d41586-018-07034-42https://www.who.int/mediacentre/news/ebola/15-august-2014/en/

Page 2: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

made to target such people. Existing misinformation detectionfor domains such as politics and gossips usually adopt social con-texts such as user comments to provide auxiliary information fordetection[8, 13, 16, 36, 39]. However, in healthcare domain, socialcontext information is not always available and may not be usefulbecause of users without professional knowledge seldom respondto healthcare information and cannot give accurate comments. Sec-ond, despite good performance of existing misinformation detectionmethods [42], the majority of them cannot explain why a piece ofinformation is classified as misinformation. Without proper expla-nation, users who have no health expertise might not be able toaccept the result of the detection. To convince them, it is necessaryto offer an understandable explanation why certain information isunreliable. Therefore, we need some auxiliary information that can(1) help detect healthcare misinformation; and (2) provide easy tounderstand professional knowledge for an explanation.

Medical knowledge graph, which is constructed from research pa-pers and reports can be used as an effective auxiliary for healthcaremisinformation detection, to find the inherent relations betweenentities in texts to improve detection performance and provide ex-planations. In particular, we take the article-entity bipartite graphand medical knowledge graph as complementary information, intoconsideration to facilitate a detection model (See Figure 1). First,article contents contain linguistic features that could be used to ver-ify the truthfulness of an article. Misinformation (including hoaxes,rumors and fake news) is intentionally written to mislead readersby using exaggeration and sensationalization verbally.

For example, we can infer from a medical knowledge graph that𝑆𝑢𝑔𝑎𝑟 is not directly linked to 𝐿𝑖𝑣𝑒𝑟 𝐶𝑎𝑛𝑐𝑒𝑟 , however, the misinfor-mation indicates that there is a “strong correlation” between thetwo entities. Second, the relation triples from a medical knowledgegraph can add/remove the credibility of certain information, andprovide explanations to the detection results. For example, in Fig-ure 1, we can see that the triple (𝐵𝑀𝐼 , 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑒𝑠 , 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠) andtwo more triples can directly verify that the article is real, whilethe triple (𝐻𝑒𝑟𝑏𝑎𝑙 𝑆𝑢𝑝𝑝𝑙𝑒𝑚𝑒𝑛𝑡 , 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐻𝑒𝑎𝑙 , 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠) can provethat the saying in an article is wrong. Above all, it is beneficial toexplore the medical graph for healthcare misinformation detection.And to our best knowledge, there is no prior attempt to detecthealthcare misinformation by exploiting the knowledge graph.

Therefore in this paper, we study a novel problem of explainablehealthcare misinformation detection by leveraging the medicalknowledge graph. Modeling the medical knowledge graph withhealthcare articles is a non-trivial task. On the one hand, healthcareinformation/texts and medical knowledge graph cannot be directlyintegrated, as they have different data structures. On the otherhand, social network analysis techniques are not applied to themedical knowledge graph. For example, recommendation systemswould recommend movies to users who watched a similar set ofmovies. However, in the healthcare domain, two medications arenot necessarily related even if they can heal the same disease.

To address the above two issues, we propose a knowledge guidedgraph attention network that can better capture the crucial entitiesin news articles and guide the article embedding.We incorporate theArticle-Entity Bipartite Graph and a Medical Knowledge Graph intoa unified relational graph and compute node embeddings alongthe graph. We use the Node-level Attention and BPR loss [30] to

tackle the positive and negative relations in the graph. The maincontributions of the paper include:

• We study a novel problem of explainable healthcare misinforma-tion detection by leveraging medical knowledge graph to bettercapture the high-order relations between entities;• We propose a novel method DETERRENT (knowleDgE guidedgraph aTtention nEtwoRks foR hEalthcare misiNformationdeTection), which characterizes multiple positive and negative re-lations in the medical knowledge graph under a relational graphattention network; and• We manually build two healthcare misinformation datasets ondiabetes and cancer. Extensive experiments have demonstratedthe effectiveness of DETERRENT. The reported results show thatDETERRENT achieves a relative improvement of 1.05%, 4.78% onDiabetes dataset and 6.30%, 12.79% on Cancer dataset comparingwith the best results in terms of Accuracy and F1 Score. The casestudy shows the interpretability of DETERRENT.

2 RELATEDWORKIn this section, we briefly review two related topics: misinformationdetection and graph neural networks.

Misinformation Detection. Misinformation detection methodsgenerally focus on using article contents and external sources. Ar-ticle contents contain linguistic clues and visual factors that candifferentiate the fake and real information. Linguistic features basedmethods check the consistency between the headlines and contents[4], or capture specific writing styles and sensational headlines thatcommonly occur in fake content [28]. Visual-based features canwork with linguistic features to to identify fake images [42], andhelp to detect misinformation collectively [9, 13].

For external sources based approaches, the features are mainlycontext-based. Context-based features represent the informationof users’ engagements from online social media. Users’ responsesin terms of credibility [31], viewpoints [36] and emotional signals[9] are beneficial to detect misinformation. The diffusion networkconstructed from users’ posts can evaluate the differences in thespread of truth and falsity [41]. However, users’ engagements arenot always available when a news article is just released, or userslack professional knowledge of relevant fields such as medicine.Knowledge graph (KG) can address the disadvantages of currentmethods relying on social context and derive explanations to thedetection results. Some researchers use knowledge graph basedmethods to decide and explain whether a (Subject, Predicate, Object)triple is fake or not [7, 15, 17]. These methods use the score functionto measure the relevance of the vector embedding of subject andvector embedding of object with the embedding representation ofpredicate. For example, KG-Miner exploits frequent predicate pathsbetween a pair of entities [35]. Other researchers use news streamsto update the knowledge graph [37].

Hence in this paper, we study the novel problem of knowledgeguided misinformation detection, aiming to improve misinforma-tion detection performance in healthcare, and provide a possibleinterpretation on the result of detection simultaneously.

GraphNeural Networks. GraphNeural Networks (GNNs) refer tothe neural network models that can be applied to graph-structured

Page 3: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

data. Several extensions to GNN [33] have been proposed to en-hance the capability and efficiency [43]. GCN [20] attempts to learnnode embeddings in a semi-supervised fashion using a differentneighborhood aggregation method. GAT [40] extends GNN by in-corporating the attention mechanism. R-GCN [34] is also a variantof GCN which is suitable for relational data. RGAT [6] takes ad-vantage of both the attention mechanism and R-GCN and attemptsto extend the attention mechanism to the area of relational graph-structured data. Signed Networks are variants of GNNs applicableto the signed graphs with negative and positive edges [10, 22, 23].

However, existing methods are not suitable for modeling thepositive and negative relations in the medical knowledge graph, asmentioned in the introduction. In this work, we model the medicalknowledge graph under a relational graph attention network, anduse BPR loss to capture positive and negative relations.

3 PROBLEM FORMULATIONIn this section, we describe the notations and formulate medi-cal knowledge graph guided misinformation detection problem.The medical knowledge graph describes the entities collectedfrom the medical literature, as well as positive/negative rela-tions (e.g., 𝐻𝑒𝑎𝑙𝑠/𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐻𝑒𝑎𝑙) among entities. For example,(𝐶𝑎𝑙𝑐𝑖𝑢𝑚 𝐶ℎ𝑙𝑜𝑟𝑖𝑑𝑒 , 𝐻𝑒𝑎𝑙𝑠 , 𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎) contains a positive re-lationship, but (𝐴𝑐𝑡𝑜𝑛𝑒𝑙 , 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐻𝑒𝑎𝑙 , 𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎) has a nega-tive relationship.

Definition 1. Medical Knowledge Graph: Let G𝑚 = {E,R,R−,T ,T−} be a knowledge graph, where E, R, R−, T and T−are the entity set, positive relation set, negative relation set, positivesubject-relation-object triple set and negative triple set, respectively.The positive triples are presented as {(𝑒𝑖 , 𝑟 , 𝑒 𝑗 ) |𝑒𝑖 , 𝑒 𝑗 ∈ E, 𝑟 ∈ R},which describes a relationship 𝑟 from the head node 𝑒𝑖 to the tail node𝑒 𝑗 . Similarly, negative triples are represented as {(𝑒𝑖 , 𝑟 , 𝑒 𝑗 ) |𝑒𝑖 , 𝑒 𝑗 ∈E, 𝑟 ∈ R−}.

We denote D as the health-related article set. Each article 𝑆 ∈D contains |𝑆 | words, 𝑆 = {𝑤1,𝑤2, . . . ,𝑤 |𝑆 |}. We perform entitylinking to build the word-entity alignment set {(𝑤, 𝑒) |𝑤 ∈ V, 𝑒 ∈E}, where (𝑤, 𝑒) means that word 𝑤 in the vocabulary V canbe linked with an entity 𝑒 in the entity set. To capture the co-relationships of articles and entities in a medical knowledge graph,we define the article-entity bipartite graphs as follow.

Definition 2. Article-Entity Bipartite Graph: The article-entity bipartite graph is denoted as G𝑎𝑒 = (D ∪ E,L), where Lis the set of links. The link is denoted as {(𝑆, 𝐻𝑎𝑠, 𝑒) |𝑆 ∈ D, 𝑒 ∈ E}.If an article 𝑆 contains a word that can be linked to entity 𝑒 , there willbe a link “𝐻𝑎𝑠” between them, otherwise none.

Exploiting the knowledge path between entities is of great im-portance. Here we formally define the knowledge path.

Definition 3. Knowledge Path: A knowledge path between en-tity 𝑒1 and 𝑒𝑘 is denoted as 𝑒1, 𝑟1, 𝑒2, 𝑟2 . . . , 𝑟𝑘−1, 𝑒𝑘 , where 𝑒𝑘 ∈ E,𝑟𝑘 ∈ R and (𝑒𝑘−1, 𝑟𝑘−1, 𝑒𝑘 ) ∈ T .

Consider such a knowledge path: 𝑒1, 𝑟1, 𝑒2, 𝑟2, 𝑒3, of which thetwo relations are (𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠,𝐶𝑟𝑒𝑎𝑡𝑒𝑠𝑅𝑖𝑠𝑘𝐹𝑜𝑟, 𝐾𝑖𝑑𝑛𝑒𝑦 𝐷𝑖𝑠𝑒𝑎𝑠𝑒) and(𝐾𝑖𝑑𝑛𝑒𝑦 𝐷𝑖𝑠𝑒𝑎𝑠𝑒,𝐶𝑎𝑢𝑠𝑒𝑠, 𝐸𝑑𝑒𝑚𝑎). The two relations build a pathbetween “diabetes” and “edema”, which implies a potential link

between two disorders. Such a knowledge path can add credibilityto the article mentioning these two disorders. Conversely, if twowords are not reachable in a knowledge graph, such two words arelargely irrelevant, which reduces the credibility of related articles.For example, although “bipolar disorder” and “fenofibrate” may bethe causes of “diabetes”, there is no strong connection between twoentities themselves from a medical perspective. However, existingtext classification methods regard ‘bipolar disorder” and “fenofi-brate” as related as they both co-occur with “diabetes” a lot. Hence,we argue that considering knowledge paths between words througha knowledge graph can provide medical evidence in healthcare mis-information detection, which yields higher detection accuracy.

With the above notations and definitions, we formulate theknowledge guided misinformation detection task as follows:

Problem 1 (Medical Knowledge Graph Guided Misinforma-tion Detection) Given a set of healthcare articles D, their corre-sponding label set Y, and the medical knowledge graph G, the goalis to learn a prediction function 𝑓 to distinguish if a news is fake.

4 METHODOLOGYOur proposed framework consists of three components, whichis shown in Figure 2: 1) an information propagation net, whichpropagates the knowledge between articles and nodes by preservingthe structure of KG; 2) knowledge aware attention, which learns theweights of a node’s neighbors in KG and aggregates the informationfrom the neighbors and an article’s contextual information to updateits representation; 3) a prediction layer, which takes an article’srepresentation as input and outputs a predicted label. Next, weintroduce the details of each component.

4.1 Information Propagation NetThe medical knowledge graph can provide medical evidence inhealthcare misinformation detection. To fully utilize the medicalknowledge graph for healthcare misinformation detection, moti-vated by previous work [34, 37], we leverage inherent directionalstructure of the medical database to learn the entity embedding.To propagate the information from knowledge graph to the arti-cle, we incorporate the Article-Entity Bipartite Graph and MedicalKnowledge Graph into a unified relational graph, and add a setof self-loops (edge type 0) denoted as A = {(𝑒𝑖 , 0, 𝑒𝑖 ) |𝑒𝑖 ∈ E},which allows the state of a node to be kept. Hence, the new graphis defined as G = {E ′,R ′,R−,T ′,T−}, where E ′ = E ∪ D,R ′ = R ∪ R− ∪ {𝐻𝑎𝑠, 0} and T ′ = T ∪ T− ∪ L ∪ A.Information Propagation: As there are multiple relations in agraph, we use R-GCN [34] to model the relational data, which isvery effective in modeling multi-relational graph data. In R-GCN,each node is assigned to an initial representation h(0)

𝑖. The layer-

wise propagation rule updates the node representation using therepresentations of its neighbors in the graph in the (𝑙 + 1)-th layer,yielding the representation h(𝑙+1)

𝑖as follows:

h(𝑙+1)𝑖

= 𝜎©­«∑𝑟 ∈R′

∑( 𝑗,𝑟 ,𝑖) ∈T′

1𝑐𝑖,𝑟

W𝑟h(𝑙)𝑗

ª®¬ , (1)

Page 4: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

GRU

GRU

GRU

GRU

GRU

GRU……

Text Encoder

BiGRU

𝛽" 𝛽# 𝛽$

𝐜

𝒗𝟏 𝒗𝟐

𝒔𝟏𝒔𝟏

𝒗𝒕

𝒔𝟐 𝒔𝒕

𝒔𝟐 𝒔𝒕Edema Herbal

Supplement

Menopausal Symptoms

𝐡 Article-EntityBipartite Graph

Information Propagation Net

Medical Knowledge

Graph

𝑟" 𝑟" 𝑟" 𝑟"

Diabetes

KidneyDiseases

𝑟# 𝑟#

Obesity

𝑟- 𝑟.𝑟/

𝑟/

Knowledge Guided Embedding Layers0𝒚Model Prediction

𝛼34#

Relations:𝑟": Has 𝑟.: Heals𝑟#: Causes 𝑟/: DoesNotHeal𝑟-: CreatesRiskFor

Figure 2: Illustration of the proposedDETERRENTmodel. The left subfigure shows the Knowledge Guided Embedding Layers ofDETERRENT, and the right subfigure presents the Information Propagation Net of DETERRENT. The Information PropagationNet is performed on the unified graph of Article-Entity Bipartite Graph and Medical Knowledge Graph, which has positive(in black) and negative (in red) relations.

where 𝑐𝑖,𝑟 is a normalization factor which is usually set to thenumber of neighbors of node 𝑖 ∈ E ′ under relation 𝑟 ∈ R ′, W𝑟 is alearnable edge-type-dependent weight parameter and 𝜎 (·) denotesan activation function (we use LeakyReLU in this paper).Node-level Attention: Each entity has relations with multipleentities. Not all relations are equally important for the healthcaremisinformation detection problem. However, each neighbor hasdifferent importance to the node representation. Thus, we introducethe attention mechanism into the Information Propagation in Eq. (1)to assign more weights to important neighboring nodes, and thenode representation is computed as the weighted sum of neighbors’:

h(𝑙+1)𝑖

= 𝜎©­«∑𝑟 ∈R′

∑( 𝑗,𝑟 ,𝑖) ∈T′

𝛼𝑟𝑖 𝑗W𝑟h(𝑙)

𝑗

ª®¬ (2)

where 𝛼𝑟𝑖 𝑗

measures the importance of node 𝑖 for a neighbor 𝑗 ,which is calculated as follows:

u𝑟𝑖 𝑗 = W𝑟 (h(𝑙)𝑖∥ h(𝑙)

𝑗),

𝛼𝑟𝑖 𝑗 =exp(a𝑟u𝑟

𝑖 𝑗)∑

(𝑘,𝑟,𝑖) ∈T′ exp(a𝑟u𝑟𝑖𝑘 )(3)

where a𝑟 is the learnable parameter that weighs different featuredimensions of the node representation.

An issue of Eq. 2 is that, with the increasing number of relationtypes, the model will be quickly over-parameterized. To alleviatethis problem, we apply Basis Decomposition [34] for regulariza-tion. This approach decomposes the weight matrix into a linearcombination of several basic matrices, which largely decreases thenumber of model parameters.Modeling Negative Relations: Since negative relations have dif-ferent effects on the target node compared with positive relations,they should be treated separately. For example, the following threepositive triples between four entities in a medical knowledge graph:1) 𝐶𝑎𝑙𝑐𝑖𝑡𝑟𝑖𝑜𝑙 can heal 𝐶𝑎𝑙𝑐𝑖𝑢𝑚 𝐷𝑒𝑓 𝑖𝑐𝑖𝑒𝑛𝑐𝑦; 2) 𝐴𝑐𝑡𝑜𝑛𝑒𝑙 can heal𝐶𝑎𝑙𝑐𝑖𝑢𝑚 𝐷𝑒𝑓 𝑖𝑐𝑖𝑒𝑛𝑐𝑦; and 3)𝐶𝑎𝑙𝑐𝑖𝑡𝑟𝑖𝑜𝑙 can alleviate 𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎.Intuitively, we can infer that 𝐴𝑐𝑡𝑜𝑛𝑒𝑙 is a potential treatment for𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎. However, a negative triple in a medical knowl-edge graph indicates that 𝐴𝑐𝑡𝑜𝑛𝑒𝑙 does not heal 𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎.Although the fact overrides our guess, it is explainable medically:

Both 𝐶𝑎𝑙𝑐𝑖𝑡𝑟𝑖𝑜𝑙 and 𝐴𝑐𝑡𝑜𝑛𝑒𝑙 can treat 𝐶𝑎𝑙𝑐𝑖𝑢𝑚 𝐷𝑒𝑓 𝑖𝑐𝑖𝑒𝑛𝑐𝑦. How-ever, the active ingredients in them are Vitamin D and Risedronate,respectively. Furthermore, the Vitamin D in𝐶𝑎𝑙𝑐𝑖𝑡𝑟𝑖𝑜𝑙 can alleviate𝐻𝑦𝑝𝑜𝑐𝑎𝑙𝑐𝑒𝑚𝑖𝑎 while Risedronate cannot. Thus, when we are mod-eling the graph, we hope the discrepancy between two entities ina negative triple is larger than in a positive triple. To achieve thisgoal, we choose BPR loss [30]. It is commonly used in recommen-dation systems, to maximize the difference between the scores ofthe positive and negative samples. Hence, we first conduct innerproduct of entity representations as the matching score:

𝑚𝑖 𝑗 = hT𝑗 (W𝑟h𝑖 ) (4)

where h𝑖 and h𝑗 are the representations for entity 𝑒𝑖 and 𝑒 𝑗 underrelation 𝑟 in each layer. Then we use BPR loss to penalize the scoresof two entities in a negative triple:

L𝑘 =∑

(𝑒 𝑗 ,𝑟 ,𝑒𝑖 ) ∈T′(𝑒𝑘 ,𝑟 ,𝑒𝑖 ) ∈T−

− ln𝜎(𝑚𝑖 𝑗 −𝑚𝑖𝑘

)(5)

where 𝜎 (·) is the Sigmoid function.It is worth noting that the signed GCNs [10] use balance theory

[14] in social psychology to deal with the negative relations in GCN.The balance theory suggests a positive relationship between twonodes, if there exists a knowledge path between the nodes thathave an even number of negative relations (e.g., “The enemy ofmy enemy is my friend”). However, these methods cannot be usedin modeling the medical knowledge graph due to the complexityof entities (medications and diagnoses). Distinct from the existingmethods, ourmodel uses a soft assumption on the negative relations,which does not require the graph to be balanced.

4.2 Knowledge Guided Embedding LayersAfter going through the Information Propagation Net, we can getthe neighboring attention weights of nodes (including articles). Inthis section, we propose Knowledge Guided Embedding Layersto use the relevance scores of entities to an article to guide theembedding of the article.Text Encoder: To fully capture the contextual information of anarticle, we use BiGRU [3] to encode word sequences from bothdirections of words. To be specific, given the word embeddings

Page 5: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

{v1, v2, . . . , v |𝑆 |} of an article 𝑆 , the article embedding is computedas below: −→s 𝑡 = GRU(−→s 𝑡−1, v𝑡 )

←−s 𝑡 = GRU(←−s 𝑡−1, v𝑡 )(6)

We concatenate the forward hidden state −→s 𝑡 and the backwardhidden state←−s 𝑡 as s𝑡 = [−→s 𝑡 ,←−s 𝑡 ], which captures the contextualinformation of the article centered around word v𝑡 .

Since not all words equally contribute to the semantic represen-tation of the article, we leverage the attention mechanism to learnthe weights to measure the importance of each word, and computethe article representation vector as follows:

c =|𝑆 |∑𝑡=1

𝛽𝑡 s𝑡 (7)

where 𝛽𝑡 measures the importance of the 𝑡-th word for the article,which is calculated as follows:

u𝑡 = tanh (W𝑐s𝑡 + b𝑐 )

𝛽𝑡 =exp(uT𝑡 g)∑ |𝑆 |𝑘=1 exp(u

T𝑘g)

(8)

where u𝑡 is a hidden representation of v𝑡 obtained by feeding thehidden state v𝑡 to a fully embedding layer, and g is a trainableparameter to guide the extraction of the context.Knowledge Guided Attention: To incorporate the knowledgeguidance into the textual information, we update the g in Eq. 8 byg′ to get the final attention function:

g′ = 𝛾g + (1 − 𝛾)W𝑘h𝑠 (9)

where h𝑠 is the node embedding of the article 𝑆 obtained fromthe Information Propagation Net, W𝑘 is a learnable transforma-tion matrix and 𝛾 ∈ [0, 1] is a trade-off parameter that controlsthe relative importance of the two terms. If we set 𝛾 = 1, then g′

degenerates to g and our framework degenerates to a text classi-fier without the information from the medical knowledge graph. Itmakes it easy to pre-train the model to get good word embeddingsfor misinformation detection. The updated context vector g′ takesboth linguistic features from BiGRU and knowledge guidance intoconsideration. The Information Propagation Net propagates moreinformation among similar entities and articles through the knowl-edge paths. We further use the attention score 𝛽𝑡 to compute thearticles representation vector c by Eq. 7.

4.3 Model PredictionWe have introduced how we can encode article contents throughknowledge guidance. We further feed the embeddings to a softmaxlayer for misinformation classification as follows:

𝑦 = Softmax(W𝑓 c + b𝑓 ) (10)

where 𝑦 is the predicted value which indicates the probability ofthe article being fake. For each article, our goal is to minimize thecross-entropy loss:

L𝑑 = −𝑦 log𝑦 − (1 − 𝑦) log(1 − 𝑦) (11)

where 𝑦 ∈ {0, 1} is the ground truth label being 0 (fact) and 1(misinformation), respectively.

Table 1: Statistics of datasets

Disease Diabetes Cancer

# Misinformation 608 1,476# Fact 1,661 4,623# Entities 1,932 2,873# Relations 22,685 28,391

4.4 Training and Inference with DETERRENTFinally, we combine the detection goal with BPR loss to form thefinal objective function as follows:

L𝑓 𝑖𝑛𝑎𝑙 = L𝑘 + L𝑑 + [ ∥Θ∥22 (12)

where Θ is the model parameters, and [ is a regularization factor.During the training, we optimize L𝑘 and L𝑑 alternatively. We

use Adam [19] to optimize the embedding loss and the predictionloss. Adam is awidely used optimizer, which can compute individualadaptive learning rates for different parameters w.r.t. the absolutevalue of gradient.

5 EXPERIMENTSIn this section, we present the experiments to evaluate the effective-ness of DETERRENT. Specifically, we aim to answer the followingevaluation questions:• RQ1: Is DETERRENT able to improve misinformation classifi-cation performance by incorporating the medical knowledgegraph?• RQ2: How effective are knowledge graph and knowledge awareattention, respectively, in improving the misinformation detec-tion performance of DETERRENT?• RQ3: Can DETERRENT provide reasonable explanations aboutmisinformation detection results?

Next, we first introduce the datasets and baselines, followed byexperiments to answer these questions.

5.1 DatasetsAs the medical knowledge graph, we use a public medical knowl-edge graph KnowLife3 [11] which contains 25,334 entity namesand 591,171 triples. We extract six positive relations includ-ing 𝐶𝑎𝑢𝑠𝑒𝑠 , 𝐻𝑒𝑎𝑙𝑠 , 𝐶𝑟𝑒𝑎𝑡𝑒𝑠𝑅𝑖𝑠𝑘𝐹𝑜𝑟 , 𝑅𝑒𝑑𝑢𝑐𝑒𝑠𝑅𝑖𝑠𝑘𝐹𝑜𝑟 , 𝐴𝑙𝑙𝑒𝑣𝑖𝑎𝑡𝑒𝑠 ,𝐴𝑔𝑔𝑟𝑎𝑣𝑎𝑡𝑒𝑠 and four negative relations including 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐶𝑎𝑢𝑠𝑒 ,𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐻𝑒𝑎𝑙 , 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐶𝑟𝑒𝑎𝑡𝑒𝑅𝑖𝑠𝑘𝐹𝑜𝑟 , 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝑅𝑒𝑑𝑢𝑐𝑒𝑅𝑖𝑠𝑘𝐹𝑜𝑟 .

To evaluate the performance of DETERRENT, we need a reason-ably sized collection of health-related articles of several diseaseswith labels. Unfortunately, there is no available dataset of adequatesize. For this reason, we have collected a health-related articledataset whose years range from 2014 to 2019.

To gather real articles, we crawled from 7 reliable media out-lets that have been cross-checked as reliable, e.g., Healthline, Sci-enceDaily, NIH (National Institutes of Health), MNT (Medical NewsToday), Mayo Clinic, Cleveland Clinic, WebMD. For misinforma-tion, we crawled verified health misinformation from Snopes.comand Hoaxy API, popular hoax-debunking site and web tool. Thedetailed statistics of the datasets are shown in Table 1.3http://knowlife.mpi-inf.mpg.de/

Page 6: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

Table 2: Performance Comparison on Diabetes and Cancer datasets. DETERRENT outperforms all state-of-the-art baselinesincluding knowledge graph based and article contents based methods.

Datasets Metric KG-Miner TransE text-CNN CSI\c dEFEND\c GUpdater HGAT DETERRENT

Diabetes

Accuracy 0.7601 0.7671 0.7566 0.8359 0.9101 0.9012 0.8888 0.9206Precision 0.5398 0.5963 0.5563 0.6847 0.9793 0.9687 0.7730 0.8445Recall 0.6333 0.4248 0.4836 0.7826 0.6597 0.6369 0.8289 0.8503F1 Score 0.5828 0.4961 0.5174 0.7304 0.7883 0.7685 0.7996 0.8474

Cancer

Accuracy 0.8051 0.8536 0.8812 0.8982 0.8969 0.9022 0.8608 0.9652Precision 0.5790 0.6455 0.8531 0.7900 0.8847 0.7868 0.7226 0.9469Recall 0.7365 0.8125 0.5988 0.8165 0.6538 0.8147 0.7338 0.9153F1 Score 0.6485 0.7195 0.7037 0.8030 0.7519 0.8005 0.7282 0.9309

5.2 BaselinesWe compare DETERRENTwith representative and state-of-the-artmisinformation detection algorithms, which are listed as follows:

• KG-Miner [35]: KG-Miner is a fast discriminative path miningalgorithm that can predict the truthfulness of a statement. Wefirst use OpenIE [2] to extract the relation triple of each sentencein the article. Then we compute the score of each triple when thesubject, predicate, object are all in the KG, and average all thescore as output label.• TransE [5]: TranE is a knowledge graph embedding method,which embeds entities and relations into latent vectors and com-pletes KGs based on these vectors. We use TransE on the unifiedrelational graph. The article embeddings are used for misinfor-mation detection.• text-CNN [18]: text-CNN is a text classificationmodel that utilizesconvolutional neural networks to model article contents, whichcan capture different granularity of text features with multipleconvolution filters.• CSI\c [31]: CSI is a hybrid deep learning-based misinformationdetection model that utilizes information from article contentand user response. The article representation is modeled via anLSTM model with the article embedding via Doc2Vec [21] anduser response. As our datasets do not have user comments, thecorresponding part of the model is ignored, and termed as CSI\c.• dEFEND\c [36]: dEFEND utilizes a hierarchical attention neuralnetwork framework on article content and co-attention mech-anism between article content and user comment for misinfor-mation detection. As our datasets do not have user comments,the corresponding part of the model is ignored, and termed asdEFEND\c.• HGAT [24]: HGAT is a flexible heterogeneous information net-work framework for classifying short texts, which can integrateany type of additional information. We add Semantic Group tothe entities as side information, such as Procedures and Disorders.• GUpdater [37]: GUpdater can update KGs by using news. It isbuilt upon GNNs with a text-based attention mechanism to guidethe updating message passing through KG structures. Similar toTransE, we use article embeddings for misinformation detection.

Note that for a fair comparison, we choose above contrastingmethods that use features from following aspects: (1) only knowl-edge graph, such as TransE, KG-Miner; (2) only article contents,such as text-CNN, CSI\c, dEFEND\c and (3) both knowledge graph

and article contents, such as HGAT and GUpdater. For knowledgegraph methods, we feed output article embeddings into several tra-ditional machine learningmethods and choose the one that achievesthe best performance. The methods include Logistic Regression,Multilayer Perceptron and Random Forest. We run these methodsby using scikit-learn [27] with default parameter settings.

5.3 Experimental Setup5.3.1 Metrics. To evaluate the performance of misinformation de-tection algorithms, we use the following metrics, which are com-monly used to evaluate classifiers in related areas: Accuracy, Preci-sion, Recall, and F1 score.

5.3.2 Implementation Details. We implement all models with Keras.We randomly use the labels of 75% news pieces for training andpredict the remaining 25%. We set the hidden dimension of ourmodel and other neural models to 128. The dimension of wordembeddings is 100. For DETERRENT, the entity embeddings andrelation embeddings are pre-trained using Information PropagationNet.We tested the depth ofDETERRENT𝐿 = {1, 2, 3, 4} and learningrate 𝑙𝑟 = {10−2, 10−3, 10−4}. We set [ = 0.05. For other methods,we follow the network architectures as shown in the papers. For allmodels, we use Adam with a minibatch of 50 articles on Diabetesdataset and 100 on Cancer dataset, and the training epoch is set as10. For a fair comparison, we use cross-entropy loss.

5.4 Misinformation Detection (RQ1)To answer RQ1, we first compare DETERRENT with the represen-tative misinformation detection algorithms introduced in Section5.2, and then investigate the performance of DETERRENT whendealing with different types of articles.

5.4.1 Overall Comparison. Table 2 summarized the detection per-formance of all competing methods (reporting the average of 5runs). From the table, we make the following observations:• For knowledge graph-based methods, TransE and KG-Miner, theperformance is less satisfactory. Although they are designed forKG triple checking and they do not incorporate linguistic featuresin news information. TransE can capture article-entity relationsto differentiate fake and real news. When detecting fake articles,KG-Miner is dependent on OpenIE to extract relation triple fromthe contents, and the performance of OpenIE tends to decreaseas the sentence gets longer.

Page 7: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

• In addition, article content-based methods, text-CNN, CSI\c anddEFEND\c perform better than those methods purely based ona knowledge graph. This indicates that the methods can utilizethe semantic and syntactic clues in texts. dEFEND\c can bettercapture important words and sentences that can contribute tothe prediction through a hierarchical attention structure.• Moreover, methods using both article contents and knowledgegraph, DETERRENT, GUpdater, and HGAT, perform compara-ble or better than those methods using either one of them, andthose only based on the knowledge graph. This indicates thatknowledge graph can provide complementary information tothe linguistic features, and thus improving the detection resultsthereby.• Generally, for methods based on both article contents and knowl-edge graph, we can see that DETERRENT consistently outper-forms other methods in terms of Accuracy and F1 Score on bothtwo datasets. DETERRENT achieves a relative improvement of1.05%, 4.78% on Diabetes dataset and 6.30%, 12.79% on Cancerdataset, comparing against the best results in terms of Accuracyand F1 Score.• It is worthwhile to point out that dEFEND\c and CSI\c have arelatively high Precision and low Recall, which indicates that themethods predict positive samples (misinformation) wrongly asnegative (fact). Hence we can see the necessity of modeling therelations between entities, as only linguistic information is notenough to distinguish fake and real information.

5.4.2 Performance Comparison w.r.t. Article Types. Besides fakearticles, misinformation also includes shorter formats such as click-bait and fake posts which can easily be posted and quickly go viralon social media. The important motivation of misinformation de-tection is to build a general framework to detect various types ofmisinformation.

Hence we investigate the performance of DETERRENT whendealing with different types of articles, including title and abstract.We evaluate DETERRENT by using articles’ titles and abstractsrespectively. The results in terms of F1 score on both datasets areshown in Figure 3. The bars show the word lengths of differentnews types in log base 10. From the results, we observe that:

• DETERRENT consistently outperforms the other models. Itdemonstrates the effectiveness of DETERRENT on different typesof misinformation regardless of the length. It again verifies thesignificance of knowledge graph and knowledge guided text em-bedding.• The performance of article contents based methods like CSI\cand dEFEND\c do not perform very well when the length of theinformation is short. This suggests that those methods rely on thelinguistic features of contents and cannot avoid the disadvantagesbrought by limited data. Although DETERRENT leverages articlecontents, it also exploits the additional information of entitiesto address above issue. The performance of DETERRENT onlyslightly decreases when dealing with titles (the shortest text).• The performance of knowledge graph-based methods, KG-Minerand TransE, is relatively stable with all types of information onthe two datasets.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.5

1

1.5

2

2.5

3

Title Abstract Content

F1 S

core

Leng

th (l

og)

Information Type

TransE KGMinerTextCNN CSI\cdEFEND\c GUpdaterHGAT DETERRENT

(a) Diabetes

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.5

1

1.5

2

2.5

3

Title Abstract Content

F1 S

core

Leng

th (l

og)

Information Type

TransE KGMinerTextCNN CSI\cdEFEND\c GUpdaterHGAT DETERRENT

(b) Cancer

Figure 3: Performance comparison over the length of articletypes on two datasets. The background histograms indicatethe length of each article; meanwhile, the lines demonstratethe performance w.r.t. F1 score.

Table 3: Effects of the network depth

Datasets Metric 1 2 3

Diabetes

Accuracy 0.8853 0.9171 0.9206Precision 0.7500 0.9217 0.8445Recall 0.8543 0.7361 0.8503F1 Score 0.7987 0.8185 0.8474

Cancer

Accuracy 0.9580 0.9599 0.9652Precision 0.9108 0.9507 0.9469Recall 0.9157 0.8817 0.9153F1 Score 0.9132 0.9149 0.9309

5.5 Ablation Analysis (RQ2)In order to answer RQ2, we explore each component of DETER-RENT. We first investigate the layer number of the model, then weexamine the components of knowledge graph embedding and theattention mechanisms by deriving several variants.

5.5.1 Effects of Network Depth. We vary the depth 𝐿 of DETER-RENT to investigate the efficiency of the usage of multiple em-bedding propagation layers of a knowledge graph. The larger 𝐿allows further information to propagate through the informationpropagation layer. In particular, we search the layer number in theset of {1, 2, 3, 4}. For 𝐿 > 3, we did not get satisfying results on bothdatasets, which suggests that forth- and higher-order knowledgepaths contribute little information. The results are summarized inTable 3. From this, we make the following observations:• Increasing the depth of DETERRENT can improve the perfor-mance of DETERRENT, which demonstrates the effectiveness ofmodeling high-order knowledge paths.• By analyzing Table 2 and Table 3, we can see that DETERRENTis slightly better than the article contents based methods, whichindicates the effectiveness of leveraging the relations.• Besides first-order knowledge paths, high-order paths can dis-cover inherent relations overlooked by traditional methods.

5.5.2 Effects of Attention Mechanisms and Negative Relations. Inaddition to article contents, we also apply knowledge graph infor-mation and integrate it with article contents with knowledge guidedattention. We further investigate the effects of these componentsby defining three variants of DETERRENT:

Page 8: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

Table 4: Ablation study of DETERRENT demonstrated the ad-vantage of the attention mechanisms and modeling bothpositive and negative relations.

Datasets Metric w/o Rel w/o K-Att w/o Neg

Diabetes

Accuracy 0.8412 0.9012 0.9118Precision 0.7164 0.8870 0.9565Recall 0.7988 0.7236 0.7096F1 Score 0.7554 0.7971 0.8148

Cancer

Accuracy 0.9022 0.9291 0.9586Precision 0.9291 0.9385 0.9462Recall 0.6569 0.7651 0.8756F1 Score 0.7697 0.8430 0.9096

• w/o Rel: w/o Rel is a variant of DETERRENT, which does notconsider the relations in the medical knowledge graph. The In-formation Propagation Net is replaced by a GNN model.• w/o K-Att: w/o K-Att is a variant ofDETERRENT, which excludesthe knowledge-guided attention module. Each article is repre-sented by the concatenation of the text embedding from the textencoder and node embedding from the Information PropagationNet, and fed into the prediction module.• w/o Neg: w/o Neg is a variant of DETERRENT, which does notspecifically model the negative relations in the medical knowl-edge graph. The BPR loss is excluded from this variant.When one removes a medical knowledge graph, leaving only a

BiGRU text encoder, the results are far from satisfactory, and thusare omitted. We summarize the experimental results in Table 4 andhave the following findings:• Whenwe solely use amedical knowledge graphwithout consider-ing relations, the performance of DETERRENT largely degrades,which suggests the necessity of modeling relations.• Removing knowledge guided embedding attention degrades themodel’s performance, as the attention mechanism will assignimportance weights for words, based on the semantic clues indifferentiating misinformation from fact without consideringknowledge paths.• When we do not specifically model negative relations, someentities may be embedded close in a relation wrongly throughinformation propagation. Thus, some misinformation (label 1)may be predicted as fact (label 0), which leads to relatively highPrecision and low Recall.Through the ablation study of DETERRENT, we conclude that

(1) knowledge-guided article embedding can contribute to the mis-information detection performance; (2) both positive and negativerelations are necessary for effective misinformation detection.

5.6 Case Study (RQ3)In order to illustrate the importance of knowledge graph for ex-plaining healthcare misinformation detection results, we use anexample to show the triples captured by DETERRENT in Figure 4and the corresponding attention weight in Figure 5.

In Figure5, 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠 has higher attention weights to the texts.The related triples (𝑃𝑎𝑛𝑐𝑟𝑒𝑎𝑡𝑖𝑐𝐼𝑠𝑙𝑒𝑡 ,𝑅𝑒𝑑𝑢𝑐𝑒𝑠𝑅𝑖𝑠𝑘𝐹𝑜𝑟 ,𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠) and(𝐼𝑛𝑠𝑢𝑙𝑖𝑛, 𝐷𝑜𝑒𝑠𝑁𝑜𝑡𝐻𝑒𝑎𝑙 , 𝐷𝑖𝑎𝑏𝑒𝑡𝑒𝑠) can provide explanations about

why the information is false, as the texts exaggerated the effectsof 𝑃𝑎𝑛𝑐𝑟𝑒𝑎𝑡𝑖𝑐𝐼𝑠𝑙𝑒𝑡 and 𝐼𝑛𝑠𝑢𝑙𝑖𝑛. In contrast, 𝐺𝑙𝑢𝑐𝑜𝑠𝑒 has a smallerattention weight than above two entities. We can see that DETER-RENT can not only detect the given information as fake but alsoyields the explanations of the detection results.

Chaste tree berry may have a potential in improved pancreatic islet regeneration and hepatic insulin sensitivity for cure of Type I and II diabetes, some scientists suggested. Diabetes are condition caused by insufficient insulin entering the bloodstream in regulation of glucose.

Misinformation

(Pancreatic Islet, ReducesRiskFor, Diabetes)

Triples from KG

(Insulin, DoesNotHeal, Diabetes)

(Glucose, Causes, Diabetes Mellitus)

(Glucose, ReducesRiskFor, Insulin Resistance)

Figure 4: The explainable triples captured by DETERRENT.

Pancreatic Islet Insulin

Insulin Resistance

Diabetes Glucose

Diabetes Mellitus

DoesNotHeal

Has Has Has Has

ReducesRiskFor

ReducesRiskForCauses0.0124 0.00680.0052 0.0089

0. 0012 0. 0045 0. 0027 0. 0013

Figure 5: The visualization with attention wights.

0

0.2

0.4

0.6

0.8

1

Positive Negative

Atte

ntio

n W

eigh

ts

Label

FactMisinformation

(a) Diabetes

0

0.2

0.4

0.6

0.8

1

Positive Negative

Atte

ntio

n W

eigh

ts

Label

FactMisinformation

(b) Cancer

Figure 6: The attention weight analysis indicates that posi-tive relations contributemore to fact, and negative relationscontribute more to misinformation.

We calculate the average attention weights of positive and neg-ative relations to both misinformation and fact on two datasets.The results are shown in Figure 6. Note that positive relations havehigher attention weights to fact than misinformation, while nega-tive relations have higher attention weights to misinformation thanfact. Hence, it indicates that positive relations contribute more tofact, and negative relations contribute more to misinformation.

6 CONCLUSIONIn this paper, we proposedDETERRENT, a knowledge guided graphattention network for misinformation detection in healthcare. DE-TERRENT leverages additional information from a medical knowl-edge graph, to guide the article embedding with a graph attentionnetwork. The network can capture both positive and negative rela-tions, and automatically assign more weights to important relationsin differentiating misinformation from fact. The node embeddingis used for guiding text encoder. Experiments on two real-worlddatasets demonstrate the strong performance of DETERRENT.

Page 9: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

DETERRENT has two limitations. It only leverages a knowledgegraph, instead of other complementary information. Also, it doesnot consider the publishing time of an article. In future, first, wecan incorporate the data from medical forums to find questionableuser comments. Second, other complementary information, such asdoctors’ remarks, can be considered. Third, time intervals betweenposts can be considered to model misinformation diffusion.

7 ACKNOWLEDGMENTSWe thank Patrick Ernst and Gerhard Weikum for sharing KnowLifedata with us, and Jason (Jiasheng) Zhang for his valuable feedback.

This work was in part supported by NSF awards #1742702,#1820609, #1909702, #1915801, and #1934782.

REFERENCES[1] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the

2016 election. Journal of economic perspectives 31, 2 (2017), 211–36.[2] Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning.

2015. Leveraging linguistic structure for open domain information extraction. InProceedings of the 53rd Annual Meeting of the Association for Computational Lin-guistics and the 7th International Joint Conference on Natural Language Processing(Volume 1: Long Papers). 344–354.

[3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma-chine translation by jointly learning to align and translate. arXiv preprintarXiv:1409.0473 (2014).

[4] Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, BalasubramanianRaman, and Ankush Mittal. 2018. Combining neural, statistical and externalfeatures for fake news stance identification. In Companion Proceedings of the TheWeb Conference 2018. 1353–1357.

[5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-sana Yakhnenko. 2013. Translating embeddings for modeling multi-relationaldata. In Advances in neural information processing systems. 2787–2795.

[6] Dan Busbridge, Dane Sherburn, Pietro Cavallo, and Nils Y Hammerla. 2019.Relational Graph Attention Networks. arXiv preprint arXiv:1904.05811 (2019).

[7] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen,Filippo Menczer, and Alessandro Flammini. 2015. Computational fact checkingfrom knowledge networks. PloS one 10, 6 (2015).

[8] Limeng Cui, Kai Shu, SuhangWang, Dongwon Lee, and Huan Liu. 2019. dEFEND:A System for Explainable Fake News Detection. In Proceedings of the 28th ACMInternational Conference on Information and Knowledge Management. 2961–2964.

[9] Limeng Cui, Suhang Wang, and Dongwon Lee. 2019. SAME: sentiment-awaremulti-modal embedding for detecting fake news. In Proceedings of the 2019IEEE/ACM International Conference on Advances in Social Networks Analysis andMining. 41–48.

[10] Tyler Derr, Yao Ma, and Jiliang Tang. 2018. Signed graph convolutional networks.In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 929–934.

[11] Patrick Ernst, Amy Siu, and Gerhard Weikum. 2015. KnowLife: a versatileapproach for constructing a large knowledge graph for biomedical sciences. BMCbioinformatics 16, 1 (2015), 157.

[12] Gunther Eysenbach, John Powell, Oliver Kuss, and Eun-Ryoung Sa. 2002. Em-pirical studies assessing the quality of health information for consumers on theworld wide web: a systematic review. Jama 287, 20 (2002), 2691–2700.

[13] Han Guo, Juan Cao, Yazi Zhang, Junbo Guo, and Jintao Li. 2018. Rumor detec-tion with hierarchical social attention network. In Proceedings of the 27th ACMInternational Conference on Information and Knowledge Management. 943–951.

[14] Fritz Heider. 1946. Attitudes and cognitive organization. The Journal of psychology21, 1 (1946), 107–112.

[15] Viet-Phi Huynh and Paolo Papotti. 2019. A Benchmark for Fact Checking Algo-rithms Built on Knowledge Bases. In Proceedings of the 28th ACM InternationalConference on Information and Knowledge Management. 689–698.

[16] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verificationby exploiting conflicting social viewpoints in microblogs. In Thirtieth AAAIconference on artificial intelligence.

[17] Georgios Karagiannis, Immanuel Trummer, Saehan Jo, Shubham Khandelwal,Xuezhi Wang, and Cong Yu. 2019. Mining an" anti-knowledge base" fromWikipedia updates with applications to fact checking and beyond. Proceedings ofthe VLDB Endowment 13, 4 (2019), 561–573.

[18] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification.In Proceedings of the 2014 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP). 1746–1751.

[19] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-mization. arXiv preprint arXiv:1412.6980 (2014).

[20] Thomas N Kipf and MaxWelling. 2016. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[21] Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences anddocuments. In International conference on machine learning. 1188–1196.

[22] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting pos-itive and negative links in online social networks. In Proceedings of the 19thinternational conference on World wide web. 641–650.

[23] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Signed networksin social media. In Proceedings of the SIGCHI conference on human factors incomputing systems. 1361–1370.

[24] Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heteroge-neous graph attention networks for semi-supervised short text classification. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-cessing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP). 4823–4832.

[25] Amy Nguyen, Sasan Mosadeghi, and Christopher V Almario. 2017. Persistentdigital divide in access to and use of the Internet as a resource for health infor-mation: Results from a California population-based study. International journalof medical informatics 103 (2017), 49–54.

[26] Shivam B Parikh and Pradeep K Atrey. 2018. Media-rich fake news detection:A survey. In 2018 IEEE Conference on Multimedia Information Processing andRetrieval (MIPR). IEEE, 436–441.

[27] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss,Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journalof machine learning research 12, Oct (2011), 2825–2830.

[28] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and BennoStein. 2017. A stylometric inquiry into hyperpartisan and fake news. arXivpreprint arXiv:1702.05638 (2017).

[29] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi.2017. Truth of varying shades: Analyzing language in fake news and politicalfact-checking. In Proceedings of the 2017 Conference on Empirical Methods inNatural Language Processing. 2931–2937.

[30] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedingsof the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press,452–461.

[31] Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model forfake news detection. In Proceedings of the 2017 ACM on Conference on Informationand Knowledge Management. ACM, 797–806.

[32] Daniel Scanfeld, Vanessa Scanfeld, and Elaine L Larson. 2010. Dissemination ofhealth information through social networks: Twitter and antibiotics. Americanjournal of infection control 38, 3 (2010), 182–188.

[33] Franco Scarselli, MarcoGori, AhChung Tsoi, MarkusHagenbuchner, andGabrieleMonfardini. 2009. The Graph Neural Network Model. IEEE Transactions on NeuralNetworks 20, 1 (2009), 61–80.

[34] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, IvanTitov, and Max Welling. 2018. Modeling relational data with graph convolutionalnetworks. In European Semantic Web Conference. Springer, 593–607.

[35] Baoxu Shi and TimWeninger. 2016. Discriminative predicate path mining for factchecking in knowledge graphs. Knowledge-based systems 104 (2016), 123–133.

[36] Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. de-fend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining. 395–405.

[37] Jizhi Tang, Yansong Feng, and Dongyan Zhao. 2019. Learning to Update Knowl-edge Graphs by Reading News. In Proceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the 9th International Joint Conferenceon Natural Language Processing (EMNLP-IJCNLP). 2632–2641.

[38] Christopher C Tsai, SH Tsai, Q Zeng-Treitler, and BA Liang. 2007. Patient-centered consumer health social network websites: a pilot study of quality ofuser-generated health information. In AMIA Annu Symp Proc, Vol. 1137.

[39] Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant,and Andreas Krause. 2018. Fake news detection in social networks via crowdsignals. In Companion Proceedings of the The Web Conference 2018. 517–524.

[40] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLiò, and Yoshua Bengio. 2018. Graph Attention Networks. International Confer-ence on Learning Representations (2018).

[41] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and falsenews online. Science 359, 6380 (2018), 1146–1151.

[42] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, LuSu, and Jing Gao. 2018. Eann: Event adversarial neural networks for multi-modalfake news detection. In Proceedings of the 24th acm sigkdd international conferenceon knowledge discovery & data mining. 849–857.

[43] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang,Changcheng Li, and Maosong Sun. 2018. Graph neural networks: A review ofmethods and applications. arXiv preprint arXiv:1812.08434 (2018).

Page 10: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

A APPENDIX ON REPRODUCIBILITYAll the codes that we have implemented are available under thefolder “Healthcare misinformation detection” through the followinglink: https://github.com/cuilimeng/DETERRENT.

A.1 Healthcare Misinformation DetectionIn this section, we provide more details of the experimental settingand configuration to enable the reproducibility of our work.

We compared the proposed framework, DETERRENT, with 7baseline methods discussed in Section 5.2, including KG-Miner,TransE, text-CNN, CSI, dEFEND, HGAT and GUpdater. Baselineswere obtained as follows:

• KG-Miner: We used the implementation by the authors of [15],which is available at: https://github.com/huynhvp/Benchmark_Fact_Checking.• TransE: We used the implementation by the authors of [15],which is available at: https://github.com/huynhvp/Benchmark_Fact_Checking.• text-CNN: we used the publicly available implementation at:https://github.com/dennybritz/cnn-text-classification-tf .• CSI:We used the implementation available at: https://github.com/sungyongs/CSI-Code.• dEFEND: We used the implementation provided by the authorsavailable at: https://tinyurl.com/ybl6gqrm.• HGAT: We implemented the codes ourselves.• GUpdater: We used the implementation available at: https://github.com/esddse/GUpdater.

For the health-related article dataset, we manually created adataset on healthcare by ourselves, under the folder “Dataset” at:https://github.com/cuilimeng/DETERRENT.

For parameter settings forDETERRENT, we introduce the detailsof major parameter setting as shown in Table 5. The abstracts ofthe major parameters are as follows:

• Text Max Length: the threshold to control the maximum lengthof news contents• Word Embedding: the word emending package used for initializethe word vectors• Embedding Dimension: the dimension of embedding layer• 𝑑 : the size of hidden states for BiGRU

A.2 Medical Knowledge GraphFor a medical knowledge graph used in this paper, we use partialdata from KnowLife, which is a well-known knowledge base inbiomedical science. The data we used were provided by the authorsof [11]. KnowLife is constructed from textual Web sources found inspecialized portals and discussion forums, such as Pubmed Medline,Pubmed Central, by using information extraction (IE) techniques.The sources include both scientific publications and posts in healthportals. Overall, it consists of 214k canonical entities and 78k factsfor 14 relations. Example triples in KnowLife are listed in Table 7.

Left pattern phrase and right pattern phrase are entities, regard-ing to the head node and tail node in a medical knowledge graph,respectively. The relation indicates a directed edge from the headnode to the tail node. The above three, then, forms a triple in amedical knowledge graph.

Table 5: The details of the parameters of DETERRENT

Parameter Diabetes Cancer

Text Max Length 500 120Embedding Dimension 100 100Dropout Rate 0.5 0.5Learning Rate 10−4 10−3# Epochs 10 10Minibatch Size 50 100𝑑 128 128Adam Parameter (𝛽1) 0.9 0.9Adam Parameter (𝛽2) 0.999 0.999

Table 6: An example of the entity name consistency

Left Fact Entity Left Pattern Phrase

C0271650 PrediabetesC0271650 Glucose IntoleranceC0271650 Impaired Glucose Tolerance

As there may exit multiple names for a disease/symptom, KnowL-ife assigns the same entity ID to all names with the same semanticsto maintain the consistency of entity name, as shown in the Table6. Left pattern phrase indicates an entity name and left fact entityindicates the corresponding entity ID. For instance, “Prediabetes",“glucose intolerance" and “impaired glucose tolerance" are severalphrases that indicate the same disorder, which is characterized bythe inability to properly metabolize glucose. As we can see in theexample, then, they have the same left fact entity “C0271650”.

A.3 Querying Examples of DETERRENTDETERRENT can not only predict the truthfulness of a given article,but also provide related entities and triples. Hence, to show theinput and output of DETERRENT more clearly, we show moreexamples in this section. In Table 8, we show two fake and two realsnippets of information, and the detection results by DETERRENT.The related triples can help people better understand why certaininformation is fake (or not).

Page 11: DETERRENT: Knowledge Guided Graph Attention Network for ...pike.psu.edu/publications/kdd20-deterrent.pdf · Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon

Table 7: Example triples extracted from specialized portals

Source Sentences Left Pattern Phrase Relation Right Pattern Phrase

DrugsDotCom {“Although rare, the corticosteroid in this medicinemay cause higher blood and urine sugar levels ,especially if you have severe diabetes and are usinglarge amounts of this medicine .”}

Corticosteroid Causes Diabetes

Wikipedia {“One of the more serious complications of choledo-cholithiasis is acute pancreatitis , which may resultin significant permanent pancreatic damage andbrittle diabetes .”}

Acute Pancreatitis Causes Brittle Diabetes

pub_med_medline {“Anemia is associated with an increased risk of car-diovascular and renal events among patients withtype 2 diabetes and chronic kidney disease -LRB-CKD -RRB- .”}

Anemia CreatesRiskFor Diabetes

MedlinePlus {“Diseases such as diabetes , obesity , kidney failureor alcoholism can cause high triglycerides .”}

Alcoholism Causes High Triglycerides

Table 8: Querying Examples of DETERRENT

Article Ground Truth Prediction Related Triples

This is a detailed exploration of BME’s anti-obesity ef-fect, facilitating the rational use of this herbal plantto address this increasingly severe issue, obesity.

1 1 (Herbal Plant, DoesNotHeal, Obesity)

They contain essential fats like ALA, antioxidants likevitamin E. . .Urolithin can bind to estrogen receptors,making it a strong candidate for the prevention ofbreast cancer. An animal study also reported that wal-nuts can reduce the growth of prostate cancer cells.

1 1 (Vitamin E, DoesNotReduceRiskFor,Prostate Cancer)(Estrogens, DoesNotReduceRiskFor, Breast Cancer)

A study published this month found that mediter-ranean diet led to significantly lower risk of gesta-tional diabetes and reduction in excess weight gainduring pregnancy.

0 0 (Mediterranean Diet, ReducesRiskFor, Diabetes)

Researchers say vitamin D may make the body moreresistant to breast cancer.

0 0 (Vitamin D, ReducesRiskFor, Breast Cancer)