inferring hidden relationships from biological literature with multi-level context t erms

31
Inferring Hidden Relationships from Biological Literature with Multi-level Context Terms

Upload: keren

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Inferring Hidden Relationships from Biological Literature with Multi-level Context T erms. Introduction. Literature Based Discovery (LBD). PKC1. 3. 8. Alzheimer. Insulin. CATS. 5. 9. Drug repositioning. 4. 2. SOS2. Swanson’s ABC model. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Inferring Hidden Relationships from Biological Literature

with Multi-level Context Terms

Page 2: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Introduction• Literature Based Discovery (LBD)

Swanson’s ABC model

Drug repositioning

Alzheimer

In-sulin

PKC1

CATS

SOS2

35

2

89

4

Page 3: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Literature-based discovery (LDA)? ---the very idea.1. It means deriving, from the public record of

science new solutions to scientific problems.

2. The possibility arises, for example, when two articles considered together for the first time suggest new information of scientific interest not apparent from either article alone.

Page 4: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Venn Diagram -- ABC Model

A CB

Articles about an AB relationship.

Articles about a BC relationship.

AB BC

AB and BC are complementary but disjoint :They can reveal an implicit relationship between A and C in the absence of any explicit relation.

Page 5: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

An ABC example based on title words in Medline Magnesium-deficient rat as a model of epilepsy.Lab Animal Sci 28:680-5, 1978

The relation of migraineand epilepsy. Brain 92: 285-300, 1969

A magnesium8011

C migraine2756An unintended link

Venn diagram: sets of Medline records; A,C are disjoint.

22 45

B epilepsy

Page 6: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Related work• CTD

• A manually curated database.• Inferring chemical – gene – disease relations using ABC models

Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (2008, NAR)

Page 7: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Related work• CoPub discovery

• Co-occurrence score based ABC models• Inferring diseases, genes, drugs relations

Literature Mining for the Discovery of hidden connections between drugs, genes and diseases (2010, PLOS computational biology)

Page 8: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms
Page 9: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

MeSH Terms

Page 10: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms
Page 11: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Objective• Objective

• Inferring hidden drug-disease relations accurately from the lit-eratures

• Limitations on previous models• generate large volume of false positive candidate relations• are semi-automatic, labor-intensive technique requiring human

experts’ input.

• Solution strategy• Incorporate context information into relation inference

Page 12: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Our approach

• Key Idea• Inferring drug-disease relations based on context term

similarity• Drug - gene relation• Disease - gene relation

• Our hypothesis • The similarity of context terms between Drug-Gene

and Gene-Disease model enables to infer more mean-ingful Drug-Disease relations.

Dis-easeDrug

GeneContext Context

Simi-larity

Page 13: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Context vector

• Set of bio-medical terms in paper abstracts

Interaction

Bio-medicalTerm

Ab-stact1Abstact2

A context vectorof an interaction

Average

Page 14: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Similarity measures and score comparison

Alzheimer Insulin

PCK1

1 1 1 1 1 0 …0 0 2 1 1 1 …

Context Vec-tors

Similar-ity

Measure

Scoring

Score Comparison - All frequencies V.S. - Context similarity based filtered frequencies

Similarity Measures- Cosine similarity- Spearman Correlation

Page 15: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Experiment overview

Entity Tag-ging

Interaction Ex-traction

Context Vector Extraction

PubMed Ab-

stracts

Drug – Disease

Inference

ScoredResult Evaluation

Known disease-drug interac-tions (CTD)(336,693)

Answer set

Prev. model VSOur model

Perfor-mance

analysisLiterature analysis

CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

Known disease-drug interac-tions (Phar-

mGKB)(1,992)

Entity Dictio-nary

Page 16: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

DatasetUMLS

• 96,031 disease, 45,527 gene, 6,132 symptom synonyms

PharmGKB• 25,693 disease, 28,091 drug, 258,840 gene synonyms

CTD• 68,211 disease, 384,141 chemical, 679,701 gene synonyms.

Pubmed• 77,711 Alzheimer's disease related abstracts

Page 17: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Multi-level entity recognition• Dictionary based entity recognition from the ab-

stracts.• We import data from three external databases to generate the

multi-level entity dictionaries: PharmGKB, CTD, and UMLS. • We define the entity levels of the dictionaries into four different

levels: gene, drug, disease, and symptom.

Page 18: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Multi-level entity recognition• From Alzheimer’s disease related 77,711 abstracts

• Parse the sentences using the Condition Ran-dom Field (CRF) based sentence detector

• Extract Bio-medical entities using LingPipe• Match the extracted entities with PharmGKB, and

CTD entity dictionary databases to extract interac-tion data

• Map the extracted entities to the UMLS entity dictio-nary database to extract members of context vector

Page 19: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Experiment overview

Entity Tag-ging

Interaction Ex-traction

Context Vector Extraction

PubMed Ab-

stracts

Drug – Disease

Inference

ScoredResult Evaluation

Known disease-drug interac-tions (CTD)(336,693)

Answer set

Prev. model VSOur model

Perfor-mance

analysisLiterature analysis

CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

Known disease-drug interac-tions (Phar-

mGKB)(1,992)

Entity Dictio-nary

Page 20: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Interaction Extraction• To extract biologically meaningful interactions,

we limited to extract the patterns of ‘drug - gene’ and ‘gene - disease’ from the recognized entities.

• We generated entity dictionaries from Phar-mGKB and CTD databases. PharmGKB and CTD have different number of terms, so their tag-ging results are different from each other.

• We tagged biological entities from PubMed records. After we tagged them, we extracted candidate interactions when two different types of entities are co-occurred within a sentence.

Page 21: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Suggested approaches• Experiment overview

Entity Tag-ging

Interaction Ex-traction

Context Vector Extraction

PubMed Ab-

stracts

Drug – Disease

Inference

ScoredResult Evaluation

Known disease-drug interac-tions (CTD)(336,693)

Answer set

Prev. model VSOur model

Perfor-mance

analysisLiterature analysis

CTD : Comparative Toxicogenomics DatabaseUMLS : Unified Medical Language System

Known disease-drug interac-tions (Phar-

mGKB)(1,992)

Entity Dictio-nary

Page 22: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

• We compare our method to the ABC model that is based on entity frequency in Alzheimer’s disease re-lated abstracts.• The comparison was made for top 100, 500 results• Literature analysis in top 10 ranked interactions.

Evaluation method

Phar-mGKB(1,992)

CTD(336,693)

Answer set

ABC model VSOur model

ABC model VSOur model

Page 23: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

ResultsEntity Tagging

• From 77,711 abstracts related with “Alzhemier• 1,640,761 biomedical entities

• 295,419 were tagged by the PharmGKB entity dictionary• 438,987 were tagged by the CTD entity dictionary• 260,291 were tagged by the UMLS entity dictionary

Interaction Extrac-tion• PharmGKB tagged entities

• From 60,415 interactions• We inferred 14,481 new disease-drug interactions

• CTD tagged entities• From 119,464 interactions• We inferred 136,570 interactions

• Size of context vector• 1,641 terms

Page 24: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Results

PharmGKB• The PharmGKB case does not achieve outstanding performance (be-

tween 0%~1%). • The weak performance is attributed to the fact that PharmGKB

has only 1,992 drug-disease interactions. • Furthermore, our dataset was not all PubMed abstracts but only

Alzheimer’s disease related context.

Page 25: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Results

CTD• The Context based approach is superior to the baseline in all cases

(Top 100, 500).• When we filtered the inferred interactions using the context

term based similarity, we observed that it helped improving performance, which is better than the frequency used only.

Page 26: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

CTD-Hybrid0.95 Disease Chemical PMID Only Frequency model

D058225:D016229Plaque,

AmyloidAmyloid beta-Peptides 21575663 o

D000544:D020932 Alzheimer Disease Nerve Growth Factor 20965859 oD005182:D000544 Alzheimer Disease Flavin-Adenine Dinucleotide 12127087 oD015850:D000544 Alzheimer Disease Interleukin-6 20667498 oD000544:D014409 Alzheimer Disease Tumor Necrosis Factor-alpha 21327054 oD000544:D015415 AlzheimerDisease Biological Markers   o

D005182:D002311Cardiomyopathy,

DilatedFlavin-Adenine Dinucleotide   o

D000544:D016229 Alzheimer Disease Amyloid beta-Peptides 21726674 o

D002311:D016229Cardiomyopathy,

DilatedAmyloid beta-Peptides   x

D000544:D007328AlzheimerDisease

Insulin 21525299 x

ResultsTop 10 ranked interactions (CTD based)

Page 27: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Results• Alzheimer’s disease - Insulin

• A low score case (0.28) Alzheimer – CATS – Insulin

• A relatively high score case (0.95) Alzheimer- CYC-1- In-sulin

Page 28: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Conclusion• We suggested context-vectors to infer unknown relationships

based on biologically meaningful terms.• We constructed multi-level entity dictionary to recognize multi-

level entities from the literature.• We utilized our context vectors to discover putative drugs and

diseases relationships.• We evaluated the results by drug-disease relations which are

curated from the literature.(PharmGKB, CTD).

• In the Alzheimer’s disease 77,711 papers, we found that our context vector based hybrid approach has better precision than previous frequency based ABC model.

Page 29: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Future Study: Difference Approach to Context Terms

• Based on Interaction words (verb terms), de-fine possible direct interaction among enti-ties, and assume that interactions among the rest of entities are context.

I-verbI-Ent1 I-En2 C-Ent C-EntC-EntSentence 1

I-verbI-Ent1 I-En2C-Ent C-EntC-EntSentence 2

I-verbC-Ent I-En1C-Ent C-EntI-Ent2

Sentence 3

Page 30: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Future Study

Page 31: Inferring Hidden Relationships  from  Biological Literature  with Multi-level  Context  T erms

Questions?• Thank you!

Questions?

Thank You!