semantic relations for interpreting dna microarray data and for novel hypotheses generation

34
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin, 2 MD PhD, Thomas C Rindflesch, 3 PhD 1 Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia 2 Institute of Medical Genetics, University Medical Centre, Ljubljana, Slovenia 3 National Library of Medicine, National Institutes of Health, Bethesda, MD, U.S.A. e-mail: [email protected]

Upload: ralph

Post on 26-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation. Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 B orut Peterlin, 2 MD PhD , Thomas C Rindflesch, 3 PhD 1 Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

1

Semantic Relations for Interpreting DNA Microarray Data

and for Novel Hypotheses Generation

Dimitar Hristovski,1 PhD, Andrej Kastrin,2 Borut Peterlin,2 MD PhD, Thomas C Rindflesch,3 PhD

1Institute of Biomedical Informatics, Medical Faculty, University of Ljubljana, Slovenia

2Institute of Medical Genetics, University Medical Centre, Ljubljana, Slovenia3National Library of Medicine, National Institutes of Health, Bethesda, MD,

U.S.A.

e-mail: [email protected]

Page 2: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

2

Introduction

Microarray experiments:

• great potential to support progress in biomedical research,

• results NOT EASY to interpret,

• information about functions and relations of relevant genes needs to be extracted from the vast biomedical literature

Page 3: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Related Work

• Text mining and microarray analysis

• Literature-based Discovery

Page 4: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

4

Proposed Solution

• Computerized text analysis system• Extract semantic relations from literature

– SemRep

• Integrate with microarray experiments• Develop tools for:

– Interpretation– Novel hypotheses generation

Page 5: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Overall Design

Medline GEO

SemRepSem.rels Extraction

R Bioconductorscripts

Integrated Database=semantic relations +

microarrays

Interpretation & Discovery Tools

semantic relationsmicroarra

ys

Page 6: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

SemRep

• Extracts semantic relations from biomedical text (implemented in Prolog)

• Based on UMLS Metathesaurus and Semantic Network– <MetaConc> SEMNET RELATION <MetaConc>

• Database of relations extracted from MEDLINE– 6.7M citations (01/01/1999 through 03/31/2009)– 43M sentences– 21M relation instances– 7M relation types

6

Page 7: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

7

Semantic Relations Extracted

• Wide range of relations in:– Clinical medicine– Molecular genetics– Pharmacogenomics

• Genetic Etiology: associated_with, predisposes, causes• Substance Relations: interacts_with, inhibits, stimulates • Pharmacological Effects: affects, disrupts, augments • Clinical Actions: administered_to, manifestation_of, treats, • Organism Characteristics: location_of, part_of, process_of • Co-existence: co-exists_with

Page 8: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

8

Examples

• “… the loss of Mbd1 could lead to autism-like behavioral phenotypes …”

• Relation: MDB1 causes Autistic Disorder • “… Mbd1 can directly regulate the

expression of Htr2c, one of the serotonin receptors, …”

• Relation: MBD1 interacts_with HTR2C

Page 9: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation
Page 10: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

10

Interpretation of Microarrays

Find known facts from the literature:

• Desease related:– Associated genes– Current treatments– …

• Microarray Genes:– Relations between genes (INHIBITS, STIMULATES, …)– Relations between the genes and anything else

Page 11: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Relations with “Parkinson” as Argument?

Page 12: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

What Treats Parkinson?

Page 13: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

What (causes, associated_with) Parkinson?

Page 14: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Sentences from which Relations are Extracted

Page 15: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Genes from the Microarray Related to Anything?

Page 16: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

16

Novel Hypotheses Generation

• Based on discovery patterns

• Discovery patterns:– search templates that have a higher likelihood of

returning a new discovery

• Specific discovery patterns for specific discovery tasks

Page 17: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

17

Discovery Patterns

• Inhibit the upregulated:– Search for substances, genes, ... which, according to the

literature, inhibit the top N (e.g. 300) genes that are upregulated on a given microarray

– Such substances, genes, … might be used to regulate the upregulated genes

• Stimulate the downregulated:– Search for substances, genes, ... which, according to the

literature, stimulate the top N (e.g. 300) genes that are downregulated on a given microarray

– Such substances, genes, … might be used to regulate the downregulated genes

Page 18: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Discovery Patterns – Graphical View

Disease X

Maybe_Treats2?

Upregulated

Downregulated

Genes Y1

Genes Y2

Drug Z1

(or substance)

Drug Z2

(or substance)

Inhibits

Stimulates

Maybe_Treats1?

Microarray Literature

Page 19: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

19

Results – Inhibit the Upregulated

Paclitaxel INHIBITS HSPB1|HSPB1 protein

Paclitaxel completely inhibited the expression of HSP27 (PMID: 15304155)

Quercetin INHIBITS HSPB1|HSPB1 gene

Quercetin …, inhibited the expression of both HSP70 and HSP27 (PMID: 12926076)

•Parkinson microarray GSE8397

•HSP27 (HSPB1) gene is upregulated on the microarray

•We identified paclitaxel and quercetin as substances that inhibit the expression of this gene

Page 20: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Inhibit the Upregulated

Page 21: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

21

Results – Stimulate the

Downregulated• NR4A2 downregulated on the microarray• We found out that:

– Pramipexol stimulates expression of NR4A2 – NR4A2 is associated with Parkinson disease

pramipexol STIMULATES NR4A2

… the increase of Nurr1 gene expression induced by PRX, ... (PMID: 15740846)

… the induction of Nurr1 gene expression by PRX ... (PMID: 15740846)

NR4A2 ASSOCIATED_WITH

Parkinson Disease

… lower levels of NURR1 gene expression were associated with significantly increased risk for PD (PMID: 18684475)

Page 22: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Explaining a Relation - Closed Discovery

Page 23: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Closed Discovery – Aligned Relations

Page 24: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Evaluation• Estimate – based on [Masseroli, BMC Bioinformatics

2006]:• Extract known facts – baseline precision on 2,042

extracted relations:– Gene – Disease (causes, assoc_with, …) P=74.2%– Gene – Gene (inhibits, stimulates, …) P=41.95%

• Propose Argument-Predicate distance for filtering (Gene-Gene):– At distance no more than 1: P=70.75%; R=43.6%– At distance no more than 2: P=55.88%; R=66.28%

• We use Argument-Predicate distance for ranking of semantic relations and we show relations more likely to be correct first.

Page 25: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

25

Conclusion

• A new bioinformatics tool for interpretation and novel hypotheses generation

• Based on integration of semantic relations extracted from literature with microarrays

• Available at:

• http://sembt.mf.uni-lj.si

Page 26: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Syntactic Processing

Mbd1 can directly regulate the expression of Htr2c• MedPost tagger and shallow parser[ NP[head([… inputmatch(mdb1),tag(noun)])], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… inputmatch(htr2c),tag(noun)])] ]

26

Page 27: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)])], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358)])] ]

27

Page 28: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

28

Page 29: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

29

Page 30: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

• Apply indicator rule: Verb(regulate) INTERACTS_WITH

30

Page 31: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

• Apply indicator rule: Verb(regulate) INTERACTS_WITH

31

Page 32: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

• Apply indicator rule: Verb(regulate) INTERACTS_WITH

• Substitute concepts for semantic types:

32

Page 33: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

• Apply indicator rule: Verb(regulate) INTERACTS_WITH

• Substitute concepts for semantic types:

33

Page 34: Semantic Relations  for Interpreting DNA  Microarray Data  and  for Novel Hypotheses Generation

Semantic Processing

• Identify concepts: MetaMap and ABGene

[ NP[head([… semtype(gngm),entrez(MBD1,4152)], ...

[verb([inputmatch(regulate),lexmatch(regulate),tag(verb)])],...

NP[… head([… semtype(gngm),entrez(HTR2C,3358])] ]

• Match semantic type patterns to ontology:

<gngm> INTERACTS_WITH <gngm>

• Apply indicator rule: Verb(regulate) INTERACTS_WITH

• Substitute concepts for semantic types:

MBD1 INTERACTS_WITH HTR2C

34