ontology-based annotation & query of tma data

21
Ontology-based Annotation & Query of Ontology-based Annotation & Query of TMA data TMA data Nigam Shah Stanford Medical Informatics ([email protected])

Upload: semah

Post on 16-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Ontology-based Annotation & Query of TMA data. Nigam Shah Stanford Medical Informatics ([email protected]). Tissue Microarrays. www.nature.com/clinicalpractice/onc. Stanford tissue microarray database. http://tma.stanford.edu/tma_portal/. Key analysis issue. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology-based Annotation & Query of TMA data

Ontology-based Annotation & Query of Ontology-based Annotation & Query of TMA data TMA data

Nigam Shah

Stanford Medical Informatics([email protected])

Page 2: Ontology-based Annotation & Query of TMA data

Tissue MicroarraysTissue Microarrays

www.nature.com/clinicalpractice/onc

Page 3: Ontology-based Annotation & Query of TMA data

Stanford tissue microarray databaseStanford tissue microarray database

http://tma.stanford.edu/tma_portal/

Page 4: Ontology-based Annotation & Query of TMA data

Key analysis issueKey analysis issue

Tissue microarrays query a large number of samples/patients for one protein.

The key query dimension in TMA data is a tissue sample

Because of the lack of a commonly used ontology to describe the diagnosis [or

annotations] for a given TMA sample in TMAD it is not easy to perform such as query.

Page 5: Ontology-based Annotation & Query of TMA data

Ontologies consideredOntologies considered

The NCI Thesaurus, version 05.09g

The SNOMED-CT, from UMLS 2005 AA

Page 6: Ontology-based Annotation & Query of TMA data

Available annotations for a blockAvailable annotations for a block

Each donor block in the TMA has semi-structured text associated with it.

ID Organ Diagnosis Subclass 1 Subclass 2 Subclass 3 Subclass 4

2334 Ovary MMMT

3335 Prostate Carcinoma Adeno intraductal

7022 Bladder Carcinoma Transitional cell

In situ

7288 Testis teratoma immature Embryonal carcinoma

8060 Liver Carcinoma hepatocellular No vascular invasion

HepC cirrhosis

6662 Soft tissue Sarcoma Leiomyo epithelioid

6663 lung Sarcoma Leiomyo epithelioid

4713 stomach carcinoma unknown

Page 7: Ontology-based Annotation & Query of TMA data

Map text to ontology termsMap text to ontology terms

Make all possible permutations Rules to weed out bad permutations

Check for an exact match with NCI and SNOMED-CT terms (and/or synonyms) Rules to weed out bad matches

Prostate Carcinoma Adeno intraductal 24 permutations

Prostate Carcinoma Adeno intraductal:Carcinoma Prostate intraductal Adeno:Adeno Carcinoma intraductal Prostate:Prostate intraductal Adeno Carcinoma

Prostate_Ductal_Adenocarcinoma

Page 8: Ontology-based Annotation & Query of TMA data

Sample matches (from NCI-T)Sample matches (from NCI-T)

Organ Diagnosis Subclass 1 Subclass 2 Subclass 3 Ontology Terms

2334 Ovary MMMT Malignant_Mixed_Mesodermal_Mullerian_Tumor

3335 Prostate Carcinoma Adeno intraductal Prostate_Ductal_Adenocarcinoma

7022 Bladder Carcinoma Transitional cell

In situ Stage_0_Transitional_Cell_Carcinoma

Transitional_Cell_Carcinoma

Bladder_Carcinoma

Carcinoma_in_situ

7288 Testis teratoma immature Embryonal carcinoma

Immature|Teratoma

Testicular_Embryonal_Carcinoma

Immature_Teratoma

8060 Liver Carcinoma hepatocellular No vascular invasion

HepC cirrhosis

Hepatocellular_Carcinoma

6662 Soft tissue Sarcoma Leiomyo epithelioid Soft_Tissue_Sarcoma

Leiomyosarcoma

Epithelioid_Sarcoma

6663 lung Sarcoma Leiomyo epithelioid Lung_Sarcoma

Leiomyosarcoma

Epithelioid_Sarcoma

4713 stomach carcinoma unknown Gastric_carcinoma

Page 9: Ontology-based Annotation & Query of TMA data

Results and validationResults and validation

Mapped the term-sets for 8495 records, which correspond to 783 distinct term-sets. 577 term-sets (6614 records) matched to the NCI thesaurus 365 term-sets (3465 records) matched to SNOMED-CT

In total mapped 6871 records (80%) of annotated records in TMAD (641 distinct term-sets) to one or more ontology terms.

Validation NCI SNOMED-CT

Appropriate Inappropriate Appropriate Inappropriate

Set-1 41 9 41 9

Set-2 42 8 43 7

Set-3 46 4 38 12

Total 129 21 122 28

Average (%) 43.0 (86%) 7.0 (14%) 40.66 (81%) 9.33 (19%)

Page 10: Ontology-based Annotation & Query of TMA data

Browsing interfaceBrowsing interface

Page 11: Ontology-based Annotation & Query of TMA data

Parents & Siblings nodes with data (Burly wood)

Child nodes with data (Yellow)

Child nodes with no data (Grey)

Page 12: Ontology-based Annotation & Query of TMA data

Click on the “anchor” link to get dataClick on the “anchor” link to get data

Page 13: Ontology-based Annotation & Query of TMA data

2/17/2006 9/23/2068495 8518 Donor blocks to match6614 7162 Donor blocks with NCI match3465 6959 Donor blocks with SNOMEDCT match6871 7399 Donor blocks with any match3208 6722 Donor blocks with both match

Updates since FebruaryUpdates since February

2/17/2006 9/23/2006783 791 Distinct Terms577 610 Distinct Terms with NCI match365 610 Distinct Terms with SNOMEDCT match641 651 Distinct Terms with any match295 569 Distinct Terms with both match

0

100

200

300

400

500

600

700

800

900

Distinct Terms Distinct Terms w ithNCI match

Distinct Terms w ithSNOMEDCT match

Distinct Terms w ithany match

Distinct Terms w ithboth match

2/17/2006

9/23/2006

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Donor blocks tomatch

Donor blocks w ithNCI match

Donor blocks w ithSNOMEDCT match

Donor blocks w ithany match

Donor blocks w ithboth match

2/17/2006

9/23/206

Page 14: Ontology-based Annotation & Query of TMA data

How do ontology based annotation help?How do ontology based annotation help?

Better search: we can retrieve samples of all the retroperitoneal tumors or malignant uterine neoplasms for example.

Better Integration of data: we can correlate gene expression with protein expression across multiple tumor types.

Tissue microarray data from TMADGene expression data from GEO

Page 15: Ontology-based Annotation & Query of TMA data

Integrating mRNA and protein expressionIntegrating mRNA and protein expression

Proteins

Sam

ples

Genes Sam

ples

Page 16: Ontology-based Annotation & Query of TMA data

Partial alignment of NCI-T and SNOMED-CT as a “bonus”

Page 17: Ontology-based Annotation & Query of TMA data

Steps in AlignmentSteps in Alignment

Anchor identification Identify similar class

labels in the ontologies to be aligned

Usually done by string matching

Ontology structure Use the “similar”

classes as anchors and examine the local [graph] structure around them to inform the “similarity” metric

Root

Term-1 Term-2

Term-3 Term-4

Term-5

R

t1 t2

t4

t5 t6 t7

t3

Page 18: Ontology-based Annotation & Query of TMA data

We might improve alignment …We might improve alignment …

Root

Term-1 Term-2

Term-3 Term-4

Term-5

R

t1 t2

t4

t5 t6 t7

t3

Term-2 t1

Term-5 t5

Ontology [graph] structure based step

Provide Anchors from annotated data

S2

t5

Term-5

S2

t5

Term-5

Page 19: Ontology-based Annotation & Query of TMA data

Better Text-mapping Better Text-mapping Better Alignment Better Alignment

0

100

200

300

400

500

600

700

800

900

Distinct Terms Distinct Terms w ithNCI match

Distinct Terms w ithSNOMEDCT match

Distinct Terms w ithany match

Distinct Terms w ithboth match

2/17/2006

7/23/2006

2/17 7/23

783 791 Distinct Terms

577 620 Terms with NCI match

365 610 Terms with SNOMEDCT match

641 654 Terms with any match

295 576 Terms with both match

Page 20: Ontology-based Annotation & Query of TMA data

SummarySummary

Ability to map word-groups to ontology terms

Proteins

Sam

ple

s

Genes Sam

ples

Root

Term-1 Term-2

Term-3 Term-4

Term-5

R

t1 t2

t4

t5 t6 t7

t3

Term-2 t1

Term-5 t5

Ontology [graph] structure based step

Provide Anchors from annotated data

S2

t5

Term-5

S2

t5

Term-5

Page 21: Ontology-based Annotation & Query of TMA data

Credits and acknowledgementsCredits and acknowledgements

PathologyRobert MarinelliMatt van de Rijn

Medical InformaticsKaustubh SupekarDaniel RubinMark Musen

FundingNIH