biosemantics semantic support technology for on-line knowledge tracking and discovery second order...

Post on 16-Jan-2016

217 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BIOSEMANTICSSemantic Support Technology

For on-line Knowledge Tracking and Discovery

Second Order Semantic Enrichment

CD40 ligand and tumor necrosis factor alpha, the cells acquire a mature phenotype of dendritic cells that is characterized by up-regulation of human leukocy

te antigen (CD80, CD86, CD40and CD54 and appearance of CD83. These

Too much to read: major trends foreseen:

• From Reading to Consulting

• From Reading to Meta Analysis

• From Writing to Knowledge Representations

• To Central AND Distributed Annotation

textminingWriting =ambiguity

Future (hope)

Papyrust

What do we do ?

• Disambiguate Text and tag/link concepts– Pre-done for ‘own’ content– On the fly for selected web environments

• Meta-analyse at concept level

• Provide meta-analysed information

• Support Information Based Knowledge Discovery (especially new associations)

Ambiguity 1: Synonyms

• Facilitating networks of information. van Mulligen EM, Diwersy M, Schmidt M, Buurman H, Mons BProceedings of AMIA Symposium 2000, 868-72

Ambiguity 2: Homonyms

PSAProstate Specific AntigenPSoriatic Arthritisalpha-2,8-PolySialic AcidPolySubstance AbusePicryl Sulfonic AcidPolymeric Silicic AcidPartial Sensory AgnosiaPoultry Science Association

• Distribution of information in biomedical abstracts and full-text publications, Schuemie MJ, Weeber M, Schijvenaars BJ, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA, Bioinformatics 2004 Nov 1, 20:2597-604

But…we have nomenclature committees now……

DEFB4DEFB4defensin, beta 4defensin, beta 4SAP1SAP1, HBD-2, DEFB-2, DEFB102, DEFB2, HBD-2, DEFB-2, DEFB102, DEFB2

ELK4ELK4ELK4, ETS-domain protein (SRF accessory protein 1)ELK4, ETS-domain protein (SRF accessory protein 1) SAP1SAP1

PSAPPSAPproposin (variant Gaucher disease and variant proposin (variant Gaucher disease and variant metachromatic leukodystrophy)metachromatic leukodystrophy)SAP1SAP1, GLBA, GLBA

PRESENT

• Contextual annotation of web pages for interactive browsing, van Mulligen E, Diwersy M, Schijvenaars B, Weeber M, van der Eijk CC, Jelier R, Schuemie M, Kors J, Mons B, Medinfo 2004, 11:94-8• Which gene did you mean?, Mons B, BMC Bioinformatics 2005 Jun 7, 6:142

First order semantic enrichment

The Knowlet

2nd order S.E.

Second Order Semantic Enrichment1: Creating Reference Knowlets

PSA Prostate Specific Antigen

PSA Psoriatic Arthritis

ReferenceKnowlet

ReferenceKnowlet

2. Context matching

PSA ??

Prostate Specific Antigen

Psoriatic Arthritis

ReferenceKnowlet

ReferenceKnowlet

New text

93 % correct in ‘Worst Case Scenario’98 % overall….

• Thesaurus-based disambiguation of gene symbols. Schijvenaars BJ, Mons B, Weeber M, Schuemie MJ, van Mulligen EM, Wain HM, Kors JABMC Bioinformatics 2005 Jun 16, 6:149•Word sense disambiguation in the biomedical domain: an overview. Schuemie MJ, Kors JA, Mons B, Journal of Computational Biology 2005 Jun, 12:554-65

x

Text (free or structured)

Resolving ambiguities (contextual reference concepts)

Concept Tagging and inserting appropriate links

Basic methodology:Concept Tagging, creation and systematic aggregation of Knowlets

Text Knowlet

Object Knowlets (people, diseases, drugs, genes)

Collection Knowlets ( category, pathway, Micro-array-gene-set )

Aggregation of Object Knowlets

Aggregation of Text Knowlets

person organisation Object 1

gene

Object 2

disease

Object 3

drug

> 15 million Knowlets from PubMed etc.

3. Building an association matrix of large data sources

 

1

0.16 1

0.30 0.03 1

0.28 0.35 0.20 1

0.188

0.004 0.15 0.13 1

A matrix of associative distances

meta-analysis

HierarchicalClusteringACSMDSEtc.

4. Meta-analysis method 1: ACS

• Constructing an Associative Concept Space for Literature-based Discovery, van der Eijk CC, van Mulligen EM, Kors JA, Mons B, van den Berg JJournal of the American Society for Information Science and Technology 2004, 55(5): 436-444•Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Jelier R, Jenster G, Dorssers LC, van der Eijk CC, •van Mulligen EM, Mons B, Kors JA Bioinformatics 2005 May 1, 21:2049-58

Function unknow nChaperonesChromatin structureFibrous proteinsmRNA metabolismOthersRibosomal proteinsRibosome biogenesisTranslation

SRPPARN

l

• Assignment of protein function and discovery of new nucleolar proteins based on automatic analysis of MEDLINE. Martijn Schuemie, Christine Chichester, Frederique Lisaceck, Yohann Coute, Peter-Jan Roes, Jean Charles Sanchez, Barend MonsTo be Submitted

Fingerprints

Knowlet 

Association Matrix Meta-analysis

Expert Challenge

WikiZ/PExpert comments

Peer to Peer Review

Final Approval

U.W. Fingerprint

Update

Literature

Protein A

0.1

0.4

0.9

New publications or annotations

Solid

Liquid

Gas

1st order Semantic enrichment

ReductionFalse Positives

DiscussionVoting in Wiki

Meta-analysisProximity measures

Proposals to Data bases ?

Central Annotation

• REGISTRATION (1X)

• Unique Author ID• E-mail Adress• PHP/userpage• People Knowlets

• Unique concept ID• Language variants• Homonyms• Definitions (brief)• Object Knowlets

Science Wiki’s• UID from WiktionaryZ• Research information• Talk-page• Liquid Threads• Object Knowlets

• UID from WiktionaryZ• Articles about UID’s• Encyclopaedic/ NPOV• Anonymous allowed

Dr. Johan den Dunnen

Wiki-Authors

OMIM

NPOV

DMD (Hs)

MEI

Wiki-Proteins

DMD (Hs)

AOI

Freely provided by

• Original Source • WZ definition (choice)• Related concepts (Knowlet)• Experts (Wiki-Authors)• More about (Wiki-proteins)• Vote (Wiki-Proteins)• Wikipedia article• Browse (Knowlet Browser)• Google (e-vamp)

top related