![Page 1: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/1.jpg)
Biosemantics group
Martijn Schuemie
![Page 2: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/2.jpg)
Overview
The biosemantics group
Ontology assembly
Concept tagging
Homonym disambiguation
Concept profile creation
Nucleolus
![Page 3: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/3.jpg)
Biosemantics group
ErasmusMC University Medical Center Rotterdam
Department of Medical Informatics
Biosemantics group
Jan Kors
Barend Mons
Erik van Mulligen
Martijn Schuemie
Rob Jelier
Kristina Hettne
Antoinne van Veldhoven
![Page 4: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/4.jpg)
Biosemantics group
Biosemantics
Molecular Biology
High througput experiment data (genomics and proteomics)
Gene and protein databases, MEDLINE, Gene Ontology
Biosemantics
Concept-based text-mining
Interpretation of experiment data
Knowledge discovery
![Page 5: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/5.jpg)
Ontology assembly
Entrez Gene Swiss-Prot HUGO
Combination
Add spelling variationsABC1 -> ABC-1DEF3 -> DEF-III
Remove highly ambiguous terms
CO2, membrane-boundobesity, open reading frame
P=37%, R=76%
P=50%, R=75%
![Page 6: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/6.jpg)
Concept tagging
MEDLINE text Malaria fever is a disease. It is spread by mosquitos.
Sentence splitting [Malaria fever is a disease.] [It is spread by mosquitos.]
Tokenization [Malaria] [fever] [is] [a] [disease]
Word normalisation [malaria] [fever] [be] [a] [disease]
Concept mapping [malaria fever] C24530 [disease] C12634
Homonym disambiguationPSA -> Prostate Specific Antigen or Poultry Science Association?
Concept profile of text
![Page 7: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/7.jpg)
Homonym disambiguation
Some simple rules:• Is it likely that a term has multiple meanings?
- 3-letter-acronym (e.g. PSA): highly likely- long forms (e.g. Prostate Specific Antigen): highly unlikely- terms that refer to several concepts by definition
• Is a synonym found? (e.g. “KLK3 (PSA)”)
• Is a keyword found? (e.g. “PSA is secreted by the prostate”)
These simple rules change performance from P=50%, R=75% to P=71%, R=71%.
![Page 8: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/8.jpg)
Homonym disambiguation
Concept profile of text containing PSA
Concept profile of Prostate Specific Antigen
Concept profile of Phosphoserine Aminotransferase
Unknown meaning
Similarity?
Previous tests showed an overall accuracy of 93%
![Page 9: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/9.jpg)
Concept profile creation
Concept profile of textConcept profile of textConcept profile of text Concept profile of concept
TextTextText Concept
- From databases- By concept mapping
![Page 10: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/10.jpg)
Concept profile creation
Binary
Log likelihood
X IDF
Uncertainty cf.
![Page 11: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/11.jpg)
Concept profile creation
Profile of gene ESR1:
estrogen receptor 1
breast neoplasm 0.5
BRCA1 0.34
PGR 0.30
Estrogen 0.28
BRCA2 0.25
TP53 0.15
gene suppressor tumor 0.12
genetics polymorphism 0.12
genetic predisposition to disease 0.10
female 0.05
![Page 12: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/12.jpg)
Concept profile comparison
![Page 13: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/13.jpg)
Concept profile comparison
Concept Name Weight RAB27B MYRIP MLPH RAB27A
RAB27A 52.17 0.61 0.74 0.73 1
MLPH 11.16 - 0.44 1 0.29
Myosin Type V 7.22 0.04 0.68 0.4 0.22
Melanosomes 6.7 0.12 0.3 0.47 0.27
RAB27B 4.06 1 0.14 - 0.11
MYRIP 2.98 0.07 1 0.09 0.06
Melanocytes 2.73 0.13 0.14 0.28 0.17
Myosins 2.33 0.04 0.38 0.22 0.12
Myosin Heavy Chains 1.72 - 0.46 0.18 0.09
GTP Phosphohydrolases 1.31 0.17 0.23 0.04 0.08
Actins 1.17 0.05 0.32 0.12 0.06
Exocytosis 0.87 0.08 0.12 0.08 0.12
Secretory Vesicles 0.68 0.07 0.16 0.06 0.09
Carrier Proteins 0.59 - 0.11 0.17 0.09
Organelles 0.54 0.11 - 0.12 0.09
rab GTP-Binding Proteins 0.52 0.16 - 0.04 0.12
![Page 14: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/14.jpg)
Nucleolus
• main function: ribosome biogenesis
• over 700 proteins identified and classified into 8 main categories
![Page 15: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/15.jpg)
MEDLINE article
Nucleolus – Concept profiles
Concept profile of textConcept profile of textConcept profile of text Concept profile of protein
Protein- From databases
MEDLINE articleMEDLINE article
![Page 16: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/16.jpg)
Nucleolus – Concept profiles
BLAST (Basic Local Alignment Search Tool)
Query: nucleolar protein
Results: homologs in• human• mouse• fruitfly• yeast
![Page 17: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/17.jpg)
Nucleolus – Concept profiles
Minimum Maximum Mean
Human 0 9 1.66
Mouse 0 10 1.37
Fruitfly 0 5 0.7
Yeast 0 8 1.21
Articles 1 1046 91.31
Homologs used
Articles used
![Page 18: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/18.jpg)
Nucleolus – fun with protein profiles
• 2D visualization of high-dimensional space
• Automatic functional annotation of proteins
• Finding similar proteins
![Page 19: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/19.jpg)
Nucleolus - visualisationFunction unknow nChaperonesChromatin structureFibrous proteinsmRNA metabolismOthersRibosomal proteinsRibosome biogenesisTranslation
SRPPARN
Exosome comp. 10
O43390P98179
Q8N220Multi-Dimensional Scaling
![Page 20: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/20.jpg)
Nucleolus – Assigning GO terms
MEDLINE article
Concept profile of textConcept profile of textConcept profile of text Concept profile of GO term
GO term- From GO
MEDLINE articleMEDLINE article
![Page 21: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/21.jpg)
Nucleolus – Assigning GO terms
AuC : Area under Curve
Category AuC pChaperones 1.00 <.001Chromatin Structure 0.98 <.001Fibrous proteins 0.97 <.001mRNA metabolism 0.72 <.001Others 0.81 <.001Ribosomal proteins 0.97 <.001Ribosome biogenesis 0.69 <.001Translation 0.88 <.001
![Page 22: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/22.jpg)
Nucleolus – Assigning GO terms
1. Manual assignment to one category only
e.g. SFRS protein kinase 1 plays a role in splicing,but is also in kinase
2. Assumptions do not always hold• Sequence homology ≠ function homology• Concept co-occurrence ≠ functional relationship
3. Homonyms
‘Mistakes’ in automatic annotation
![Page 23: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/23.jpg)
Nucleolus – Finding new proteins
Concept profile ofnucleolar protein
Concept profile ofhuman protein
Concept profile ofhuman protein
Concept profile ofhuman protein
![Page 24: Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept tagging Homonym disambiguation Concept profile](https://reader034.vdocuments.site/reader034/viewer/2022051316/56649ec15503460f94bcc7bc/html5/thumbnails/24.jpg)
Nucleolus – Finding new proteins
60S ribosomal protein L3-likeProbable ATP-dependent RNA helicase DDX4ATP-dependent RNA helicase DDX3Y Guanine nucleotide binding protein-like 3 Importin-11 (importin beta family)Putative Brix domain containing protein 1PProbable ATP-dependent RNA helicase DDX20 (Gemin 3)60S acidic ribosomal protein P0Helicase SKI2WATP-dependent RNA helicase DDX3940S ribosomal protein S20Probable ATP-dependent RNA helicase DDX6Probable ATP-dependent RNA helicase DDX23 Double-stranded RNA-binding protein Staufen homolog 1ATP-dependent RNA helicase DDX25Probable nucleolar complex protein 14Eukaryotic initiation factor 4A-IIATP-dependent RNA helicase DDX19B40S ribosomal protein S3
Ribosomal proteinDEAD-boxDEAD-boxFound in nucleolusAssociated with nucleolar p.DEAD-boxDEAD-boxDEAD-boxFound in nucleolusDEAD-boxRibosomal proteinDEAD-boxDEAD-boxIndirect evidence DEAD-boxNucleolarDEAD-boxDEAD-boxRibosomal protein