![Page 1: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/1.jpg)
1
Data Integration and Extraction over Molecular Biological Data
Cui Tao
supported by NSF
![Page 2: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/2.jpg)
2
Motivation
Online biological data: Highly diverse in granularity and
variety Various formats Different terminologies, ID systems,
units
![Page 3: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/3.jpg)
3
How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames
![Page 4: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/4.jpg)
4
How to Build a Gene Extraction Ontology?
(G*A*U*C*)*
(G*A*T*C*)*
![Page 5: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/5.jpg)
5
Knowledge Sources Gene Ontology
Thousands of terms
All Species Toolkit 1,231,935 species names
Protein Databases Thousands of protein names
(Molecular Function, Biological Process, Cellular Component)
![Page 6: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/6.jpg)
6
Extraction Rules Statistical NLP Machine learning
Naïve Bayes Hidden Markov Models Decision Trees
![Page 7: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/7.jpg)
7
Integration
![Page 8: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/8.jpg)
8
![Page 9: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/9.jpg)
9
![Page 10: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/10.jpg)
10
![Page 11: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/11.jpg)
11
![Page 12: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/12.jpg)
12
![Page 13: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/13.jpg)
13
Integration Information Hidden behind Links
![Page 14: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/14.jpg)
14
![Page 15: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/15.jpg)
15
![Page 16: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/16.jpg)
16
![Page 17: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/17.jpg)
17
Query-based Extraction
Query the gene extraction ontology
Find applicable resources Fill out forms Extract information
![Page 18: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/18.jpg)
18
Query-based Extraction
Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”
Gene NameGene Sequence
Gene
Mutant
Protein FunctionMutant Function
![Page 19: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/19.jpg)
19
![Page 20: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/20.jpg)
20
![Page 21: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/21.jpg)
21
![Page 22: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/22.jpg)
22
![Page 23: Data Integration and Extraction over Molecular Biological Data](https://reader036.vdocuments.site/reader036/viewer/2022062516/56812bd6550346895d904233/html5/thumbnails/23.jpg)
23
Contribution Provides a way to automatically
integrate online biological data from different sources
Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query