a graph oriented database implemented in dex for protein structural information. 2 larriba-pey jl 1...
TRANSCRIPT
![Page 1: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/1.jpg)
A graph oriented database implemented in DEX for protein structural information.
2Larriba-Pey JL1Valdés-Jiménez A1Reyes JA2Dominguez-Sal D1Arenas-Salinas MA
1. Centro de Bioinformática y Simulación Molecular, Facultad de Ingeniería, Universidad de Talca.2. DAMA-UPC, Data Management Universitat Politecnica de Catalunya
![Page 2: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/2.jpg)
Introduction
• Proteins are essential macromolecules of great interest for the study of diseases and the development of novel drug-targets.
• The analysis and understanding of the tri-dimensional (3D) structure of these proteins could help us to comprehend the molecular mechanisms involved in living organisms.
• Examples of how the information stored in this graph-oriented database can be employed to make queries and find structural patterns among different proteins.
![Page 3: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/3.jpg)
Yearly growth of total structures*
[*] http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html
![Page 4: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/4.jpg)
PDB file format
![Page 5: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/5.jpg)
Schema implemented
![Page 6: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/6.jpg)
Steps to populate the database
![Page 7: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/7.jpg)
Statistics
• PDBs files processed: 74,208 (73GB)• Size DEX database: 106,730MB• Nodes: 481,888,415• Edges: 480,317,207 (without distance calculation)• Total: 962,205,622
• Data preparation: 4 days.• Data import: 7 days.
![Page 8: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/8.jpg)
Test Queries
• Show protein information.• Searching a zinc finger motif (class C2H2) given a
hetatm.• Searching subseq over all sequences of all proteins
(POSIX regular expression).• Searching atoms neighbors of a hetatom (by distance
in angstrom).• Calculate AFAL (Aminoacid Frequency Around Ligan)
of ZN.• Statistics of database.
![Page 9: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/9.jpg)
Example: Zinc Finger (C2H2 motif)'Zinc finger' domains are nucleic acid-binding protein structures. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown below:
Sequence alignment
His (H)
Cys (C)
![Page 10: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/10.jpg)
Result of search of Zinc Finger motif (C2H2)
CYSHYS
![Page 11: A graph oriented database implemented in DEX for protein structural information. 2 Larriba-Pey JL 1 Valdés-Jiménez A 1 Reyes JA 2 Dominguez-Sal D 1 Arenas-Salinas](https://reader034.vdocuments.site/reader034/viewer/2022042821/56649ce25503460f949ad2b5/html5/thumbnails/11.jpg)
Working ...
1. Search for structural patterns2. Integration with other biological databases3. Incorporation of new attributes4. Benchmark.