bioinformatički centri i baze podataka - university of...

32
1 13 April 2012 [email protected] Bioinformatički centri i baze podataka Željko Jeričević, Ph.D. www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html [email protected] 13 April 2012 [email protected] 2 What Is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. …” from NCBI web site

Upload: duongxuyen

Post on 06-Feb-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

1

13 April 2012 [email protected]

Bioinformatički centri i baze podataka

Željko Jeričević, Ph.D.

www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

[email protected]

13 April 2012 [email protected] 2

What Is Bioinformatics?“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. …”from NCBI web site

2

13 April 2012 [email protected] 3

Što je bioinformatika?“Bioinformatika je multidisciplinarna znanostu kojoj se biologija, računalna znanost i informacijska tehnologija sjedinjuju u jednudisciplinu. Konačni cilj je omogućiti otkrićanovih bioloških uvida i stvoriti globalnuperspektivu iz koje se mogu razaznati biološkiprincipi objedinjena (unifying principles).”

(prevod definicije s NCBI web stranice)

13 April 2012 [email protected] 4

Ukratko1. Zašto i što su baze podataka

2. Navigacija u lavini medicinskih i bioloških informacija na internetu

3. O dostupnosti i obradi informacija

3

13 April 2012 [email protected] 5

Zašto biološke baze podataka• Razvoj molekularne biologije i genomskih

tehnologija doveo je do eksplozivnog rastabioloških podataka

• Poplava bioloških podataka nužno zahtjevaračunarske baze podataka i metode zapohranu, pronalaženje, organizaciju, obradu i vizualizaciju podataka

6

What Is a Biological Database?“A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence.”from NCBI web site

4

13 April 2012 [email protected] 7

What Is a Biological Database?“For researchers to benefit from the data stored

in a database, two additional requirements must be met:

• easy access to the information• a method for extracting only that information

needed to answer a specific biological question…”

from NCBI web site

13 April 2012 [email protected] 8

1. Što su baze podataka

Organizirani skup povezanih podataka u računalu koji se mogu pretraživati

• Plošne baze podataka

• Relacijske baze podataka

• Objektno orijentirane baze podataka

5

13 April 2012 [email protected] 9

Relacijske baze podataka

• Komercijalni DBMS: $$$

Oracle, DB2, SQL Server, FileMaker

• Slobodni DBMS: PiO

PostgreSQL, MySQL

13 April 2012 [email protected] 10

Relacijske baze podataka

• Edgar F. Codd

• Tablice

• Preobilje(Redundancy)

• Ključevi

• Codd-ovih12 pravila

6

13 April 2012 [email protected] 11

Primjer knjižnice

Carey

Jakovljević, …

Mintas, …

Autor telefonKorisnikNaslov#

651-156Jeričević, Ž.

Organic Chemistry13

651-192Jeričević, Ž.

Benzodiazepini12

651-192Jeričević, Ž.

Načela dizajniranjalijekova

11

Gornja tabela posudbi nije relacijska baza podataka(preobilje i konfliktne informacije)

Primjer knjižnice

Organic Chemistry

3913

Benzodiazepini34,35

12

Načeladizajniranjalijekova

31,32,33

11

NaslovA#B# telefonKorisnikK#

651-156Jeričević, Željko123

Lacković, Zdravko35

Raić-Malić, Silvana32

Raos, Nenad33

Carey, Francis A.39

Jakovljević, Miro34

Mintas, Mladen31

AutorA#

Intersection table =>

Prikazane tabele mogu biti

elementi relacijske BP

1231323

1231222

1231121

K#B#P#

7

13 April 2012 [email protected] 13

Ukratko1. Zašto i što su baze podataka

2. Navigacija u lavini medicinskih i bioloških informacija na internetu

3. O dostupnosti i obradi informacija

13 April 2012 [email protected] 14

2. NavigacijaPočetne točke

• Nucleic Acids Research DB &WS Issues

• NCBI– Bookshelf, PubMed, PubChem, …

• EBI

• KEGG

8

13 April 2012 [email protected] 15

NAR DB Issue

Ovogodišnji broj:Volume 40,

Database Issue

January 1, 2012

http://nar.oxfordjournals.org/

13 April 2012 [email protected] 16

NAR DB Issue 2012 editorial“The 19th annual Database Issue of Nucleic Acids Researchfeatures descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NARand other journals. … The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).

9

13 April 2012 [email protected] 17

NAR Web Server Issue

Zadnji broj:

Volume 39 suppl 2

July 1, 2011

http://nar.oxfordjournals.org

13 April 2012 [email protected] 18

NAR WS Issue 2011 editorial“The 2011 Web Server Issue of Nucleic Acids Research is the ninth in a series of annual special issues dedicated to web-based software resources for analysis and visualization of molecular biology data. It is freely available online under NAR's open access policy. The present issue reports on 92 web servers. “

From editorial

10

13 April 2012 [email protected] 19

NCBI

http://www.ncbi.nlm.nih.gov/

13 April 2012 [email protected] 20

11

13 April 2012 [email protected] 21

EBI

http://www.ebi.ac.uk

13 April 2012 [email protected] 22

EBI servisi

12

13 April 2012 [email protected] 23

KEGG

http://www.genome.jp/kegg/

13 April 2012 [email protected] 24

KEGG

13

13 April 2012 25

O NCBIDobra informativnastranica ako većniste upoznati s NCBI

13 April 2012 [email protected] 26

14

13 April 2012 [email protected] 27

Zašto NCBI EntrezSadašnji “za profit” sistemi publiciranja i

pristupa informacijama su nedovoljniza moderni tempo stvaranja biološkihpodataka i translaciju (bench to bed) u medicini.

NIH/NLM je razvio sistem slobodnogpristupa (Open Access) za širenjeinformacija preko interneta.

13 April 2012 [email protected] 28

NCBI Bookshelf

15

13 April 2012 [email protected] 29

NCBI Pocket BookshelfHandheld computer versions ready for downloading:•Blood Groups and Red Cell Antigens•Clinical Methods•Genes and Disease•Health Services/Technology Assessment Text (HSTAT)•Inflammatory Atherosclerosis: Characteristics of the

Injurious Agent•Medical Microbiology

AHRQ is Agency for Healthcare Research and Quality

13 April 2012 [email protected] 30

NCBI PubMed

PubMed: citiranje & sažetci za preko 11 milijuna publikacija

PubMed Central: cijelokupni tekst za ~2 milijuna publikacija

Slobodan pristup (Open Access)

16

13 April 2012 [email protected] 31

Programiranje potrebno za osnovnopretraživanje literature:

Logički operatori AND, OR, NOT

101

000

10AND

111

100

10OR

0 = FALSE 1 = TRUE

NOT 1 = 0NOT 0 = 1

13 April 2012 [email protected] 32

MeSH (Medical Subject Heading)

U.S. National Library of Medicine MeSH je pomoćna baza podataka koja sadržikontrolirani rječnik termina upotrebljenih zaindeksiranje publikacija u MEDLINE/PubMed. MeSH terminologijaomogućuje pronalaženje izvora informacijakoji koriste različita imena za isti koncept.

17

33

MeSH (Medical Subject Heading)Primjer: Ptičja groznica → bird AND flu•Influenza in Bird•Avian Flu•Flu, Avian•Avian Influenza•Fowl Plague•Plague, Fowl•Influenza, Avian•Avian Influenzas•Influenzas, Avian

[email protected] 34

MeSH (Medical Subject Heading)

Primjer: Ptičja groznica → bird AND flu•Influenza in Bird•Avian Flu•Flu, Avian•Avian Influenza•Fowl Plague•Plague, Fowl•Influenza, Avian•Avian Influenzas•Influenzas, Avian

Drugs:oseltamivir (Tamiflu) and zanamivir (Relenza)

18

13 April 2012 [email protected] 35

NCBI PubMeddokumentacija

13 April 2012 [email protected] 36

Automatsko pretraživanje literatureupotrebom Entrez programskih modula

Entrezprogramski moduli suPerl naredbenedatoteke

E-Utilities

19

13 April 2012 [email protected] 37

NCBI PubChem

Podaci o preko 10 milijuna kemikalija

13 April 2012 [email protected] 38

tamiflu

20

13 April 2012 [email protected] 39

tamiflu78000

13 April 2012 [email protected] 40

tamiflu65028

21

13 April 2012 [email protected] 41

tamiflu449381

related

13 April 2012 [email protected] 42

Strukture slične 449381

22

13 April 2012 [email protected] 43

Ukratko1. Zašto i što su baze podataka

2. Navigacija u lavini medicinskih i bioloških informacija na internetu

3. O dostupnosti i obradi informacija

13 April 2012 [email protected] 44

4. O dostupnosti i obradi informacija

• Može li informacija biti vlasništvo?• Što možemo naučiti iz povijesti

(usporedba kemije i biologije)?• Može li se poslovati s profitom bez

prava vlasništva nad informacijama?

23

45

The Federal Research Public Access Act of 2006 (Cornyn-Lieberman)

>>the policy would require that agencies with research budgets of more than $100 million enact policy to ensure that articles generated through research funded by that agency are made available online within 6 months of publication.<<(from article by Robin Peek, Newsbreaks, October 31, 2006)

46

The Federal Research Public Access Act of 2006

>>“Public access to research expands shared knowledge across scientific fields and is the best path for accelerating multi-disciplinary breakthroughs in research,” said Richard J. Roberts, a Nobel Prize laureate and research director at New England Biolabs. “As a scientist and a taxpayer, I support this bill because it lifts barriers that hinder, delay, or block the spread of scientific knowledge supported by federal tax dollars.”<< (from article by Robin Peek, Newsbreaks,

October 31, 2006)

24

47

The Federal Research Public Access Act of 2006

>>In an article in The Washington Post, Patricia S. Schroeder, president and chief executive of the Association of American Publishers, promised a fight (not surprisingly). “It is frustrating that we can’t seem to get across to people how expensive it is to do the peer review, edit these articles, and put them into a form everyone can understand”Schroeder said.<<

(from article by Robin Peek, Newsbreaks, October 31, 2006)

13 April 2012 [email protected] 48

Open AccessWelcome to the Directory of Open Access Journals.

This service covers free, full text, quality controlled scientific and scholarly journals. We aim to cover all subjects and languages. There are now 3881 journals in the directory. Currently 5999 journals are searchable at article level. As of today 259860 articles are included in the DOAJ service.

http://www.doaj.org/

25

13 April 2012 [email protected] 49

Ultimate Open Source & Open Access Application: Wikipedia

“Since Wikipedia was launched online in 2001 as "the free encyclopedia that anyone can edit," it has blossomed to more than a billion words spread over 10 million articles in 250 languages, including 2.5 million articles in English, according to Wikipedia cofounder Wales”

http://www.wikipedia.org/

13 April 2012 [email protected] 50

Otvoreni kod (Open Source)• OpenCola

• Medicina: TDI (Tropical Diseases Initiative)

• MIT OpenCourseWare (MIT OCW) 2007

• Connexion Project @ Rice University

• Projekt Gutemberg

• Otvoreni Dokument (ODF)

• OpenOffice

• Linux, slobodni operacijski sustav

26

13 April 2012 51

Roditelji i potomstvo

Richard M. Stallman Linus B. Torvalds

52

Računarske kompetencijepotrebne za bioinformatiku

“Developing Bioinformatics Computer Skills”, C. Gibas& P. Jambeck, O’Reilly, Sebastopol, CA, 2001, pp 427

I IntroductionII The Bioinformatics Workstation

3. Setting Up Your Workstation4. Files and Directories in Unix5. Working on a Unix System

III Tools for BioinformaticsIV Databases and Visualization

27

13 April 2012 [email protected] 53

Otvoreni kod(Open Source)Eric S. Raymond

“The Cathedral and the Bazaar”

“The Art of Unix Programming”

Larry Wall

PERL (Practical Extraction and Report Language)

13 April 2012 [email protected] 54

Elementi slobodnog toka informacijaSlobodan protok bioloških informacija

uvelike je zasluga ljudi koji s biologijomnemaju (gotovo) ništa.

Značaj informatike (Computer science) zarazvoj moderne biologije je gotovonemoguće precijeniti.

28

55

Usporedba keminformatike i bioinformatike

• Zbog povijesnih i ekonomskih razloga, pristupkemijskim informacijama je bitno različit odpristupa biološkim informacijama

• Opseg i fokus se razlikuju

• Problemi na kojima kemičari i biolozi rade surazličiti

• Metode i pomagala nisu isti, iako postojiprekrivanje

13 April 2012 [email protected] 56

Opseg i fokus informacija• Kemijske informacije

• Problemi na molekularnom i atomskom nivou

• Mehanizam uključuje razumijevanje svih atoma i njihove elektronske strukture u molekuli

• Biološke informacije• Problemi na molekularnom, staničnom i višim nivoima

• Mehanizam uključuje razumijevanje uloge svihmolekula i eventualno njihove regulacije, vrste stanica i tkiva

29

13 April 2012 [email protected] 57

Kemijske i biološke informacije• Pristup kemijskim informacijama ima dužu tradiciju

- Uglavnom komercijaliziran

- Chemical Abstract ima dugu tradiciju (od 1907)

- Metode obrade kemijskih informacija su razvijene u

pre-računalnom vremenu i postupno programirane

• Pristup biološkim informacijama je napredniji- Uglavnom slobodan

- Informacija se distribuira preko www

- Metode za obradu velikih količina podataka su

relativno nove (HGP), masovno i brzo programirane.

13 April 2012 [email protected] 58

Primjer kemijskog i biološkog časopisa

• Journal of Chemical Information and Modeling• 1961 Journal of Chemical Documentation

• 1975 Journal of Chemical Information and Computer Sciences

• 2005 Journal of Chemical Information and Modeling

• Bioinformatics• 1985 Computer Applications in the Biosciences (CABIOS)

• 1999 Bioinformatics

30

13 April 2012 [email protected] 59

Završna riječ

13 April 2012 60

Budućnost bioinformatike ?Izgradnja informacijske infrastrukture

Slobodni pristup (Open Access) Otvoreni kod (Open Source)

UNIX operacijski sustav

31

13 April 2012 61

Budućnost bioinformatike ?Izgradnja informacijske infrastrukture

Slobodni pristup (Open Access) Otvoreni kod (Open Source)

UNIX operacijski sustav

13 April 2012 [email protected] 62

Vježbe

• Visit NAR web site http://nar.oxfordjournals.org/

• Visit the web sites for NCBI, EBI & KEGG

• Visit the http://www.ncbi.nlm.nih.gov/About/

• Visit NCBI Bookshelf

• Visit PubChem

• Visit PubMed

• Visit MIT OCW site at http://ocw.mit.edu/index.htm

32

13 April 2012 [email protected] 63

Hvala na pažnji