bio-medical informatics

90
Bio-Medical Informatics Instructor : Hanif Yaghoobi Website: site444703.44.webydo.com E-mail : [email protected] My personal Mail: [email protected]

Upload: jenny

Post on 05-Jan-2016

63 views

Category:

Documents


3 download

DESCRIPTION

Bio-Medical Informatics. Instructor : Hanif Y a ghoobi Website : site444703.44.webydo.com E-mail : [email protected] My personal Mail: [email protected]. About this Course. Activities during the semester 5 score : 1)Home Works 2) MATLAB exercises Your Final Projects 3 score - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bio-Medical Informatics

Bio-Medical Informatics

Instructor : Hanif YaghoobiWebsite: site444703.44.webydo.com

E-mail : [email protected] personal Mail: [email protected]

Page 2: Bio-Medical Informatics

About this Course

• Activities during the semester 5 score:1)Home Works2) MATLAB exercises• Your Final Projects 3 score• Final Exam 12 score

Page 3: Bio-Medical Informatics
Page 4: Bio-Medical Informatics

Shortliffe

“ Medical informatics is the rapidly developing scientific field that deals with resources, devices and formalized methods for optimizing the storage, retrieval and management of biomedical information for problem solving and decision making”

Edward Shortliffe, MD, PhD

1995

Page 5: Bio-Medical Informatics
Page 6: Bio-Medical Informatics
Page 7: Bio-Medical Informatics
Page 8: Bio-Medical Informatics
Page 9: Bio-Medical Informatics
Page 10: Bio-Medical Informatics
Page 11: Bio-Medical Informatics
Page 12: Bio-Medical Informatics
Page 13: Bio-Medical Informatics
Page 14: Bio-Medical Informatics
Page 15: Bio-Medical Informatics

Organisms

• Classified into two types:

• Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi,…)

• Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria)

• Not all single celled organisms are prokaryotes!

15

Page 16: Bio-Medical Informatics

Cells

• Complex system enclosed in a membrane

• Organisms are unicellular (bacteria, baker’s yeast) or multicellular

• Humans:– 60 trillion cells – 320 cell types

16

Example Animal Cellwww.ebi.ac.uk/microarray/ biology_intro.htm

Page 17: Bio-Medical Informatics

DNA Basics – cont.

• DNA in Eukaryotes is organized in chromosomes.

17

Page 18: Bio-Medical Informatics

Chromosomes

• In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes

• Humans: – 22 Pairs of autosomes– 1 pair sex chromosomes

18

Human Karyotype http://avery.rutgers.edu/WSSP/StudentScholars/

Session8/Session8.html

Page 19: Bio-Medical Informatics

19www.biotec.or.th/Genome/whatGenome.html

Page 20: Bio-Medical Informatics

What is DNA?

• DNA: Deoxyribonucleic Acid

• Single stranded molecule (oligomer, polynucleotide) chain of nucleotides

• 4 different nucleotides:– Adenosine (A)– Cytosine (C)– Guanine (G)– Thymine (T)

20

Page 21: Bio-Medical Informatics

Nucleotide Bases

• Purines (A and G)• Pyrimidines (C and T)• Difference is in base structure

21

Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm

Page 22: Bio-Medical Informatics

DNA

22

Page 23: Bio-Medical Informatics

23

Page 24: Bio-Medical Informatics
Page 25: Bio-Medical Informatics

The Central DogmaProtein Synthesis

Cell Function

Genome Transcriptome Proteome

Transcription Translation

Gene Expression

Level

Page 26: Bio-Medical Informatics
Page 27: Bio-Medical Informatics
Page 28: Bio-Medical Informatics

Genome

• chromosomal DNA of an organism

• number of chromosomes and genome size varies quite significantly from one organism to another

• Genome size and number of genes does not necessarily determine organism complexity

28

Page 29: Bio-Medical Informatics

Genome Comparison

29

ORGANISM CHROMOSOMES GENOME SIZE GENES

Homo sapiens (Humans)

23 3,200,000,000 ~ 30,000

Mus musculus(Mouse)

20 , 2600,000,000 ~30,000

Drosophila melanogaster

(Fruit Fly)

4 180,000,000 ~18,000

Saccharomyces cerevisiae (Yeast)

16 14,000,000 ~6,000

Zea mays (Corn) 10 2,400,000,000 ???

Page 30: Bio-Medical Informatics

30

Page 31: Bio-Medical Informatics

DNA Basics – cont.

• The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca…)

31

Page 32: Bio-Medical Informatics

DNA Basics – cont.

• In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa).

32

Page 33: Bio-Medical Informatics

DNA Basics – cont.

• There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions).

33

…CATTGCCAGT…

Page 34: Bio-Medical Informatics

DNA Basics – Cont.

34

…CATTGCCAGT…

Start: ATG

Stop: TAA, TGA, TAG

gene

Exon ExonExon IntronIntron Exon

Page 35: Bio-Medical Informatics

Understanding Genome Sequences~3,289,000,000 characters:

aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaattaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggagggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaacaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca

. . . 35

Goal: Identify components encoded in the DNA sequence

Page 36: Bio-Medical Informatics

Open Reading Frame

• Protein-encoding DNA sequence consists of a sequence of 3 letter codons

• Starts with the START codon (ATG)• Ends with a STOP codon (TAA, TAG, or TGA)

36

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA

M L S V T S . . . Q R STP

Page 37: Bio-Medical Informatics

Finding Open Reading Frames

Try all possible starting points• 3 possible offsets• 2 possible strands

Simple algorithm finds all ORFs in a genome• Many of these are spurious (are not real genes)• How do we focus on the real ones?

37

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA

M L S V T S . . . Q R STP

Page 38: Bio-Medical Informatics

Using Additional Genomes

Basic premise“What is important is conserved”

Evolution = Variation + Selection– Variation is random– Selection reflects function

Idea: • Instead of studying a single genome, compare related

genomes• A real open reading frame will be conserved

38

Page 39: Bio-Medical Informatics

Phylogentic Tree of Yeasts

39Kellis et al, Nature 2003

S. cerevisiae

S. paradoxus

S. mikataeS. bayanus

C. glabrata

S. castellii

K. lactis

A. gossypii

K. waltii

D. hansenii

C. albicans

Y. lipolytica

N. crassa

M. graminearum

M. grisea

A. nidulans

S. pombe

~10M years

Page 40: Bio-Medical Informatics

Evolution of Open Reading Frame

40

ATGCTCAGCGTGACCTCA . . . ATGCTCAGCGTGACATCA . . . ATGCTCAGGGTGACA--A . . . ATGCTCAGG---ACA--A . . .

S. cerevisiaeS. paradoxusS. mikataeS. bayanus

Conservedpositions

Variablepositions

A deletion

Frame shiftchanges interpretationof downstream seq

Page 41: Bio-Medical Informatics

ExamplesSpurious ORF

41

Frame shift

[Kellis et al, Nature 2003]

Sequencingerror

Confirmed ORF

ConservedVariable

ATG notconserved

Greedy algorithm to find conserved ORFs surprisingly effective (> 99% accuracy) on verified yeast data

Page 42: Bio-Medical Informatics

Defining ConservationNaïve approach• Consensus between all

speciesProblem: • Rough grained• Ignores distances between species• Ignores the tree topology

Goal:• More sensitive and

robust methods42

AAAA

AA

AA

A

AAAA

CC

CC

C

ACAG

TC

GG

T

CCCA

CA

AA

C

Conserved

Variable

100% conserv 33 5555

Page 43: Bio-Medical Informatics

Bioinformatics – an area of emerging knowledge

• Each cell of the body contains the whole DNA of the individual (about 40,000 genes in the human genome, each of them comprising from 50 to a mln base pairs – A,T,C or G)

• The Main Dogma in Genetics: DNA->RNA->proteins

• Transcription: DNA (about 5%) -> mRNA – DNA -> pre-RNA -> splicing -> mRNA (only the exons)

• Translation: mRNA -> proteins– Proteins make cells alive and specialised (e.g. blue eyes)– Genome -> proteome N.Kasabov, 2003

Page 44: Bio-Medical Informatics

Bioinformatics

• The area of Science that is concerned with the development and applications of methods, tools and systems for storing and processing of biological information to facilitate knowledge discovery.

• Interdisciplinary: Information and computer science, Molecular Biology, Biochemistry, Genetics, Physics, Chemistry, Health and Medicine, Mathematics and Statistics, Engineering, Social Sciences.

• Biology, Medicine -- Information Science --> IT, Clinics, Pharmacy, I____________________I • Links to Health informatics, Clinical DSS, Pharmaceutical Industry

N.Kasabov, 2003

Page 45: Bio-Medical Informatics

N.Kasabov, 2003

Bioinformatics: challenging problems for computer and information sciences

• Discovering patterns (features) from DNA and RNA sequences (e.g. genes, promoters, RBS binding sites, splice junctions)

• Analysis of gene expression data and predicting protein abundance

• Discovering of gene networks – genes that are co-regulated over time

• Protein discovery and protein function analysis

• Predicting the development of an organism from its DNA code (?)

• Modeling the full development (metabolic processes) of a cell (?)

• Implications: health; social,…

Page 46: Bio-Medical Informatics

N.Kasabov, 2003

Problems in Computational Modeling for Bioinformatics

• Abundance of genome data, RNA data, protein data and metabolic pathway data is now available (see http://www.ncbi.nlm.nih.gov) and this is just the beginning of computational modeling in Bioinformatics

• Complex interactions:– between proteins, genes, DNA code, – between the genome and the environment – much yet to to be discovered

• Stability and repetitiveness: Genes are relatively stable carriers of information.

• Many sources of uncertainty:– Alternative splicing– Mutation in genes caused by: ionising radiation (e.g. X-rays); chemical contamination, replication

errors, viruses that insert genes into host cells, aging processes, etc.– Mutated genes express differently and cause the production of different proteins

• It is extremely difficult to model dynamic, evolving processes

Page 47: Bio-Medical Informatics

Bioinformatics Important Challenges

Transcription Translation

Gene Predication

Gene FunctionProtein FunctionProtein 3D Structure

Page 48: Bio-Medical Informatics

Public Data Base

Transcription Translation

DNA sequence {A,T,C,G}

Microarray Protein sequenceKMLSLLMARTYW

Gene Expression

Level

Page 49: Bio-Medical Informatics

Gene Expression

49

Page 50: Bio-Medical Informatics

Microarray • What can it be used for? • How does it work?• What are the Advantages?

An Example Application

Page 51: Bio-Medical Informatics

Microarrays can be used for:Comparison of transcription levels between two cells

Examples:Comparison between:Cells from a young mouse vs cell from an old mouse

Drug efficacy:Treated cells vs untreated cells

Page 52: Bio-Medical Informatics

How it works:Based on hybridization

A =C ≡T =T =G ≡A =C ≡C ≡ ▀

UGAACUGG

A C T T GA C C ▀

TGAACTGG

UGAACUGG

A =C ≡T =T =A ≡A =C ≡C ≡ ▀

UGAAUUGG

UGAAUUGG

mRNA

A =C ≡T =T =A ≡A =C ≡C ≡ ▀

Page 53: Bio-Medical Informatics

MicrotiterPlates

Print Head

slides (100)

Probes and the printing process

Page 54: Bio-Medical Informatics

Print HeadPins

Page 55: Bio-Medical Informatics
Page 56: Bio-Medical Informatics
Page 57: Bio-Medical Informatics
Page 58: Bio-Medical Informatics

Print Head with Pins

Page 59: Bio-Medical Informatics
Page 60: Bio-Medical Informatics

23/2/2008 60

Microarray Technology

Page 61: Bio-Medical Informatics

probe(on chip)

sample(labelled)

pseudo-colourimage

[image from Jeremy Buhler]

Page 62: Bio-Medical Informatics

Experimental design Track what’s on the chip

which spot corresponds to which gene

Duplicate experimental spots reproducibility

Controls DNAs spotted on glass

positive probe (induced or repressed)negative probe (bacterial genes on human chip)

oligos on glass or synthesised on chip (Affymetrix)point mutants (hybridisation plus/minus)

Page 63: Bio-Medical Informatics

Images from scanner Resolution

standard 10m [currently, max 5m] 100m spot on chip = 10 pixels in diameter

Image format TIFF (tagged image file format) 16 bit (65’536 levels of grey) 1cm x 1cm image at 16 bit = 2Mb (uncompressed) other formats exist e.g.. SCN (used at Stanford University)

Separate image for each fluorescent sample channel 1, channel 2, etc.

Page 64: Bio-Medical Informatics

Images in analysis software The two 16-bit images (cy3, cy5) are compressed into 8-bit images Goal : display fluorescence intensities for both wavelengths using a

24-bit RGB overlay image RGB image :

Blue values (B) are set to 0 Red values (R) are used for cy5 intensities Green values (G) are used for cy3 intensities

Qualitative representation of results

Page 65: Bio-Medical Informatics

Images : examples

cy3

cy5Spot color Signal strength Gene

expression

yellow Control = perturbed unchanged

red Control < perturbed induced

green Control > perturbed repressed

Pseudo-color overlay

Page 66: Bio-Medical Informatics

Data : DNA Microarray

23/2/2008 66

0 10 20 30 40 50 60time (min)

gene 1

gene 2

gene 3

assay

Page 67: Bio-Medical Informatics

Data Required: Gene Expression Matrix

t1 t2 t3 t4

g1 0 1 2 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

23/2/2008 67

Page 68: Bio-Medical Informatics

Data Required: Gene Expression Matrix

a1 a2 a3 a4

g1 0 3 1 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

23/2/2008 68

Snap Shot

t1 t2 t3 t4

g1 0 1 2 1

g2 1 2 1 0

g3 0 1 1 1.

g4 1 2 1 0

Time serious

Page 69: Bio-Medical Informatics
Page 70: Bio-Medical Informatics
Page 71: Bio-Medical Informatics
Page 72: Bio-Medical Informatics
Page 73: Bio-Medical Informatics
Page 74: Bio-Medical Informatics
Page 75: Bio-Medical Informatics
Page 76: Bio-Medical Informatics
Page 77: Bio-Medical Informatics
Page 78: Bio-Medical Informatics

• World Health Organization

Page 79: Bio-Medical Informatics
Page 80: Bio-Medical Informatics
Page 81: Bio-Medical Informatics
Page 82: Bio-Medical Informatics
Page 83: Bio-Medical Informatics
Page 84: Bio-Medical Informatics
Page 85: Bio-Medical Informatics
Page 86: Bio-Medical Informatics
Page 87: Bio-Medical Informatics
Page 88: Bio-Medical Informatics
Page 89: Bio-Medical Informatics
Page 90: Bio-Medical Informatics