genetics and molecular biology tutorial ii -- computational perspective the goal is to introduce...

64
Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in genetics/biology, and yet try to provide some examples of topics to maintain the interest of individuals with extensive

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

Genetics and Molecular Biology Tutorial II -- Computational

Perspective

The goal is to introduce some topics to individuals with a minimal background in

genetics/biology, and yet try to provide some examples of topics to maintain the interest of individuals with extensive biological/genetics

backgrounds.

Page 2: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

2

Outline Gene structure

– genomic structure vs mRNA structure– coding and noncoding exons– introns– primary transcript processing

aside -- nonsense mediated mRNA degradation

– alternative splicing and differential polyadenylation– evolutionary conservation of coding and

noncoding sequences

Page 3: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

3

Outline… Genomic structure

– repetitive sequences LINES and SINES

– example -- Y chromosome palindromes– C value paradox– genomes of model organisms

example– yeast genome and gene-chip– single/double knockouts

– cross-species sequence similarities for putative function identification example -- “chaperonine”

Page 4: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

4

Fundamental Genetics and Probability Concepts

meiosis and sampling patterns of inheritance monogenic and complex inheritance

– phenocopy– reduced penetrance

DNA variation– polymorphisms, SNPs, and mutations

positional cloning

Page 5: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

5

Gene Structure

Page 6: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

6

Transcript Processing

DNA -> pre-mRNA -> mRNA -> protein

Page 7: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

7

Nonsense mediated mRNA degradation

– unknown mechanism– more rapidly degrades mRNA containing– Lykke-Andersen, “mRNA quality control:

Marking the message for life or death.” Current Biology, 11, 2001.

Page 8: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

8

Nonsense Mediated mRNA Degradation

Page 9: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

9

Genome Structure -- repeat classesClass (blocks) Size of

RepeatChr Locations

Megasatellite (100s ofkb)

several kb various locations

RS447 4.7 kb ~50-70 copies on 4, several on 8untitled 2.5 kb ~400 copies on 4 and 19untitled 3.0 kb ~50 copies on XSatellite (100kb to Mbs) 5-171 bp centromericalphoid 171 bp centromeric hetero all chrsSau3 A family 68 bp centromeric hetero 1 9 13 14 15 21

22 6satellite 1 (AT rich) 25-48 bp centromeric hetero most chrssatellites 2 and 3 5 bp most chrsMinisatellite (0.1-20 kb) 6-64 bp At or close to telomerestelomeric family 6 bp all telomereshypervariable family 9-64 bp all chrs, often near telomeresMicrosatellite (<150bp)

1-4 bp dispersed through all chromosomes

Page 10: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

10

C-Value ParadoxHartl, “Molecular melodies in high and low C,” Nat. Rev. Genetics, Nov 20001

refers to the massive, counterintuitive and seemingly arbitrary differences in genome size observed in eukaryotic organisms– Drosophila melanogaster 180 Mb– Podisma pedestris 18,000 Mb– difference is difficult to explain in view of

apparently similar levels of evolutionary, developmental, and behavioral complexity

Page 11: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

11

Alternative Splicing Every conceivable pattern of alternative

splicing is found in nature. Exons have multiple 5’ or 3’ splice sites alternatively used (a, b). Single cassette exons can reside between 2 constitutive exons such that alternative exon is either included or skipped ( c ). Multiple cassette exons can reside between 2 constitutive exons such that the splicing machinery must choose between them (d). Finally, introns can be retained in the mRNA and become translated.

Graveley, “Alternative splicing: increasing diversity in the proteomic world.” Trends in Genetics, Feb., 2001.

Page 12: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

12

Classic View of Gene No Longer Valid -- Strachan pg 185

Mechanism Frequency/Examples

multigenic transcription units rare. 18S, 28S, and 5.8S rRNA,mitochondria

alternative promoters common. dystrophin gene (8)

alternative splicing very frequent. slo gene (8cassettes), >500 mRNAs

alternative polyadenylation common. calcitonin gene (2)

RNA editing extremely rare. apolipoprotein Bgene (tissue specific editing –codon changed)

post-translational cleavage rare. may generate functionallyrelated polypeptides – hormones.insuline

Page 13: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

13

Alternative Splicing Example -- Graveley 2001

Page 14: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

14

Alternative PolyAdenylation

common in human RNA (Edwards-Gilbert 1997)

in many genes, 2 or more poly-A signals in 3’ UTR– alternative transcripts can show tissue

specificity alternative poly-A signals may be brought

into play following alternative splicing

Page 15: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

15

Edwards-Gilbert. Nucleic Acids Res, 13, 1997

Page 16: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

16

Evolution of the mitochondrial genome and origin of eukaryotic cells

Page 17: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

17

Evolutionary Conservation of Coding and Noncoding Sequences

Sequencing of H. sapiens and model organisms is basis for comparative genomics

Generally, functional solutions (encoded as genes) across organisms allows us to compare gene sequences and infer function

protein functional/structural region == “domains” Intergenic regions are generally not conserved

(always exceptions)

Page 18: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

18

Example - MKKS (UniGene Clusters)

human rat 87.4 % human mouse 84.9 % human cow 87.1 % mouse rat 97.8 % rat cow 91.0% mouse cow 85.1 % frog rat 62.5 %

Page 19: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

19

Example - MKKS

Page 20: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

20

Page 21: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

21

Computational Approach to Using Conserved Regions

Problem -- want to screen genes for mutations

Conventional approach -- screen all exons of a single gene

Alternative -- identify domains with in multiple genes, and screen domains first, to optimize screening time and resources

Page 22: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

22

Cross-Species Similarities

yeast– gene chip for hybridization/expression– complete genome (first eukaryote)– singe knockouts and double knockouts

Page 23: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

23

Fundamental Genetics

meiosis– Hs are diploid– meiosis produces haploid gametes– mechanism for transmission of genetic

material to offspring– recombination by cross-over (Holliday

structure) or by independent segregation of homologous pairs

Page 24: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

24

Fundamental Genetics (Background for Linkage Analysis)

Rule of Segregation– offspring receive ONE allele (genetic material) from

the pair of alleles possessed by BOTH parents Rule of Independent Assortment

– alleles of one gene can segregate independently of alleles of other genes

– (Linkage Analysis relies on the violation of Independent Assortment Rule)

Page 25: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

25

Genetic Marker … Prelude to LA– A genetic marker allows for the observation of

the genetic state at a particular genomic location (locus). A genotype is the measured state of a genetic marker. May never be feasible to sequence cases directly.

– An “informative” marker is often “heterozygous,” or “polymorphic” and enables the observation of the inheritance of genetic material.

Page 26: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

26

Monogenic and Polygenic Diseases– monogenic (Mendelian) -- one gene

“simple” (dominant and recessive) Mendelian inheritance direct correspondence between one gene mutation and one

disorder majority of disease genes found are monogenic

– polygenic -- (complex) multiple genes heterogeneity and epistasis combinatorics no longer have direct correspondence between one gene and

disorder majority of disorders are probably polygenic

– complexity of organisms and observed pathways

Page 27: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

27

...Mongenic and Polygenic Diseases

phenocopy reduced penetrance

– Example -- sickle cell anemia “classic” recessive disorder defect in red blood cells (hemoglobin) but… infant hemoglobin gene can “leak” wide range of phenotypes

Page 28: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

28

Examples

Page 29: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

29

Examples

Page 30: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

30

Example

Page 31: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

31

BBS4 Pedigree

Page 32: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

32

Hardy-Weinberg Equilibrium

Rule that relates allelic and genotypic frequencies in a population of diploid, sexually reproducing individuals if that population has random mating, large size, no mutation or migration, and no selection

Assumptions– allelic frequencies will not change in a population from

one generation to the next– genotypic frequencies are determined in a predictable

way by allelic frequencies– the equilibrium is neutral -- if perturbed, it will reestablish

within one generation of random mating at the new allelic frequency

Page 33: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

33

Page 34: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

34

H-W

f(AA) = p2

f(Aa) = 2pq f(aa) = q2

(p+q)2

(p2 + q2 + r2 + 2pq + 2pr + 2qr)= (p+q+r)2

Page 35: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

35

Dominant and Recessive Penetrance Modeled

penetrance = P(pt | gt)

DD Dd dd

1 1 0

DD Dd dd

0.9 0.9 0.0

DD Dd dd

0 0 1

DD Dd dd

0 0 0.8

Page 36: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

36

D-R Heterogeneous, DD Epistatic

AA Aa aaBB 1 1 0Bb 1 1 0bb 1 1 1

reduced penetrance 3,9,27,81,243… 3n

AA Aa aaBB 1 1 0Bb 1 1 0bb 0 0 0

Page 37: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

37

Dom-Rec Heterozygous

Screen genes A, B?, b

Page 38: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

38

Uninformative Marker

Page 39: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

39

Informative Marker

Page 40: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

40

Given the following observations: family structure, affection status, genotypes, and disease allele frequencies. Assuming a model for the disease, can we calculate the probability that these observations “fit” an assumed model???

Page 41: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

41

Linkage

Page 42: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

42

Linkage Analysis

Goal: find a marker “linked” to a disease gene. LOD score = log of likelihood ratio LR[θ;data] == k P[data; θ] theta = estimate of genetic distance

(recombination fraction) between marker and disease

= proportion of recombinant gametes/total gametes

Page 43: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

43

…Linkage Analysis Linkage analysis calculates the likelihood that

the inheritance pattern of the phenotype (disease) is supported by the observed inheritance patterns (genotypes) in a pedigree.

– few monogenic models, easy to test– more difficult to find models explaining inheritance

in polygenic models– parameter maximization

Page 44: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

44

Linkage Analysis Programs

FASTLINK - 2 point– O(n2), where n = number of markers

GeneHunter - multipoint, 2 point– O(n2), where n = number of people

Page 45: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

45

Allele Sharing

tries to show that affected family members inherit the same chromosomal regions more often than expected by chance

Page 46: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

46

Allele Sharing Example

Needs at least sibs.

Page 47: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

47

Association Studies

“Allelic association studies provide the most powerful method for locating genes of small effect contributing to complex diseases and traits.” Daniels, Am J Hum Genet 62:1189-1197, 1998.

Linkage analysis – genome wide screen, 400 markers ~ 10 cM (10 MB),

association needs 4000+ polymorphic markers– generally need nuclear family or larger

Association finds “linkage disequilibruim”

Page 48: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

48

Association Studies

“Association is simply a statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with disease D if people who have D also have A more (or maybe less) often than would be predicted from the individual frequencies of D and A in the population.” Pg. 286 Human Molecular Genetics 2, Tom Strachan

Page 49: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

49

Examples HLA-DR4 (antigen marker)

– 36% in UK– 78% with rheumatoid arthritis

CF( RFLP markers XV2.c (X1,X2), KM19(K1,K2))

– Marker Alleles CF(case) Normal(control)

– X1, K1 3 49– X1, K2 147 19– X2, K1 8 70– X2, K2 8 25– CF associated with X1, K2 in ‘89 (Strachan)

Page 50: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

50

Linkage Disequilibrium

linkage equilibrium (aka Hardy-Weinberg) is true if– P(gt1,gt1’;gt2,gt2’) = P(gt1,gt1’)*P(gt2,gt2’) where

[P(haplotype)] case vs controls TDT (heterozygous marker transmitted),

HRR (untransmitted alleles as control) allelic associations (outbred populations)

maintained at only <= 1cM

Page 51: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

51

Equilibrium

Page 52: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

52

“SNPs” Single-Nucleotide Polymorphisms 1 every 1000 bp (estimated) 2,972,052 SNPs submitted to dbSNP

– dbSNP summary link– 50% of all SNPs are in question– 10% of UTRs have SNPs

100,000 - 500,000 SNPs needed Why don’t we do this?

– $$$

Page 53: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

53

Homozygosity Mapping

Page 54: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

54

Positional Cloning

Page 55: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

55

Disease Gene Identification

SSCP -- single strand conformational polymorphism

PCR -- polymerase chain reaction– primers amplify template sequence

direct sequencing

BBS2 (Bardet-Biedl Syndrome)

Page 56: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

56

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

Page 57: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

57

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

unaffectedaffected

Page 58: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

58

BBS4 Gene (Direct Sequencing)(Hs.26471)

Page 59: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

59

BBS4 Deletion (by PCR)

exons 3 4

Page 60: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

60

BBS4 Mutations (direct sequencing)

(R295P)

Page 61: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

61

Summary

Disease Gene Identification– challenges– interval localization

genotyping and genetic markers, linkage analysis, allele sharing, association studies (“SNiPs”), homozygosity mapping

– disease gene identification techniques Take home

– A complex disorder (with interacting genes) has yet to be characterized

Page 62: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

62

Demo -- installing a database A database organizes data Most common

– relational database (oracle, sybase)– perceived as a collection of tables,– where table is an unordered collection of rows– each row has a fixed number of fields, and each field

can store a predefined type of data value (date, integer, string, etc.)

simplest– flat file

Page 63: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

63

Databases

NCBI BLAST Amazon Yahoo Several of our own

– genotypes– rat ESTs– eye clones from differential display– micro-array data

Page 64: Genetics and Molecular Biology Tutorial II -- Computational Perspective The goal is to introduce some topics to individuals with a minimal background in

64

This space intentionally left blank