introduction to genomics · aims of genomics ii describing every gene: • function/expression...

22
Genomics An introduction

Upload: others

Post on 12-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Genomics

An introduction

Page 2: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Aims of genomics I

Establishing integrated databases –being far from merely a storage

Linking genomic and expressed gene sequences

cDNA

Page 3: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Aims of genomics II

Describing every gene:• function/expression

data/relationships/phenotype• 3-d structure and features (introns/exons,

domains, repeats)• similarities to other genes

Characterize sequence diversity in population

Page 4: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Genomics can be:

Structural – where it is?

Functional – what it does?– DNA microarrays:

Comparative– finding important fragments

Page 5: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Mapping genomesPast– Genetic maps

Distance between simple markers expressed in units of recombination

– Cytological mapsStained chromosomes, observable under microscope

Present– Physical maps

Distance between nucleotides expressed in bases

– Comparative mapCorresponding genes detection; Regulatory sequence detection;

Page 6: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Genome sizes

4 4004.5 MbEscherichia coli

18 000120 MbDrosophilamelanogaster

6 20012 MbSaccharomyces cerevisiae

3200 Mb

97 Mb

3 Mb in 4-10 copies!

0.5 Mb

DNA length

32 000Homo sapiens

22 000Caenorhabditiselegans

3 200Deinococcus radiodurans

470Mycoplasma genitalium

GenesOrganism

Page 7: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Genetic differences among humans

Goals– Genetic diseases– Identifying criminals

Methods– Genetic markers (fingerprints) and DNA

sequence. Repeats:• Microsatellites (repeats of 1-12 nucleotides)• Minisatellites (> 12)

– Other types of variation• Genome rearrangements• Single nucleotide mutations

Page 8: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Microsatellites and diseaseHuntington’s disease

– Huntingtin gene of unknown (!) function– Repeats #: 6-35: normal; 36-120: disease

• Friedrich ataxia disease– GAA repeat in non-coding (intron) region– Repeats #: 7-34: normal; 35 up: disease– Repeat expansion reduces expression of frataxin gene

Page 9: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

SNP - Single Nucleotide Polymorphism

Definition– SNP and phenotype

Occurrence in genome– Rarity of most SNPs (agrees with

neutral molecular evolutionary theory)– SNPs in human population:

• High variance in genome!

Detection of SNPs: Hybridization

Every 1430bpEvery 1400bpCoding regionsInter-genic regions

Page 10: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Sickle cell anemia

SNP on Beta Globin gene, which is recessive:

• 2 faulty copies: red blood cells change shape under stress -anemia

• 1 faulty copy: red blood cells change shape under heavy stress –but gives resistance to malaria parasite

Sickle looks like this:

Page 11: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

SNPs and haplotypes

Passengers and their evolutionary vehicles

Page 12: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

SNP - Phase inferenceIn the data from sequencing the genome the origin of SNP is scrambled G

T...CT AC GT...

GA

Possibility 1 Possibility 2

...CTGACGGT... ...CTGACAGT...chromosome

...CTTACAGT... ...CTTACGGT...chromosome

Which SNPs are on the same chromosome (are in phase)?

Page 13: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

SNP – phase inferenceDetermining the parent of origin for each SNP

CT

...CT AC GT...AA

GA

...CT AC GT...CG

GT

...CT AC GT...GA

In this case:TAGG

Phase inference – the reason why many SNPs sequencing is done for child and two parents.

Page 14: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Linkage Disequilibrium, introHow hard is it to break a chromosome

An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both– fA - frequency of occurrences of trait A in population– fa = 1- fA– fB, fb = 1 - fB are frequency occurrences of B and b

Probabilities of occurences of both traits on the same chromosome:fABfAbfaBfab

LD and genomic recombination

A BA ba Ba b

Page 15: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Linkage Disequilibrium, calculation

When these alleles are not correlated we expect them to occur together by chance alone:

fAB = fA fBfAb = fA fbfaB = fa fBfab = fa fb

But if A and B are occurring together more often (disequilibriumstate), we can write

fAB = fA fB + DfAb = fA fb - DfaB = fa fB - Dfab = fa fb + D

where D is called the measure of disequlibriumOf course from definitions above we have D = fAB - fA fB

Page 16: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

How can we use it?

Phase inference tells us how SNPs are organized on chromosomeLinkage disequilibrium measures the correlation between SNPs

Page 17: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Back to SNPsDaly et al (2001), Figure 1

Page 18: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Haplotypes - vehicles for SNPsDaly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“The haplotype blocks:– Up to 100kb– 5 or more SNPs

For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes

Page 19: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Haplotypes on the genome fragment

a) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2%b) Percent of explanation by haplotypesc) Contribution of specific haplotypes

Page 20: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Another genetic testDoes haplotypes exist?

- Each row represents an SNP- Blue dot = major- yellow = minor

- Each column represents a single chromosome

- The 147 SNPs are divided into 18 blocks defined by blacklines.

- The expanded box on the rightis an SNP block of 26 SNPsover 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs

Page 21: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

How much SNPs we can ignore?…and still predict haplotypes with high accuracy?

Page 22: Introduction to genomics · Aims of genomics II Describing every gene: • function/expression data/relationships/phenotype • 3-d structure and features (introns/exons, domains,

Literature

Gibson, Muse „A Primer of Genome Science”N Patil et al . Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science 294 2001:1719-1723.M J Daly et al . High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229-232.