introduction to genomics · aims of genomics ii describing every gene: • function/expression...

Genomics

An introduction

Aims of genomics I

Establishing integrated databases –being far from merely a storage

Linking genomic and expressed gene sequences

cDNA

Aims of genomics II

Describing every gene:• function/expression

data/relationships/phenotype• 3-d structure and features (introns/exons,

domains, repeats)• similarities to other genes

Characterize sequence diversity in population

Genomics can be:

Structural – where it is?

Functional – what it does?– DNA microarrays:

Comparative– finding important fragments

Mapping genomesPast– Genetic maps

Distance between simple markers expressed in units of recombination

– Cytological mapsStained chromosomes, observable under microscope

Present– Physical maps

Distance between nucleotides expressed in bases

– Comparative mapCorresponding genes detection; Regulatory sequence detection;

Genome sizes

4 4004.5 MbEscherichia coli

18 000120 MbDrosophilamelanogaster

6 20012 MbSaccharomyces cerevisiae

3200 Mb

97 Mb

3 Mb in 4-10 copies!

0.5 Mb

DNA length

32 000Homo sapiens

22 000Caenorhabditiselegans

3 200Deinococcus radiodurans

470Mycoplasma genitalium

GenesOrganism

Genetic differences among humans

Goals– Genetic diseases– Identifying criminals

Methods– Genetic markers (fingerprints) and DNA

sequence. Repeats:• Microsatellites (repeats of 1-12 nucleotides)• Minisatellites (> 12)

– Other types of variation• Genome rearrangements• Single nucleotide mutations

Microsatellites and diseaseHuntington’s disease

– Huntingtin gene of unknown (!) function– Repeats #: 6-35: normal; 36-120: disease

• Friedrich ataxia disease– GAA repeat in non-coding (intron) region– Repeats #: 7-34: normal; 35 up: disease– Repeat expansion reduces expression of frataxin gene

SNP - Single Nucleotide Polymorphism

Definition– SNP and phenotype

Occurrence in genome– Rarity of most SNPs (agrees with

neutral molecular evolutionary theory)– SNPs in human population:

• High variance in genome!

Detection of SNPs: Hybridization

Every 1430bpEvery 1400bpCoding regionsInter-genic regions

Sickle cell anemia

SNP on Beta Globin gene, which is recessive:

• 2 faulty copies: red blood cells change shape under stress -anemia

• 1 faulty copy: red blood cells change shape under heavy stress –but gives resistance to malaria parasite

Sickle looks like this:

SNPs and haplotypes

Passengers and their evolutionary vehicles

SNP - Phase inferenceIn the data from sequencing the genome the origin of SNP is scrambled G

T...CT AC GT...

GA

Possibility 1 Possibility 2

...CTGACGGT... ...CTGACAGT...chromosome

...CTTACAGT... ...CTTACGGT...chromosome

Which SNPs are on the same chromosome (are in phase)?

SNP – phase inferenceDetermining the parent of origin for each SNP

CT

...CT AC GT...AA

GA

...CT AC GT...CG

GT

...CT AC GT...GA

In this case:TAGG

Phase inference – the reason why many SNPs sequencing is done for child and two parents.

Linkage Disequilibrium, introHow hard is it to break a chromosome

An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both– fA - frequency of occurrences of trait A in population– fa = 1- fA– fB, fb = 1 - fB are frequency occurrences of B and b

Probabilities of occurences of both traits on the same chromosome:fABfAbfaBfab

LD and genomic recombination

A BA ba Ba b

Linkage Disequilibrium, calculation

When these alleles are not correlated we expect them to occur together by chance alone:

fAB = fA fBfAb = fA fbfaB = fa fBfab = fa fb

But if A and B are occurring together more often (disequilibriumstate), we can write

fAB = fA fB + DfAb = fA fb - DfaB = fa fB - Dfab = fa fb + D

where D is called the measure of disequlibriumOf course from definitions above we have D = fAB - fA fB

How can we use it?

Phase inference tells us how SNPs are organized on chromosomeLinkage disequilibrium measures the correlation between SNPs

Back to SNPsDaly et al (2001), Figure 1

Haplotypes - vehicles for SNPsDaly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“The haplotype blocks:– Up to 100kb– 5 or more SNPs

For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes

Haplotypes on the genome fragment

a) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2%b) Percent of explanation by haplotypesc) Contribution of specific haplotypes

Another genetic testDoes haplotypes exist?

- Each row represents an SNP- Blue dot = major- yellow = minor

- Each column represents a single chromosome

- The 147 SNPs are divided into 18 blocks defined by blacklines.

- The expanded box on the rightis an SNP block of 26 SNPsover 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs

How much SNPs we can ignore?…and still predict haplotypes with high accuracy?

Literature

Gibson, Muse „A Primer of Genome Science”N Patil et al . Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science 294 2001:1719-1723.M J Daly et al . High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229-232.

introduction to genomics · aims of genomics ii describing every gene: • function/expression...

Documents