introduction to genomics · aims of genomics ii describing every gene: • function/expression...
TRANSCRIPT
Genomics
An introduction
Aims of genomics I
Establishing integrated databases –being far from merely a storage
Linking genomic and expressed gene sequences
cDNA
Aims of genomics II
Describing every gene:• function/expression
data/relationships/phenotype• 3-d structure and features (introns/exons,
domains, repeats)• similarities to other genes
Characterize sequence diversity in population
Genomics can be:
Structural – where it is?
Functional – what it does?– DNA microarrays:
Comparative– finding important fragments
Mapping genomesPast– Genetic maps
Distance between simple markers expressed in units of recombination
– Cytological mapsStained chromosomes, observable under microscope
Present– Physical maps
Distance between nucleotides expressed in bases
– Comparative mapCorresponding genes detection; Regulatory sequence detection;
Genome sizes
4 4004.5 MbEscherichia coli
18 000120 MbDrosophilamelanogaster
6 20012 MbSaccharomyces cerevisiae
3200 Mb
97 Mb
3 Mb in 4-10 copies!
0.5 Mb
DNA length
32 000Homo sapiens
22 000Caenorhabditiselegans
3 200Deinococcus radiodurans
470Mycoplasma genitalium
GenesOrganism
Genetic differences among humans
Goals– Genetic diseases– Identifying criminals
Methods– Genetic markers (fingerprints) and DNA
sequence. Repeats:• Microsatellites (repeats of 1-12 nucleotides)• Minisatellites (> 12)
– Other types of variation• Genome rearrangements• Single nucleotide mutations
Microsatellites and diseaseHuntington’s disease
– Huntingtin gene of unknown (!) function– Repeats #: 6-35: normal; 36-120: disease
• Friedrich ataxia disease– GAA repeat in non-coding (intron) region– Repeats #: 7-34: normal; 35 up: disease– Repeat expansion reduces expression of frataxin gene
SNP - Single Nucleotide Polymorphism
Definition– SNP and phenotype
Occurrence in genome– Rarity of most SNPs (agrees with
neutral molecular evolutionary theory)– SNPs in human population:
• High variance in genome!
Detection of SNPs: Hybridization
Every 1430bpEvery 1400bpCoding regionsInter-genic regions
Sickle cell anemia
SNP on Beta Globin gene, which is recessive:
• 2 faulty copies: red blood cells change shape under stress -anemia
• 1 faulty copy: red blood cells change shape under heavy stress –but gives resistance to malaria parasite
Sickle looks like this:
SNPs and haplotypes
Passengers and their evolutionary vehicles
SNP - Phase inferenceIn the data from sequencing the genome the origin of SNP is scrambled G
T...CT AC GT...
GA
Possibility 1 Possibility 2
...CTGACGGT... ...CTGACAGT...chromosome
...CTTACAGT... ...CTTACGGT...chromosome
Which SNPs are on the same chromosome (are in phase)?
SNP – phase inferenceDetermining the parent of origin for each SNP
CT
...CT AC GT...AA
GA
...CT AC GT...CG
GT
...CT AC GT...GA
In this case:TAGG
Phase inference – the reason why many SNPs sequencing is done for child and two parents.
Linkage Disequilibrium, introHow hard is it to break a chromosome
An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both– fA - frequency of occurrences of trait A in population– fa = 1- fA– fB, fb = 1 - fB are frequency occurrences of B and b
Probabilities of occurences of both traits on the same chromosome:fABfAbfaBfab
LD and genomic recombination
A BA ba Ba b
Linkage Disequilibrium, calculation
When these alleles are not correlated we expect them to occur together by chance alone:
fAB = fA fBfAb = fA fbfaB = fa fBfab = fa fb
But if A and B are occurring together more often (disequilibriumstate), we can write
fAB = fA fB + DfAb = fA fb - DfaB = fa fB - Dfab = fa fb + D
where D is called the measure of disequlibriumOf course from definitions above we have D = fAB - fA fB
How can we use it?
Phase inference tells us how SNPs are organized on chromosomeLinkage disequilibrium measures the correlation between SNPs
Back to SNPsDaly et al (2001), Figure 1
Haplotypes - vehicles for SNPsDaly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“The haplotype blocks:– Up to 100kb– 5 or more SNPs
For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes
Haplotypes on the genome fragment
a) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2%b) Percent of explanation by haplotypesc) Contribution of specific haplotypes
Another genetic testDoes haplotypes exist?
- Each row represents an SNP- Blue dot = major- yellow = minor
- Each column represents a single chromosome
- The 147 SNPs are divided into 18 blocks defined by blacklines.
- The expanded box on the rightis an SNP block of 26 SNPsover 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs
How much SNPs we can ignore?…and still predict haplotypes with high accuracy?
Literature
Gibson, Muse „A Primer of Genome Science”N Patil et al . Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science 294 2001:1719-1723.M J Daly et al . High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229-232.