gene variations single nucleotides polymorphism & copy number variation 刘戈飞...

Post on 27-Dec-2015

341 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Gene Variationssingle nucleotides polymorphism

&copy number variation

刘戈飞汕头大学医学院遗传学与细胞生物学教研室

gfliu@stu.edu.cn

075488900497

13502932022

Genetic VariationsChromosome numbers

Segmental duplications, Copy number variation

Translocations

Inversion

Sequence Repeats

Transposable Elements

Short deletions and insertions

Tandem Repeats

Nucleotide Insertions and Deletions (Indels)

Single Nucleotide Polymorphisms (SNPs)

Mutations

Sizable

Minor

Structural

Sequence

Genetic Markers• Morphological markers

• Cytological markers

• Biochemical and physiological markers

• Molecular markers

• 1980, RFLPs (restriction fragment length polymorphisms)

• 1985, STRs (short tandem repeats, mini-satellites)

• 1990s, SNPs (single nucleotide polymorphisms)

• 2000s, CNV (copy number variation)

A C G T G T C G G T C T T A A A Maternal chromosome

A C G T G T C C G T C T T A A A Paternal chromosome

A C G T G T C G G T C T T A A A Maternal chromosome

A C G T G T C G G T C T T A A A Paternal chromosome

A C G T G T C C G T C T T A A A Maternal chromosome

A C G T G T C C T A C T T A A A Paternal chromosome

The position of the SNP is indicated by the box. Individual 1 is heterozygous, while individuals 2 and 3 are homozygous.

Individual 1

Individual 2

Individual 3

SNP

Single nucleotide polymorphism (SNP)

在基因组中,不同个体的 DNA 序列上的单个碱基的差异被称作单核苷酸多态性。

1/1000 Estimated between any 2 individuals

(3 m)

10 m in the whole populations

Single nucleotide polymorphism (SNP)

SNP Effects

• SNPs in genesIn coding regions (possible protein structure changes) Synonymous substitutions ( 同义 ) Missense substitutions ( 错义 ) Nonsense substitutions ( 终止 )In coding and non-coding regions Change of gene expression (by diverse binding various factor

s) Yield Timing Alternative splicing

• SNPs in regulatory regions• Change of gene expression

• SNPs in non-regulatory intergenic regions• Can be used as genetic markers

1 2 3 4

1 2 1 2 43

HapMap国际人类基因组单体型图计划

Towards genomevariations

• 人类的所有群体中大约存在一千万个 SNP 位点,其中稀有的 SNP 位点的频率至少有 1% 。

• 相邻 SNPs 的等位位点倾向于以一个整体遗传给后代。位于染色体上某一区域的一组相关联的 SNP 等位位点被称作单体型 (haplotype) 。

• 大多数染色体区域只有少数几个常见的单体型 ( 每个具有至少 5% 的频率 ) ,它们代表了一个群体中人与人之间的大部分多态性。一个染色体区域可以有很多 SNP 位点,但是只用少数几个标签 SNPs ,就能够提供该区域内大多数的遗传多态模式。

HapMap 的构建分为三个步骤: a 在多个个体的 DNA 样品中鉴定单核苷酸多态性 SNPs ; b 将群体中频率大于 1% 的那些共同遗传的相邻 SNPs 组合成单体型; c 在单体型中找出用于识别这些单体型的标签 SNPs 。通过对图中的三个标签 SNPs 进行基因分型,研究者可以确定每个个体拥有图示的四个单体型中的哪一个。

We are so young! with limited number of ancestorswith a few (thousands) of generationswith only a few recombination events

我们非常年轻人类进化史上曾有一大瓶颈(约 6-15 万年前)通过“瓶颈”的人类祖先群体很小(仅有万余人)现代人类仅经过少数几千个时代(约 3000 - 5600 代)“ 遗传重组”数目有限

Human genome is composed of “blocks”

单体型的起源

Methods and technologies in SNP studies

• Discovery (Find SNPs)• Validation (A common one or rare one)• Genotyping (Frequency in population)

Consideration:—Call rates

—Flexibility

—Throughput

—Cost

Fundamental approaches

• large-scale sequencing based: genomic-alignment(GA), reduced representation shotgun(RRS)• PCR based: common PCR• hybridization based: DNA chips

Genomic DNA mRNA

BAC library cDNA libraryRRS (reduced representation

shot-gun) library or sampling

BAC overlap Shotgun overlap EST overlap

Sequence overlap SNP discovery

GTTTAAATAATACTGATCA

GTTTAAATAATACTGATCAGTTTAAATAGTACTGATCAGTTTAAATAGTACTGATCA

How to discover SNPs

Base-calling

Quality determination

Contig assembly

Sequence viewing

Polymorphism tagging

Polymorphism reporting

Individual genotyping

Polymorphism detection

PolyPhred

Consed

Analysis

Sequence Phred PhrapAmplify DNA5’ 3’

Discovering SNPs by Sequencing

Phylogenetic analysis

ATAGACG ATACACG ATAGACG ATACACG

ATAGACGATACACG

Homozygotes Heterozygote

SNP 检定— Genotyping目标:灵敏、准确、简单、高通量、低成本

Invader(Third Wave) 、 SNPlex(ABI) 、 Parallele 、 BeadArray(Illumina)

Fluorescence Polarization(PE) 、MassArray(Sequenom)

SNaPshot(ABI) 、 SNuPe(GE) 、 TaqMan(ABI) 、 Pyrosequencing

Throughput

SNP screening of certain genes

5’UTR exons 3’UTR-1000~-1 regulation region

Genes, Samples, Phenotypes

Primers design and PCR

Directly DNA sequencing

Statistical Analysis

• SNP raise the resolution of genetic analysis

• Pharmacogenomics• Personalize medicine

2|JANUARY 2007|VOLUME 8 www.nature.com/reviews/genetics

Science,2004 23 JULY 2004,305:525

• Forty-three authors used the DNA from 270 individuals from the 4 HapMap populations.

• Overall, the authors found 1,447 discrete, heterogeneously distributed, copy number variable regions (CNVRs), which cover 12% of the human genome. They found that 24% of CNVRs are associated with segmental duplications.

• CNVRs contain different classes of functional elements. – many CNVs preferentially lie outside genes.– genes that are involved in cell-adhesion functions,

sensory perception of smell and response to chemical stimuli are enriched within CNVs.

– Conversely, cell signalling and proliferation, as well as kinase-and phosphorylationrelated categories were underrepresented among CNVs.

– Interestingly, ultraconserved elements are strongly excluded from these regions.

• CNV has effects on SNP genotype patterns. SNP has the ability to identify linked CNV.

• Both types of variation will need to be collecte

d and analysed systematically if we are to understand the genetic basis of human disease.

• The authors call for standard assessment of CNV in all studies of the genetic basis of phenotypic variation, and for an international effort to continue to characterize and catalogue structural genomic variation.

26,628 clones 534500 SNPs

• phenotype: modify drug response• predispose to or cause disease• polymorphism: population genetics• genome wide gene regulation variation

Effects of CNV

• Genome-wide– array-based

• array- CGH: Clone-based(1Mb), oligonucleotide-based(30kb)

• SNP array (signal intensity, genotyping)– sequence-assembly comparison

• Targeted– PCR-based

• MAPH, MLPA, QMPSF: mutiplex, up to 40 regions per time• real-time qPCR

– Hybridization-based• FISH, Southern blotting

• Computation approaches

Methods to identify CNV

Methods to identify CNV: array-CGH

array-based CGH

Methods to identify CNV: array-CGH

representational oligonucleotide microarray analysis, ROMA

multiplex amplifiable probe hybridization, MAPH

Methods to identify CNV: targeted PCR-based

Multiplex ligation-dependent probe amplification, MLPA

Methods to identify CNV: targeted PCR-based

Quantitative multiplex PCR of short fluorescent fragments, QMPSF

Methods to identify CNV: targeted PCR-based

Methods to identify CNV: computational

• Mass spectrometery: MALDI-TOF• real-time quantitative PCR• Southern blotting• FISH

Validation of CNV

submicrosopic

microscopic

DNA 提取

自动序列分析及等位基因信息的获得

Primer

MassArray ( 1 )

目标序列的扩增

第一次纯化和 SNP 位点延伸反应

点样

MALDI-TOF 质谱测定

Alle

le 1

Alle

le 1

Alle

le 2

Alle

le 2

+Enzyme+ddATP

+dCTP/dGTP/dTTP

Allele 1Unlabeled Primer (23mer)

T C T

Extended Primer (24mer)

T C TA T G A

Allele 2Unlabeled Primer(23mer)

A C T

Extended Primer (26-mer)

A C T

MassArray ( 2 )

Identify CNV of certain genes or regions

Southern (small samples)

FISH (optional)

real-time qPCR

QMPSF

MAPH

MLPA

(large samples)

top related