single nucleotide polymorphisms jennifer lyon eskind biomedical library may 1, 2009 crc workshop...

23
Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Upload: quentin-ethelbert-blair

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Single Nucleotide Polymorphisms

Jennifer Lyon

Eskind Biomedical Library

May 1, 2009

CRC Workshop Series

Page 2: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Types of Genetic Variations

• Single Nucleotide Polymorphisms (SNP)

– Single base pair changes

GTCATTCGATT

GTCAGTCGATT

• Indels

– Small insertion/deletions

CTT------GATC

CTTACGGATC

• Small variable repeats – microsatellites

– ACGACGACGACGACGACG (6 copies)

– ACGACGACGACGACGACGACG (7 copies)

• Variable Long tandem repeats (can be dozens to hundreds to thousands)

• Chromosomal Aberrations: Translocations, Inversions, etc.

Page 3: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Focusing on SNPs

• Types of SNPs• SNP nomenclature• Resources for SNPs• Examples and Challenges in Finding SNPs

http://learn.genetics.utah.edu/content/health/pharma/snips/

Page 4: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

SNPs Types

• SNPs can be categorized in a number of ways, the most common are by location and function (relative to a gene)

• Intragenic SNPs are often categorized by function – are they in a coding region, an intron, part of the mRNA, outside the mRNA but still in the gene locus (i.e., in the promoter)

• Extragenic SNPs may be considered simply ‘genomic’ or might be labeled relative to the nearest gene, ie. 5’ or 3’ to a gene

An ‘extragenic’ SNP may affect regulatory regions important in gene expression or other DNA functions such as DNA replication.

Page 5: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

SNP Functional Categories

• coding nonsynonymous– Missense, nonsense, frame shift

• coding synonymous• Intronic

– splice site

• mRNA utr– 5' utr or 3' utr

• (gene) locus region (5’ or 3’ to the gene)– ‘near gene’ usually means within ~2000bp of gene

• genomic/extragenic (distant from any gene)

Page 6: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Coding Nonsynonymous SNPs

Missense – change an aa

http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Variation/powerpoint/variation_files/frame.html

Page 7: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Coding Non-Synonymous SNPs

• Nonsense– Change an aa to a stop codon– Results in a shortened protein

• Frame Shift– Are really single-base indels– Drop or add one base and the triplet reading frame is

thrown out of shift, altering all downstream aa’s and usually resulting in an earlier stop codon

Page 8: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

SNP Nomenclature

• The Human Genome Variation Society (http://www.hgvs.org/mutnomen/recs.html) has proposed some guidelines for SNP nomenclature, but at the moment, there is minimal consistency.

• Different sources will refer to the same SNP in different ways

• While dbSNP identifiers (rs#12345678) are becoming common, they are not required of publishing authors and not used in all cases.

Page 9: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

SNPs at Base-Pair Level

• The base-pair change is given in various forms:

A/C T→G C>T 432G>C T73C

The HGVS nomenclature recommendations:

"c." for a coding DNA sequence (like  c.76A>T) "g." for a genomic sequence (like g.476A>T) "m." for a mitochondrial sequence (like m.8993T>C

"r." for an RNA sequence (like r.76a>u)

Page 10: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Position, position, position!

• The big issue with SNPs is identifying their location (numerically).

• Position can be specified:– Number location within a specific sequence– Relative to another genetic landmark

• Start site for a coding region of a gene• Start or end of an exon or intron• Relative to a marker

• Published articles are not always clear on this!!!• Different resources may use different

landmarks/numbering• Numbering is always relative to the chosen

sequence

Page 11: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Coding SNPs

• These are easier because they can be identified by the amino acid position rather than the base-pair position

• Most common nomenclature uses either 3-letter or single amino acid codes:

Asn332Asp OR A95V• The HGVS recommendation is similar:

"p." for a protein sequence (like  p.Lys76Asn)• Amino Acid (protein) coding sequence positions

becoming more consistent, but are not always consistent

Page 12: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Database of SNPs (dbSNP)

dbSNP • is the international central repository for both

single base nucleotide substitutions and short deletion and insertion polymorphisms

• accepts data submissions from scientists • is integrated with the NCBI’s Entrez system

Page 13: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

dbSNP Content

The SNP database has two major classes of content:

• Submitted data, i.e., original observations of sequence variation: Submitted SNPs (SS) with ss# (ss 5586300)

• Computed/curated data: Reference SNP Clusters (Ref SNP) with rs# (rs 4986582)

Page 14: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Reference SNP Clusters

• Ref SNP clusters are computer-generated and curated by NCBI staff

• Ref SNP Clusters define a non-redundant set of SNPs

• All individual SNPs submitted by a researcher are given a submitter SNP number (ss#) and then redundant (repetitive) submitter SNPs are combined into a RefSNP cluster record, with a unique rs#

• Ref SNP clusters may contain multiple submitted SNPs

Page 15: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Searching dbSNP

• dbSNP is searched like any other Entrez db• Specialized fields include:Field Tag Notes

Allele [Allele] Uses IUPAC codes for bases

Chromosomal Location [CHRPOS] Uses chromosomal base-pair locations

Contig Position [ctpos] Uses contig base-pair locations

Function Class [Func] Includes coding synonymous, missense, nonsense, intron, utr, etc.

SNP Class [SNP_Class] Includes snp, indel, mixed

Page 16: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

SNP Limits Page

Page 17: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Creating a Complex Search

Retrieve all synonymous coding reference SNPs for the human norepinephrine transporter gene (Slc6a2) from dbSNP

Search Strategy:

human[orgn] AND Slc6a2[gene] AND “coding synonymous” [FUNC]

Note: To use the [gene] (gene name) field, it is necessary to have the official gene name or gene symbol as per the Human Gene Nomenclature Committee. Entrez Gene can be used to find these.

Page 18: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

dbSNP Output – Graphical Display

Page 19: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

dbSNP - Live

• Let’s look at a dbSNP reference SNP page:

• http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=3743788

Page 20: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Finding SNPs - Challenges

• If rs# is available – start with it• Not all rs#s have information in all databases• Another database of interest is the Online

Mendelian Inheritance in Man (OMIM)• OMIM doesn’t always provide rs#s even when

there is one• dbSNP records may link to OMIM or may not,

even if the SNP is in an OMIM record

Page 21: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Example 1

• rs1800888• (C>T) → Ile164Thr in ADRB2 gene• HGVS nomenclature

– NP_000015.1:p.T164I

To Find in OMIM• Search with rs1800888 – yield nothing• Search with ADRB2[gene] – find record• Look at allelic variants: .0003 BETA-2-

ADRENORECEPTOR AGONIST, REDUCED RESPONSE TO [ADRB2, THR164ILE ]

• It is a match

Page 22: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Example 2

• rs2740574• A/G SNP located 5’ to CYP3A4• HGVS nomenclature:

– NT_007933.14:g.24616372C>T

To find in OMIM• Search with rs2740574 yields nothing• Search with gene name CYP3A4 – find record• Find list of allelic variants - .0001 CYP3A4

PROMOTER POLYMORPHISM [CYP3A4, a-g PROMOTER]

• Compare info in dbSNP to info in OMIM (look at sequence)

Page 23: Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series

Other Databases

• OMIM – NCBI• HapMap - International HapMap Project• ALFRED – Allele Frequence Databases• HGVbaseG2P - Human Genome Variation

database of Genotype-to-Phenotype information

• PharmGKB – Pharmacogenomics Knowledgebase

• F-SNP – Functional SNPs