bioinformatics medical genomics and...
Post on 16-Sep-2020
4 Views
Preview:
TRANSCRIPT
1
Bioinformatics
Handling and analysis of data obtained fromcurrent biomedical / gene technology methods
Interdisciplinary science• Biology• Mathematics• Computer science
Medical genomics and bioinformatics
Biological sequencesDNA -> mRNA -> protein
Information resources in biomedicineSequence analysis
Sequence alignmentsDatabase searches for sequence similarityFinding genes in genomes
Finding disease genes Linkage analysis
Medical genomics and bioinformatics
Microarray data analysisgene expression - mRNA abundance
Molecular genetic and cytogenetic analysis in the clinic
RNA bioinformatics microRNAs and prediction of target mRNAs
Medical genomics and bioinformatics
ProteomicsLarge scale analysis of protein content
Molecular phylogeny Sequences in virology and microbiology
Introduction to bioinformatics
• Information resources
Tore Samuelsson Nov 2009
Flow of genetic information
DNA
RNA transcript
splicing
mature mRNA
protein
protein structure -> biological function
56,000 protein structures
8,000,000 protein sequences
100 x 106 sequences correspondingto partial mRNAs
~ 250 x 109 nt
2
Nature Nov 6 2008
Archon X prize
$10 million to the first Team ... to sequence 100 human genomes within 10 days or less ...at a cost of no more than $10,000 per genome.
Margaret Dayhoff
The early days of sequence databases
Genome sequencingusing a shotgunapproach
3
DDBJ (Japan) NCBI, NIH, US Genbank
EMBL (EBI, UK )
- DNA sequence databases
Genbank (www.ncbi.nlm.nih.gov)EMBL (European Molecular Biology Laboratory,
www.ebi.ac.uk)
EMBL and Genbank formats
EMBL format
ID LISOD standard; DNA; PRO; 756 BP.XXAC X64011; S78972;XXSV X64011.1XXDT 28-APR-1992 (Rel. 31, Created)DT 30-JUN-1993 (Rel. 36, Last updated, Version 6)XXDE L.ivanovii sod gene for superoxide dismutaseXXKW sod gene; superoxide dismutase.XXOS Listeria ivanoviiOC Bacteria; Firmicutes; Bacillus/Clostridium group;OC Bacillus/Staphylococcus group; Listeria.XXRN [1]RX MEDLINE; 92140371.RA Haas A., Goebel W.;RT "Cloning of a superoxide dismutase gene from Listeria ivanovii byRT functional complementation in Escherichia coli and characterization of theRT gene product.";RL Mol. Gen. Genet. 231:313-322(1992).XXRN [2]RP 1-756RA Kreft J.;RT ;RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases.RL J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum AmRL Hubland, 8700 Wuerzburg, FRGXXDR SWISS-PROT; P28763; SODM_LISIV.XX
FH Key Location/QualifiersFHFT source 1..756FT /db_xref="taxon:1638"FT /organism="Listeria ivanovii"FT /strain="ATCC 19119"FT RBS 95..100FT /gene="sod"FT terminator 723..746FT /gene="sod"FT CDS 109..717FT /db_xref="SWISS-PROT:P28763"FT /transl_table=11FT /gene="sod"FT /EC_number="1.15.1.1"FT /product="superoxide dismutase"FT /protein_id="CAA45406.1"FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSGFT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAAFT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGLFT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"XXSQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other;
cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 60gtaatttctt ttcacataaa taataaacaa tccgaggagg aatttttaat gacttacgaa 120ttaccaaaat taccttatac ttatgatgct ttggagccga attttgataa agaaacaatg 180gaaattcact atacaaagca ccacaatatt tatgtaacaa aactaaatga agcagtctca 240ggacacgcag aacttgcaag taaacctggg gaagaattag ttgctaatct agatagcgtt 300
CDS join(1886..1922,2272..2319,3563..3675,4750..4878)
* to represent a coding sequence on the complementary strand of DNA:CDS complement(1159..2577)
Examples of typical feature table elements
* to represent a coding sequence that is constructedfrom a range of exons:
Common sequence formats
1. EMBL release format2. Genbank (ASN.1)3. FASTA format :
>X12345 Y098TR gene CGTATCTTACGAGCTACTACGAGGTCTTATCGGACGAGCGACT...
4
Two major types of DNA / nucleotide / base sequences found in databases such as GenBank and EMBL
* Genomic , arising from sequencing of DNA material isolated from cells
* ESTs , arising from projects to determine what mRNAs are produced in an certain organism or in a certaintype of cell within a multicellular organism.
DNA
mRNA
EST (Expressed Sequence Tag)
Expressed Sequence Tags (ESTs) correspond to partial mRNA sequences, they are sequences of cDNA which have been reverse-transcribed from mRNA
Short sequences (~500-1000 bases), each is result of single sequencing experiment -> high frequency of errors
Applications:
1) Used to answer questions like: What genes in a specific cell or tissue are expressed ?
2) Identification of coding regions in genomicsequences
3) Discovery of new genes
Redundancy at GenBank=> RefSeq
Many sequences are represented more than once in GenBank
2003 RefSeq collection : curated secondary databasenon-redundantselected organisms
•Genome DNA (assemblies)•Transcripts (RNA)•Protein
RefSeq vs GenBank
Access via Nucl. and Protein dbAccess via NCBI Nucleotide db
Proteins and transcripts identified and linkedProteins identified and linked
Akin to review articlesAkin to primary literature
Exclusive NCBI databaseData exchange among INDSC members
Limitied to model organismsNo limit to species included
Records can contradict each other
Single records for each moleculer of major organismsMultiple records from same loci common
NCBI reivses as new data emergeOnly author can revise
NCBI creates from existing dataAuthor submits
CuratedNot curated
RefSeqGenBank
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook
Trace Archive2001 NCBI and EMBL/ENSEMBL
purpose collect raw data at sequencing centers worldwidePERMANENT repository of single-pass reads
Traces Pieces of a Puzzlebetween 300 and 1,000 nucleotides
vital hunt for polymorphisms in gene sequences linked to disease (human DNA)linked to virulence (viral DNA)
dbSNP : detailed info > 25 million SNPs
Insigths to the impact of genetic variation on health
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?
2009: 2 x 109 single pass reads
First genomes to be sequenced
1995, TIGR (www.tigr.org)
Hemophilus influenzae 1.83 MBMycoplasma genitalium 0.58 MB
Genome projects
5
Sequenced eukaryotic genomesMB Genes
Bacteria 0.6 - 7.5 500-7,000
S. cerevisiae 12 6,000
S. pombe 13 6,000
Worm, Caenorhabditis elegans 97 20,000
Fly, Drosophila melanogaster 120 14,000
Plant, Arabidopsis thaliana 110 26,000
Fish, Fugu rubripes 365 22,000
Mus musculus 3000 24,000
H. sapiens 3200 23,000
Why are genome sequences and comparative genomics useful?
• Many non-human organisms are important model systems
• Comparative genomics useful in gene identification, identification of regulatory elements etc.
• Evolution of genes, proteins and organisms
Variation between individuals
2007 Craig Venter
2008James WatsonCancer patient, normal and cancer tissue Yoruba, Ibadan, NigeriaHan Chinese
SNPs ~3 x 106
Insertion/deletion polymorphisms 105-106
Structural variants/copy number variation103-104
Variation between individuals
6
Flow of genetic information
DNA
RNA transcript
splicing
mature mRNA
protein
protein structure -> biological function
56,000 protein structures
8,000,000 protein sequences
100 x 106 sequences correspondingto partial mRNAs
~ 250 x 109 nt
The SWISS-PROT Protein Sequence Data Bank (www.ebi.ac.uk ) is a database of protein sequences produced collaboratively by Amos Bairoch (University of Geneva) and the EBI. It contains high-quality annotation, is non-redundant, and cross-referenced to many other databases.
SWISS-PROT is accompanied by TrEMBL, a computer-annotated supplement to SWISS-PROT. TrEMBL contains the translations of all coding sequences (CDS) present in the EMBL Nucleotide Sequence Database not yet integrated into SWISS-PROT.
Uniprot : Swissprot + TrEMBL
Sequence entries in Feb 2009Uniprot 7,568,118 Swissprot 410,518TrEMBL 7,157,600
Genbank NCBI protein db 24,133,189
Protein sequence databases
ID PRIO_HUMAN STANDARD; PRT; 253 AA.AC P04156;DT 01-NOV-1986 (REL. 03, CREATED)DT 01-NOV-1986 (REL. 03, LAST SEQUENCE UPDATE)DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE)DE MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).GN PRNP.OS HOMO SAPIENS (HUMAN).OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;OC EUTHERIA; PRIMATES.RN [1]RP SEQUENCE FROM N.A.RX MEDLINE; 86300093.RA KRETZSCHMAR H.A., STOWRING L.E., WESTAWAY D., STUBBLEBINE W.H.,RA PRUSINER S.B., DEARMOND S.J.;RL DNA 5:315-324(1986).RN [2]RP SEQUENCE OF 8-253 FROM N.A.RX MEDLINE; 86261778.RA LIAO Y.-C.J., LEBO R.V., CLAWSON G.A., SMUCKLER E.A.;RL SCIENCE 233:364-367(1986).RN [3]RP VARIANT AMYLOID GSS, SEQUENCE OF 58-85 AND 111-150.RX MEDLINE; 91160504.RA TAGLIAVINI F., PRELLI F., GHISO J., BUGIANI O., SERBAN D.,RA PRUSINER S.B., FARLOW M.R., GHETTI B., FRANGIONE B.;RL EMBO J. 10:513-519(1991).RN [4]RP REVIEW ON VARIANTS.RX MEDLINE; 93372867.RA PALMER M.S., COLLINGE J.;RL HUM. MUTAT. 2:168-173(1993).
CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THECC HOST GENOME AND IS EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLEDCC "RODS".CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.CC -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS ANDCC ANIMALS INFECTED WITH NEURODEGENERATIVE DISEASES KNOWN ASCC TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION DISEASES, LIKE:CC CREUTZFELDT-JACOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROMECC (GSS), FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIECC IN SHEEP AND GOAT; BOVINE SPONGIFORM ENCEPHALOPATHY (BSE) INCC CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME); CHRONIC WASTINGCC DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORMCC ENCEPHALOPATHY (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHYCC (EUE) IN NYALA AND GREATER KUDU. THE PRION DISEASES ILLUSTRATECC THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)CC SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE,CC EUE ARE ALL THOUGHT TO OCCUR AFTER CONSUMPTION OF PRION-INFECTEDCC FOODSTUFFS.CC -!- DISEASE: CJD OCCURS PRIMARILY AS A SPORADIC DISORDER (1 PERCC MILLION), WHILE 10-15% ARE FAMILIAL. ACCIDENTAL TRANSMISSION OFCC CJD TO HUMANS APPEARS TO BE IATROGENIC (CONTAMINATED HUMAN GROWTHCC HORMONE (HGH), CORNEAL TRANSPLANTATION, ELECTROENCEPHALOGRAPHICCC ELECTRODE IMPLANTATION. . .). EPIDEMIOLOGIC STUDIES HAVE FAILED TOCC IMPLICATE THE INGESTION OF INFECTED ANNIMAL MEAT IN THECC PATHOGENESIS OF CJD IN HUMAN. THE TRIAD OF MICROSCOPIC FEATURESCC THAT CHARACTERIZE THE PRION DISEASES CONSISTS OF (1) SPONGIFORMCC DEGENERATION OF NEURONS, (2) SEVERE ASTROCYTIC GLIOSIS THAT OFTENCC APPEARS TO BE OUT OF PROPORTION TO THE DEGREE OF NERF CELL LOSS,CC AND (3) AMYLOID PLAQUE FORMATION. CJD IS CHARACTERIZED BYCC PROGRESSIVE DEMENTIA AND MYOCLONIC SEIZURES, AFFECTING ADULTS INCC MID-LIFE. SOME PATIENTS PRESENT SLEEP DISORDERS, ABNORMALITIES OFCC HIGH CORTICAL FUNCTION, CEREBELLAR AND CORTICOSPINAL DISTURBANCES.CC THE DISEASE ENDS IN DEATH AFTER A 3-12 MONTHS ILLNESS.CC -!- DISEASE: GSS IS A HETEROGENEOUS DISORDER AND WAS DEFINED AS ACC "SPINOCEREBELLAR ATAXIA WITH DEMENTIA AND PLAQUELIKE DEPOSITS".CC GSS INCIDENCE IS LESS THAN 2 PER 100 MILLION.CC -!- DISEASE: KURU IS TRANSMITTED DURING RITUALISTIC CANNIBALISM, AMONGCC NATIVES OF THE NEW GUINEA HIGHLANDS. PATIENTS EXHIBIT VARIOUSCC MOVEMENT DISORDERS LIKE CEREBELLAR ABNORMALITIES, RIGIDITY OF THECC LIMBS, AND CLONUS. EMOTIONNAL LABILITY IS PRESENT, AND DEMENTIA ISCC CONSPICUOUSLY ABSENT. DEATH USUALLY OCCURS FROM 3 TO 12 MONTHCC AFTER ONSET.CC -!- SIMILARITY: TO OTHER PRP.CC -!- DATABASE: NAME=HotMolecBase; NOTE=PrP entry;CC WWW="http://bioinformatics.weizmann.ac.il/hotmolecbase/entries/prp.htm".
Protein sequence databases
FT SIGNAL 1 22FT CHAIN 23 230 MAJOR PRION PROTEIN.FT PROPEP 231 253 REMOVED IN MATURE FORM (BY SIMILARITY).FT LIPID 230 230 GPI-ANCHOR (BY SIMILARITY).FT CARBOHYD 181 181 PROBABLE.FT CARBOHYD 197 197 PROBABLE.FT DISULFID 179 214 BY SIMILARITY.FT DOMAIN 51 91 5 X 8 AA TANDEM REPEATS OF P-H-G-G-G-W-G-FT Q.FT REPEAT 51 59 1.FT REPEAT 60 67 2.FT REPEAT 68 75 3.FT REPEAT 76 83 4.FT REPEAT 84 91 5.FT VARIANT 102 102 P -> L (IN GSS).FT VARIANT 105 105 P -> L (IN GSS).FT VARIANT 117 117 A -> V (LINKED TO DEVELOPMENT OFFT DEMENTING GSS).FT VARIANT 129 129 M -> V (DETERMINES THE DISEASE PHENOTYPEFT IN PATIENTS WHO HAVE A PRP MUTATION ATFT CODON 178: PATIENTS WITH MET DEVELOP FFI,FT THOSE WITH VAL DEVELOP CJD).FT VARIANT 178 178 D -> N (IN FFI AND CJD).FT VARIANT 180 180 V -> I (IN CJD).FT VARIANT 198 198 F -> S (IN A ATYPICAL FORM OF GSS WITHFT NEUROFIBRILLARY TANGLES).FT VARIANT 200 200 E -> K (IN CJD).FT VARIANT 210 210 V -> I (IN CJD).FT VARIANT 217 217 Q -> R (IN GSS WITH NEUROFIBRILLARYFT TANGLES).FT VARIANT 232 232 M -> R (IN CJD).FT CONFLICT 118 118 MISSING (IN REF. 2).SQ SEQUENCE 253 AA; 27661 MW; FD5373AD CRC32;
MANLGCWMLV LFVATWSDLG LCKKRPKPGG WNTGGSRYPG QGSPGGNRYP PQGGGGWGQPHGGGWGQPHG GGWGQPHGGG WGQPHGGGWG QGGGTHSQWN KPSKPKTNMK HMAGAAAAGAVVGGLGGYML GSAMSRPIIH FGSDYEDRYY RENMHRYPNQ VYYRPMDEYS NQNNFVHDCVNITIKQHTVT TTTKGENFTE TDVKMMERVV EQMCITQYER ESQAYYQRGS SMVLFSSPPVILLISFLIFL IVG
//
Protein sequence databases
Protein sequence databases can be accessed through:
• Uniprot (www.ebi.uniprot.org/)
• Entrez
7
UniProt - record
They are all the result of experimental work
* X ray crystallography* NMR
Three dimensional structures of proteins,DNA and RNA are collected in the Protein Data Bank (PDB)
8
Example of PDB entry
HEADER HORMONE 30-OCT-92 1BPH 1BPH 2COMPND INSULIN (CUBIC) IN 0.1M SODIUM SALT SOLUTION AT PH9 1BPH 3SOURCE BOVINE (BOS $TAURUS) PANCREAS 1BPH 4AUTHOR O.GURSKY,J.BADGER,Y.LI,D.L.D.CASPAR 1BPH 5REVDAT 2 31-OCT-93 1BPHA 1 REMARK HET FORMUL 1BPHA 1REVDAT 1 15-JAN-93 1BPH 0 1BPH 6JRNL AUTH O.GURSKY,J.BADGER,Y.LI,D.L.D.CASPAR 1BPH 7JRNL TITL CONFORMATIONAL CHANGES IN CUBIC INSULIN CRYSTALS 1BPH 8JRNL TITL 2 IN THE PH RANGE 7-11 1BPH 9JRNL REF BIOPHYS.J. V. 63 1210 1992 1BPH 10JRNL REFN ASTM BIOJAU US ISSN 0006-3495 030 1BPH 11REMARK 1 1BPH 12REMARK 1 REFERENCE 1
ATOM 1 N GLY A 1 13.994 47.196 31.798 1.00 35.87 1BPH 129ATOM 2 CA GLY A 1 14.277 46.226 30.708 1.00 38.67 1BPH 130ATOM 3 C GLY A 1 15.574 45.507 31.085 1.00 31.18 1BPH 131ATOM 4 O GLY A 1 16.078 45.660 32.217 1.00 22.60 1BPH 132ATOM 5 N ILE A 2 16.088 44.766 30.126 1.00 28.39 1BPH 133ATOM 6 CA ILE A 2 17.342 44.034 30.404 1.00 23.76 1BPH 134ATOM 7 C ILE A 2 18.526 44.939 30.686 1.00 25.29 1BPH 135ATOM 8 O ILE A 2 19.425 44.457 31.392 1.00 18.74 1BPH 136ATOM 9 CB ILE A 2 17.571 43.072 29.158 1.00 27.36 1BPH 137ATOM 10 CG1 ILE A 2 18.638 42.049 29.605 1.00 18.03 1BPH 138ATOM 11 CG2 ILE A 2 17.859 43.936 27.903 1.00 25.54 1BPH 139ATOM 12 CD1 ILE A 2 18.914 40.930 28.590 1.00 17.07 1BPH 140ATOM 13 N VAL A 3 18.619 46.195 30.192 1.00 24.42 1BPH 141ATOM 14 CA VAL A 3 19.774 47.080 30.436 1.00 30.26 1BPH 142ATOM 15 C VAL A 3 19.952 47.453 31.895 1.00 19.08 1BPH 143ATOM 16 O VAL A 3 21.018 47.421 32.561 1.00 28.15 1BPH 144ATOM 17 CB VAL A 3 19.719 48.274 29.462 1.00 33.87 1BPH 145ATOM 18 CG1 VAL A 3 20.847 49.225 29.754 1.00 30.40 1BPH 146ATOM 19 CG2 VAL A 3 19.868 47.724 28.044 1.00 24.51
3D viewersSeveral free programs for viewing protein and nucleic 3D structures:
Cn3D www.ncbi.nlm.nih.gov/Entrez
UCSF Chimera www.cgl.ucsf.edu/chimera/
DS Visualizer www.accelrys.com/products/downloads/ds_visualizer/
Rasmol & Protein explorerwww.umass.edu/microbio/rasmol/
Chime www.umass.edu/microbio/chime/getchime.htm
DS Visualizer
9
* Entrez
* Genome browsers-Santa Cruz
Accessing molecular biology information
NCBI is the most heavily site in biomedicine.
300,000
200,000
100,000
NCBI Web Traffic – 1997-2006
400,000
January 1998
500,000
600,000
700,000
January 1999
January 2000
January 2001
January 2002
January 2003
January 2004
January 2005
January 2006
722,000 Unique IPs a Day
91 Million Web Hits a Day
3200 Peak Web Hits a Second
1.5 Terabytes FTP a Day
1.8 Million Unique Users a Day
10
Title
11
Added title words "gene" and "complete"
26 exons
12
NCBI Cn3D viewer
OMIM - Online Mendelian Inheritance in Mandatabase of human genes andgenetic disorders
NCBI - Taxonomy browser
NCBI - Taxonomy browser
Accessing molecular biology information
* Entrez
* Genome browsers-Santa Cruz
13
Santa Cruz browser - genome.ucsc.edu- chromosome 18
Zoom in on a particular gene/locus
Example : beta-globin
A subunit of hemoglobin. Hemoglobin is composed of2 alfa- and 2 beta-subunits
Chromosome 11 - HBB locus (text search : “beta globin”) Chromosome 11 - HBB locus
(text search : “beta globin”)
Betaglobingene
Zoomed in on HBB
Protein coding region
untranslated region
intronArrows show polarity
14
HBB betaHBD delta (minor form of hemoglobin)HBG1 A-gamma (fetal hemoglobin)HBG2 G-gamma (fetal hemoglobin)HBE1 epsilon (embryonic hemoglobin)
Configuration of ‘tracks’
Chromosome 11 - HBB locus (text search : “beta globin”)
LINEsSINEs
Comparativeanalysis -similarity toother species
Betaglobingenes
Finding a region of interest in the genome
* Text search (“beta globin”)
* BLAT search (based on sequence similarity)
Jim Kent
BLAT with the HBB amino acid sequence
15
One of the BLAT hits seem to be a pseudogene
Podocin - a kidney specific protein
Protein genes
PseudogenesRepetitive element - SINES/LINESCgG islandsVariation between individuals - SNPs
Gene expression data
Examples of information available at UCSC browser
ENSEMBL www.ensembl.org
16
ENSEMBL www.ensembl.orgChromosome 18
Sequencing methods
1977 Walter Gilbert – A. Maxam (chemical modification)
Sequencing by enzymatic synthesis 1975 F. Sanger (chain termination)1984 Ligation based (SOLID, Applied Biosystems)1988 Pyrosequencing (454, Roche)1994 Reversible dye terminators (Solexa, Illumina)
454 – pyrosequencing (Roche)Detects the activity of DNA polymerase with a chemiluminescent enzymeby synthesizing the complementary strand.
Schematic representation of the progress of the enzyme reaction in solid-phase pyrosequencing
Ronaghi M Genome Res. 2001;11:3-11
©2001 by Cold Spring Harbor Laboratory Press
17
Pyrogram of the raw data obtained from liquid-phase pyrosequencing
Ronaghi M Genome Res. 2001;11:3-11
©2001 by Cold Spring Harbor Laboratory Press
Solexa (Illumina) reversible terminator sequencing
ss DNA Enzymatically synthesize its complementary strand Detect fluorescence of one nucleotide at a timeRemove the blocking group (reversible terminator)Polymerization of another nucleotide
GCAGCTATTACGGCTATCTGACCGTCGATAAT
GT AC
G
TAC
G
terminatordNTPs
Sequencing by ligation(SOLID - Applied Biosystems)
The method:
It is based on sequential ligation of dye labeled oligonucleotideprobes whereby each probe queries two base positions at a time
DNA ligase rather than polymerase
The system uses 4 fluorescent dyes to enconde for the 16 possible two base combinations
Multiple ligation cycles of probe hybridization, ligation, imaging an analysis are preformed
The resulting product is the removed
The process is repeated for 5 more extension rounds with primershybridized to position n-1, n-2, etc in th adaptor.
http://www3.appliedbiosystems.com/AB_Home/applicationstechnologies/SOLiDSystemSequencing/index.htm
2-base color encoding data
1 dye = 4 possible di-nucelotides
2 bases are interrogated in each ligation reaction providing increased specificity
Primer round 1
18
Primer round 2
Total of 5 primer rounds
Each sequence is interrogated twice in different reactionsimproves the signal to noise ratio
Decoding
Color space
Possible dinucleotides
Base zero Decoded sequence
Base space sequence
top related