computational analysis of transcript identification using genbank
Post on 21-Dec-2015
222 views
TRANSCRIPT
![Page 1: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/1.jpg)
Computational Analysis of Transcript Identification Using
GenBank
![Page 2: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/2.jpg)
Differentiation of hematopoietic cellsPluripotent stem cell
Myeloid Lymphoid
Erythrocyte PlateletMonocyteNeutrophil Eosinophil Basophil B cell T cell
Pluripotent stem cellMyeloid LymphoidMyeloid Lymphoid
![Page 3: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/3.jpg)
![Page 4: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/4.jpg)
![Page 5: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/5.jpg)
![Page 6: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/6.jpg)
Genome-wide gene expression
number of expressed genes level of expression
100
< 5 mRNA / cell
5--50 mRNA / cell
>500 mRNA / cell
9,000
900
![Page 7: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/7.jpg)
![Page 8: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/8.jpg)
![Page 9: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/9.jpg)
![Page 10: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/10.jpg)
![Page 11: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/11.jpg)
![Page 12: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/12.jpg)
![Page 13: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/13.jpg)
![Page 14: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/14.jpg)
![Page 15: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/15.jpg)
![Page 16: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/16.jpg)
![Page 17: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/17.jpg)
![Page 18: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/18.jpg)
SAGE (Serial Analysis of Gene Expression)
isolate SAGE tags
link tags together& sequencing
AAAAAAAAA
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAAA
AAAAAAA
AAAAAAAA
gene identification
mRNA/cDNA
![Page 19: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/19.jpg)
![Page 20: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/20.jpg)
![Page 21: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/21.jpg)
SAGE & GLGI Overview
SPGI
SAGE
identify most of expressed genes
quantitative analysis of expressed genes by collecting tags
GLGI
Gene identification
GenBank
collect cDNA clones
mRNA
extend tags into longer 3' cDNAs
multi-match
single-match
no match
matchmatch
![Page 22: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/22.jpg)
SAGE tags match to many genes(Tags from Hashimoto S, et al. Blood 94:837, 1999)
Tags matched gene numbers Matched genes (only show up to 10)
CCTGTAATCC 405 Hs.267557,Hs.240615,Hs.231705,Hs.283045,Hs.236713,Hs.232277,Hs.181553,Hs.262716,Hs.181392,Hs.220696GTGAAACCCC 305 Hs.282868,Hs.170225,Hs.184220,Hs.194021,Hs.231625,Hs.171830,Hs.270571,Hs.270572,Hs.272193,Hs.283921CCACTGCACT 174 Hs.118778,Hs.256868,Hs.96023,Hs.31575,Hs.47517,Hs.200451,Hs.271222,Hs.253240,Hs.270018,Hs.270415ACTTTTTCAA 44 Hs.16426,Hs.10669,Hs.75155,Hs.28166,Hs.13975,Hs.79136,Hs.111334,Hs.133430,Hs.79356,Hs.239100TTGGGGTTTC 9 Hs.231375,Hs.273127,Hs.275603,Hs.175173,Hs.276612,Hs.224773,Hs.62954,Hs.182771,Hs.276326TGCACGTTTT 8 Hs.199160,Hs.279943,Hs.36927,Hs.5338,Hs.169793,Hs.83450,Hs.173902,Hs.183506TGTGTTGAGA 5 Hs.284136,Hs.275865,Hs.275221,Hs.274466,Hs.181165CCCGTCCGGA 5 Hs.276353,Hs.277498,Hs.277573,Hs.276350,Hs.180842TTGGTCCTCT 4 Hs.12328,Hs.108124,Hs.9739,Hs.112845CTGACCTGTG 3 Hs.277477,Hs.181244,Hs.77961TACCTGCAGA 3 Hs.100000,Hs.256957,Hs.253884AGGCTACGGA 3 Hs.119122,Hs.211582,Hs.183297GGGCTGGGGT 3 Hs.183698,Hs.118757,Hs.90436CCCTGGGTTC 2 Hs.52891,Hs.111334CACAAACGGT 2 Hs.2043,Hs.195453GTGAAGGCAG 2 Hs.4221,Hs.77039GGGCATCTCT 2 Hs.75061,Hs.76807ATGGCTGGTA 2 Hs.254246,Hs.182426CGCCGCCGGC 2 Hs.182825,Hs.132753AGGGCTTCCA 2 Hs.29797,Hs.276544TTGGTGAAGG 2 Hs.278674,Hs.75968GTGGCCACGG 1 Hs.112405GTTCACATTA 1 Hs.84298TGGTGTTGAG 1 Hs.275865CCCATCGTCC 1 Hs.151604GTTGTGGTTA 1 Hs.75415TTGTAATCGT 1 Hs.125078CCCACAACCT 1 Hs.252136GAGGGAGTTT 1 Hs.76064CCAGAACAGA 1 Hs.111222
![Page 23: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/23.jpg)
Tag Frequency Groups for 10-base Tag Set
Containing 878,938 Tags for UniGene Human
![Page 24: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/24.jpg)
Unique Tags among 878,938 EST Derived Tags
![Page 25: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/25.jpg)
Unique Tags among 32,851 Gene Derived Tags
![Page 26: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/26.jpg)
Converting tag into longer 3’ sequence
3' end
3' end5' end
SAGE tag
3' longer sequence
![Page 27: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/27.jpg)
Generation of Longer 3'cDNA for Gene Identification (GLGI)
TAAAAAAAAAAACTCGCCGGCGAANNNNNNNNNNATTTTTTTTTTTGAGCGGCCGCTT
10 bases
hundred bases
TAAAAAAAAAAACTCGCCGGCGAANNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
Sense extension
antisense extension TGAGCGGCCGCTT
nnnnnnnnnn
nnnnnnnnnn
nnnnnnnnnn
nnnnnnnnnn
nnnnnnnnnn
nnnnnnnnnn
SAGE tag
TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT
TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT
TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT
TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT
![Page 28: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/28.jpg)
UniGene Human 3’ Part Length Distribution
![Page 29: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/29.jpg)
![Page 30: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/30.jpg)
Number of Tags which Move for k to k+25
![Page 31: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/31.jpg)
Unique Tags among 878,938 EST Derived Tags
![Page 32: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/32.jpg)
Unique Tags among 32,851 Gene Derived Tags
![Page 33: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/33.jpg)
Idealized Construction
![Page 34: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/34.jpg)
Random Model
![Page 35: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/35.jpg)
Ideal Case Tag Count Progression
![Page 36: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/36.jpg)
Myeloid Tag Matches with UniGene Human SAGE Tag Reference Database
![Page 37: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/37.jpg)
SAGE Tag Processing with GIST
![Page 38: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/38.jpg)
k-mer tree
![Page 39: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/39.jpg)
![Page 40: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/40.jpg)
GIST Performance with Improved IO
![Page 41: Computational Analysis of Transcript Identification Using GenBank](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649d605503460f94a41bb9/html5/thumbnails/41.jpg)
Conspirators
Sanggyu LeeJanet D. RowleySan Ming Wang
Terry ClarkAndrew HuntworkJosef JurekL. Ridgway Scott