![Page 1: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/1.jpg)
Carlo Colantuoni&
Rafael Irizarry
April 19, 2006
Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor
![Page 2: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/2.jpg)
Biological Setup
Every cell in the human body contains the entire human genome: 3.3 Gb or ~30K genes.
The investigation of gene expression is meaningful because different cells, in different environments, doing different jobs express different genes.
Tasks necessary for gene expression analysis:
Define what a gene is.
Identify genes in a sea of genomic DNA where <3% of DNA is contained in genes.
Design and implement probes that will effectively assay expression of ALL (most? many?) genes simultaneously. Cross-reference these probes.
![Page 3: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/3.jpg)
Cellular Biology, Gene Expression, and Microarray Analysis
DNA
RNA
Protein
![Page 4: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/4.jpg)
AAAAA
Gene: Protein coding unit of genomic DNA with an mRNA intermediate.
START STOPprotein coding
5’ UTR 3’ UTR
mRNA
GenomicDNA 3.3 Gb
DNAProbe
~30K genes
Sequence is a Necessity
![Page 5: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/5.jpg)
From Genomic DNA to mRNA Transcripts
EXONS INTRONS
RNA editing & SNPs
Alternative splicingAlternative start & stop sites in same RNA molecule
~30K
>30K
Transcript coverage Homology to other transcripts
Hybridization dynamics 3’ bias
Protein-coding genes are not easy to find - gene density is low, and exons are interrupted by introns.
![Page 6: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/6.jpg)
Sequence Quality!
Redundancy!
Completeness?
Unsurpassed as source of expressed sequence
Chaos?!?
![Page 7: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/7.jpg)
From Genomic DNA to mRNA Transcripts
~30K
>30K
>>30K
![Page 8: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/8.jpg)
Transcript-BasedGene-Centered Information
![Page 9: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/9.jpg)
Possible mis-referencing:Genomic GenBank Acc.#’sReferenced ID has more NT’s than probeOld DB buildsDB or table errors – copying and pasting 30K rows in excel …
Using RefSeq’s can help.
Design of Gene Expression Probes
Content: UniGene, Incyte, Celera Expressed vs. Genomic
Source: cDNA libraries, clone collections, oligos
Cross-referencing of array probes (across platforms):
Sequence <> GenBank <> UniGene <> HomoloGene
![Page 10: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/10.jpg)
From Genomic DNA to mRNA Transcripts
![Page 11: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/11.jpg)
![Page 12: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/12.jpg)
![Page 13: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/13.jpg)
From Genomic DNA to mRNA Transcripts
![Page 14: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/14.jpg)
![Page 15: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/15.jpg)
http://www.ncbi.nlm.nih.gov/Entrez/
![Page 16: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/16.jpg)
Functional Annotation of Lists of Genes
KEGGPFAM
SWISS-PROTGO
DRAGONDAVID
BioConductor
![Page 17: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/17.jpg)
Analysis of Functional Gene Groups
![Page 18: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/18.jpg)
![Page 19: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/19.jpg)
![Page 20: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/20.jpg)
![Page 21: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/21.jpg)
![Page 22: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/22.jpg)
![Page 23: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/23.jpg)
![Page 24: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/24.jpg)
•One of the largest challenges in analyzing genomic data is associating the experimental data with the available metadata, e.g. sequence, gene annotation, chromosomal maps, literature.
•The annotate and AnnBuilder packages provides some tools for carrying this out.
•Using AnnBuilder. It is possible to build associations with specific gene lists, eg. hgu95a package for Affymetrix HGU95A GeneChips.
•The annotate package maps to GenBank accession number, LocusLink LocusID, gene symbol, gene name, UniGene cluster, chromosome, cytoband, physical distance (bp), orientation, Gene Ontology Consortium (GO), PubMed PMID.
![Page 25: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/25.jpg)
Analysis of Functional Gene Groups
![Page 26: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/26.jpg)
Functional Gene/Protein Networks
DIPBINDMINTHPRD
PubGenePredicted Protein Interactions
![Page 27: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/27.jpg)
Analysis of Gene Networks
![Page 28: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/28.jpg)
![Page 29: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/29.jpg)
9606 is the Taxonomy ID for Homo Sapiens
![Page 30: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/30.jpg)
![Page 31: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/31.jpg)
![Page 32: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/32.jpg)
![Page 33: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/33.jpg)
Predicted Human Protein Interactions
![Page 34: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/34.jpg)
Predicted Human Protein Interactions
Used high-throughput protein interaction experiments from fly, worm, and yeast to predict human protein interactions.
Human protein interaction is predicted if both proteins in an interaction pair from other organism have high sequence homology to human proteins.
>70K Hs interactions predicted>6K Hs genes
![Page 35: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/35.jpg)
Analysis of Gene Networks
![Page 36: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/36.jpg)
Carlo ColantuoniClinical Brain Disorders Branch, NIMH, NIH
Dept. Biostatistics, [email protected]
Thanks to …
Rafael Irizarry
Scott Zeger
Jonathan Pevsner
![Page 37: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/37.jpg)
![Page 38: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/38.jpg)
http://www.ncbi.nlm.nih.govhttp://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nih.gov/Genbank/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotidehttp://www.ncbi.nlm.nih.gov/dbEST/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Proteinhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genehttp://www.ncbi.nlm.nih.gov/LocusLink/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigenehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologenehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIMhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMedhttp://www.ncbi.nlm.nih.gov/PubMed/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cddhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snphttp://www.ncbi.nlm.nih.gov/SNP/http://eutils.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/geo/http://www.ncbi.nlm.nih.gov/RefSeq/
FTP:ftp://ftp.ncbi.nlm.nih.gov/ftp://ftp.ncbi.nlm.nih.gov/repository/UniGeneftp://ftp.ncbi.nih.gov/pub/HomoloGene/
NCBI Web Links
![Page 39: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/39.jpg)
NUCLEOTIDE:
http://genome.ucsc.edu/
http://www.embl-heidelberg.de/
http://www.ensembl.org/
http://www.ebi.ac.uk/
http://www.gdb.org/
http://bioinfo.weizmann.ac.il/cards/index.htmlhttp://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl
PATHWAYS and NETWORKS:
http://www.genome.ad.jp/kegg/
ftp://ftp.genome.ad.jp/pub/kegg/ (http://www.genome.ad.jp/anonftp/)
http://dip.doe-mbi.ucla.edu
http://dip.doe-mbi.ucla.edu/dip/Download.cgi
http://www.blueprint.org/bind/
http://www.blueprint.org/bind/bind_downloads.html
http://160.80.34.4/mint/index.php
http://160.80.34.4/mint/release/main.php
http://www.hprd.org/
http://www.hprd.org/FAQ?selectedtab=DOWNLOAD+REQUESTS
http://www.pubgene.org/ (also .com)
PROTEIN:
http://us.expasy.org/
ftp://us.expasy.org/
http://www.sanger.ac.uk/Software/Pfam/
http://www.sanger.ac.uk/Software/Pfam/ftp.shtml
http://smart.embl-heidelberg.de/
http://www.ebi.ac.uk/interpro/
http://us.expasy.org/prosite/
ftp://us.expasy.org/databases/prosite/
More Web Links
http://www.bioconductor.org/http://apps1.niaid.nih.gov/david/http://www.geneontology.org/http://discover.nci.nih.gov/gominer/index.jsphttp://pubmatrix.grc.nia.nih.gov/http://pevsnerlab.kennedykrieger.org/dragon.htm
![Page 40: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/40.jpg)
SAVAGE:
Detection of More Subtle Functionally Related Groups
of Gene Expression Changes
![Page 41: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/41.jpg)
EXP#1
Swiss-Prot
30KPFAM
KEGG
~3K
10K
~40K annotations
DRAGON SAVAGE
Differential Expression of FunctionalGene Groups within One Experiment
![Page 42: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/42.jpg)
EXP#4EXP#3EXP#2EXP#1BioDB
Differential Expression of a Single FunctionalGene Group Across Multiple Experiments
DRAGON
SAVAGE
![Page 43: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/43.jpg)
Similar Differential Expression Patterns Across Multiple Experiments
p value
0.0
<0.1
ALL
CN
CN
CN
CN
The distribution of gene expression values for each gene group in each sample is plotted as a single point in low dimensional space. This is achieved using Principal Components Analysis along with Non-Metric Multi-Dimensional Scaling.
1
1
EX
P#1
EX
P#1
2
2
EX
P#2
EX
P#2
5
4
3
5
4
3
X
CN
X
![Page 44: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/44.jpg)
PING:
Detection of Differential Expression in Functional
Networks of Proteins
![Page 45: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/45.jpg)
Interaction Networks in Gene Expression Data
![Page 46: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/46.jpg)
Large Protein Interaction Network
Network Regulated in Sample #1
![Page 47: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/47.jpg)
Network Regulated in Sample #1
Network Regulated in Sample #2
Large Protein Interaction Network
![Page 48: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/48.jpg)
Network Regulated in Sample #1
Network Regulated in Sample #2
Network Regulated in Sample #3
Large Protein Interaction Network
![Page 49: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/49.jpg)
Networkof Interest
Network Regulated in Sample #1
Network Regulated in Sample #2
Network Regulated in Sample #3
Large Protein Interaction Network
PING
![Page 50: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/50.jpg)
1
10
100
1000
10000
100000N
T's
in G
en
Ban
k (m
illio
ns)
1984 1994 2004
![Page 51: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/51.jpg)
Genomic DNA Content
1. Interspersed repeats (~1/2 Hs. genome)2. (Processed) pseudogenes3. Simple sequence repeats4. Segmental duplications (~5% Hs. genome)5. Blocks of tandem repeats (can be very large)6. Genes: Promoters - Exons – Introns <3%
defining what a gene is - protein coding unit of genomic DNA with an mRNA intermediateidentifying genes within genomic DNA
protein-coding genes (mRNA)functional RNA genes - tRNA, rRNA, snoRNA, snRNA, miRNA
prokaryotes eukaryotes
![Page 52: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/52.jpg)
AAAAA
Gene: Protein coding unit of genomic DNA with an mRNA intermediate.
START STOPprotein coding
5’ UTR 3’ UTR
mRNA
GenomicDNA 3.3 Gb
Protein
![Page 53: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/53.jpg)
AAAAA
Gene: Protein coding unit of genomic DNA with an mRNA intermediate.
START STOPprotein coding
5’ UTR 3’ UTR
mRNA
GenomicDNA 3.3 Gb
Protein
~30K genes
Sequence is a Necessity
![Page 54: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/54.jpg)
How is a gene definedin “wet” biology and in silico?
Seq. from mRNA sample
Seq. on array
Array probe design:
Source – cDNA libraries, oligos, clone collections
Content – UniGene, Celera, Incyte
Transcript coverage
Homology to other transcripts
Hybridization dynamics – hyper-multiplex hyb rxn
Empirical validation
3’ bias
Alt. splicing - known and not
Alt. start / stop site in same RNA molecule
Less important: RNA editing, SNPs
Cross-referencing of array probes:GenBank <> UniGene <> HomoloGene
Possible mis-referencing:Genomic GenBank Acc.#’sReferenced ID has more NT’s than probeOld DB buildsDB or table errors
![Page 55: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/55.jpg)
Finding genes in eukaryotic DNA
ORF identification – Three Letter Genetic Code (codons) 4*4*4. It is possible to translate any stretch of genomic DNA into protein, but that doesn’t mean we have identified a protein coding gene!
There are several kinds of exons:-- non-coding-- initial coding exons-- internal exons-- terminal exons-- some single-exon genes are intronless
![Page 56: Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor](https://reader036.vdocuments.site/reader036/viewer/2022062515/56649cc45503460f9498d9d6/html5/thumbnails/56.jpg)
What We Are Going To Cover
Cells, Genes, Transcripts –> Genomics Experiments
Sequence Knowledge Behind Genomics Experiments
Annotation of Genes in Genomics Experiments