functional annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11...
TRANSCRIPT
![Page 1: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/1.jpg)
Functional Annotation基因功能预测
唐海宝
基因组与生物技术研究中心2013年 11月 23日
![Page 2: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/2.jpg)
Functional Annotation
?
![Page 3: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/3.jpg)
Name that protein
?C2H2 zinc finger proteins
Calmodulin and calmodulin-related calcium sensor proteinsCellulose Synthase Gene Family
Cysteine Rich PeptidesCytochrome P450
Early Auxin-responsive Aux/IAA Gene FamilyF-Box Proteins
Glycosyl Hydrolase MADS-box family
Serine ProteasesWRKY family
……
![Page 4: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/4.jpg)
Erythropoietin (促红细胞生成素 )
![Page 5: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/5.jpg)
Myostatin (肌肉生长限制因子 )
![Page 6: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/6.jpg)
Outline
• Basic Searches to Run
• Advanced Assignments
• Protein Families
• Naming Genes
![Page 7: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/7.jpg)
1. Basic Searches to Run
![Page 8: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/8.jpg)
Basic Searches to Run
• BLAST (nucleotide or protein homology) Non-redundant protein sequences (nr) UniRef (UniProt - Swiss-Prot, TrEMBL) Trusted genomes (TAIR)
• CDD (NCBI’s Conserved Domain Database) • Interpro (protein families, domains and functional sites)• HMMER or SAM (searches using statistical descriptions)
Pfam (database of protein families and HMMs) TIGRFAMS (protein family based HMMs) SCOP (Structural domains) TMHMM (Transmembrane domains)
• SignalP (signal peptide cleavage sites)• TargetP (subcellular location)• Many others
![Page 9: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/9.jpg)
Web BLAST
• NCBI Blast http://www.ncbi.nlm.nih.gov/blast/• WU blast http://genome.wustl.edu/tools/blast/• Uniprot-swissprot blast http://www.uniprot.org/• Phytozome http://www.phytozome.net/search.php• The Gene Indices http://compbio.dfci.harvard.edu/tgi/• Sanger projects http://www.sanger.ac.uk/DataSearch/• TAIR - http://www.arabidopsis.org/Blast/index.jsp
![Page 10: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/10.jpg)
CDD
• Collection of multiple sequence alignments • Contains protein domain models imported from outside sources, such as Pfam, SMART, COGs (Clusters of Orthologous Groups of proteins), PRK (PRotein Klusters), and are curated at NCBI.
![Page 11: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/11.jpg)
InterPro
• Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
![Page 12: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/12.jpg)
Hidden Markov Model
• Databases of HMM domains to search:• Pfam: http://www.sanger.ac.uk/Software/Pfam/
• TIGRFAMs: http://www.jcvi.org/cms/research/projects/tigrfams/overview/
• SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
• TMHMM: http://www.cbs.dtu.dk/services/TMHMM/
• Tools to use:• HMMER, HMMPFAM: http://hmmer.janelia.org/
![Page 13: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/13.jpg)
Pfam
• For each family in Pfam you can:• Look at multiple alignments
• View protein domain architectures
• Examine species distribution
• Follow links to other databases
• View known protein structures
![Page 14: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/14.jpg)
TMHMM
• Predicts transmembrane helices in integral membrane proteins using HMM’s
![Page 15: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/15.jpg)
SignalP
• Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms.
• Based on a combination of artificial neural networks and HMMs.
![Page 16: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/16.jpg)
TargetP
• TargetP predicts the subcellular location of eukaryotic proteins.
• The location assignment is based on the predicted presence of any of the N-terminal presequences:• chloroplast transit peptide (cTP)
• mitochondrial targeting peptide (mTP)
• secretory pathway signal peptide (SP)
![Page 17: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/17.jpg)
Gene function evidence
![Page 18: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/18.jpg)
2. Advanced Assignments
![Page 19: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/19.jpg)
Advanced Assignments
• Enzyme Commission (EC) Numberhttp://www.chem.qmul.ac.uk/iubmb/enzyme/
• Gene Ontology (GO) Terms• Pathways
KEGG MetaCyc Pathway Tools
![Page 20: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/20.jpg)
Assigning EC Number
• EC classification scheme is a hierarchical numerical classification based on the chemical reactions enzymes catalyze.
• Every enzyme code consists of four numbers separated by periods. Ex.- EC 1.1.1.1- alcohol dehydrogenase
• EC numbers may be assigned computationally.• There are many available tools and methods for predicting EC
numbers and pathways.• Common problems:
The computational method may not be specific for assigning EC number to the enzymes. It may be accurate to decide an enzyme family for a gene rather than a specific enzyme. To be precise, the fourth number (Ex. 1.1.1-) is often left blank.
![Page 21: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/21.jpg)
GO Terms
• Gene Ontology (Gene Ontology Consortium™ ) is a method used to structure biological knowledge using a dynamic controlled vocabulary across organisms.
Molecular function (MF)– What the gene product does– Think ‘activity’– Ion channel activity
Biological process (BP)– A biological objective– Ion transport, transmembrane transport
Cellular component (CC)– Location in the cell (or smaller unit)– Or part of a complex– Membrane, plasma membrane
• You can obtain GO for any sequence using tools like: BLAST2GO INTERPRO2GO
![Page 22: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/22.jpg)
View Pathways
• Graphical interface for users to visualize the substrates, final products and steps in a completed pathway catalyzed by an enzyme (gene).
KEGG: http://www.genome.jp/kegg/tool/search_pathway.html MetaCyc: http://metacyc.org Pathway Tools: http://bioinformatics.ai.sri.com/ptools
![Page 23: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/23.jpg)
Pathway Tools
![Page 24: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/24.jpg)
3. Protein Families
![Page 25: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/25.jpg)
Why Compute Protein Families?
• To group proteins by probable function• To identify possible gene structure problems• To identify evolutionary relationships between
protein families.• Gene naming and Transposable Element
assignment
![Page 26: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/26.jpg)
Domain Based Protein Families(Paralogous families)
Identify Pfam andall vs all blastP based domains
protein sequences
Families grouped based on type and number of
domains
![Page 27: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/27.jpg)
Domain Based Protein Families(Paralogous families)
Identify Pfam andall vs all blastP based domains
protein sequences
9 family members contain:
PF00027 - Cyclic nucleotide-binding domain
PF00520 - Ion transport protein
para_246
![Page 28: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/28.jpg)
OrthoMCL/TribeMCL Protein Clustering
• Markov clustering method for grouping proteins into families• http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/TribeMCL.html
Nucleic Acids Res. 2002 April 1; 30(7): 1575–1584.
![Page 29: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/29.jpg)
4. Naming Genes
![Page 30: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/30.jpg)
Functional Assignments
NameDescriptive common name for the protein, with as much
specificity as the evidence supports; gene symbol.
RoleDescribe what the protein is doing in the cell and why.
Associated information:Supporting evidence: Domain and motifs
EC number if protein is an enzymeParalogous family membership
![Page 31: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/31.jpg)
Naming convention
![Page 32: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/32.jpg)
Methods to name gene products
1. Top BLAST hit to database of choice
2. Manually aggregate evidence from multiple sources
3. Automated Assignment of Human Readable Descriptions (AHRD) https://github.com/groupschoof/AHRD
![Page 33: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/33.jpg)
Automated Human Readable Description (AHRD)
![Page 34: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/34.jpg)
Automated Human Readable Description (AHRD)
![Page 35: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/35.jpg)
Automated Human Readable Description (AHRD)
https://github.com/groupschoof/AHRD
![Page 36: Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日](https://reader033.vdocuments.site/reader033/viewer/2022061401/56649efb5503460f94c0d345/html5/thumbnails/36.jpg)
练习•已知蛋白序列,命名•使用在线工具查找结构域和功能域