developing novel web-based bioinformatics analysis tools for comparative genomics kashi vishwanath...

28
Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary Advisor: Dr. Qunfeng Dong, The Center for Genomics and Bioinformatics (CGB) 1

Upload: silas-bell

Post on 16-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009

Primary Advisor:Dr. Qunfeng Dong, The Center for Genomics and Bioinformatics (CGB)

1

Page 2: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Introduction•Comparative genomics

▫It is the analysis and comparison of genomes from different species.

•Identify▫gene duplications.▫gene inversions.▫gene translocations.▫gene clusters.▫orthologs and paralogs.

2

Page 3: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Overview

•Blast Output Visualization (BOV) Tool.▫visual representation of BLAST output.▫Perl scripts from Rajesh Gollapudi, CGB.

•Comparative Genome Cluster Viewer (CGCV)▫gene clusters across multiple genomes.▫database developed by Vivek

Krishnakumar, CGB.•Multiple Genome Browser (MGB)

▫synteny regions between genomes.

3

Page 4: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

BOV:BLAST Output Visualization Tool

4

Page 5: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Motivation• Commonly used tool for comparative genomics

▫ Basic Local Alignment Search Tool (BLAST)* web based at NCBI or Standalone local installation. input – nucleotide/protein sequence(s) database – nucleotide sequences of genes or genomes, or protein sequence. output – textual format.

• BLAST output consists of High-scoring Segment Pairs (HSPs) that correspond to matching pair between the query and the database hit sequence.

• Manual interpretation of these regions can/will be difficult.

5

*Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.

Page 6: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Requirement• Post-processing BLAST Output.• Programs are available to

- flexibly select BLAST matching regions. (e.g. MuSeqBox, BioParser).

- parse the output into database to facilitate keyword search. (e.g. NuclearBLAST program, PLAN web server).

Need• A tool for graphical representation of HSPs,

extracted from the BLAST output and provide options to interactively select and analyze.

6

Page 7: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Specifications•To develop the tool

▫parse uploaded BLAST output.▫extract HSP co-ordinates.▫store the information in the database.▫provide summary of query sequences and

corresponding hit sequences.▫generate visual representation of HSPs.▫ability to manipulate the HSPs.

7

Page 8: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

CGB server(Perl 5, Linux Platform)

Web interface(DHTML, Perl, CGI)

Blast Output(BLASTN/P/X, TBLASTN/X)

Perl Scripts(BioPerl Modules)

MySQL(HSPs, Projects, ..)

Email

Summary

Create Image(Perl GD Library)

Visualization(Javascript)

Download(Sequences, HSP,

image, ..)

Implementation8

Page 9: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

BLAST output submission

Screenshots9

Query information

Page 10: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Screenshots10

Page 11: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Screenshots11

Page 12: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Program Release• BOV ver-1.0.7 is live and hosted at

▫http://bioportal.cgb.indiana.edu/bov• Web-pages

• in-depth tutorial on using the tool.• download and installation manual.

Publication• Rajesh Gollapudi*, Kashi Vishwanath

Revanna*, Chris Hemmerich, Sarah Schaack, and Qunfeng Dong (2008); BOV - A Web-based BLAST Output Visualization Tool. BMC Genomics. 2008 Sep 15;9(1):414.

* contributed equally

12

Page 13: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

CGCV:Comparative Genome Cluster Viewer

13

Page 14: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Motivation• Standard practice in comparative genomics

▫ identification of conserved gene clusters across multiple genomes.• Existing tools rely on pre-computation strategies and

algorithms that are genome wide and computationally intensive.

• Genome-wide orthologs for all gene families based on identifying reciprocal best BLAST hits.

• Limitations:• no optimal universal BLAST parameters for all gene families• distinguishing orthologs from paralogs on a genome-wide scale• when new organisms are available, time-consuming updates.

Requirement• Updated Database.• A tool which considers only a set of genes, perform dynamic

search against selected genomes and interactively visualize the gene cluster conservation across the selected genomes.

14

Page 15: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Specification• To develop the web-based tool

▫maintain database of Prokaryotic and Eukaryotic sequences, annotated gene information.

▫Database in-sync with NCBI and Ensembl▫Use BLAST program to blast uploaded query

sequences.▫User selects the BLAST database and parameters.▫Generate Phylogenetic Profiling Table,

i.e., count of HSPs against a given genome with respect to each query sequence.

▫Provide interactive tools to manipulate the visual representation of the gene clusters across genomes.

15

Page 16: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

CGB Server(Perl 5, Linux Platform)

Web Interface(DHTML, Perl, CGI, Ajax)

- Select Genomes- Query

SequencesBLAST Program

Perl Scripts(BioPerl Modules)

Email

Phylogenetic Profiling Table

Create Image(Perl, GD Library)

Visualization(Javascript)NCBI

MySQL(Sequences, GFF, GTF)

Ensembl

Perl Scripts(download,

daily updates) GFF format file

Database (CGB)

Implementation16

Download(BLAST

output, ..)

Page 17: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Screenshots17

Page 18: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Screenshots18

Page 19: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

19

Page 20: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Program Release• CGCV ver-1.0.5 is live and hosted at

▫http://cgcv.cgb.indiana.edu/• Web pages also provide

▫in-depth tutorial to use the tool▫step-by-step procedure for local installation.▫update information on database.

Publication:• Kashi Vishwanath Revanna, Vivek

Krishnakumar & Qunfeng Dong (2009) A web-based software system for dynamic gene cluster comparison across multiple genomes. Bioinformatics, 25(7):956-957

20

Page 21: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

MGB: Multiple Genome Browser

21

Page 22: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Motivation• Comparative Genomics involves determination

of the synteny regions between two or more genomes.

• Synteny is the preserved order of genes between related species.

• Currently available tools like SynBrowse*, provide visualization of synteny between genomes but it involves pre-computation of alignments.

* Pan X, Stein L, Brendel V: SynBrowse, a synteny browser for comparative sequence analysis. Bioinformatics 2005, 21(17):3461-3468.

22

Page 23: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Specification

•To develop a web-based tool for visualizing synteny for multiple genomes.

•To allow users to determine the synteny by using their choice of sequence comparison methods/tools.

•To be portable with simple installation procedure.

23

Page 24: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Progress

•Currently building this tool.•Expected time of completion – End of

June.

24

Page 25: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Conclusion

•Web-based tools were built to assist a Biologist in Comparative Genomics.

•Design, implementation, testing, maintenance and provide support.

•Balance between usability, functionality and portability.

•Future work▫further development.▫incorporate these tools in their workflow.

25

Page 26: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

References• Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ:

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.

• Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD, Kurtz S, Lushbough C, Brendel V: Comparative plant genomics resources at PlantGDB. Plant Physiol 2005, 139(2):610-618.

• Xing L, Brendel V: Multi-query sequence BLAST output examination with MuSeqBox. Bioinformatics 2001, 17(8):744-745.

• Catanho M, Mascarenhas D, Degrave W, de Miranda AB: BioParser: a tool for processing of sequence similarity analysis reports. Appl Bioinformatics 2006, 5(1):49-53.

• Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res 2002, 12(10):1611-1618.

• Pan X, Stein L, Brendel V: SynBrowse: a synteny browser for comparative sequence analysis. Bioinformatics 2005, 21(17):3461-3468.

• Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger JC: SynView: a GBrowse-compatible approach to visualizing comparative genome data. Bioinformatics 2006, 22(18):2308-2309.

• Fong C, et al. PSAT: a web tool to compare genomic neighborhoods of multiple prokaryotic genomes. BMC Bioinformatics (2008) 9:170.

• Koski LB, Golding GB. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. (2001) 52:540–542.

• Markowitz VM, et al. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. (2008) 36:D528–D533.

• Uchiyama I, et al. CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes. BMC Bioinformatics (2006) 7:472.

26

Page 27: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

Acknowledgment• Dr. Qunfeng Dong.

▫Bioinformatics Director, The Center for Genomics and Bioinformatics (CGB)

• Bioinformatics Faculty and Staff, School of Informatics.

• Friends and Colleagues at CGB for their support and resources.

• Special Thanks to my family.

Thank You.

27

Page 28: Developing novel web-based Bioinformatics analysis tools for Comparative Genomics Kashi Vishwanath Revanna, Capstone Presentation, May 1, 2009 Primary

28