msi all staff galaxy-p july 2013galaxyp.org/wp-content/uploads/2017/08/msi-galaxy-p-july... ·...

30
Galaxy-P Update: Project-based approach to developing resources for proteomics research. Brief History of Galaxy. Tools and WorkFlows. NSF Grant : Highlights. Galaxy-P Team. Project-based Approach. Proteogenomics Workflow. ASMS Posters and Workshop. Public Server at ASMS. ASMS Workshop. Benefits to the University. Galaxy-P Year Two: Challenges and Opportunities. MSI’s Role. Summary. Acknowledgements.

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Galaxy-P Update: Project-based approach to developing resources for

proteomics research.

• Brief History of Galaxy.• Tools and WorkFlows.• NSF Grant : Highlights.

• Galaxy-P Team.• Project-based Approach.

• Proteogenomics Workflow.• ASMS Posters and Workshop.

• Public Server at ASMS.• ASMS Workshop.

• Benefits to the University.• Galaxy-P Year Two: Challenges and Opportunities.

• MSI’s Role.• Summary.

• Acknowledgements.

Page 2: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Brief History of Galaxy.

What is Galaxy?A web-based bioinformatics data analysis platform.

Originally designed to address issues in genomic informatics including:• Software accessibility and usability• Analytical transparency• Reproducibility• Scalability• Share-ability

Goecks J, Nekrutenko A, Taylor J; Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.

Page 3: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

GALAXY-P https://galaxyp.msi.umn.edu/

Galaxy-P 101 Page: Building up and using a proteomics workflow

Page 4: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Tools and WorkFlows

Page 5: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Tools and WorkFlows

Page 6: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Tools and WorkFlows

Page 7: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Tools and WorkFlows

Page 8: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Title: ABI Development - Galaxy-P: A new community-based informatics paradigm for MS-based proteomics• Funded via the NSF Advances in Biological Informatics program

• 3 years of funding; effective July 15, 2012-June 30, 2015

Grant objective in a nutshell: We propose to extend the Galaxy framework for genomics by

deploying and integrating a series of key software programs for MS-

based proteomics data analysis, thus creating Galaxy Tool Modules

for Proteomics which we refer to as Galaxy-P.

NSF Grant : Highlights

Page 9: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

• Aim 1: Develop Galaxy-P specifications and analytical modules to address informatics needs and practices of MS-based proteomics

Implement high-value tools for MS-based proteomics workflow developme

• Aim 2: Develop integrated workflow solutions in Galaxy-P for emerging applications in proteogenomics and metaproteomics

Implement tools and develop workflows addressing under-served challenges of these applications

• Aim 3: Develop workflow recommendations in Galaxy-PPut tools into actions, test and establish default workflow parameters

• Aim 4: Promote usage of Galaxy-P by the greater research community and develop a local area network for training undergraduates in computational systems biology

Broader Impact goals: activities for promoting Galaxy-P usage by community; summer internships for undergraduates.

NSF Grant: Specific Aims

Page 10: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

NSF Grant: Proposed Plans

Page 11: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Participant RoleTim Griffin (BMBB, 17%) PI

Pratik Jagtap (MSI, 17%) Senior personnel; management, planning, evaluation; analyst

Ben Lynch (MSI, 17%) Senior personnel; management, planning, evaluation

John Chilton (MSI, 50%) Lead developer

James Johnson (MSI, 33%) Developer (lead developer for ongoing Galaxy work at UofM)

Getiria Onsongo (RISS/MSI, 25%) Analyst / Developer, emphasis on integrating genomic and proteomic workflows in Galaxy-P

Shuxia Zhang (MSI, 3%) Parallelization support

Ebbing de Jong (BMBB, 30%) Wet-bench/instrumental data generation and end-user.

Anne-Francios Lamblin (RISS) Collaborator/consultant; RISS manager; leader in Galaxy deployment and development at UofM

Anton Nekrutenko (Penn St U, 10%) Lead PI on Galaxy development; connectivity to larger Galaxy effort

Nate Coraor (Penn St U, 25%) Lead developer of Galaxy; connectivity to newest Galaxy releases; assist in modifying Galaxy framework to better accommodate proteomics data

Galaxy-P Team

Center for Mass Spectrometry and Proteomics (CMSP) staff (LeeAnn Higgins)

Page 12: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Our choice: Project-based approach

Strategy for Galaxy-P development“Shotgun” approach (i.e. add as many useful tools as possible)

orProject-based approach (science-driven, user-analyst-developer team model)

User(e.g.deJong,CMSPstaff…non-expertbiologist)

Analyst(e.g.Jagtap,Onsongo)

Developer(e.g.Chilton,Johnson)

Project-based Approach

Page 13: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

GALAXY-P PROJECTShttps://galaxyp.msi.umn.edu/

PROJECT USERS PROTEOGENOMICS. GRIFFIN, HERING.iTRAQ QUANTITATION. HERING, WENDT, BHARGAVA.DUAL LABELING QUANTITATION. ARRIAGA.

TEAM OF DEVELOPERS AND ANALYSTS CHILTON, JAGTAP, JOHNSON, LYNCH, ONSONGO.

Page 14: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

ITRAQ QUANTITATION (PIG ISLET DATASET)

Quantitativeproteomiccomparisonofhighqualitypigisletsfortransplantationfortype1diabetestreatmentwithlowqualityisletsthatareunsuitablefortransplantation.

Generating Databases• Organism database (UniProt)

Generating peaklists from RAW data• msconvert• MGF converter

Searching datasets• ProteinPilot. • X!tandem, Myrimatch,

TagRecon, OMSSA. *

Differential expression analysis. *

• Use text manipulation and statistical tools in Galaxy framework.

Pathway Analysis*• Biological pathway analysis

tools such as D.A.V.I.D and String

GOALS• Create a resource for iTRAQ quantitation at protein and peptide levels (including differential expression of post-translational modifications; amino-acid substitutions and novel isoform analysis).

QC Analysis* •Metrics about input spectra, iTRAQ labeling etc.

Quantitative Analysis• Protein-level Analysis (iQuantand ProteinPilot)•Peptide-level Analysis ( iQuant)

1 2 3

4 5 6

7

(Dr. Bernhard Hering )

Page 15: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

DUAL LABELING PROJECT (Arriaga Lab)

A resource to quantitate dual-labeled SILAC and iTRAQ datasets.

Generating Databases• Organism db (UniProt).

Generating peaklists from RAW data• msconvert

Searching datasets (SILAC)• MaxQuant. • Command-line interface, Open-

source software for SILAC. *

Quantitative Analysis• Protein-level iTRAQ Analysis. (iQuant). Compare Heavy-iTRAQand Light -iTRAQ.• Protein-level SILAC Analysis.

GOALS• Create a resource for quantitation at protein level using SILAC and iTRAQ labeling.• Test datasets with known ratios.• Generate results demonstrating use on real datasets.

Searching datasets (iTRAQ)• iQuant.

QC Analysis* • Basic metrics about input spectra, iTRAQ and SILAC labeling efficiency, average intensity and ratios.

Differential expression analysis. *

• Use text manipulation and statistical tools in Galaxy framework.

Pathway Analysis*• Biological pathway analysis

tools such as D.A.V.I.D and String

1 2 3

4 5 6

7 8

Page 16: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

PROTEOGENOMICS PROJECT

GOALS• Create a resource for proteogenomics that includes all steps from creating db to genomic context analysis.• Show results using Galaxy-P for salivary dataset, oral exudate dataset and pig islet dataset (Dr. Bernhard Hering).• Create validated metaproteome and proteogenome datasets for future spectral library searches.

Proteogenomics study identifies protein sequences that are not yet annotated in the predicted proteome of the organism under study. It provides information for gene annotation and genomic structure - thus enhancing our understanding of genomes.

Generating Databases• 3-frame EST database.• AUGUSTUS database.• Junctions database.*

Generating peaklists from RAW data• msconvert

Searching datasets• ProteinPilot. • X!tandem, Myrimatch,

MS-GF , OMSSA *

One Step•Search with target-decoy db.• Peptides ID at 5% local FDR

Two Step• Search with forward database.• Search with target-decoy version of subset db. •Peptides ID at 5% local FDR

BLAST Search against NCBI human nr db

• Select peptides based on BLAST output thresholds..

Spectra for peptides of interest.

• Spectral characteristics and coordinates.

Validation• Spectral filtering.• De novo analysis.• Spectral visualization.• Re-searching data.

Genomic context analysis*• Overlay annotated

sequences over genomic sequences.

1 2 3

4

7 8 9

5 6

Page 17: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Proteogenomics Workflow

Page 18: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Proteogenomics Workflow

Galaxy-P provides an integrated platform for every step of proteogenomic analysis.

• Build target database – download and translate EST databases or perform gene prediction with Augustus.

• Numerous tools for identification and text manipulation.

• Workflow utilizing BLAST to identify novel peptides.

• Tool to assess peptide-spectrum matches and visualize spectra.

• Visualize identified peptides on the genome.

150 steps: Seamless, integrated proteogenomic workflow. http://z.umn.edu/pgasms2013

Page 19: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

• Quantitative Analysis of the C2C12 and Mouse Skeletal Muscle Proteomes using a Multiplexing Strategy.Michelle Henderson; John Chilton; Getiria Onsongo; Pratik Jagtap; Edgar Arriaga.

• Automated Quantification and Analysis of SILAC-iTRAQ Dual-labeled Data. Getiria Onsongo, John Chilton, Michelle Henderson, Timothy J. Griffin, Pratik Jagtap, Edgar Arriaga.

• A comprehensive characterization of the pig islet proteome: PTMs, amino acid substitutions and novel isoforms. Ebbing de Jong, Hering, Pratik Jagtap, John Chilton, Getiria Onsongo, Timothy J. Griffin.

• Galaxy-P: Transforming MS-based proteomic informatics via innovative workflow development, dissemination, standardization and transparency. Griffin, T.J., Chilton, J., Johnson, J., de Jong, E.P., Getiria, Onsongo, Jagtap P.

• Building Proteomic Application Platforms for Cloud Computing Environments with CloudBioLinux. Chilton, J., Zenka, R., Jagtap, P., Lynch, B., Bergen, H.R., Griffin T.J.

• Reproducible Proteomic Workflows using Extensions to the Galaxy Framework. Johnson, J., Chilton, J., Jagtap, P., Lynch, B., Griffin, T.J.

• Bronchoalveolar Lavage Fluid Protein Profiling In ARDS: Early Differences Between Survivors And Non-Survivors. M Bhargava, T Becker. LA Higgins, P Jagtap, S Day, M Steinbach, B Wu, V Kumar, PB Bitterman, DH Ingbar, CH Wendt

• An integrated Systems Biology Platform for complete Proteogenomic Analysis. Pratik Jagtap1; John Chilton1, Ebbing de Jong2, James Johnson1, Joel Kooren2, Getiria Onsongo1, Sri Bandhakavi3, Timothy Griffin2

• WORKSHOP: The Galaxy framework as a solution for MS-based informatics.

ASMS Posters and Workshop

Page 20: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Public Server at ASMS

Page 21: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Asms Workshop

• An overview was presented by UMN and MSI at a workshop entitled “The Galaxy framework as a solution for MS-based informatics” at the ASMS conference.

• This involved researchers at UMN leading the talks along with speakers from Europe and Australia.

Page 22: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Benefits to the UniversityiTRAQ Analysis platform: Users benefitted: Griffin, Herring, Arriaga, Bhargava, Wendt. Developed workflows for protein quantification using iTRAQ. Will be beneficial for most CMSP customers.

Resource for proteogenomic analysis:Users benefitted: Griffin, Herring, Wendt and others. Developed a novel two-step method which has been noted for its sensitivity. This method will be beneficial for current and future proteogenomic studies.

Resource for metaproteomic analysis:Users benefitted: Rudney and Griffin. Developed a resource for the proteomics community that makes metaproteomics analysis simpler by use of tools and workflows.

Dual labeling Project:Users benefitted: Arriaga. Developed a resource /workflow for the user and the proteomics community in general.

Metabolic labeling:Users benefitted: Hegeman and Libourel. Implemented tools and developing workflows for metabolic labeling and Flux analysis.

Other benefits: Tutorials and Screencasts; Access to HPC resources; Q-Exactive Analysis; Development of Analytical platforms; Collaboration Opportunities.

Page 23: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Galaxy-P Year Two: Challenges and Opportunities

• Expanding usage. (Within and outside UMN)- Screencasts and tutorials.- Collaboration opportunities.- A system for routine projects – but also ability to pursue ambitious / risky projects.

- Sharing workflows through manuscripts.- Stable system with tested software with adequate documentation.

• Integration with HPC resources.- so that “seemingly impossible” projects can be addressed.

• Resource for Systems Biology.

Page 24: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Potential Collaborations

Smith Lab Chemistry University of Wisconsin (http://smith.chem.wisc.edu/)

Contact person: Gloria Sheynkman.

Opportunity: Develop on the database generation and genome context visualization of the workflow.

Status: Galaxy-P Team already in touch with Smith Lab.

CHORUS (https://chorusproject.org/pages/about.html)

Contact person: Michael MacCoss. (University of Washington)

Opportunity: Analytical portion of the Chorus Project.

Status: Tim and John have already discussed with Michael and Andrey.

Page 25: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

ManuscriptsIntegrated proteogenomic analysis.o Integrated, seamless workflow with flexible components.o Page that describes the method.o Share workflow on the website.o Describe underlying tools.

Simutaneous metaproteomic and proteogenomic analysis of OPML datasets. o Simultaneous analysis.o Page that describes the method.o Share workflow on the website.

o Galaxy workflows for generating proteogenomicdatabases.

o RNA Seq (Shenykman)o ORFome / Micropeptidome.o 6-frame.o 3-frame.o Commentary on different kinds of databases and need to

search with microbial databases.

PSM Evaluation.

Multifile analysis.

LWR (Light Weight Runner) to run Windows applications.

JGalaxy.

Page 26: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

Summer Undergraduate internsCody Wang (Carleton College) and Fredrik Sadler (St. Olaf College) are working at MSI as summer interns.

Fredrik is working on documentation for various tools within Galaxy-P. He will also assist in tutorials and workflow documentation. He is working on evaluation for thresholds for the PSM Evaluation tool.

Cody is working with App/Dev group to upgrade and develop tools within Galaxy-P. He is also working on developing a graphical user interface for the PSM Evaluation tool.

Page 27: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

MSI’s Role• Close interaction with PI and users for

conceptualization, implementation and promotion of the project.

• Involvement of all groups – App/Dev, USS, HOPs.

• Project-driven approach to prioritize software implementation in Galaxy-P.

• Underlying these implementations are several modifications to the Galaxy framework enabling proteomic data analysis:

- a server application and job runner for remote execution on a Windows server for Windows-based programs;

- batch submission and processing of multiple data files

- extension of the open-source Cloud Biolinux platform to work with the complex web-based Galaxy-P framework, enabling deployment in cloud environments.

Page 28: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

• During last year, we have followed up a project-driven approach to prioritize software implementation in Galaxy-P. The focus has been on applications to challenging protein identification projects requiring innovative workflows.

• Underlying these implementations are several modifications to the Galaxy framework enabling proteomic data analysis.

• Developed specialized software implementation and innovations resulting is the fully automated workflow for metaproteomic / proteogenomic analysis.

• This effort across groups (App/Dev, RISS, HOPs and users) resulted in 9 posters (http://z.umn.edu/asms2013) that were presented at the American Society for Mass Spectrometry (ASMS) Conference that was held in June 2013 (www.asms.org). Moreover, this work and related projects was presented at the Galaxy Conference in Oslo and other European Conferences in July 2013.

• An overview was presented by UMN and MSI at a workshop entitled “The Galaxy framework as a solution for MS-based informatics” at the ASMS conference.

• NSF Funding for second year has been approved and plans are been followed up for manuscripts, collaborations and expanding usage of the resource.

SUMMARY

Page 29: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm

AcknowledgementsJohnChilton.BenLynch.JimJohnson.

Getiria Onsongo.EbbingdeJong.JoelKooren.

LeeAnn Higgins.TimGriffin.

SeanSeymour.NSFGrant1147079.

MinnesotaSupercomputingInstitute.CenterforMassSpectrometryandProteomics.

Page 30: MSI All Staff Galaxy-P July 2013galaxyp.org/wp-content/uploads/2017/08/MSI-Galaxy-P-July... · 2017-08-31 · Title: ABI Development -Galaxy-P: A new community-based informatics paradigm