iplant/usda-ars big data workshop on rnaseq workflows and standards dec 7-9, 2014 at cshl organizers...

19
iPlant/USDA-ARS Big Data workshop on RNAseq Workflows and Standards Dec 7-9, 2014 at CSHL Organizers Jason Williams, Kapeel Chougule, Dewayne Shoemaker, Doreen Ware

Upload: isabella-caldwell

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

iPlant/USDA-ARS Big Data workshop on RNAseq Workflows and Standards

Dec 7-9, 2014 at CSHL

Organizers Jason Williams, Kapeel Chougule,

Dewayne Shoemaker, Doreen Ware

Goals

• Foster competency in using iPlant including knowing the general features of the platforms, how they work together, how to select and use platforms/tools, and knowledge of platform’s expectations and limitations

• Support trainers with domain knowledge at the most basic level of analysing RNA-Seq data

• Cultivate support and commitment to a sustainable network that provides access to shared training materials, help, and recommendations for future training and workshop.

Agenda

• 2 pre-workshop webinar- forming teams and getting started with RNA seq assembly

• At the workshop- – working with data and metadata in iPlant (iDrop,

iCommands, DE, Sharing, Searching)– working with the Discovery Environment

(apps/workflows/analyses) and Atmosphere ( image launching, visualizing and downstream analysis)

– an introduction to XSEDE resources and iPlant support mechanism

Total participants : 23

Data

Insect

Fish

Parasite

Plant

Animal

Tools UtilizedData Pre-ProcessingFastX suite: trimmer, quality trimming, quality filterHTProcessPipeline: includes Trimmomatic for quality trimming

Digital Normalization Trinity Normalizationkhmer normalization suite (based on DigiNorm)

de novo Transcriptome AssemblyTrinitySOAPdenovo-Trans

Assembly QualityCEGMAContig statistics

Conserved DomainsTransdecoder

BLASTCreate a BLAST database

Tools Utilizedkhmer genie

R studioTrinotate – blast, signalP, tmHMM, HMMER, RNAMMER

Mapping Reads to de novo AssemblyBowtie (trimmed reads agst the SOAPtrans assembly)Bowtie Build and MapSAM to sorted BAM->indexed BAM

Survey done after the workshop

Beginner Advanced Intermediate0

2

4

6

8

10

12

Before the workshop how would you rate your level of bioin-formatics skills?

n=20

Not helpful at all Slightly helpful Neutral/No opinion Helpful Very Helpful0

2

4

6

8

10

12

How helpful was it to be working in teamsn=20

Not helpful at all Slightly helpful Neutral/No opinion Helpful Very Helpful0

2

4

6

8

10

12

14

How helpful were the webinars that preceded the workshop

Had no impact Improved my ability slightly Improved my ability immensely

0

2

4

6

8

10

12

How did the workshop impact on your ability to perform bioin-formatics analyses?

Upload data (iDrop/DE)

Upload data (iCommands)

Share data within iPlant

Run analyses in iPlant

Input and manage metadata

Run an anlyses in the Discovery Environment

Add/Modify App in DE

Create App in DE

Create workflow in DE

Launch Atmosphere image

Connect to Atmosphere image

Move data into/out of Atmosphere

Create Atmosphere image

0 5 10 15 20 25

How prepared are you to help others use iPlant in the following waysn=20

Unprepared Somewhat prepared Prepaired Very prepared

iPlant tools are not user friendly

I find it difficulty to use iPlant tools

iPlant services are not reliable

iPlant documentation and manuals are not helpful

iPlant support staff are not reliable/quick

I can't get publishable results using iPlant resources

0 2 4 6 8 10 12 14 16 18 20

Barriers to using iPlant (Indicate how much you agree with the statement)n=19

N/A Strongly Agree Agree Neutral Disagree Strongly Disagree

Outcomes from workshop

Groups

• Group1 – Tools and workflows -Brenda Oppert, Anna Bennett, Jamie Strange, Brian Rector Neil Sanscrainte and George Yocum

• Group2- Integration of new tools-Christopher Childers, Guangtu Gao , Geoff Waldbieser, Monica Poelchau, Zaid Abdo

• Group3- Metadata Standards- Michelle Graham, Joe Hull, Pia Olafson, Lucy Stewart, Judy Chen, Deven See, and Brad Coates (Lead)

• Group4-Adoption (training and organizing webinars)- Anja Baldo, Stephen White, Linda Ballard, Brenda Oppert, Kristina Friesen, Pia Olafson

Group1-Prioritize tool and workflows

1. upload and assemble an RNA and genomic data set.2. process data through annotation and post assembly quality

control3. Downstream: Report to “integration” team.4. Develop a mechanism for other ARS researchers and

collaborators to suggest improvements.

Brenda Oppert, Anna Bennett, Jamie Strange, Brian Rector Neil Sanscrainte and George Yocum

Group2-Tool integration‘Install me!’

1. Communication with the ‘Application’ group: create template with required metadata for requested applications (program name, version, URL, application, justification, test input files).

2. Develop workflow for program installation (to easily train other developers) (which includes pushing the finished apps to the Tester group; including sufficient documentation/readmes for the resulting app)

Christopher Childers, Guangtu Gao , Geoff Waldbieser, Monica Poelchau, Zaid Abdo

• Emphasis was on following NSF standards and NCBI annotation descriptors.

• Across project areas (insect, plant, animal) collaborate with iPlant and Big Data centers to implement standard associations with data uploads & DOIs.

• Database integration of meta data, sequence and assembly information into searchable database to ease retrieval, find/foster collaborations, and highlight ARS outputs.

.

Group3-Meta Data Standards Group Members Michelle Graham, Joe Hull, Pia Olafson, Lucy Stewart, Judy Chen, Deven See, and Brad Coates (Lead

Group4-Adoption (training and organizing webinars)

1. Identify holes in existing material; differences in standard iPlant vs. USDA practices2. Webinars could go on USDA youtube channel3. Announce releases of tools, images, workflows, via webinars (other tools that are widely successful)4. Tie trainings into IDPs5. Downloadable materials (tutorials, videos, etc) at ARS website, SharePoint, or iPlant location6. Assess adoption

i. Track training material downloads

ii. Track iPlant signups

iii. Ask what they hope to get out of the training when they sign up, make it brief

iv. Ask after a few months if expectations were met

v. Track USDA tags in user forum

7. Pre-workshop homework (successful component of current gathering)

i. needs to be clear, easy to complete

ii. Sub-groups of attendees to foster participation in pre-course materials8. What about locations with poor connections? And other barriers to adoption.

Anja Baldo, Stephen White, Linda Ballard, Brenda Oppert, Kristina Friesen, Pia Olafson