using biogrids › wiki › downloads › biogrids_aws_rna_seq_july_25_2019.pdfusing biogrids for...
TRANSCRIPT
UsingBioGridsforRNA-SeqonAWSandYourLaptop
TMEC304 July252019
JamesVincent
BioGridsConsortiumHarvardMedicalSchool
Todaywewill.....
InstallsoftwarewithBioGrids
RunanRNA-Seqworkflow
ReplicateaboveonAWSandlaptop
https://www.biostars.org/p/189261/:Thisseemstobeabugwheninstallingfastqcusingapt-getinstallfastqc
STARmanual.pdf….whichcreatesproblemsforSTARcompilation.Oneoptiontoavoidthisproblemistoinstallgcc…….
http://github.gersteinlab.org/exceRpt/ManualInstallation:….generallynotrecommended…<snip>…instructionsonhowtoinstallexceRptanditsvariousdependencieswill[oneday]belistedtowardthebottomofthispage.
AvoidTimeSinks
ReproducibleResearch
$ STAR --sbapp:d !!Capsule:STAR using star version 2.5.3a ! Version information for: /programs/i386-mac/star !!Default version: 2.5.3a !In-use version: 2.5.3a !Other available versions: none !Overrides use this shell variable: STAR_M !
SelfDocumenting
STAR --sbapp:d ! samtools --sbapp:d !
Includeinworkflow:
ConfigFile
[installer] !site = biogrid-production !key = 70rYFBTDnmCr93VUklfbf1s3M4jdyC9bFVYHew== !user = jvincent1 !![packages] [email protected] = i386-mac [email protected] = i386-mac [email protected] = i386-mac !
ThisIsHandy
biogridssavemysetup.txt biogridsreactivatemysetup.txt
faithfullaptop newworkstation
BioGridsisPortable
faithfullaptop
laboratoryworkstation
HMSO2computecluster
BCHcomputecluster
BioGridsBenefits
savetime-reduceheadachesscaleandshareworkflowspartofreproducibleresearch
BioGridsConsortium
ComputeInfrastructure
PersonnelSBGridBioGrids
FundingHMSToolsandTechnologiesCommittee
WhyBioGrids?
You CoverofNature
compilesoftwarecompilelibrariesmanagedependenciesmanageversionsmanagepathschangeversions....
learntousesoftwareoptimizeworkflowgetsciencedone
RNA-SeqOverview
HarvardChanBioinformaticsCore(HBC)
http://bioinformatics.sph.harvard.edu/training
RNA-SeqOverview
hbctraining.github.io/Intro-to-rnaseq-hpc-O2
Biologicalsamples/Libraryprep
sequencereads
qualitycheck
adapter/qualitytrimming
spliceawaremappingtogenome
countreadsassociatedwithgenes
statisticalanalysisidentifydifferentiallyexpressedgenes
RNAPrep
Sequencing
RNA-SeqOverview
hbctraining.github.io/Intro-to-rnaseq-hpc-O2
Biologicalsamples/Libraryprep
sequencereads
qualitycheck
adapter/qualitytrimming
spliceawaremappingtogenome
countreadsassociatedwithgenes
statisticalanalysisidentifydifferentiallyexpressedgenes
FastQC
(trimmomatic)
STAR
subRead
BioGridsApps
MappedReads
CheckResults
IGV1:Genomes/LoadGenomefromFile...(chr1_MOV10.fa)2:File/Loadfromfile...(.gtffile)3:File/Loadfromfile...(.bamfile)
workflow softwarestack
computeresources
DevOpswithBioGrids
bioinformatics BioGrids laptopHMSO2AWS
AWSHandsOn
https://sbgrid.signin.aws.amazon.com/consoleusername:workshop21password:Biogrids_Workshop1
AWS-AmazonWebServices
BioGrids is funded by the Harvard Medical School
Tools and Technologies Committee
AdditionalResourcesENCODEdatafilescanbefoundhereforCalTechRNA-Seq:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/Usethisbamfile:wgEncodeCaltechRnaSeqK562R1x75dAlignsRep1V2RegionofMOV10gene:chr1:113,214,934-113,243,900Howtodownloadwholegenome:-UCSCftpsite:hgdownload.cse.ucsc.edu-UCSCwebsite:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/-UCSCrecommendsusinganftpclientforlargefiledownloads-chr1isonly70M
References
TRAININGhbctraining.github.io/Intro-to-rnaseq-hpc-O2AWShttps://aws.amazon.com/ec2/getting-startedENCODEhttps://www.encodeproject.orgIMAGEShttps://www.diagenode.com/en/categories/Library-preparation-for-RNA-seqhttps://rnaseq.uoregon.edu