![Page 1: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/1.jpg)
RNA-Seq inGalaxy:Tuxedoprotocol
IgorMakunin,UQRCC,QCIF
![Page 2: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/2.jpg)
Acknowledgments
GenomicsVirtualLab:gvl.org.auGalaxyfortutorials:galaxy-tut.genome.edu.auGalaxyAustralia:galaxy-aust.genome.edu.au
Contributorsandparticipants:
![Page 3: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/3.jpg)
Planfortoday
Galaxy
DatatypesusedinRNA-Seq analysis
RNA-Seq practical
Galaxyworkflow
![Page 4: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/4.jpg)
High-throughputsequencing
Bigscalesequencing• 100,000,000ssequences,orreads,perexperiment• sequencingofa(random)library• lowcostpernucleotide
Populartechnologies:• illumina• ion/proton• PacBio
Emergingtechnologies• OxfordNanopore MinION
AnalysisofNGSdataBigdatasetsComputationallyintensiveDedicatedtoolsanddatatypesExtensiveuseofpublicdata
Storage
Computationalresources
Publicdata
Knowledgeandskills
Tools
![Page 5: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/5.jpg)
Galaxy:howdoesitlooklike
WorkingwindowUpload
Historymenu
![Page 6: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/6.jpg)
GalaxyhistorysystemHistorymenuRefresh
Source:http://galaxyproject.github.io/training-material/topics/introduction/tutorials/galaxy-intro-history/tutorial.html
![Page 7: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/7.jpg)
PublicGalaxyservers
Advantageoftheregistration:• accesstohistoriesoverlongtime• multiplehistories• abilitytouseGalaxyfromdifferentdevices• biggerquotas(onsomeservers)• ftp
• IndependentregistrationoneveryGalaxyserver
• Differenttools,differentuserpolicy
• DatacanbemovedbetweenGalaxyservers
Galaxyservers:usegalaxy.orgusegalaxy.eu
galaxy-tut.genome.edu.au
galaxy-aust.genome.edu.au
![Page 8: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/8.jpg)
GalaxyAustralia
Lessjobsonweekends
Jobsperday
galaxy-aust.genome.edu.au
Workernodes:16CPUs,64GBRAM
49TbVolumestorage(userdata)
Designedforagenomescaleresearch>1,600registeredusers
Upto16CPUs60GBRAMperjobUpto12concurrentjobsperuserUpto1Tbperuser
![Page 9: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/9.jpg)
TuxedoprotocolGVLBasicRNA-Seq GalaxytutorialTrapnell etal.(2012)NatureProtocols
VisualisealignmentwithIGV
FASTQFASTA
GFF BAM
Genomebrowser
![Page 10: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/10.jpg)
FASTQformat
@SRR3145.19ILLUMINA-C32_FC:3:1:80:12/1TAGCAGCACATCATGGTTTACATCGTATGC+IIHIDIIIIIIIIIIIIIHIHIIIIIDGIB
Namealwaysstartswith@SequenceAlwaysstartswith+;mayhavenameEncodedPhred qualityscore
single-endreads paired-endreads
Terminology: read isasequencewithqualityscorevaluesproducedbyasequencingmachine
Commonoutputformat:FASTQ compressedwithgzip,e.g.SRR3145_1.fq.gz
MultiplereadsinasingleFASTQfileEachreadisdescribedbyfourlines
![Page 11: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/11.jpg)
FASTQPhred qualityscore
Quality+Offset
39+33=72
ASCII(72):H
Range:~0to~40
Phred 10:accuracy90%Phred 20:accuracy99%Phred 30:accuracy99.9%Phred 40:accuracy99.99%
Valuesareencodedbycharacters
Advantage:asinglecharacterisusedinsteadofatwo-digitnumber
APhredqualityscoreisameasureofthequalityoftheidentificationforeverynucleotide.
@S391ILLUMINA_FC:3:80:12/1TAGCAGCACATCATGGTTTAC+IIHIDIIIIIIIIIIIIIHIH
![Page 12: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/12.jpg)
ASCIItable
![Page 13: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/13.jpg)
Phred qualityscoreencodingOffset33- SangerOffset64- oldillumina
Source:https://en.wikipedia.org/wiki/FASTQ_format
Qual.=40Offset=3340+33=73ASCII(73):I
![Page 14: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/14.jpg)
FASTQqualityscoreinGalaxyManyoldillumina datasetshaveaproprietarydataencoding(offset64)CurrentlymostNGSdatasetsusetheSangerencoding(offset33)
GalaxyBydefaultGalaxyassign‘fastq’datatypetouploadedFASTQfiles.Inthiscasetheoffsetisnotspecified,andmanytoolsdonotrecognizethedata
fastqillumina – oldillumina qualityscoreencoding(offset64,illumina 1.3+)fastqsanger – newillumina 1.8+/SangerqualityscoreencodingSometoolsinGalaxynowworkonlywithfastqsanger datatype
Solution:- specifyfastqsanger orfastqillumina datatype duringupload- changetheformatviaAttributes>Datatype- useNGS:QCandmanipulation>FASTQGroomertool
![Page 15: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/15.jpg)
TuxedoprotocolGVLBasicRNA-Seq GalaxytutorialTrapnell etal.(2012)NatureProtocols
VisualisealignmentwithIGV
FASTQFASTA
GFF BAM
Genomebrowser
![Page 16: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/16.jpg)
ReferencegenomesGenomeReferenceConsortium:…aconsensusrepresentationofthegenome.
FASTAformat
ThehumanreferencesequenceGRCh37(hg19)containsthemitochondrialgenome,22autosomes,chrX,chrY,9haplotypechromosomes,39unplacedcontigs,and20unlocalized contigs.
Genomesarebig.GRCh38.p10totalnon-Nbases:3,080,585,178
Genomesmayhavemanyassemblyversions(releases,build):mm9,mm10
Usethesameassemblyversionforthereferencesequenceandgeneannotations.
Orderofsequences/contigs mightbeimportantforsometools.
“chr1”and“1”arenotidenticalforsometools.http://hgdownload.cse.ucsc.edu/gbdb/hg19/html/description.html
![Page 17: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/17.jpg)
GeneannotationsCoordinate-based:linkedtoaparticulargenomeassembly,e.g.,hg19
GFF(GeneralFeatureFormat)formatconsistsofonelineperfeature,eachcontaining9columnsofdata,plusoptionaltrackdefinitionlines.Popularversions:GTF(=GFF2),GFF3tab-separatedfields
##gff-version3ctg123. mRNA 13009000.+.ID=mrna0001;Name=sonichedgehogctg123 .exon 1300 1500.+.ID=exon00001;Parent=mrna0001ctg123. exon 1050 1500.+.ID=exon00002;Parent=mrna0001ctg123.exon 3000 3902.+.ID=exon00003;Parent=mrna0001ctg123. exon 5000 5500.+.ID=exon00004;Parent=mrna0001ctg123.exon 7000 9000.+.ID=exon00005;Parent=mrna0001
http://asia.ensembl.org/info/website/upload/gff3.html
seqid
source
type start end
score
strand
phase'0','1'or'2'
attributes
The first line must be a comment that identifies the version
bothare1-based
![Page 18: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/18.jpg)
IntervalsCoordinate-based:linkedtoaparticulargenomeassembly,e.g.,hg19
BEDformat,upto12columnsofdata(UCSCTableBrowser),plusoptionaltrackheaderlines.tab-separatedfields
GFF3##gff-version3ctg123. mRNA 13009000.+.ID=mrna0001;Name=sonichedgehog
BEDctg12312999000sonichedgehog .+
chromchromStart
chromEndscore
strandname
1-based
0-based
![Page 19: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/19.jpg)
Aligners
Alignersmapreadstoareferencesequence.
Alignersuseproprietaryindexfilesformapping.bwa index hg19.fa
Galaxy-qld providesindicesforseveralgenomeassemblies(hg19,hg38,mm9,mm10etc.)
Galaxyusersalsocanuseacustomreferencesequenceforalignment.Inthissituationthealignercreatesindicesinatemporaryworkingdirectoryforeveryjob.
ContactGalaxy-qld adminsifyouplantorunmanyalignmentjobswithacustomgenome.Wecanaddgenomeindicestotheserver.
OnlyforBWA Onlyforhg19
Gappedalignment
![Page 20: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/20.jpg)
Alignments:SAMandBAM50xcoverageofthehumangenomewithreadlength100bp:~1,500,000,000readsCompressedsizeofsuchalignmentcanbe>100Gb.
SAM:SequenceAlignment/Map.Plaintextformat.BAM:binary(compressed)versionofthealignmentformat.
SAMcoordinatesare1-basedBAMcoordinatesare0-based
BAMsareindexedforrapidaccess.Usefulforalignmentvisualization.
Itisalwaysgoodtohaveaheader!@HD VN:1.0 SO:queryname@RG ID:igGroup SM:igSmpl LB:igL1 PL:ILLUMINA@SQ SN:chr2L LN:23011544@PG ID:TopHat VN:2.0.14
CL:/mnt/galaxy/tools/tophat/2.0.14/iuc/package_tophat_2_0_14/536f7bb5616d/bin/tophat --num-threads5….
Readgroups Canhandlemultiplesamplesinalignment
![Page 21: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/21.jpg)
SAMformat
11mandatorycolumnsandoptionalfieldswiththeTAG:TYPE:VALUEformat
![Page 22: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/22.jpg)
VisualizationofBAMs
AlignmentonIGV
Galaxy servers can act as a track hub
Itispossibletoaddmultipletracks:BAMs,geneannotations,knownvariants…
![Page 23: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/23.jpg)
Genomebrowsers
IntegrativeGenomicsViewer,IGVEfficientgenomeviewerdevelopedbytheBroadInstitute.Installableonpersonalcomputers.Canaddacustomgenome.
UCSCGenomeBrowserAbigserverintheUS.TableBrowserfordataanalysis(intersection)SupportdataexporttoGalaxyCustomsessions(cansaveyourtracks)liftOver toolPublictrackhubs
![Page 24: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/24.jpg)
RNA-Seq withtheCufflinkspackageGVLBasicRNA-Seq GalaxytutorialTrapnell etal.(2012)NatureProtocols
Visualisealignments
Datamanipulation
D.melanogasterTwoconditionsThreereplicatesDataforchr4
![Page 25: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/25.jpg)
SetupfortheworkshopGVLwebsite:gvl.org.au
3
RegisteronGalaxy-tut:galaxy-tut.genome.edu.au
BasicGalaxytutorial
![Page 26: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/26.jpg)
Galaxyisaworkflowengine
Selecttoolorinputdataset
Addname,comments
ToolboxNoodle
Input
AGalaxyworkflowisaseriesoftoolsanddatasetactionsthatruninsequenceasabatchoperation
Emailnotification
![Page 27: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/27.jpg)
Galaxyworkflow
![Page 28: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/28.jpg)
CreateaGalaxyworkflow
Fromhistory
Fromscratch
![Page 29: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/29.jpg)
Exercise
WewillcreateaGalaxyworkflowforRNA-Seq analysiswithoutreplicates:tophat2>Cuffdiff >Filter
![Page 30: RNA-Seqin Galaxy: Tuxedo protocol · Public Galaxy servers Advantage of the registration: • access to histories over long time • multiple histories • ability to use Galaxy from](https://reader034.vdocuments.site/reader034/viewer/2022042316/5f0540bb7e708231d4120acd/html5/thumbnails/30.jpg)
Acknowledgments
GenomicsVirtualLab:gvl.org.auGalaxyfortutorials:galaxy-tut.genome.edu.auGalaxyAustralia:galaxy-aust.genome.edu.au
Contributorsandparticipants: