![Page 1: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/1.jpg)
NGS Bioinformatics Workshop2.1 Tutorial – Next Generation Sequencing
and Sequence Assembly Algorithms
May 3rd, 2012IRMACS 10900
Facilitator: Richard BruskiewichAdjunct Professor, MBB
![Page 2: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/2.jpg)
Agenda
Data format review (and some associated tools)
Revisit GalaxyRevisit data visualization
![Page 3: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/3.jpg)
FASTQ FASTQ – FASTA “with an attitude” (embedded quality scores). Originally
developed at the Sanger to couple (Phred) quality data with sequence, it is now common to specify raw read output data from NGS machines in this format.
Various flavors: fastq-sanger fastq-illumina fastq-solexa
Differing in the format of the sequence identifier and in the valid range of quality scores. See:
http://en.wikipedia.org/wiki/FASTQ_formathttp://maq.sourceforge.net/fastq.shtml
http://nar.oxfordjournals.org/content/earlyà /2009/12/16/nar.gkp1137.full
“…the Sanger version of the FASTQ format has found the broadest acceptance, supported by many assembly and read mapping tools …Therefore, most users will do this conversion very early in their workflows…”
@EAS54_6_R1_2_1_443_348GTTGCTTCTGGCGTGGGTGGGGGGG+EAS54_6_R1_2_1_443_348*-+*''))**55CCF>>>>>>CCCC
![Page 4: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/4.jpg)
SAM/BAMSAM– a tab-delimited text file that contains a compact
and index-able representation of nucleotide sequence alignments
http://samtools.sourceforge.net/SAM1.pdfhttp://samtools.sourceforge.net/
BAM – binary version of SAM (preferred by IGV) I/O format of several NGS tools, see:
http://samtools.sourceforge.net/swlist.shtmlSee also:Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
![Page 5: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/5.jpg)
http://picard.sourceforge.net/command-line-overview.shtml
http://picard.sourceforge.net/
The Picard command-line tools are packaged as executable jar files. They require Java 1.6. They can be invoked as follows:
java jvm-args -jar PicardCommand.jar OPTION1=value1 OPTION2=value2...
Most of the commands are designed to run in 2GB of JVM, so the JVM argument -Xmx2g is recommended.
![Page 6: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/6.jpg)
Getting & Running Picard…
Obtain archive using project “Download” linkExtract zip file to sensible locationEnsure that you have Java 6 on your machineRun from command shell as indicated
![Page 7: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/7.jpg)
http://hannonlab.cshl.edu/fastx_toolkit/
Linux, MacOSX or Unix only
![Page 8: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/8.jpg)
Visualization of NGS Data - Standalonehttp://www.broadinstitute.org/igv/
![Page 9: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/9.jpg)
Visualization of NGS Data – Web Site
http://gmod.org/wiki/GBrowse_NGS_Tutorial
![Page 10: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/10.jpg)
GALAXY REVISITED2.1 Next Generation Sequencing and Sequence Assembly Algorithms
![Page 11: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/11.jpg)
Learning about Galaxy
Extensive web resources available:http://wiki.g2.bx.psu.edu/Learn/
Getting started: “Galaxy 101”Other screencastsInformation pages about dataset management,
tool usage and data visualizationPublished pages/protocols:
https://main.g2.bx.psu.edu/page/list_published
![Page 12: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/12.jpg)
Logging into Galaxy @ WestGridhttps://joffre.westgrid.ca/galaxy/
Accessing the Westgrid Galaxy instanceUse your Westgrid ID (email name without @part)
to log into Joffre, e.g. if your email is ‘[email protected]’, your server access id is ‘rbruskie’, and use your WestGrid password
Logging into the Galaxy instanceOnce into Galaxy, you need to register (initially) or
log in (if already registered) using your username (your full email, e.g. ‘[email protected]’) and (important!) use your WestGrid password as the Galaxy password
![Page 13: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/13.jpg)
Small issue for access through IE?
![Page 14: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/14.jpg)
We will run through “Galaxy 101”
https://main.g2.bx.psu.edu/galaxy101
Try it! Ask questions along the way….
![Page 15: NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS 10900 Facilitator: Richard](https://reader030.vdocuments.site/reader030/viewer/2022032701/56649c7c5503460f9492fcfe/html5/thumbnails/15.jpg)
Some sensible steps for processing NGS data
Obtain the data (i.e. upload to Galaxy)Assess quality of read dataConvert reads to convenient form (fastq?)Filter out questionable data: low quality,
vectorProcess to integrate
de novo assembly: Allpaths, ABySS, Velvet, SOAPdenovo, etc., or…
Map onto reference: SAM, Bowtie, MAQ, etc.Clean up and visualize