© 2008 illumina, inc. illumina, making sense out of life, sentrix, goldengate, dasl, oligator,...
Post on 30-Dec-2015
219 Views
Preview:
TRANSCRIPT
© 2008 Illumina, Inc.Illumina, Making Sense Out of Life, Sentrix, GoldenGate, DASL, Oligator, Infinium, BeadArray, Array of Arrays, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, and Solexa are registered trademarks or trademarks of Illumina Inc.
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
On-instrument Real Time Analysis Overview
May, 2009
2
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Content
Real Time Analysis (RTA) – high level overview
Input and Outputs
Image analysis
Base Calling
Real time metrics
Data transfer and Error handling
Analysis of RTA data using Illumina Pipeline
Considerations
FAQ
3
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Real Time Analysis (RTA) is a new feature in Sequencing Control Software v2.4
RTA runs completely and exclusively on the instrument computer (currently a Dell 690)
4
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Take advantage of increased system throughput using reduced computer hardware.
SCS IPAR Pipeline Server
Base CallsQuality Scores
SCS
before
after
Image Intensities
BASE CALLINGQUALITY SCORINGSECONDARY ANALYSIS Alignment Assembly Counting
SECONDARY ANALYSIS
Pipeline Server
Images
Images
Images
5
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Overview - High level features
Performs primary analysis for data generated on GA including– Image analysis – Base calling and filtering– Quality Scoring
Transfers data results off-instrument PC– Results– Images
Provides real time metrics (Status.htm)– Tile Processing status– Cluster density– Cycle intensities – Focus quality
Runs on the instrument PC – Works in the background during sequencing run (cmd window)– Multi-threaded application - each thread is processing subset of tiles
6
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Data Analysis Workflow
PipelineGA PC
DataTransfer
For all data:• Sequence
alignment
• Run statistics visualization
For each tile:• Cluster intensities
• Cluster noise
• Cluster position
Alignment
(GERALD)
Image Analysis
(RTA)
For each tile:• Cluster sequence
• Calibrated base quality scores
• Quality filtering
Base Calling
(RTA)
7
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Input
Images
Auxiliary files– Instrument offsets (DefaultOffsets.txt)
C:\Documents and Settings\All Users\Application Data\Illumina\Illumina RTA\InstrumentName
RTA will generate DefaultOffsets.txt file for first analysis with the application IPAR offset file? Not transferred.
– Run meta-data - RunInfo.xml (generated by SCS at run start based on run recipe) Instrument name Run type (SR or PE, indexing) Run size Tiles to image during read prep and actual run
8
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Output
Output directory structure – mimics the PL output directory structure– Image analysis
RunName\Data\Intensities\(L001 … L008)\CN.1
– Base calling RunName\Data\Intensities\BaseCalls
Image analysis output– CIF and CNF binary files - (intensity and noise) – per tile, per cycle - NEW– *pos.txt file – cluster coordinates – one per tile - same as PL
Base calling output– QSEQ files – one per tile – same as PL– Auxiliary files – same as files in PL Bustard analysis directory
BustardSummary.xml, IVC.htm, All.htm
Output listed above is copied by default to network location specified at the start of the run
9
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Output (cont.) - tracking analysis progress
Analysis status -Status.htm :– Location: \\RunName\Data\Status.htm– Contains summary of analysis progress, data visualization – periodically
updated throughout run, available off-line
Log files:– Location: \\RunName\Data\– Log.txt – analysis progress– CopyLog.txt – copy progress– ErrorLog.txt – generated only in case of analysis error
Run completion –files generated at the end of analysis and data transfer (PE runs – files generated for each read):
– ImageAnalysis_Netcopy_complete.txt– Basecalling_Netcopy_complete.txt– Image_Netcopy_complete.txt
10
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Estimated final output folder
Estimates based on cluster density of ~440K clusters/mm2 (220k/tile)
RTA output (Gb) 37 cycles 51 cycles 76 cycle 37 cycles
with images
Total SR run output GA2 155 205 295 1000
Total SR run output GA2x 185 250 355 1200
Total PE run output GA2 305 405 585 1900
Total PE run output GA2x 370 485 700 2300
11
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Run Folders Structure
PipelineGenome Analyzer
Computer Data Transfer from Queued File
by RTA
12
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Image analysis steps
RTA uses same analysis algorithms as PL 1.4
Image pre-processing
Cluster detection (Template Generation)– Finds positions for each cluster– Uses first two cycles of images
Re-analyzes cycles 1 and 2 once template is generated
– RTA falls behind image acquisition for first cycles but catches up around cy5
Image registration – Aligns template of cluster positions to an image
Intensity extraction– Determines intensity for each cluster
13
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Base calling steps
Uses same base calling and filtering algorithms as PL 1.4
Matrix generation– Calculated from cycle 2 intensities for each tile– Run matrix is the median of all tiles matrices
Phasing/pre-phasing estimation– Calculated based on matrix-corrected intensities cycles 2-12 for each tile– Run phasing (pre-phasing) is the median of all tiles phasing (pre-phasing)
Base calling– Called base is the one with maximum corrected intensity value for a given
cycle (A, C, G or T) Corrected intensity is resulted value after applying matrix and phasing/prephasing
corrections to the raw intensity.
Filtering – Chastity => 0.6 for all but one of the first 25 cycles
14
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Base calling Assumptions
Sample nucleotide base composition is unbiased
When using unbalanced samples– Use control lane– Turn base calling off (optional, for run efficiency)– Start pipeline analysis with intensities
Examples– Bisulfite sequencing– ChIP seq– small RNA– others
15
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Quality scoring details
Uses set of predictors and look-up tables – Some predictors require using data from multiple cycles ahead of the current
one– Therefore quality scoring will lag behind image analysis and base calling
This is the last step to complete
RTA will continue to process after imaging is complete on last cycle– <4 hours post run (2 x 75bp)
16
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Real time Metrics
Status.htm - generated as soon as analysis starts
Can be viewed on instrument PC (during run) and remotely from server
Visualization of analysis progress – Image analysis – color coding schematic flow cell/lane/tile display– Base calling and quality scoring – cycle numbers on schematic
5047
Base callQuality Score
17
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Real time MetricsBox-plot graphs
– Cluster density per lane– Intensity per cycle per channel (90-th percentile)– Focus quality per cycle per channel– Box-plot deciphered
Red line – median Box – interquartile – middle 50% data Error bars – min and max for the metric Outliers – 1.5 below/above IQR (inner quartile range)
Note: As the run progresses Status.htm will display Intensity and FQ for every N-th cycle (instead of every cycle)
18
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Data transfer and error handling
Data transfer– RTA has built-in data transfer functionality– Data transfer does not interfere with data analysis
Runs on a separate thread with lowest priority Speed and efficiency of data transfer is highly dependent of network performance
(may lag behind due to a slow network) We recommend a 1Gb connection (No change from previous configuration)
Error handling– In case of failure RTA
Will auto-restart on the next cycle of imaging When RTA restarts it will continue from place where it left
– In case of exception preventing analysis completion, RTA will retry Unsuccessful retry will not stop analysis – it will generate the output for problematic
step with blank values
– All errors are logged in ErrorLog.txt saved in the Data folder Generated only if errors occur
19
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
RTA Configuration
Run Parameters for RTA:
Configured in Run Parameters window in SCS
Saved in XML configuration file (RTA.exe.config.xml) as key-value pairs
Default values for Run Parameters:
Copy Images flag (defaults to false)
Call Bases (defaults to true)
RunBrowser file generation .bro files (defaults to false)– Needed for run trouble shooting – “On” Increases time to data analysis completion
20
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
RTA Practical considerations
RTA should complete processing <4hrs post run– Not linear: 2 x 75 takes longest– Can perform wash recipes only
RTA and PE handling of Read 1 and 2– For Read 1: Ideally let RTA complete then start Read 2– Or Kill RTA after chemistry incorporation, it will resume after Read 2 is
complete
Sample nucleotide composition– Phasing/Pre-phasing analyzes from tile to tile– Control lane specification not possible in RTA currently
First Run – Offsets not set upon first run– Therefore recommend using <150k clusters/tile on first run
21
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
RTA Practical considerations
First Run – Offsets location:– c:\Documents and Settings\All Users\ApplicationData\Illumina\Illumina RTA\GANAME
22
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
PL analysis of RTA data
Pipeline version 1.4 (and later) are compatible with RTA output
Alignment of RTA base calls using Illumina PL (User Guide Ch 5)– Generate Gerald makefile by invoking GERALD.pl:
/<PL1.4path>/bin/GERALD.pl --EXPT_DIR=/<RunFolder>/Data/Intensities/BaseCalls/ config.txt --FORCE
– Execute the make file Run make all OR make –jN all from Gerald analysis directory
Base calling RTA image analysis data (User Guide Ch 4)– Generate base calling makefiles by invoking bustard.py script – must use CIF
switch: /<PL1.4path>/bin/bustard.py –-CIF /<RunFolder>/Data/Intensities/ --GERALD=config.txt --make
using Illumina PL
– Execute the make file in Bustard directory by running make all (for base calling only) OR make recursive OR make –jN recursive
All standard pipeline parameters available for use
23
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
What can I do with my IPAR hardware?
Convert it into a Pipeline/CASAVA Analysis Server– IPAR has very similar specifications to the HP Pipeline Analysis Server that
Illumina resells
Use it for off-network/post-run alignment – IPAR’s system specifications are perfect for memory intensive processes– Minimize network traffic by moving and archiving only aligned data
Use it for something completely different– IPAR is a general purpose, high performance server
8 x 3.16 GHz Cores 16GB RAM 3.4 TB usable storage (in RAID 6 configuration) Windows XP installed
24
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Additional information and details
RTA Theory Of Operations
Genome Analyzer User Guide
Pipeline and CASAVA User Guide
Pipeline and CASAVA Quick Reference
25
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Questions?
26
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Appendix
27
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
FAQ
Networking
What if my network goes down?– Temporary network outages will not cause a run to fail.
– The run is only affected when disk space becomes limiting (<1Gb). If this happens, the flow cell will be place in a safe state and the run will stop until disk space is available.
Have the specifications for network connections changed? – No, we recommend that the GA have a 1Gb connection to network folder destination.
Connections less than this will cause decreases in network transfer times. Depending upon the length of the sequencing read, this may or may not affect data processing.
Changing Defaults
Can I turn base calling off?– Yes, base calling can be turned off by changing the SetPhoenixArgs.bat file, in the Event
Scripts folder.
28
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
FAQ
Running RTA
Can I start another run while RTA is finishing its processing?– No, RTA will not allow a second instance to run. However maintenance
recipes such as post wash may be run.
What happens if I accidentally close RTA during a run?– During the next cycle of imaging, RTA will automatically resume. Do not
manually restart RTA from either the start menu.
Data analysis
I like to analyze the data in Pipeline before a run is complete, is this still possible?
– Yes, however you will have to verify that the files you wish analyze have been transferred to the Pipeline server.
29
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
FAQ
Setting up a run: disk space required
What are the folder output sizes?
RTA output (Gb) 37 cycles 51 cycles 76 cycle 37 cycles with
images
Total SR run output GA2 155 205 295 1000
Total SR run output GA2x 185 250 355 1200
Total PE run output GA2 305 405 585 1900
Total PE run output GA2x 370 485 700 2300
30
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Auxiliary outputs
Remain on instrument, has to be deleted prior to starting next run
Located in \\RunName\Processed\(L001...L008)\CN.1– *.bcl – binary output from cycle base calling
– *.dif – binary file containing cycle matrix and phasing corrected intensities
Located in \\RunName\Data:– RunBrowser\*bro.xml files – for post-run analysisi using RunBrowser application
– This folder is transferred to the network by default
31
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Auxiliary outputs
File Extension Location Contains Specificity Created Deleted
Intensity file .cif cycle directoryRaw intensity for each cluster in all four channels
tile-cycleEvery cycle at Extraction
After base calling
Final output file .qseq lane directoryFinal output of RTA. A text file that include base call and quality score
tileAt the end of processing
End of processing
Log Log.txt Data directory Log flowcell throughoutEnd of processing
Error log ErrorLog.txt Data directory Errors flowcellwhenever an error is thrown
End of processing
Status Summary Status.html Data directoryProcessing status and real time metric charts
flowcellupdated throughout
End of processing
Offsets summary Offsets.txt Data\Intensities\OffsetsOffsets for every tile, every channel, every cycle
flowcellupdated throughout
End of processing
Phasing *_phasing.txtData\Intensities\BaseCalls\Phasing
Phasing by tile, aggregated by lane, or aggregated across the flowcell
tile or lane or flowcell Cycle 12End of processing
Phasing Correction
*_phasingCorrection.xml
Data\Intensities\BaseCalls\Phasing
File indicating the size of the phase window
tile ThroughoutEnd of processing
Color Matrix *_matrix.txtData\Intensities\BaseCalls\Matrix
Matrix by tile, aggregated by lane, or aggregated across the flowcell
tile or lane or flowcell Cycle 2End of processing
Positions file *_pos.txt Data\IntensitiesSame as a .locs file, only in text format. Used for off-line analysis
tile Cycle 2As soon as transferred (cycle 2)
Real-time metric file
.broData\goldcrest\RunBrowser
An xml file with real time metric statistics at the image level
lane-cycle(optional) At intensity extraction
End of processing
Result files transferred
32
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Auxiliary outputs
File Extension Location Contains Specificity Created Deleted
Image .tif cycle directory Image data tile-cycle-channelBy SCS at scan time
After registration and extraction
Temporary Reference
.tempref lane directory Template cluster positions tile Cycle 1 After cycle 2
Temporary intensity file
.tempints lane directory Intensities from each spot in each channel in cycle 1 tile-channel (cycle 1) Cycle 1 After cycle 2
Temporary noise file
.tempnoise lane directory Noise for each spot in each channel in cycle 1 tile-channel (cycle 1) Cycle 1 After cycle 2
Temporary spot locs
.templocs lane directory Spot locations for each channel in cycle 1 tile-channel (cycle 1) Cycle 1 After cycle 2
Template locations file
.locs lane directoryCluster positions for every cluster for the tile, in Cycle 1's A image coordinates
tile Cycle 2 End of processing
Noise file .cnf cycle directory Noise value for each cluster in all four channels tile-cycleEvery cycle at Extraction
After base calling
Base call and quality score file
.bcl cycle directoryBase call and quality score for each cluster with the quality score encoded in the higher-order 6 bits of each byte
tile-cycle
Every cycle at base calling and then resaved at quality scoring
After generation of qseq file
Corrected intensity file
.dif cycle directory Intensity for each cluster after matrix and phasing correction tile-cycleEvery cycle at base calling
After quality scoring
Quality scoring flag file
.qms lane directory Empty file indicating quality scoring has occurred for this cycle tile-cycleevery cycle at quality scoring
End of processing
Quality metrics file .ctr lane directoryIntermediate file that caches quality metrics use for calculation of quality score
tileAt quality scoring, every 5th cycle
End of processing
Quality metrics file .fctr lane directoryIntermediate file that caches quality metrics use for calculation of quality score
tileEvery cyle at quality scoring
End of processing
Cycle 2 offsets file .offsets lane directoryText file storing the offsets for cycle 2. This is used to determine if the DefaultOffsets are valid or need to be replaced
tileAfter cycle 2 template building
End of processing
Transfer Request .trans Queued directory Path to the file to be transferred by the background copy thread file-specific Throughout When transfer is complete
Files not transferred
top related