quality control at scale: multiqc - intelligent analysis
TRANSCRIPT
Phil Ewels [email protected] http://phil.ewels.co.uk
Quality control at scale:MultiQC - Intelligent analysis reporting
Phil Ewels, Måns Magnusson, Max Käller INFR S RU URE A T CT
ENOMI SG CATCA GTN ION LGA CTAC AT A
Fig 1. A typical MultiQC report. The General Statistics table ties together results from multiple programs, allowing you to see how different analysis steps affect the parameters of your data.
multiqc.orgmultiqc.org
MultiQC ReportMultiQC Report
A modular tool to aggregate results from bioinformatics analyses across many samples into a singlereport.
Report generated on 2015-09-24, 09:09 based on data in /Users/philewels/Desktop/MultiQC_testing/star/data
Report location: /Users/philewels/Desktop/MultiQC_testing/star/multiqc_report/multiqc_report.html
General Statistics Show Key
Sample Name % Assigned M Assigned % Mapped M Mapped Trimmed % Dups % GC Length M Seqs
|| SRR1067503_1
|| SRR1067505_1
|| SRR1067510_1
|| SRR1067514_1
|| SRR1067519_1
|| SRR1067522_1
v0.3.0dev
General Stats
featureCounts
STAR
Cutadapt
FastQ Screen
FastQC
Sequence Quality Histograms
Per Sequence GC Content
Per Base Sequence Content
Adapter Content
2.4% 0.9 63.2% 19.3 2.1% 12.9% 44% 35 30.5
7.4% 1.5 79.1% 14.2 3.5% 7.8% 47% 35 18.0
1.1% 0.6 50.6% 17.4 2.0% 11.4% 40% 35 34.3
5.7% 1.9 70.2% 23.6 3.1% 6.6% 44% 35 33.6
3.2% 0.9 81.1% 19.9 2.3% 5.8% 42% 35 24.6
1.4% 0.7 61.8% 22.0 1.5% 13.3% 40% 36 35.7
Tool
box
Total Molecules (including duplicates)
Uni
que
Mol
ecul
es
Preseq complexity curve
0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M0M
2M
4M
6M
8M
10M
Created with MultiQC
Reset zoom
Fig 4. Click to zoom on interactive plotsExport publication-ready graphics.
Fig 3. Quickly visualise analysis results produced from a range of supported analysis tools.
# Reads
STAR Alignment Scores
Uniquely mapped Mapped to multiple loci Mapped to too many loci Unmapped: too short Unmapped: other
SRR1067503_1
SRR1067505_1
SRR1067510_1
SRR1067514_1
SRR1067519_1
SRR1067522_1
0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M 27.5M 30M 32.5M 35M 37.…
Created with MultiQC
Fig 2. Overlay data from multiple sources, allowing direct comparison between samples.
%GC
Cou
nt
Per Sequence GC Content
0 10 20 30 40 50 60 70 80 90 1000M
1M
2M
3M
4M
5M
Created with MultiQC
SRR106503_1_trimmed58% GC: 1 462 329
Fig 5. Highlight samples using pattern matching, with regex supportRename and hide samples and save your config in the toolbox.
MultiQC Toolbox
Highlight SamplesCustom Pattern +
Regex mode on
2301.*_R1$
2301.*_R2$
2851.*_R1$
2851.*_R2$
Tool
box
Created with MultiQC
Coverage (X)
Frac
tion
of re
fere
nce
(%)
Genome Fraction Coverage
0 10 20 30 40 500
20
40
60
80
100
pip install multiqccd my_datamultiqc .
Py2/3
MultiQC recursively searches files within target directories and builds a report from any log files that it recognises.
The report opens in any web browser and has lots of tools to help interpretation.
Typical Use Cases»Installation & Usage»
MultiQC: Aggregate results from bioinformatics analysis across many samples into a single report.
More Info & Examples»http://multiqc.info
MultiQC is built using a plugin framework that makes customisation and extension simple.
Routine QC after running any analysis
Final step in processing pipelines
Comapring different tools / parameters