quality control at scale: multiqc - intelligent analysis

1
Phil Ewels [email protected] http://phil.ewels.co.uk Quality control at scale: MultiQC - Intelligent analysis reporting Phil Ewels, Måns Magnusson, Max Käller INFR S RU URE A T CT ENOMI S G C ATCA GT N ION L GA CTAC AT A Fig 1. A typical MultiQC report. The General Statistics table ties together results from multiple programs, allowing you to see how different analysis steps affect the parameters of your data. multiqc.org multiqc.org A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. Report generated on 2015-09-24, 09:09 based on data in /Users/philewels/Desktop/MultiQC_testing/star/data Report location: /Users/philewels/Desktop/MultiQC_testing/star/multiqc_report/multiqc_report.html General Statistics Show Key Sample Name % Assigned M Assigned % Mapped M Mapped Trimmed % Dups % GC Length M Seqs || SRR1067503_1 || SRR1067505_1 || SRR1067510_1 || SRR1067514_1 || SRR1067519_1 || SRR1067522_1 v0.3.0dev General Stats featureCounts STAR Cutadapt FastQ Screen FastQC Sequence Quality Histograms Per Sequence GC Content Per Base Sequence Content Adapter Content 2.4% 0.9 63.2% 19.3 2.1% 12.9% 44% 35 30.5 7.4% 1.5 79.1% 14.2 3.5% 7.8% 47% 35 18.0 1.1% 0.6 50.6% 17.4 2.0% 11.4% 40% 35 34.3 5.7% 1.9 70.2% 23.6 3.1% 6.6% 44% 35 33.6 3.2% 0.9 81.1% 19.9 2.3% 5.8% 42% 35 24.6 1.4% 0.7 61.8% 22.0 1.5% 13.3% 40% 36 35.7 Toolbox Total Molecules (including duplicates) Unique Molecules Preseq complexity curve 0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M 0M 2M 4M 6M 8M 10M Created with MultiQC Reset zoom Fig 4. Click to zoom on interactive plots Export publication-ready graphics. Fig 3. Quickly visualise analysis results produced from a range of supported analysis tools. # Reads STAR Alignment Scores Uniquely mapped Mapped to multiple loci Mapped to too many loci Unmapped: too short Unmapped: other SRR1067503_1 SRR1067505_1 SRR1067510_1 SRR1067514_1 SRR1067519_1 SRR1067522_1 0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M 27.5M 30M 32.5M 35M 37.… Created with MultiQC Fig 2. Overlay data from multiple sources, allowing direct comparison between samples. %GC Count Per Sequence GC Content 0 10 20 30 40 50 60 70 80 90 100 0M 1M 2M 3M 4M 5M Created with MultiQC SRR106503_1_trimmed 58% GC: 1 462 329 Fig 5. Highlight samples using pattern matching, with regex support Rename and hide samples and save your config in the toolbox. MultiQC Toolbox Highlight Samples Custom Pattern + Regex mode on 2301.*_R1$ 2301.*_R2$ 2851.*_R1$ 2851.*_R2$ Toolbox Created with MultiQC Coverage (X) Fraction of reference (%) Genome Fraction Coverage 0 10 20 30 40 50 0 20 40 60 80 100 pip install multiqc cd my_data multiqc . Py2/3 MultiQC recursively searches files within target directories and builds a report from any log files that it recognises. The report opens in any web browser and has lots of tools to help interpretation. Typical Use Cases » Installation & Usage » MultiQC: Aggregate results from bioinformatics analysis across many samples into a single report. More Info & Examples » http://multiqc.info MultiQC is built using a plugin framework that makes customisation and extension simple. Routine QC after running any analysis Final step in processing pipelines Comapring different tools / parameters

Upload: others

Post on 29-Oct-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quality control at scale: MultiQC - Intelligent analysis

Phil Ewels [email protected] http://phil.ewels.co.uk

Quality control at scale:MultiQC - Intelligent analysis reporting

Phil Ewels, Måns Magnusson, Max Käller INFR S RU URE A T CT

ENOMI SG CATCA GTN ION LGA CTAC AT A

Fig 1. A typical MultiQC report. The General Statistics table ties together results from multiple programs, allowing you to see how different analysis steps affect the parameters of your data.

multiqc.orgmultiqc.org

MultiQC ReportMultiQC Report

A modular tool to aggregate results from bioinformatics analyses across many samples into a singlereport.

Report generated on 2015-09-24, 09:09 based on data in /Users/philewels/Desktop/MultiQC_testing/star/data

Report location: /Users/philewels/Desktop/MultiQC_testing/star/multiqc_report/multiqc_report.html

General Statistics Show Key

Sample Name % Assigned M Assigned % Mapped M Mapped Trimmed % Dups % GC Length M Seqs

|| SRR1067503_1

|| SRR1067505_1

|| SRR1067510_1

|| SRR1067514_1

|| SRR1067519_1

|| SRR1067522_1

v0.3.0dev

General Stats

featureCounts

STAR

Cutadapt

FastQ Screen

FastQC

Sequence Quality Histograms

Per Sequence GC Content

Per Base Sequence Content

Adapter Content

2.4% 0.9 63.2% 19.3 2.1% 12.9% 44% 35 30.5

7.4% 1.5 79.1% 14.2 3.5% 7.8% 47% 35 18.0

1.1% 0.6 50.6% 17.4 2.0% 11.4% 40% 35 34.3

5.7% 1.9 70.2% 23.6 3.1% 6.6% 44% 35 33.6

3.2% 0.9 81.1% 19.9 2.3% 5.8% 42% 35 24.6

1.4% 0.7 61.8% 22.0 1.5% 13.3% 40% 36 35.7

Tool

box

Total Molecules (including duplicates)

Uni

que

Mol

ecul

es

Preseq complexity curve

0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M0M

2M

4M

6M

8M

10M

Created with MultiQC

Reset zoom

Fig 4. Click to zoom on interactive plotsExport publication-ready graphics.

Fig 3. Quickly visualise analysis results produced from a range of supported analysis tools.

# Reads

STAR Alignment Scores

Uniquely mapped Mapped to multiple loci Mapped to too many loci Unmapped: too short Unmapped: other

SRR1067503_1

SRR1067505_1

SRR1067510_1

SRR1067514_1

SRR1067519_1

SRR1067522_1

0M 2.5M 5M 7.5M 10M 12.5M 15M 17.5M 20M 22.5M 25M 27.5M 30M 32.5M 35M 37.…

Created with MultiQC

Fig 2. Overlay data from multiple sources, allowing direct comparison between samples.

%GC

Cou

nt

Per Sequence GC Content

0 10 20 30 40 50 60 70 80 90 1000M

1M

2M

3M

4M

5M

Created with MultiQC

SRR106503_1_trimmed58% GC: 1 462 329

Fig 5. Highlight samples using pattern matching, with regex supportRename and hide samples and save your config in the toolbox.

MultiQC Toolbox

Highlight SamplesCustom Pattern +

Regex mode on

2301.*_R1$

2301.*_R2$

2851.*_R1$

2851.*_R2$

Tool

box

Created with MultiQC

Coverage (X)

Frac

tion

of re

fere

nce

(%)

Genome Fraction Coverage

0 10 20 30 40 500

20

40

60

80

100

pip install multiqccd my_datamultiqc .

Py2/3

MultiQC recursively searches files within target directories and builds a report from any log files that it recognises.

The report opens in any web browser and has lots of tools to help interpretation.

Typical Use Cases»Installation & Usage»

MultiQC: Aggregate results from bioinformatics analysis across many samples into a single report.

More Info & Examples»http://multiqc.info

MultiQC is built using a plugin framework that makes customisation and extension simple.

Routine QC after running any analysis

Final step in processing pipelines

Comapring different tools / parameters