era7 bioinformatics full length 16s taxonomic profiling with pacbio ccs reads and mg7 - db7

28
August 2016 www.era7bioinformatics.com Era7 Bioinformatics’ Full-length 16S taxonomic profiling with PacBio

Upload: era7-information-technologies-slu

Post on 14-Jan-2017

1.001 views

Category:

Science


2 download

TRANSCRIPT

August 2016 www.era7bioinformatics.com

Era7 Bioinformatics’ Full-length 16S taxonomic profiling

with PacBio

www.era7bioinformatics.com

Sequencing 16S ribosomal RNA variable regions to study bacterial diversity • 16S ribosomal RNA (or 16S rRNA) is a component of the 30S

small subunit of prokaryotic ribosomes. The genes coding for it are referred to as 16S rRNA and are used for taxonomic classification and reconstructing phylogenies

• NGS-based 16S rRNA sequencing is a culture-free technique to infer the entire microbial community within a sample

The usefulness and applicability of 16S studies is impressive but the experimental assay and the bioinformatics analysis are complex. It is important to consider all the aspects to do the integral design of the project to get better results.

www.era7bioinformatics.com

Important points in 16S profiling: The sequencing coverage To detect even minority-bacteria is needed to reach a sufficient sequencing resolution or coverage. NGS technologies make this kind of analysis possible as they provide higher throughput at lower cost.

www.era7bioinformatics.com

Important points in 16S profiling: The read length The higher the length, the more precise the taxonomic assignment is. If we want to have a taxonomical assignment at the species level we need to find unique species-specific sequences able to unequivocally identify the presence of each species. Larger sequences allow the taxonomic assignment to more specific taxonomical ranks

www.era7bioinformatics.com

Important points in 16S profiling: The error rate of the sequences The sequence variations in the 16S variable regions are subtle and sequence errors can cause miss-assignments

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio

The bacterial 16S ribosomal rRNA is a complex gene that has 9 variable regions. The usual NGS approaches for 16S analysis are based in the sequencing of one or two variable regions of the 16S ribosomal subunit using short reads technologies. Hence, using short read technologies only 2 of the 9 variable regions are screened to distinguish the taxa present in a sample.

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio

100% of the hypervariable regions are analyzed using PacBio reads

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio Using PacBio long reads we have the sequence of the 16S full gene in each read and a significantly higher specificity and resolution capacity to do the taxonomic assignments based on the differences in the 16S full gene sequence

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio:

• Long reads With PacBio the read length reach the maximum needed because PacBio covers the 16S full gene with each read

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio: • Multiplexing

Thanks to the PacBio multiplexing capabilities you can choose the coverage that fits your objective

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio: • High quality sequences using CCSs

16S analysis with PacBio is based on the use of CCS: Circular Consensus Sequences and, thus, you get a final sequence quality around 99.9 %

www.era7bioinformatics.com

SERVICE MG7 for full-length 16S taxonomic profiling with PacBio

by Era7 Bioinformatics INC

www.era7bioinformatics.com

MG7 Bioinformatics analysis for 16S PacBio sequences

MG7 is a complete analysis tool developed by Era7 Bioinformatics oriented to provide taxonomic assignment results for big sets of sequences. MG7 pipelines of analysis are continuously being updated with the newest approaches.

www.era7bioinformatics.com

Our rRNA reference database DB7 We have built our reference database DB7 of 16S and 18S sequences based on the complete RNAcentral release 5 . RNAcentral is a general database for all the types of non coding RNA maintained by RNAcentral Consortium

MG7 Bioinformatics analysis for 16S PacBio sequences

www.era7bioinformatics.com

Our taxonomic assignment algorithm is exhaustive We compare each read against all the sequences of our DB7 database. The taxonomic assignment for each read is based on the results of BLASTN of each read against our DB7 database.

MG7 Bioinformatics analysis for 16S PacBio sequences

www.era7bioinformatics.com

Our algorithm for taxonomic assignment provides results for two different assignment approaches:

• Best Blast Hit (BBH)

• Lowest Common Ancestor (LCA)

MG7 Bioinformatics analysis for 16S PacBio sequences:

www.era7bioinformatics.com

MG7 provides a rich set of deliverables with 4 different types of abundance values for each of the 2 approaches (BBH and LCA) to evaluate the frequencies of bacterial and archaeal organisms:

• direct values and cumulative abundance values • absolute counts and abundance percentages

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

www.era7bioinformatics.com

Best BLAST Hit (BBH): • Direct Assignment, Absolute Values • Direct Assignment, Percentage Values • Cumulative Assignment, Absolute Values • Cumulative Assignment, Percentage Values

Lowest Common Ancestor Algorithm (LCA): • Direct Assignment, Absolute Values • Direct Assignment, Percentage Values • Cumulative Assignment, Absolute Values • Cumulative Assignment, Percentage Values

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

MG7 provides a rich set of deliverables including tables per sample, per groups of samples, global, per ranks:

• Abundance tables per sample • All the ranks in a complete table • Abundances for each rank • Abundance tables per each defined group of samples • Abundance tables for all the samples together

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

MG7 provides a rich set of deliverables including Analysis of diversity indexes

The Shannon-Wiener and Simpson’s diversity indexes are calculated for each sample.

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

Comparison of groups of samples We provide statistical analysis for the study of differences between groups of samples. We use for it open tools based on R software from CRAN (The Comprehensive R Archive Network). In each case we apply the most appropriate approaches.

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

Some types of statistical analysis provided for the comparison of groups of samples: • Univariate statistics (fold change analysis, t-tests, volcano plots, one-way

ANOVA, correlation analysis) • Multivariate statistics (principal component analysis , partial least

squares discriminant analysis) • Clustering (dendrograms, heatmaps, K-means clustering, self organizing

feature maps) • Supervised classification (random forests, support vector machine)

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

Charts and Reports that MG7 provides

Different types of charts with the possibility of providing interactive visualizations (See our research project BIOGRAPHIKA about interactive visualizations) Complete results in compliant formats Technical reports ready to scientific publication

www.era7bioinformatics.com

DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:

MG7 workflow for 16S PacBio

• Our Reference database DB7 • Exhaustive taxonomic assignment

for each read • Two different taxonomic

assignment approaches, Best Blast Hit (BBH) and Lowest Common Ancestor (LCA)

• A complete set of deliverables

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio and Era7 Bioinformatics MG7

• 100% of the hypervariable regions • Obtaining 16S full gene sequences • More specific taxonomic assignments

www.era7bioinformatics.com

Full-length 16S taxonomic profiling with PacBio and Era7 Bioinformatics MG7

www.era7bioinformatics.com

You can order now! Full-length 16S taxonomic profiling with PacBio and MG7 Bioinformatics Analysis Service

[email protected] www.era7bioinformatics.com

www.era7bioinformatics.com