era7 bioinformatics full length 16s taxonomic profiling with pacbio ccs reads and mg7 - db7
TRANSCRIPT
August 2016 www.era7bioinformatics.com
Era7 Bioinformatics’ Full-length 16S taxonomic profiling
with PacBio
www.era7bioinformatics.com
Sequencing 16S ribosomal RNA variable regions to study bacterial diversity • 16S ribosomal RNA (or 16S rRNA) is a component of the 30S
small subunit of prokaryotic ribosomes. The genes coding for it are referred to as 16S rRNA and are used for taxonomic classification and reconstructing phylogenies
• NGS-based 16S rRNA sequencing is a culture-free technique to infer the entire microbial community within a sample
The usefulness and applicability of 16S studies is impressive but the experimental assay and the bioinformatics analysis are complex. It is important to consider all the aspects to do the integral design of the project to get better results.
www.era7bioinformatics.com
Important points in 16S profiling: The sequencing coverage To detect even minority-bacteria is needed to reach a sufficient sequencing resolution or coverage. NGS technologies make this kind of analysis possible as they provide higher throughput at lower cost.
www.era7bioinformatics.com
Important points in 16S profiling: The read length The higher the length, the more precise the taxonomic assignment is. If we want to have a taxonomical assignment at the species level we need to find unique species-specific sequences able to unequivocally identify the presence of each species. Larger sequences allow the taxonomic assignment to more specific taxonomical ranks
www.era7bioinformatics.com
Important points in 16S profiling: The error rate of the sequences The sequence variations in the 16S variable regions are subtle and sequence errors can cause miss-assignments
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio
The bacterial 16S ribosomal rRNA is a complex gene that has 9 variable regions. The usual NGS approaches for 16S analysis are based in the sequencing of one or two variable regions of the 16S ribosomal subunit using short reads technologies. Hence, using short read technologies only 2 of the 9 variable regions are screened to distinguish the taxa present in a sample.
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio
100% of the hypervariable regions are analyzed using PacBio reads
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio Using PacBio long reads we have the sequence of the 16S full gene in each read and a significantly higher specificity and resolution capacity to do the taxonomic assignments based on the differences in the 16S full gene sequence
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio:
• Long reads With PacBio the read length reach the maximum needed because PacBio covers the 16S full gene with each read
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio: • Multiplexing
Thanks to the PacBio multiplexing capabilities you can choose the coverage that fits your objective
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio: • High quality sequences using CCSs
16S analysis with PacBio is based on the use of CCS: Circular Consensus Sequences and, thus, you get a final sequence quality around 99.9 %
www.era7bioinformatics.com
SERVICE MG7 for full-length 16S taxonomic profiling with PacBio
by Era7 Bioinformatics INC
www.era7bioinformatics.com
MG7 Bioinformatics analysis for 16S PacBio sequences
MG7 is a complete analysis tool developed by Era7 Bioinformatics oriented to provide taxonomic assignment results for big sets of sequences. MG7 pipelines of analysis are continuously being updated with the newest approaches.
www.era7bioinformatics.com
Our rRNA reference database DB7 We have built our reference database DB7 of 16S and 18S sequences based on the complete RNAcentral release 5 . RNAcentral is a general database for all the types of non coding RNA maintained by RNAcentral Consortium
MG7 Bioinformatics analysis for 16S PacBio sequences
www.era7bioinformatics.com
Our taxonomic assignment algorithm is exhaustive We compare each read against all the sequences of our DB7 database. The taxonomic assignment for each read is based on the results of BLASTN of each read against our DB7 database.
MG7 Bioinformatics analysis for 16S PacBio sequences
www.era7bioinformatics.com
Our algorithm for taxonomic assignment provides results for two different assignment approaches:
• Best Blast Hit (BBH)
• Lowest Common Ancestor (LCA)
MG7 Bioinformatics analysis for 16S PacBio sequences:
www.era7bioinformatics.com
MG7 provides a rich set of deliverables with 4 different types of abundance values for each of the 2 approaches (BBH and LCA) to evaluate the frequencies of bacterial and archaeal organisms:
• direct values and cumulative abundance values • absolute counts and abundance percentages
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
www.era7bioinformatics.com
Best BLAST Hit (BBH): • Direct Assignment, Absolute Values • Direct Assignment, Percentage Values • Cumulative Assignment, Absolute Values • Cumulative Assignment, Percentage Values
Lowest Common Ancestor Algorithm (LCA): • Direct Assignment, Absolute Values • Direct Assignment, Percentage Values • Cumulative Assignment, Absolute Values • Cumulative Assignment, Percentage Values
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
MG7 provides a rich set of deliverables including tables per sample, per groups of samples, global, per ranks:
• Abundance tables per sample • All the ranks in a complete table • Abundances for each rank • Abundance tables per each defined group of samples • Abundance tables for all the samples together
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
MG7 provides a rich set of deliverables including Analysis of diversity indexes
The Shannon-Wiener and Simpson’s diversity indexes are calculated for each sample.
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
Comparison of groups of samples We provide statistical analysis for the study of differences between groups of samples. We use for it open tools based on R software from CRAN (The Comprehensive R Archive Network). In each case we apply the most appropriate approaches.
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
Some types of statistical analysis provided for the comparison of groups of samples: • Univariate statistics (fold change analysis, t-tests, volcano plots, one-way
ANOVA, correlation analysis) • Multivariate statistics (principal component analysis , partial least
squares discriminant analysis) • Clustering (dendrograms, heatmaps, K-means clustering, self organizing
feature maps) • Supervised classification (random forests, support vector machine)
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
Charts and Reports that MG7 provides
Different types of charts with the possibility of providing interactive visualizations (See our research project BIOGRAPHIKA about interactive visualizations) Complete results in compliant formats Technical reports ready to scientific publication
www.era7bioinformatics.com
DELIVERABLES of MG7 Bioinformatics analysis for 16S PacBio sequences:
MG7 workflow for 16S PacBio
• Our Reference database DB7 • Exhaustive taxonomic assignment
for each read • Two different taxonomic
assignment approaches, Best Blast Hit (BBH) and Lowest Common Ancestor (LCA)
• A complete set of deliverables
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio and Era7 Bioinformatics MG7
• 100% of the hypervariable regions • Obtaining 16S full gene sequences • More specific taxonomic assignments
www.era7bioinformatics.com
Full-length 16S taxonomic profiling with PacBio and Era7 Bioinformatics MG7
www.era7bioinformatics.com
You can order now! Full-length 16S taxonomic profiling with PacBio and MG7 Bioinformatics Analysis Service
[email protected] www.era7bioinformatics.com
www.era7bioinformatics.com