statistical tool for identifying sequence variations that correlate with virus phenotypic...

11
Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22, 2013 Meta-CATS

Upload: meagan-stephens

Post on 31-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource

(ViPR)

July 22, 2013

Meta-CATS

Page 2: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Overview

• Overview of the Meta-CATS algorithm• Metadata grouping• Statistical testing• Two similar integrated web toolkits :– The Virus Pathogen Resource (ViPR – viprbrc.org)– The Influenza Research Database (IRD – fludb.org)

• Review results from two use cases

Page 3: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

The Meta-CATS Algorithm

1. Collect a set of virus strains(search database or upload file)

2. Group strains by a metadata attribute or upload a spreadsheet that defines the groups

3. Perform multiple sequence alignment

4. Automatically identify residue positions where there are statistically significant differences between the groups

5. Report results

A flexible web-based tool with a few basic steps:

Page 4: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Grouping based on Metadata

Examples of metadata that may be of interest:• Host of isolation• Severity of disease• Drug resistance• Geographical location• Date of isolation• Phylogenetic clade assignment• Other taxonomic assignments (serotype, genotype,

etc.)• Or any User Defined attribute in a spreadsheet

Page 5: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

The Meta-CATS Computation

• Multiple sequence alignment of all strains• At each residue position (nucleotide or AA)

perform a chi-squared test of independence• When there are more than 2 groups, at each

position identified, perform a chi-square test to determine which pairs of groups contribute to the significant result.

• Computed results can be viewed directly or downloaded as a CSV file.

Page 6: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

The ViPR / IRD Toolkits

Location of new Meta-CATS Algorithm

Page 7: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Workbench and Metadata Attributes

Page 8: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

First use Case: SARS Coronavirus

The “Host” metadata field was used to find the positional differences in Human and Civet predominant strains

The Meta-CATS algorithm identified 117 nucleotide positions that significantly differed between the civet and human isolates. The raw p-values ranged from 2.49x10-2 to 4.33x10-12.

“Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community”. Picket et. al., Viruses. 2012 Nov 19;4(11) 3209-26

Page 9: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Second Use Case: Dengue Virus

• The “Geographic Location” metadata was used to identify 61 significant differences in the polyprotein between strains of Dengue-3 virus isolated from the Eastern Hemisphere and Western Hemisphere.– Further inspection of the group-specific amino acid composition found a

clade of “outlier” sequences likely due to an international transmission event.

• A separate analysis identified distinct NS1 amino acid residue variations correlating with DENV serotypes– The Meta-CATS algorithm identified 19 positions where the 4 serotypes

differed. In 3 locations, which are located within experimentally-determined antibody epitopes, the p-values were less than 7.07x10-193.

“Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations Dependent on Differences in Viral Metadata.” Picket et. al., J. of Virology. (submitted)

Page 10: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Summary

Through the readily accessible search interface and integrated comparative genomics tools such as Meta-CATS, researchers can easily generate hypotheses that can then be tested in the lab and

applied to the development of therapeutics and vaccines.

www.viprbrc.org www.fludb.org

ViPR IRD

Page 11: Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,

Acknowledgements

J Craig Venter Institute• Richard H. Scheuermann• Brett E. Pickett• Brian Aevermann• Yun Zhang• Rick Stanton

SMU• Mengya Liu• Eva Sadat• Monnie McGee

Northrop GrummanHealth Solutions• Edward B. Klem• Sherry He• Sam Zaremba• Sanjeev Kumar• Liwei Zhou• Wei Jen

Vecna• Christopher N. Larsen