map-based exploration of population biology data in vectorbase what is vectorbase? we are a...

1
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate vectors of human pathogens. In addition, it also has a variety of other ‘omics and population biology data. Tools available include BioMart, BLAST, ClustalW, Galaxy, Hmmer, Web Apollo and web-based browsers for genomic, expression and population biology (PopBio) data. Ioannis Kirmitzoglou 1 , Robert M. MacCallum 1 , George K. Christophides 1 and members of the VectorBase consortium 1,2,3 1 Imperial College London, 2 University of Notre Dame, 3 European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) SEARCH Single search box Auto-completing suggestions Taxonomy and ontology-aware (search with higher level concepts, such as “aquatic environment catch” or “pyrethroid”) Simple logic – does what you expect it to do when searching for several species or insecticides, for example Giannis HELP? Questions? Comments? Got data? Bob Ask us! Ideas? VIEW MODES Samples view: basic collection metadata for all population biology data in VectorBase IR phenotypes view: one data point for each measured insecticide resistance phenotype More view modes are planned, including genotypes and other phenotypes. CONTROLS All the usual controls you expect on a map. Pan and zoom with mouse drag and scroll wheel too… Climate, vegetation and epidemiological map layers coming soon… KEY In Samples view mode, the map markers are colored by species. A planned feature is to allow the user to color by other data categories such as collection method and sample type. ZOOM As you zoom in and out, map markers split and coalesce. Single samples are shown as pins. INFO “DRAWER” This white area on the left hand side expands and contracts when you click on these icons. It shows various plots and info tables for the samples belonging to the clicked map marker. Which population biology data? Our database is extremely flexible and can store many types of population data from the past, present and even the future… Once a significant amount of one kind of data has accumulated in VectorBase, we can provide specialized visualization tools, such as the PopBio map’s Insecticide Resistance View shown below. Essential metadata for field- collected samples includes latitude, longitude, date and protocols for collection and species identification. Phenotypes and genotypes and the assays that determined them are also stored in our databases and are available for browse, search and ultimately meta-analysis with tools to be developed. Current data includes: • 13k Anopheles gambiae complex samples with inversion and microsatellite genotypes from Taylor, Lanzaro and colleagues. 30k species distribution observations assimilated by the Malaria Atlas Project >200 individuals with high- throughput sequencing-based genotypes (shown in genome and PopBio browsers) >2000 insecticide resistance assays PLOTS Dynamically generated plots show all comparable data for the clicked marker (“selection”) vs. a user- selectable background dataset. Comparable data are, for example, all LC50 assay results measured in parts per million or all mortality data reported as a percentage. Note: non- standard insecticide concentrations can limit comparability. For now, concentrations are displayed within mouse-over popups for individual data points. Ultimately, we will provide filtering by insecticide concentration range via the main search box. The default background dataset is all comparable data from assays that match the user query (“DDT” in our example). Other options include only the comparable data visible in the current map window, and all comparable data ignoring the user query. RESISTANCE INDICATORS VectorBase’s PopBio resource contains insecticide resistance data from a range of assay protocols and reported in a variety of measures and units, such as percent mortality, lethal concentration (e.g. LC50) and lethal time (e.g. LT95). Not all of these protocols and measures have WHO- recognized thresholds for categorizing into “susceptible” or “resistant” classes. To aid the user in discovering geographical regions of resistance we have rescaled all comparable data (see PLOTS box) between 0 (susceptible) and 1 (resistant) after discarding data outside the 2 nd and 98 th percentiles, and inverting value ranges where appropriate. These rescaled values are used to color the map markers (from blue to red). Want to see a demo? Try it yourself at funcgen.vectorbase.org/popbio-map- preview

Upload: carmella-gibbs

Post on 20-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate

Map-based Exploration of PopulationBiology Data in VectorBase

What is VectorBase?We are a consortium of institutions that hosts the genomes of invertebrate vectors of human pathogens. In addition, it also has a variety of other ‘omics and population biology data. Tools available include BioMart, BLAST, ClustalW, Galaxy, Hmmer, Web Apollo and web-based browsers for genomic, expression and population biology (PopBio) data.

Ioannis Kirmitzoglou1, Robert M. MacCallum1, George K. Christophides1 and members of the VectorBase consortium1,2,3

1 Imperial College London, 2 University of Notre Dame, 3 European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI)

SEARCH• Single search box• Auto-completing suggestions• Taxonomy and ontology-aware (search

with higher level concepts, such as “aquatic environment catch” or “pyrethroid”)

• Simple logic – does what you expect it to do when searching for several species or insecticides, for example

GiannisHELP?

Questions?

Comments?

Got data?

Bob

Askus!

Ideas?

VIEW MODES• Samples view: basic collection metadata

for all population biology data in VectorBase

• IR phenotypes view: one data point for each measured insecticide resistance phenotype

More view modes are planned, including genotypes and other phenotypes.

CONTROLSAll the usual controls you expect on a map. Pan and zoom with mouse drag and scroll wheel too…

Climate, vegetation and epidemiological map layers coming soon…

KEYIn Samples view mode, the map markers are colored by species. A planned feature is to allow the user to color by other data categories such as collection method and sample type.

ZOOMAs you zoom in and out, map markers split and coalesce. Single samples are shown as pins.

INFO “DRAWER”This white area on the left hand side expands and contracts when you click on these icons.

It shows various plots and info tables for the samples belonging to the clicked map marker.

Which population biology data?Our database is extremely flexible and can store many types of population data from the past, present and even the future… Once a significant amount of one kind of data has accumulated in VectorBase, we can provide specialized visualization tools, such as the PopBio map’s Insecticide Resistance View shown below.

Essential metadata for field-collected samples includes latitude, longitude, date and protocols for collection and species identification.

Phenotypes and genotypes and the assays that determined them are also stored in our databases and are available for browse, search and ultimately meta-analysis with tools to be developed.

Current data includes:• 13k Anopheles gambiae complex

samples with inversion and microsatellite genotypes from Taylor, Lanzaro and colleagues.

• 30k species distribution observations assimilated by the Malaria Atlas Project

• >200 individuals with high-throughput sequencing-based genotypes (shown in genome and PopBio browsers)

• >2000 insecticide resistance assays

PLOTSDynamically generated plots show all comparable data for the clicked marker (“selection”) vs. a user-selectable background dataset.

Comparable data are, for example, all LC50 assay results measured in parts per million or all mortality data reported as a percentage. Note: non-standard insecticide concentrations can limit comparability. For now, concentrations are displayed within mouse-over popups for individual data points. Ultimately, we will provide filtering by insecticide concentration range via the main search box.

The default background dataset is all comparable data from assays that match the user query (“DDT” in our example). Other options include only the comparable data visible in the current map window, and all comparable data ignoring the user query.

RESISTANCE INDICATORS

VectorBase’s PopBio resource contains insecticide resistance data from a range of assay protocols and reported in a variety of measures and units, such as percent mortality, lethal concentration (e.g. LC50) and lethal time (e.g. LT95). Not all of these protocols and measures have WHO-recognized thresholds for categorizing into “susceptible” or “resistant” classes.

To aid the user in discovering geographical regions of resistance we have rescaled all comparable data (see PLOTS box) between 0 (susceptible) and 1 (resistant) after discarding data outside the 2nd and 98th percentiles, and inverting value ranges where appropriate. These rescaled values are used to color the map markers (from blue to red).

Want to see

a demo?

Try it yourself at funcgen.vectorbase.org/popbio-map-preview