coronavirus positivity predicting bat

Post on 29-May-2022

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Dark Knight - Timothy Quek, Ryan Kim, Peter Wang, Isaac Law

Predicting Bat Coronavirus Positivity

Background

● Bats comprise ~20% of mammal species (> 1,400 species)

● Serve as reservoir hosts of many deadly viruses (e.g. Ebola, Hendra, Nipah)

● The SARS-CoV-2 virus that led to the current COVID-19 pandemic likely originated from an Asian bat species

● Scientists do research on bats worldwide to study relationships between bats and viruses

Predict factors that make it more likely for a particular bat species to be a potential coronavirus reservoir host

- Geographical and Environmental Characteristics- Morphological and Other Biological Traits- Phylogenetic Group

Problem Statement

Data Analysis

01.Bat CoV PositivityDataset manually collected from 100+ published papersLook for coronavirus positivity rates among samples from bats

PanTHERIADataset on global mammalian species-level dataset of life-history, ecological and geographical traits

02.

Datasets

Bat Ecology / Viral DiversityBat specific dataset used in a study on viral diversity and reservoir status in a Canadian study

04.

EltonTraits 1.0Dataset on global species-level foraging attributes of mammals

03.

Zoonotic Infectious DiseasesDataset used in a study on zoonotic emerging infectious diseases, including geographical / environmental features

05.

Main Features Included: 13 features selected out of 72

Weather- Weather cluster (Precipitation + Temperature)- Actual / Potential Evapotranspiration Rate

Location Cluster (approximately corresponds to continent)- Geographical cluster

Phylogenetic Cluster (56 million years ago)- Factorized cluster

Land Use and Environment- Land-Barren- Evergreen Broadleaf- Managed Vegetation- Crop Change

Species Diversity / Livestock- Mammalian Diversity, - # of Poultry (log)- # of mammals livestock (log)

Human Population- Human Population Density change (for each grid cell)

Weather- Precipitation and Temperature

● Each dot represents one particular bat species● High prevalence: ≥5% coronavirus positivity (red dots); Low prevalence: <5% positivity (blue dots)

Low PrevalenceHigh Prevalence

Effect of Temperature and Humidity on Coronavirus Infectivity

Adv Virol 2011;2011:734690. doi: 10.1155/2011/734690. Epub 2011 Oct 1.

Land Use and Environment

Land Use and Environment- Proportion of Barren land- Broadleaf Evergreen Forest- Managed Vegetation- Cropland Change

● Higher proportion of barren land in the geographical distribution of bat species associated with higher coronavirus prevalence

Phylogenetic Cluster (56 million years ago)

● PhyloClust56-Phylogenetic clusters based on evolutionary relationships between bats 56 million years ago

● “PC3” showed a lower coronavirus positivity compared to the other phylogenetic clusters on univariate analysis

Ecology of Mammals

Ecology of Mammals- Mammalian Diversity- # of Poultry (log)- # of mammals livestock (log)

● (1) Mammalian diversity and(2) Poultry / mammalian livestock headcounts show statistically significant relationships with bat coronavirus prevalence

Mammalian Diversity and Emerging Infectious Diseases

● High mammalian biodiversity is associated with lower prevalence of bat coronavirus positivity

● Previous research- biodiversity loss increases disease transmission● Mechanism is unclear- one speculation:

○ Species better at buffering disease transmission are affected more with biodiversity reduction

○ Conversely, species with higher rates of reproduction (and spend less resources on host immunity) may survive longer during reductions in biodiversity

Nature. 2010 Dec 2;468(7324):647-52.

Geographical Location

Location Cluster - Bat species found in the location cluster corresponding to Africa have a higher coronavirus positivity (after correction for other factors)

※ Red dots : ≥5% positivity※ Blue dots: <5% positivity

Modeling and Results

● Prevalence Rate Modeling- Poisson Regression

● High vs Low Coronavirus Prevalence Classification- Generalized Boosted Model

Two Main Analyses

Feature Correlation & Feature Engineering

● Modelled outcome-Number of positive bats (out of 100 bats)

● Stepwise forward inclusion based on AIC

● RMSE ~ 5.5

● Reasonable fit- except under-fitting at both extremes

Poisson Regression - Count Response Modeling

● Model accuracy: ~ 74% ● GBM does not provide p-values or

coefficients, but ranks variables by relative influence

● Mammal & poultry ecological variables have heavy influence on bat coronavirus positivity (mammal biodiversity, mammal livestock / poultry headcount)

● Consistent with previous studies

Generalized Boosted Model - Binary Classification

● Mammalian biodiversity plays an important role in both models● Bats in geographical ranges with HIGHER mammal biodiversity => lower CoV prevalence● Weather, land use and ecological factors come after mammalian diversity

Model InferenceRegression Classification

Predictions

● Prediction Process:Construct 95% C.I. with Poisson Regression, cross-check with GBM model

● Bat species flagged as “high CoV risk” when both models converge

● Attempted to predict the coronavirus risk in Rhinolophus bats- thought to be a major reservoir of SARS related coronaviruses

Predictions

Model PredictionsRhinolophus inops

Rhinolophus subrufus

Findings● Factors increasing the risk of high bat

coronavirus prevalence include reduced mammalian diversity and low temperature / humidity

● Weather, land use and ecological factors have higher explanatory power than bat characteristics

● Our models predict that 5 species of Rhinolophus bats from the Philippines likely have a high coronavirus prevalence

● Deforestation and destruction of animal habitats likely contribute to the higher incidence of emerging infectious diseases

● The importance of the loss of mammalian diversity to predict the outcome likely reflects this point specifically

● “...global changes in the mode and the intensity of land use are creating expanding hazardous interfaces between people, livestock and wildlife reservoirs of zoonotic disease.”

Footnote

Nature. 2020 Aug; 584(7821): 398-402.

● Professor Maria Cristina Rulli, Politecnico di Milano● Professor Paolo D’Odorico, University of California, Berkeley● Dr. Amanda Adams, Bat Conservation International● Dr. Natasha Spottiswoode, University of California, San Francisco● Authors of all the papers that we used in this Capstone project

● Dr. Fred Nugen, University of California, Berkeley● Dr. Alberto Todeschini, University of California, Berkeley● Our wonderful section mates● Our families● The bats

ACKNOWLEDGEMENTS

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik

THANKS!http://bat-cov-positivity.org/home

Main Features to Include

Weather- Mean Precipitation- Mean Temperature (squared)

● Using K-means clustering, put bat species into 2 clusters based on temperature and precipitation

● Monthly mean precipitation / temperature of species habitat:Low temperature and precipitation associated with higher bat coronavirus prevalence

Contributions By Team Members

All of the team members contributed actively in the following areas of the project:

Development and refinement of conceptLiterature ReviewCoordination with ProfessorsData collectionData Cleaning and missing Data imputationVisualizationsMachine Learning AlgorithmsWebsite DesignWriting of Paper

EDA of Selected Features in Final Merged Dataset

Change in Human Population Density

Human Population- Rate of Change in Human Population Density between 1990 and 1995

● “HuPopDen_Chg” shows the rate of change of human population density between 1990 and 1995

● Interestingly, a lower change in human density (between 1990 to 1995) tends to be associated with a higher bat coronavirus prevalence

Land Use and Environment

Land Use and Environment- Land-Barren- Evergreen Broadleaf- Managed Vegetation- Crop Change

● Change in land use for cropland (grid cell, 1900-2000) and the proportion of area covered by barren land/ evergreen/ cultivate vegetation show significance

Possible Future Studies (and Capstone Projects?)

● Using species distribution and land use data, predict potential intermediate hosts that may result in coronavirus spillover infections from bats to humans

● Choosing a specific bat related zoonosis with well mapped out index cases, aim to predict areas with a high likelihood of future cases

top related