coronavirus positivity predicting bat

33
The Dark Knight - Timothy Quek, Ryan Kim, Peter Wang, Isaac Law Predicting Bat Coronavirus Positivity

Upload: others

Post on 29-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coronavirus Positivity Predicting Bat

The Dark Knight - Timothy Quek, Ryan Kim, Peter Wang, Isaac Law

Predicting Bat Coronavirus Positivity

Page 2: Coronavirus Positivity Predicting Bat

Background

● Bats comprise ~20% of mammal species (> 1,400 species)

● Serve as reservoir hosts of many deadly viruses (e.g. Ebola, Hendra, Nipah)

● The SARS-CoV-2 virus that led to the current COVID-19 pandemic likely originated from an Asian bat species

● Scientists do research on bats worldwide to study relationships between bats and viruses

Page 3: Coronavirus Positivity Predicting Bat

Predict factors that make it more likely for a particular bat species to be a potential coronavirus reservoir host

- Geographical and Environmental Characteristics- Morphological and Other Biological Traits- Phylogenetic Group

Problem Statement

Page 4: Coronavirus Positivity Predicting Bat

Data Analysis

Page 5: Coronavirus Positivity Predicting Bat

01.Bat CoV PositivityDataset manually collected from 100+ published papersLook for coronavirus positivity rates among samples from bats

PanTHERIADataset on global mammalian species-level dataset of life-history, ecological and geographical traits

02.

Datasets

Bat Ecology / Viral DiversityBat specific dataset used in a study on viral diversity and reservoir status in a Canadian study

04.

EltonTraits 1.0Dataset on global species-level foraging attributes of mammals

03.

Zoonotic Infectious DiseasesDataset used in a study on zoonotic emerging infectious diseases, including geographical / environmental features

05.

Page 6: Coronavirus Positivity Predicting Bat

Main Features Included: 13 features selected out of 72

Weather- Weather cluster (Precipitation + Temperature)- Actual / Potential Evapotranspiration Rate

Location Cluster (approximately corresponds to continent)- Geographical cluster

Phylogenetic Cluster (56 million years ago)- Factorized cluster

Land Use and Environment- Land-Barren- Evergreen Broadleaf- Managed Vegetation- Crop Change

Species Diversity / Livestock- Mammalian Diversity, - # of Poultry (log)- # of mammals livestock (log)

Human Population- Human Population Density change (for each grid cell)

Page 7: Coronavirus Positivity Predicting Bat

Weather- Precipitation and Temperature

● Each dot represents one particular bat species● High prevalence: ≥5% coronavirus positivity (red dots); Low prevalence: <5% positivity (blue dots)

Low PrevalenceHigh Prevalence

Page 8: Coronavirus Positivity Predicting Bat

Effect of Temperature and Humidity on Coronavirus Infectivity

Adv Virol 2011;2011:734690. doi: 10.1155/2011/734690. Epub 2011 Oct 1.

Page 9: Coronavirus Positivity Predicting Bat

Land Use and Environment

Land Use and Environment- Proportion of Barren land- Broadleaf Evergreen Forest- Managed Vegetation- Cropland Change

● Higher proportion of barren land in the geographical distribution of bat species associated with higher coronavirus prevalence

Page 10: Coronavirus Positivity Predicting Bat

Phylogenetic Cluster (56 million years ago)

● PhyloClust56-Phylogenetic clusters based on evolutionary relationships between bats 56 million years ago

● “PC3” showed a lower coronavirus positivity compared to the other phylogenetic clusters on univariate analysis

Page 11: Coronavirus Positivity Predicting Bat

Ecology of Mammals

Ecology of Mammals- Mammalian Diversity- # of Poultry (log)- # of mammals livestock (log)

● (1) Mammalian diversity and(2) Poultry / mammalian livestock headcounts show statistically significant relationships with bat coronavirus prevalence

Page 12: Coronavirus Positivity Predicting Bat

Mammalian Diversity and Emerging Infectious Diseases

● High mammalian biodiversity is associated with lower prevalence of bat coronavirus positivity

● Previous research- biodiversity loss increases disease transmission● Mechanism is unclear- one speculation:

○ Species better at buffering disease transmission are affected more with biodiversity reduction

○ Conversely, species with higher rates of reproduction (and spend less resources on host immunity) may survive longer during reductions in biodiversity

Nature. 2010 Dec 2;468(7324):647-52.

Page 13: Coronavirus Positivity Predicting Bat

Geographical Location

Location Cluster - Bat species found in the location cluster corresponding to Africa have a higher coronavirus positivity (after correction for other factors)

※ Red dots : ≥5% positivity※ Blue dots: <5% positivity

Page 14: Coronavirus Positivity Predicting Bat

Modeling and Results

Page 15: Coronavirus Positivity Predicting Bat

● Prevalence Rate Modeling- Poisson Regression

● High vs Low Coronavirus Prevalence Classification- Generalized Boosted Model

Two Main Analyses

Page 16: Coronavirus Positivity Predicting Bat

Feature Correlation & Feature Engineering

Page 17: Coronavirus Positivity Predicting Bat

● Modelled outcome-Number of positive bats (out of 100 bats)

● Stepwise forward inclusion based on AIC

● RMSE ~ 5.5

● Reasonable fit- except under-fitting at both extremes

Poisson Regression - Count Response Modeling

Page 18: Coronavirus Positivity Predicting Bat

● Model accuracy: ~ 74% ● GBM does not provide p-values or

coefficients, but ranks variables by relative influence

● Mammal & poultry ecological variables have heavy influence on bat coronavirus positivity (mammal biodiversity, mammal livestock / poultry headcount)

● Consistent with previous studies

Generalized Boosted Model - Binary Classification

Page 19: Coronavirus Positivity Predicting Bat

● Mammalian biodiversity plays an important role in both models● Bats in geographical ranges with HIGHER mammal biodiversity => lower CoV prevalence● Weather, land use and ecological factors come after mammalian diversity

Model InferenceRegression Classification

Page 20: Coronavirus Positivity Predicting Bat

Predictions

Page 21: Coronavirus Positivity Predicting Bat

● Prediction Process:Construct 95% C.I. with Poisson Regression, cross-check with GBM model

● Bat species flagged as “high CoV risk” when both models converge

● Attempted to predict the coronavirus risk in Rhinolophus bats- thought to be a major reservoir of SARS related coronaviruses

Predictions

Page 22: Coronavirus Positivity Predicting Bat

Model PredictionsRhinolophus inops

Rhinolophus subrufus

Page 23: Coronavirus Positivity Predicting Bat

Findings● Factors increasing the risk of high bat

coronavirus prevalence include reduced mammalian diversity and low temperature / humidity

● Weather, land use and ecological factors have higher explanatory power than bat characteristics

● Our models predict that 5 species of Rhinolophus bats from the Philippines likely have a high coronavirus prevalence

Page 24: Coronavirus Positivity Predicting Bat

● Deforestation and destruction of animal habitats likely contribute to the higher incidence of emerging infectious diseases

● The importance of the loss of mammalian diversity to predict the outcome likely reflects this point specifically

● “...global changes in the mode and the intensity of land use are creating expanding hazardous interfaces between people, livestock and wildlife reservoirs of zoonotic disease.”

Footnote

Nature. 2020 Aug; 584(7821): 398-402.

Page 25: Coronavirus Positivity Predicting Bat

● Professor Maria Cristina Rulli, Politecnico di Milano● Professor Paolo D’Odorico, University of California, Berkeley● Dr. Amanda Adams, Bat Conservation International● Dr. Natasha Spottiswoode, University of California, San Francisco● Authors of all the papers that we used in this Capstone project

● Dr. Fred Nugen, University of California, Berkeley● Dr. Alberto Todeschini, University of California, Berkeley● Our wonderful section mates● Our families● The bats

ACKNOWLEDGEMENTS

Page 26: Coronavirus Positivity Predicting Bat

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik

THANKS!http://bat-cov-positivity.org/home

Page 27: Coronavirus Positivity Predicting Bat

Main Features to Include

Weather- Mean Precipitation- Mean Temperature (squared)

● Using K-means clustering, put bat species into 2 clusters based on temperature and precipitation

● Monthly mean precipitation / temperature of species habitat:Low temperature and precipitation associated with higher bat coronavirus prevalence

Page 28: Coronavirus Positivity Predicting Bat

Contributions By Team Members

All of the team members contributed actively in the following areas of the project:

Development and refinement of conceptLiterature ReviewCoordination with ProfessorsData collectionData Cleaning and missing Data imputationVisualizationsMachine Learning AlgorithmsWebsite DesignWriting of Paper

Page 29: Coronavirus Positivity Predicting Bat

EDA of Selected Features in Final Merged Dataset

Page 30: Coronavirus Positivity Predicting Bat
Page 31: Coronavirus Positivity Predicting Bat

Change in Human Population Density

Human Population- Rate of Change in Human Population Density between 1990 and 1995

● “HuPopDen_Chg” shows the rate of change of human population density between 1990 and 1995

● Interestingly, a lower change in human density (between 1990 to 1995) tends to be associated with a higher bat coronavirus prevalence

Page 32: Coronavirus Positivity Predicting Bat

Land Use and Environment

Land Use and Environment- Land-Barren- Evergreen Broadleaf- Managed Vegetation- Crop Change

● Change in land use for cropland (grid cell, 1900-2000) and the proportion of area covered by barren land/ evergreen/ cultivate vegetation show significance

Page 33: Coronavirus Positivity Predicting Bat

Possible Future Studies (and Capstone Projects?)

● Using species distribution and land use data, predict potential intermediate hosts that may result in coronavirus spillover infections from bats to humans

● Choosing a specific bat related zoonosis with well mapped out index cases, aim to predict areas with a high likelihood of future cases