Skip to main content
SearchLoginLogin or Signup

Review 1: "Optimizing Predictive Models to Prioritize Viral Discovery in Zoonotic Reservoirs"

Reviewers find this a rigorous and well-supported approach to target future studies of betacoronaviruses in bats, and raise a few questions (about the model and the data used) for further clarification.

Published onSep 29, 2021
Review 1: "Optimizing Predictive Models to Prioritize Viral Discovery in Zoonotic Reservoirs"
1 of 2
key-enterThis Pub is a Review of
Optimizing predictive models to prioritize viral discovery in zoonotic reservoirs

AbstractDespite global investment in One Health disease surveillance, it remains difficult—and often very costly—to identify and monitor the wildlife reservoirs of novel zoonotic viruses. Statistical models can be used to guide sampling prioritization, but predictions from any given model may be highly uncertain; moreover, systematic model validation is rare, and the drivers of model performance are consequently under-documented. Here, we use bat hosts of betacoronaviruses as a case study for the data-driven process of comparing and validating predictive models of likely reservoir hosts. In the first quarter of 2020, we generated an ensemble of eight statistical models that predict host-virus associations and developed priority sampling recommendations for potential bat reservoirs and potential bridge hosts for SARS-CoV-2. Over more than a year, we tracked the discovery of 40 new bat hosts of betacoronaviruses, validated initial predictions, and dynamically updated our analytic pipeline. We find that ecological trait-based models perform extremely well at predicting these novel hosts, whereas network methods consistently perform roughly as well or worse than expected at random. These findings illustrate the importance of ensembling as a buffer against variation in model quality and highlight the value of including host ecology in predictive models. Our revised models show improved performance and predict over 400 bat species globally that could be undetected hosts of betacoronaviruses. Although 20 species of horseshoe bats (Rhinolophus spp.) are known to be the primary reservoir of SARS-like viruses, we find at least three-fourths of plausible betacoronavirus reservoirs in this bat genus might still be undetected. Our study is the first to demonstrate through systematic validation that machine learning models can help optimize wildlife sampling for undiscovered viruses and illustrates how such approaches are best implemented through a dynamic process of prediction, data collection, validation, and updating.

RR:C19 Evidence Scale rating by reviewer:

  • Strong. The main study claims are very well-justified by the data and analytic methods used. There is little room for doubt that the study produced has very similar results and conclusions as compared with the hypothetical ideal study. The study’s main claims should be considered conclusive and actionable without reservation.



The study of viruses in wildlife hosts is an essential part of understanding factors that lead to spillover and is an immense task. This valuable study uses machine learning models to predict which bat species are likely hosts of betacoronaviruses which can inform future bat sampling efforts and increase their efficacy. The research presented has many strengths that contribute to the robustness of the analysis and the validity of the results. The authors performed a thorough review of available databases and the literature to create their dataset. They use several modelling techniques which are evaluated and used to create an ensemble model. Most importantly, the authors validated their initial models which were created in early 2020 with data that was collected over the next year. Their models were then updated using the compiled dataset. The authors identified which models performed well and which did not as well as discussing the pros and cons of the different modelling approaches. The compiled data and code are also publicly available. The main limitation (which is not the fault of the authors) is that the data being used for the analyses are sparse, often collected in unstandardised ways, and often contain significant sampling bias. The authors addressed this by using several different modelling approaches as well as a method to correct for sampling bias. However, we currently do not have methods for assessing the effect of these biases on the results or how well they have been controlled. Overall, the manuscript thoroughly describes a robust analysis and provides important results that should be used to target future studies of betacoronaviruses in bats.

Vape Factory:!03192