RR:C19 Evidence Scale rating by reviewer:
***************************************
Review:
1. RR:C19 Strength of Evidence Claims are very well supported by the data and methods used. Decision makers should consider the claims in this study actionable with limitations, as described by the authors.
2. Comments At a global scale, Becker et al.’s ensemble predictions of likely bat hosts of betacoronavirus seem like the best available. They also provide a means of updating those predictions with new data and several caveats to their use. Because they found that individual model predictions varied substantially, they advise against relying too heavily on any one model of host-pathogen associations to prioritize sampling efforts. Instead, their work supports the use of hybrid models, ensembling predictions across models, or both. They offer a new metric of individual model performance to inform decisions about which models to use and/or to weight model predictions when ensembling. Finally, they demonstrate the value of host ecological traits in predictive models of host-pathogen interactions. The manuscript confirms previous work in trait-based disease ecology and offers some important lessons about relying too heavily on any one model to prioritize reservoir host surveillance and discovery. The evidence, methods, and arguments support advancement of Covid-19 understanding within society. The work is well positioned within current literature and understanding, and the authors discuss several important limitations of their conclusions and recommendations. Their presentation of recommended actions, in terms of future modelling efforts and the use of model predictions, is clear and well structured. I would recommend this manuscript for publication. The only thing that I was a bit unsure about was the difference between the correct in-sample predictions of the Network-1 model (20/21 or 95%) and its performance based on AUC-TPTSC score, which was poor. The conclusion seems to be that its success rate was just a random fluke, but the difference also made me wonder about the test data that was used to calculate the AUC-TPTSC scores. Can the authors confirm that the test data used for the network models was in-sample only (i.e., in-sample for the network models and both in- and out-of-sample for the trait-based models)? And perhaps add this information to the methods? Otherwise, the network models would be penalized for observations they couldn’t possibly predict in their in their AUC-TPTSC scores. As a reader interested in trait-based models, I’d also appreciate a few additional details about those models. For example, did all of the trait-based models use the same set of traits? Could the authors describe those traits in the text (e.g., in the aggregate, like, types of traits included) or in a table (or perhaps reprint the list /description from Han et al. (2016), if this work used the same traits)? In the trait-based models, were there specific types of traits that seemed to be especially informative (e.g., foraging vs. distribution vs. life history)? Equity in the context of disease surveillance is largely beyond the scope of this paper. However, relative to potential risk (in terms of likely reservoir host hotspots), the discussion does seem to focus, just slightly disproportionately, on North America (e.g., in reference to One Health and the potential for ‘spillback’). The authors might expand this section to include examples from sub-Saharan Africa and/or South East Asia, so the need for more surveillance in what are comparatively low-risk (and presumably already well-sampled) regions isn’t implied as strongly. They might also briefly consider equity in the context of surveillance, for example, by suggesting that their predictions be used alongside global data on the social determinants of health to prioritize surveillance in regions where both spillover and subsequent human-human transmission may be especially likely, or where the impact of outbreaks, when they do occur, is disproportionately high. For example, in predicted reservoir host hotspots that also have high levels of poverty/inequality, poor infrastructure, low surveillance capacity, high population density/mobility, or are experiencing conflict or political instability.