RR:C19 Evidence Scale rating by reviewer:
Reliable. The main study claims are generally justified by its methods and data. The results and conclusions are likely to be similar to the hypothetical ideal study. There are some minor caveats or limitations, but they would/do not change the major claims of the study. The study provides sufficient strength of evidence on its own that its main claims should be considered actionable, with some room for future revision.
This paper is an overall strong analysis linking NPIs that were put into place in different countries and reductions in transmission observed in each, towards quantifying how effective each NPI was. With any analysis like this, one of the first limitations that comes to mind is that different models can allocate efficacy to different NPIs in wildly different ways, in no small part, because there are not very many data points to fit to (each NPI is essentially only used once, and a multitude of other things may be going on, while the models are ultimately fit to case data over time). The authors have done a good job accounting for this by comparing their model/results with other similar studies.
My primary comments had to do with some of the assumptions made in the modelling, and some of the potential limitations that I think need to be discussed more than they are currently.
Because the models were fit during the first wave of the pandemic, there are significant differences in testing between countries and over time. The authors have dealt with this to some degree by modelling new cases as a function of new infections, but I believe there is a spatiotemporal component to changes in the relationship between actual infections, observable infections, and cases that is not currently accounted for in the model. Because there is a general trend in the order that countries implemented NPIs, it is worth discussing how this could influence the apparent efficacy of earlier or later NPIs.
Beyond this, was the relationship between cases and new infections fit in a country-specific way? This could be important with the differences in testing, and with potential differences in diagnosis. This is alluded to in lines 277-279, but the authors simply state “we can expect these effects cancel out.” I think this needs more explanation.
Was there any sensitivity analysis done on the choice of having NPI effectiveness increase over 3 days as compared to some longer or shorter period?
I wonder about country-specific differences in NPI manifestation, and how much specific countries like the USA are biasing results. I mention the USA because I know that NPI uptake was very spatiotemporally heterogeneous across the country, with certain regions undergoing stay-at-home measures (either officially or informally) much earlier than others. One way to account for this would be to leave a country out across multiple versions of the model fit, and observing how the results vary when different countries are left out.