Skip to main content
SearchLoginLogin or Signup

Review 1: "Sample Size Calculations for Variant Surveillance in the Presence of Biological and Systematic Biases"

Published onApr 25, 2023
Review 1: "Sample Size Calculations for Variant Surveillance in the Presence of Biological and Systematic Biases"
1 of 2
key-enterThis Pub is a Review of
Sample Size Calculations for Variant Surveillance in the Presence of Biological and Systematic Biases
Sample Size Calculations for Variant Surveillance in the Presence of Biological and Systematic Biases

SUMMARY As demonstrated during the SARS-CoV-2 pandemic, detecting and tracking the emergence and spread of pathogen variants is an important component of monitoring infectious disease outbreaks. Pathogen genome sequencing has emerged as the primary tool for variant characterization, so it is important to consider the number of sequences needed when designing surveillance programs or studies, both to ensure accurate conclusions and to optimize use of limited resources. However, current approaches to calculating sample size for variant monitoring often do not account for the biological and logistical processes that can bias which infections are detected and which samples are ultimately selected for sequencing. In this manuscript, we introduce a framework that models the full process— including potential sources of bias—from infection detection to variant characterization, and we demonstrate how to use this framework to calculate appropriate sample sizes for sequencing-based surveillance studies. We consider both cross-sectional and continuous sampling, and we have implemented our method in a publicly available tool that allows users to estimate necessary sample sizes given a specific aim (e.g., variant detection or measuring variant prevalence) and sampling method. Our framework is designed to be easy to use, while also flexible enough to be adapted to other pathogens and surveillance scenarios.

RR:C19 Evidence Scale rating by reviewer:

  • Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.



As suggested by the title, the objective of this paper is to describe a method of calculation of sample size to (i) detect a variant, and (ii) estimate its prevalence, when the sample is biased for various reasons such as heterogeneity of disease severity. The method is extended to the setup where genomic surveillance is an ongoing process. The paper addresses an important question, but the proposed method has many limitations as acknowledged by the authors. These assumptions are so strong that the practical application of the method could be drastically restricted. There are certain other concerns also that make the results of the study doubtful.

Among several assumptions that their method makes, the assumption of homogeneous and representative sampling is understandable. But the model used in this paper for calculation of sample size requires several parameters such as the proportion of the variant in the population, the sensitivity of the test for that variant, and the probability that the detected infection meets the quality threshold for genomic study. These parameters are difficult to estimate in most practical setups. When all these are known or can be estimated, the exercise simply reduces to their appropriate multiplication, called the “coefficient of detection” in the study. The paper states that only the ratio of variant coefficients is necessary for sample size calculations and not the raw values, but the rationale of the concerned equation in the paper is not fully explained. A full and more clear explanation would have helped the reader to understand and use their method. The purpose of this equation is to calculate actual prevalence from the observed prevalence but the example they give calculates observed prevalence from the actual prevalence. It is not clear why the observed prevalence is needed when the actual prevalence is known.

In addition to the concerns mentioned above, some other aspects are not fully explained. The paper gives a method to calculate sample size to detect ‘at least one’ case of the variant of concern (VoC) whereas the sample size should be for detecting the first case (and not at least one case) because the first case is enough to detect a variant. Secondly, the sample size formula for estimating the prevalence is a Gaussian approximation to binomial, whereas, for an extremely low prevalence of a variant in this setup, Poisson may be a better approximation. Although Poisson too approximates Gaussian for sufficiently large n but without this intermediary step the formula is less credible because of the approximations involved and the requirement of an extremely large sample.

Overall, the authors have raised an important issue regarding the sample size needed to detect a variant and to estimate its prevalence but the formulas they advocate need more clarity to be of practical utility.

No comments here
Why not start the discussion?