**RR:C19 Evidence Scale**** rating by reviewer:**

**Potentially informative. **The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

***************************************

**Review:**

As suggested by the title, the objective of this paper is to describe a method of calculation of sample size to (i) detect a variant, and (ii) estimate its prevalence, when the sample is biased for various reasons such as heterogeneity of disease severity. The method is extended to the setup where genomic surveillance is an ongoing process. The paper addresses an important question, but the proposed method has many limitations as acknowledged by the authors. These assumptions are so strong that the practical application of the method could be drastically restricted. There are certain other concerns also that make the results of the study doubtful.

Among several assumptions that their method makes, the assumption of homogeneous and representative sampling is understandable. But the model used in this paper for calculation of sample size requires several parameters such as the proportion of the variant in the population, the sensitivity of the test for that variant, and the probability that the detected infection meets the quality threshold for genomic study. These parameters are difficult to estimate in most practical setups. When all these are known or can be estimated, the exercise simply reduces to their appropriate multiplication, called the “coefficient of detection” in the study. The paper states that only the ratio of variant coefficients is necessary for sample size calculations and not the raw values, but the rationale of the concerned equation in the paper is not fully explained. A full and more clear explanation would have helped the reader to understand and use their method. The purpose of this equation is to calculate actual prevalence from the observed prevalence but the example they give calculates observed prevalence from the actual prevalence. It is not clear why the observed prevalence is needed when the actual prevalence is known.

In addition to the concerns mentioned above, some other aspects are not fully explained. The paper gives a method to calculate sample size to detect ‘at least one’ case of the variant of concern (VoC) whereas the sample size should be for detecting the first case (and not at least one case) because the first case is enough to detect a variant. Secondly, the sample size formula for estimating the prevalence is a Gaussian approximation to binomial, whereas, for an extremely low prevalence of a variant in this setup, Poisson may be a better approximation. Although Poisson too approximates Gaussian for sufficiently large n but without this intermediary step the formula is less credible because of the approximations involved and the requirement of an extremely large sample.

Overall, the authors have raised an important issue regarding the sample size needed to detect a variant and to estimate its prevalence but the formulas they advocate need more clarity to be of practical utility.