Skip to main content
SearchLoginLogin or Signup

Review 1: "Estimating the Reproduction Number and Transmission Heterogeneity from the Size Distribution of Clusters of Identical Pathogen Sequences"

Reviewers find the proposed method to be novel and validated with synthetic and historical epidemic data. However, they expressed concerns about the uncertainty in quantifying the magnitude of the estimation bias and the validity of this method in the case of an outbreak.

Published onMar 13, 2024
Review 1: "Estimating the Reproduction Number and Transmission Heterogeneity from the Size Distribution of Clusters of Identical Pathogen Sequences"
1 of 2
key-enterThis Pub is a Review of
Estimating the reproduction number and transmission heterogeneity from the size distribution of clusters of identical pathogen sequences
Estimating the reproduction number and transmission heterogeneity from the size distribution of clusters of identical pathogen sequences
Description

Abstract Quantifying transmission intensity and heterogeneity is crucial to ascertain the threat posed by infectious diseases and inform the design of interventions. Methods that jointly estimate the reproduction number R and the dispersion parameter k have however mainly remained limited to the analysis of epidemiological clusters or contact tracing data, whose collection often proves difficult. Here, we show that clusters of identical sequences are imprinted by the pathogen offspring distribution, and we derive an analytical formula for the distribution of the size of these clusters. We develop and evaluate a novel inference framework to jointly estimate the reproduction number and the dispersion parameter from the size distribution of clusters of identical sequences. We then illustrate its application across a range of epidemiological situations. Finally, we develop a hypothesis testing framework relying on clusters of identical sequences to determine whether a given pathogen genetic subpopulation is associated with increased or reduced transmissibility. Our work provides new tools to estimate the reproduction number and transmission heterogeneity from pathogen sequences without building a phylogenetic tree, thus making it easily scalable to large pathogen genome datasets.Significance statement For many infectious diseases, a small fraction of individuals has been documented to disproportionately contribute to onward spread. Characterizing the extent of superspreading is a crucial step towards the implementation of efficient interventions. Despite its epidemiological relevance, it remains difficult to quantify transmission heterogeneity. Here, we present a novel inference framework harnessing the size of clusters of identical pathogen sequences to estimate the reproduction number and the dispersion parameter. We also show that the size of these clusters can be used to estimate the transmission advantage of a pathogen genetic variant. This work provides crucial new tools to better characterize the spread of pathogens and evaluate their control.

RR:C19 Evidence Scale rating by reviewer:

  • Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

***************************************

Review: In this study, the authors novelly propose the idea of approximating the epidemiological clusters with the clusters of identical sequences. Based on this approximation, the author developed the size distribution of the identical sequence clusters and thus a new tool of estimating transmission dynamics of infectious disease from pathogen genome data. This estimation method avoids the intensive computational burden of constructing phylogenetic trees and therefore can be easily scalable to large pathogen genome datasets. Additionally, the author proposed a method of determining the transmission advantage of a pathogen variant. 

Although the new method was validated with synthetic and historical epidemic data, there are still several concerns on its application in real-world situations.

  1. When estimating the transmission dynamics, the estimation bias was showed to be uncontrollable when the true Ro > 1/p, where p stands for the probability that a transmission event occurs before a substitution event. Since the true values of both Ro and p are unknown in real situations, it is difficult to accurately measure the estimation bias in practice. 
    Although the authors demonstrated the estimation bias was relatively small under several simulation datasets, the estimation bias needs to be more accurately quantified based on additional dataset. In particular, the basic reproduction Ro showed considerable variation in different areas, different times and different pathogen variants. Quantifying uncertainty on Ro is very important for public response and decision-making.

  2. When defining clusters of identical sequences, additional information is still needed. In the GISAID, there are identical sequences which were reported for different countries or months. It is obviously unrealistic to assume that these sequences come from the same epidemiological cluster.

  3. This method relies on the cluster size information, which is relatively accurate when the transmission ceases to circulate. For an on-going outbreak, however, cluster size information is not accurate and may therefore lead to biased estimation of transmission dynamics. In addition, the authors assumed that transmission dynamics were constant during the study period, ignoring possible time-varying variation, which may not provide a timely and accurate reference for the public response. 

This study highlights the importance of using pathogen genomic data to reveal the transmission potential of infectious pathogens. Over the past decade, we have collected extremely large genomic datasets containing a wealth of information about important infectious diseases such as HIV, Ebola, and COVID-19. More research efforts are still needed to unearth valuable information to guide our response to the next epidemic.

Connections
1 of 4
Comments
0
comment
No comments here
Why not start the discussion?