Skip to main content
SearchLoginLogin or Signup

Review 2: "Parallel Trends in an Unparalleled Pandemic: Difference-in-Differences for Infectious Disease Policy Evaluation"

Reviewers commend the study for addressing the limitations of standard DiD methods and proposing robust alternatives. They suggest further elaboration on the differences between the new methods and traditional estimates, as well as comparisons to other modeling approaches.

Published onMay 20, 2024
Review 2: "Parallel Trends in an Unparalleled Pandemic: Difference-in-Differences for Infectious Disease Policy Evaluation"
1 of 2
key-enterThis Pub is a Review of
Parallel Trends in an Unparalleled Pandemic Difference-in-differences for infectious disease policy evaluation
Parallel Trends in an Unparalleled Pandemic Difference-in-differences for infectious disease policy evaluation

Researchers frequently employ difference-in-differences (DiD) to study the impact of public health interventions on infectious disease outcomes. DiD assumes that treatment and non-experimental comparison groups would have moved in parallel in expectation, absent the intervention (“parallel trends assumption”). However, the plausibility of parallel trends assumption in the context of infectious disease transmission is not well-understood. Our work bridges this gap by formalizing epidemiological assumptions required for common DiD specifications, positing an underlying Susceptible-Infectious-Recovered (SIR) data-generating process. We demonstrate that popular specifications can encode strict epidemiological assumptions. For example, DiD modeling incident case numbers or rates as outcomes will produce biased treatment effect estimates unless untreated potential outcomes for treatment and comparison groups come from a data-generating process with the same initial infection and equal transmission rates at each time step. Applying a log transformation or modeling log growth allows for different initial infection rates under an “infinite susceptible population” assumption, but invokes conditions on transmission parameters. We then propose alternative DiD specifications based on epidemiological parameters – the effective reproduction number and the effective contact rate – that are both more robust to differences between treatment and comparison groups and can be extended to complex transmission dynamics. With minimal power difference incidence and log incidence models, we recommend a default of the more robust log specification. Our alternative specifications have lower power than incidence or log incidence models, but have higher power than log growth models. We illustrate implications of our work by re-analyzing published studies of COVID-19 mask policies.Significance Statement Difference-in-differences is a popular observational study design for policy evaluation. However, it may not perform well when modeling infectious disease outcomes. Although many COVID-19 DiD studies in the medical literature have used incident case numbers or rates as the outcome variable, we demonstrate that this and other common model specifications may encode strict epidemiological assumptions as a result of non-linear infectious disease transmission. We unpack the assumptions embedded in popular DiD specifications assuming a Susceptible-Infected-Recovered data-generating process and propose more robust alternatives, modeling the effective reproduction number and effective contact rate.

RR:C19 Evidence Scale rating by reviewer:

  • Reliable. The main study claims are generally justified by its methods and data. The results and conclusions are likely to be similar to the hypothetical ideal study. There are some minor caveats or limitations, but they would/do not change the major claims of the study. The study provides sufficient strength of evidence on its own that its main claims should be considered actionable, with some room for future revision.


Review: Feng and Bilinski [1] address a crucial question for both retrospective studies of COVID-19 policies during the pandemic period and future infectious disease outbreak studies: are difference-in-differences  (DiD) an appropriate method for causal inference? The authors here focus on the parallel trends assumption needed for DiD and investigate several model specifications: incidence, log-incidence, log-growth, log-effective reproduction rate, and log-effective contact rate. For each, they find the identifying conditions on the susceptible-infectious-recovered (SIR) model parameters that would yield parallel trends. They note that the popular linear specification on incidence is unlikely to hold in most settings, and that log specifications are more robust to between-unit differences.

These results are convincing and provide both formal mathematical justifications of the suggested models as well as intuitive understanding of the parallel trends claim in infectious disease contexts. This provides a useful word of caution to fitting such models without careful consideration of the functional form, deepening existing discussion of their value, but it also helpfully provides useful alternatives and their assumptions.

One limitation of the present manuscript is its focus on the parallel trends assumption to the near-exclusion of the other assumptions necessary for valid estimation and inference from DiD models. This includes no-spillover and no-anticipation assumptions, which are both non-trivial for the masking policies considered in the analysis examples and infectious disease interventions more broadly. The issue of targeted estimands is given somewhat short shrift here as well, despite the fact that infectious disease interventions are often highly contingent on time and location. Issues of appropriate time period durations are raised briefly, but more consideration is needed given that time scales need to be sufficient to account for reporting delays and to avoid unusual data artifacts, and the methods that rely on generation interval-based time periods are not fully justified in practice. Naturally, these cannot all be covered in depth in a single article, and so these results and considerations should be taken in concert with work that highlights those other considerations, as in [2–6], among other work both cited by these authors and beyond. But it is hard to fully rely on this work to choose an outcome and functional form without addressing those other considerations.

On the parallel trends question, specifically, the results compellingly compare the proposed models to one another, justifying the advice Feng and Bilinski give on selecting among them. This would be more fully justified, however, if there were also comparisons to other estimation approaches beyond DiD. The broader implication for infectious disease research depends, for example, on how these approaches compare to covariate adjustment or full parameteric modeling. In particular, given the assumptions and transformation to model parameters needed for their proposed specifications using reproduction or contact rate, comparison to a fully parameterized SIR model that uses maximum likelihood to estimate the treatment effect would be highly compelling. Simulations where the data are generated by a more complex process than the one used to derive the results would also show how sensitive or robust these methods are to some misspecification and lend credence to the conclusions.

All in all, this manuscript provides well-substantiated evidence of the author’s proposed parallel trends conditions and the value of the new specifications they provide. While additional simulations and comparisons would provide a greater understanding of when to use these methods, and the paper must of course be read in concert with additional literature on the role of quasi-experimental studies in outbreaks, this provides meaningful and actionable results for investigators and for policy-makers interpreting evidence based on such studies.

  1. Feng S, Bilinski A. 2024. Parallel trends in an unparalleled pandemic: Difference-in-differences for infectious disease policy evaluation. medRxiv Preprint;

  2. Lopez Bernal JA, Andrews N, Amirthalingam G. The use of quasi-experimental designs for vaccine evaluation. Clin Infect Dis. 2019;68(10):1769-1776.

  3. Goodman-Bacon A, Marcus J. Using difference-in-differences to identify causal effects of COVID-19 policies. Surv Res Methods. 2020;14(2):153-158.

  4. Haber NA, Clarke-Deelder E, Salomon JA, Feller A, Stuart EA. Impact evaluation of coronavirus disease 2019 policy: A guide to common design issues. Am J Epidemiol. 2021;190(11):2474-2486.

  5. Callaway B, Li T. Policy evaluation during a pandemic. J Econom. 2023;236(1):105454.

  6. Kennedy-Shaffer L. Quasi-experimental methods for pharmacoepidemiology: Difference-in-differences and synthetic control methods with case studies for vaccine evaluation. Am J Epidemiol. 2024: ePub ahead of print.

No comments here
Why not start the discussion?