Skip to main content
SearchLoginLogin or Signup

Review 2: "Variants in SARS-CoV-2 Associated with Mild or Severe Outcome"

This preprint reports viral variants can improve classification of COVID-19 outcomes as compared with models using only age and region, with some individual variants associated with disease severity. Reviewers suggest major revisions to improve and clarify data analysis.

Published onJul 05, 2021
Review 2: "Variants in SARS-CoV-2 Associated with Mild or Severe Outcome"
1 of 2
key-enterThis Pub is a Review of
Variants in SARS-CoV-2 Associated with Mild or Severe Outcome

AbstractIntroductionThe coronavirus disease 2019 (COVID-19) pandemic is a global public health emergency causing a disparate burden of death and disability around the world. The molecular characteristics of the virus that predict better or worse outcome are largely still being discovered.MethodsWe downloaded 155,958 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from GISAID and evaluated whether variants improved prediction of reported severity beyond age and region. We also evaluated specific variants to determine the magnitude of association with severity and the frequency of these variants among the genomes.ResultsLogistic regression models that included viral genomic variants outperformed other models (AUC=0.91 as compared with 0.68 for age and gender alone; p<0.001). Among individual variants, we found 17 single nucleotide variants in SARS-CoV-2 have more than two-fold greater odds of being associated with higher severity and 67 variants associated with ≤ 0.5 times the odds of severity. The median frequency of associated variants was 0.15% (interquartile range 0.09%-0.45%). Altogether 85% of genomes had at least one variant associated with patient outcome.ConclusionNumerous SARS-CoV-2 variants have two-fold or greater association with odds of mild or severe outcome and collectively, these variants are common. In addition to comprehensive mitigation efforts, public health measures should be prioritized to control the more severe manifestations of COVID-19 and the transmission chains linked to these severe cases.

RR:C19 Evidence Scale rating by reviewer:

  • Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.



In this manuscript, the authors attempted to identify SRAS-CoV-2 mutations associated with severe outcomes by analyzing SARS-CoV-2 sequences deposited in GISAID during January and October 2020. This is important to understand the variabilities in disease outcomes among COVID-19 patients. The authors identified certain mutations that were associated with severe cases. However, the results should be corrected based on patients’ ages, gender, and associated comorbidities (if available).  In other words, how can the authors be sure that the severity of symptoms is due to viral mutations and not due to other host-related factors such as age, gender, and comorbidities? Therefore, the 2,870 severe cases should be further stratified based on other risk determining factors (age and underlying comorbidities). Analysis should compare between age- and gender-matched mild and severe cases to exclude the effect of host-related factors. Also, severe cases should be sub-grouped based on their comorbidities.

Overall, the focus of the study is clear but extensive editing of English language and style is required particularly for the introduction and results. The approach followed for sample inclusion/exclusion and sequences and mutations analysis is reasonable. The results, on the other hand, are not sufficiently described. More details are needed to confirm the conclusion(s), and further, (yet simple) analysis may need to be conducted. Detailed information regarding mutation types (synonymous vs non-synonymous), their specific genomic position (gene), and their prevalence (severe vs mild cases) should be added to the results section. The genomic position of severity-associated mutations can explain their possible effect. Moreover, clades and mutations used for prediction analysis should be indicated clearly in the text.

The CDC has already identified clades of concern (variants of concern) based on several factors including the increased transmissibility and pathogenicity. Therefore, it is necessary to identify and compare the prevalence of these clades between severe and mild cases before proceeding to prediction analysis. Then, a similar prediction analysis approach should be repeated using one variant/clade per analysis to confirm their findings. This will further help to identify clades and/or variants associated with severe cases. The discussion is generally well-written and explains their findings in light of published data. The authors are also aware of the limitations of their study.

Also, there some minor points that the authors should address including:

-The number of sequences indicated in the abstract is misleading. The authors should indicate the actual number of sequences analyzed (n= 3,637 sequences).

-Line 217: More relevant references/examples could be mentioned here. Similar scenarios were seen in other RNA viruses (following influenza 1968 H3N2 and 2009 H1N1 pandemics).

-Some figures (figure 1A and 1B) are not cited in the text.

-Line 186: Can you clarify why this number of variants (4499) was selected for the model testing??

-Authors mentioned that they have used SnpEff for annotating mutations. They could have further utilized this tool to predict the impact of these mutations.

No comments here
Why not start the discussion?