RR:C19 Evidence Scale rating by reviewer:
Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.
***************************************
Review: The authors present a condensed version of a manuscript that outlines the key points of a study that investigated possible blood transcriptomic markers in a group of 60 individuals (48 Long COVID patients, 12 controls). Based on transcriptomic data they performed differential gene expression analysis, followed by group-wise comparisons and multivariable logistic regression. They proposed a 2-gene signature to classify their Long-COVID patients from those without Long-COVID. Moreover, they suggest some mechanistic insight by reporting an association of “viral load status” (a constructed binary label) with the number of comorbidities (it remains unclear though which ones/how many these entail) and the number of vaccine doses (unclear what vaccine was used).
While the overall study is of interest, there is little reference as to how this study fits in with existing literature. Also, important details regarding the analysis methods are missing which limits the interpretation of the results. Most importantly, there is no (external) validation of the proposed signature presented, and all of the available data were used to derive the presented performance metric.
Detailed comments:
The introduction is lacking a brief summary of the current literature on the topic of long COVID biomarkers. The driving hypothesis of the study is not clearly presented.
A reference to the WHO criteria used for diagnosis would be helpful.
A lot of quantitative information is missing in the methods – how many patients were included (based on which criteria?), what were median (full/range follow-up times)? Where was the study performed and (how) was it approved?
How were the controls recruited? (assumingly these came to the practice for reasons other than long COVID?)
How was the acute COVID-19 infection confirmed at the time?
“Total blood viral load was determined as the sum of all individual SARSCoV-2 transcripts.” – a list of these should be provided in the supplementary material.
What was the ROC curve analysis assessing, i.e. what were the relevant input variables here? While throughout important factors such as severity, age, time since acute infection, are mentioned, it is unclear when these were considered as confounding variables and where this was not the case. It was previously suggested that acute COVID disease severity, as well as BMI were associated with higher risk for LongCOVID. The authors should ensure to account for these confounders in their multivariate logistic regression. Age and sex alone may be insufficient. It was also unclear what variables were used in the specific analysis (i.e. just the proposed two genes or more than these?). Why were these 2 genes selected for the ROC analysis to begin with, as they were not the top associated genes (only FYN)?
It is not clear how the “cutoffs” were chosen for classification tasks (e.g. high vs low viral load groups).
Why is FYN not reported in Figure 2A?
Throughout it is unclear for which tests the authors performed multiple testing corrections and for which ones they did not – a stringent multiple testing correction is essential in the setting of omics analysis. E.g. Figure 1C? What about other pathways (it seems likely that the authors assessed more than these two pathways)?
As the authors point out: the study lacks additional validation. In particular the inclusion of all data for model fitting (i.e. ROC AUC calculation) – it would have made the study stronger to show results based on a cross-validation procedure (e.g. leave-one-out-cross-validation), particularly given the lack of independent validation.
There is very limited discussion of the findings in the context of the literature.