RR:C19 Evidence Scale rating by reviewer:
Reliable: The main study claims are generally justified by its methods and data. The results and conclusions are likely to be similar to the hypothetical ideal study. There are some minor caveats or limitations, but they would/do not change the major claims of the study. The study provides sufficient strength of evidence on its own that its main claims should be considered actionable, with some room for future revision.
We reviewed “Improved COVID-19 Serology Test Performance by Integrating Multiple Lateral Flow Assays using Machine Learning” by Mowery et al. Patient testing has been one of the greatest problems faced by healthcare systems worldwide during the COVID-19 pandemic. Several emergent difficulties include cost, reliability, time-to-result, scalability, availability of reagents, specialised laboratory equipment, and personnel. The requirement for reliable anti-SARS-CoV-2 antibody detection to provide important epidemiological information to model viral spread and inform non-pharmaceutical interventions, including physical distancing, contact tracing, and population lockdown exit strategies, is becoming increasingly urgent.
Mowery et al. re-analyse existing data to present a proof-of-principle machine learning framework that they suggest may be taken forward to inform the pairing of lateral flow assays (LFAs) i.e., rapid serology tests to achieve superior classification performance. LFAs represent a cheap and scalable method of widespread testing for anti-SARS-CoV-2 antibodies. However, published results of LFAs used in the context of COVID-19 have been mixed, and Mowery et al. hypothesise that testing with pairs of SARS-CoV-2 LFAs may classify specimen serostatus more accurately than single LFAs would.
This study is somewhat limited by small sample size (true positive rate is calculated from 79 specimens and false-positive ate estimated against 31 specimens) and is based on re-analysed data described in a manuscript by Whitman et al. (which was initially posted as a pre-print on 7 May 2020 and has not yet been published by a peer-review journal). Furthermore, no explanation of the selection of immunoassays for inclusion is given, which introduces a potential selection bias. Additionally, the 10 commercial SARS-CoV-2 immunoassays in the original study were developed early in the pandemic, with some poor-performing assays included. Mowery et al. report that assays that perform well individually, perform marginally better in combination (82% combined vs. 79% & 79% individually). In comparison, lower performers alone can be paired to achieve significant performance gains over each LFA (78% combined vs. 61% & 61% individually). As such, the use of a single better-performing LFA outperforms some paired LFAs; given the rapid progress being made in LFA development and improvement, the inclusion of poor-performing assays in an already small dataset is somewhat redundant. Another omission in this manuscript is an explanation of the clinical cohort from which the original dataset was derived – no details are given regarding age, ethnicity, comorbidities, etc.
The absence of a gold-standard SARS-CoV-2 immunoassay (or antigen test, for that matter) is the most obvious significant concern when comparing diagnostic assays or combinations thereof. The authors acknowledge this and use RT-PCR to classify positive and negative specimens while subsetting to specimens collected 10 or more days after symptom onset. However, 10 days may not be enough elapsed time to select for seropositivity accurately, and we would suggest a minimum of 21 days between symptom onset and testing.
An advantage of a machine learning classifier, as in this manuscript, is its ability to account for geographic variability, allowing for tuning the target false-positive rates according to the population in which the LFAs are being deployed.
As a minor note, the references do not appear in the text in numerical order.
Mowery et al. demonstrate a proof-of-concept machine-learning approach that combines the information of semi-quantitative readouts from both IgM and IgG tests to control the false positive rate at a targeted level while achieving higher true positive rates than individual LFAs. In short: merging IgM and IgG results is good, but combining LFAs with machine learning is better. This represents novel work that allows other researchers to apply machine learning to larger datasets and progress global efforts toward reliable and scalable testing for anti-SARS-CoV-2 antibodies. However, the authors do not discuss the limitations of their study in sufficient detail. We feel it would be improved by a larger sample of well-described individuals for whom at least 21 days should have elapsed between symptom onset and testing to ensure accurate testing of seropositivity. Though LFAs for anti-SARS-CoV-2 antibody detection is rapidly improving, and the performance of excellent LFAs is only very marginally improved by pairing with another LFA, this framework provides an alternative deployment strategy that is of interest. We look forward to seeing the published manuscript.