Skip to main content
SearchLoginLogin or Signup

Review 1: "Diagnostic Accuracy of Chest X-Ray Computer Aided Detection Software and Blood Biomarkers for Detection of Prevalent and Incident Tuberculosis in Household Contacts Followed up for 5 Years"

Reviewer critiques emphasize the need for clearer definitions of the study's gold standard for AUC measurements and questioning why blood biomarkers were compared given their limited predictive value.

Published onNov 07, 2024
Review 1: "Diagnostic Accuracy of Chest X-Ray Computer Aided Detection Software and Blood Biomarkers for Detection of Prevalent and Incident Tuberculosis in Household Contacts Followed up for 5 Years"
1 of 2
key-enterThis Pub is a Review of
Diagnostic accuracy of Chest X-Ray Computer Aided Detection software and blood biomarkers for detection of prevalent and incident tuberculosis in household contacts followed up for 5 years
Diagnostic accuracy of Chest X-Ray Computer Aided Detection software and blood biomarkers for detection of prevalent and incident tuberculosis in household contacts followed up for 5 years
Description

Abstract Background WHO Tuberculosis (TB) screening guidelines recommend computer-aided detection (CAD) software for chest radiograph (CXR) interpretation. However, studies evaluating their diagnostic and prognostic accuracy are limited.Methods We conducted a prospective cohort study of household TB contacts in South Africa. Participants all underwent baseline CXR and sputum investigation (routine [single spontaneous] and enhanced [additionally 2-3 induced] sputum investigation and passive and active follow-up for incident TB. CXR were processed comparing 3 CAD softwares (CAD4TBv7.0, qXRv3.0.0, and Lunit INSIGHT CXR 3.1.4.111). We evaluated their performance to detect routine and enhanced prevalent, and incident TB, comparing the performance to blood-based biomarkers (Xpert MTB host-response, Erythrocyte Sedimentation Rate, C-Reactive Protein, QuantiFERON) in a subgroup.Findings 483 participants were followed-up for 4.6 years (median). There were 23 prevalent (7 routinely diagnosed) and 38 incident TB cases. The AUC ROC to identify prevalent TB for CAD4TB, qXR and Lunit INSIGHT CXR were 0.87 (95% CI 0.77-0.96), 0.88 (95% CI 0.79-0.97) and 0.91 (95% CI 0.83-0.99) respectively. >30% with scores above recommended CAD thresholds who were bacteriologically negative on routine baseline sputum were subsequently diagnosed by enhanced baseline sputum investigation or during follow-up. The AUC performance of baseline CAD to identify incident cases ranged between 0.60-0.65. The diagnostic performance of CAD for prevalent TB was superior to blood-based biomarkers.Interpretation Our findings suggest that the potential of CAD-CXR screening for TB is not maximised as a high proportion of those above current thresholds but with a negative routine confirmatory sputum have true TB disease that may benefit intervention.Funding UKRI-MRCSummary We found that the diagnostic accuracy of CAD-CXR to identify prevalent TB cases in household TB contacts was high but >30% with scores above recommended CAD thresholds who were bacteriologically negative on routine testing baseline were subsequently diagnosed suggest that the potential of CAD-CXR screening is not maximised.

RR\ID Evidence Scale rating by reviewer:

  • Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

***************************************

Review: This is a prospective cohort study in which 483 participants in South Africa were followed up for a median of 4.6 years. Participants all underwent baseline chest xray (CXR) and sputum investigation (routine [single spontaneous] and enhanced [additionally 2-3 induced] sputum investigation and passive and active follow-up for incident TB. CXRs were processed comparing 3 CAD software systems and the study evaluated CAD performance to detect routine and enhanced prevalent, and incident TB, comparing the performance to blood-based biomarkers (Xpert MTB host-response, Erythrocyte Sedimentation Rate, C-Reactive Protein, QuantiFERON) in a subgroup.

There were 23 prevalent (7 routinely diagnosed) and 38 incident TB cases. The AUCs to identify prevalent TB were reported as 0.87 (95% CI 0.77-0.96), 0.88 (95% CI 0.79-0.97) and 0.91 (95% CI 0.83-0.99) respectively. The authors noted that >30% with scores above recommended CAD thresholds who were bacteriologically negative on routine baseline sputum were subsequently diagnosed by enhanced baseline sputum investigation or during follow-up. The AUC performance of baseline CAD to identify incident cases ranged between 0.60-0.65. The diagnostic performance of CAD for prevalent TB was superior to blood-based biomarkers - which, of course, was not surprising, and one wonders why this was even included in the study as there was little basis to assume that these blood markers would be of higher value than the other indicators.

This could be an important article that adds a useful dimension to the growing literature on CAD for TB.  However, it needs clarification in a number of ways in order to be more useful.  I had to read this over several times to understand what the gold standard was for the AUC evaluations for the findings listed above – presumably it was the “routine +enhanced microbiological ascertainment of TB”, but this should be clearer. If it is CAD compared to routine plus “enhanced baseline sputum investigation or during follow-up”, this would be quite problematic, as there are better explanations for why a household contact of someone with TB who does not have TB at one point in time, develops TB during the subsequent 5 years.  I conclude that incident cases were not included as the discussion states: “We show that routine confirmatory testing with single spontaneous Xpert MTB/RIF to diagnose PTB captures only 30% (7/23) of total prevalent (my emphasis) disease. CXR screening with CAD interpretation has excellent diagnostic performance to detect all those with prevalent disease (diagnosed routinely and with enhanced investigation) with no significant differences between the three software solutions we assessed, AUC 0.87-0.91”.   The fact that an AUC is given for incident cases is confusing – is this comparing the original CAD findings with microbiological testing that could be many years later?

In essence, what I take from this study is that the results of an Xpert routine evaluation is insufficient for detecting true prevalent TB, as many of the CAD positive/ Xpert negative findings on routine Xpert assessment turned out to be positive on enhanced sputum testing. This is as important, if not more important, than the CAD results, as no one would ever suggest that treatment be based on CAD results alone.  An implication may be that if the CAD result is positive, enhanced sputum testing may be warranted, rather than assuming that the TB detection on CAD was a false positive.

The authors note that those with CXR abnormalities without sputum confirmation are at high risk of disease progression, and that individuals with CXR changes suggestive of TB but with negative sputum bacteriology subsequently have a 10% risk per year to being diagnosed with bacteriologically positive TB disease.  The authors note that they analyzed several pre-specified subgroups “chosen because of known associations with TB risk, clinical presentation, and plausibility that they might affect CXR findings: people living with HIV (PLHIV), participants with a history of previous TB, and smokers”. One group not mentioned, which is a substantial group in South Africa, are silica-exposed workers (ex-miners in particular), as silica is known not only to increase lifetime risk of TB but also can mimic TB on CXR.   People with CXR suggestive of TB but with negative sputum may actually have silicosis. 

Comments on Figures:

Figure 2:  There are a total of 8 routine prevalent cases (blue dots) in the “Abnormal CXR consistent with TB” screening strategy.  Figure 1 states that there are only 7 routine prevalent cases.  There seems to be an error in the plot, which makes the sensitivity/specificity reported in the included table disagree. Please verify the plots in Figure 2.

Lines 242-245 & Figure 3’s Incident TB plot: The authors are using a baseline image to predict future TB.  Again, I am not sure why a baseline CXR would have any bearing on incident TB detected 1-3 years after the baseline image was taken.  With the relatively high levels of community TB in South Africa, it seems that additional exposures would be likely to confound this.

Again, though, perhaps this is also a non-result, in that we are verifying that a CXR cannot predict future TB.

Figure 4: It is unclear what the 4 TB outcome groups are in each boxplot.  Only 2 of the groups appear labeled, and I believe the “Incident” group is not labeled correctly.  Please correct the axis labels on Figure 4.

In conclusion, this may be an important study but it is presented in a way that is too confusing at the moment to be useful. If the authors are using this study to suggest that CAD is better than routine microbiological testing and could replace it, I would say that this is not established in this study and would be a dangerous implication. I suggest that some thought be put into the concerns raised above, and then this be resubmitted.