Skip to main content
SearchLoginLogin or Signup

Review 2: "Accelerating Cough-Based Algorithms for Pulmonary Tuberculosis Screening: Results from the CODA TB DREAM Challenge"

Overall, the study was recognized as a valuable contribution with potential benefits for TB patients, healthcare providers, and healthcare systems.

Published onAug 08, 2024
Review 2: "Accelerating Cough-Based Algorithms for Pulmonary Tuberculosis Screening: Results from the CODA TB DREAM Challenge"
1 of 2
key-enterThis Pub is a Review of
Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the CODA TB DREAM Challenge
Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the CODA TB DREAM Challenge
Description

Abstract Importance Open-access data challenges have the potential to accelerate innovation in artificial-intelligence (AI)-based tools for global health. A specimen-free rapid triage method for TB is a global health priority.Objective To develop and validate cough sound-based AI algorithms for tuberculosis (TB) through the Cough Diagnostic Algorithm for Tuberculosis (CODA TB) DREAM challenge.Design In this diagnostic study, participating teams were provided cough-sound and clinical and demographic data. They were asked to develop AI models over a four-month period, and then submit the algorithms for independent validation.Setting Data was collected using smartphones from outpatient clinics in India, Madagascar, the Philippines, South Africa, Tanzania, Uganda, and Vietnam.Participants We included data from 2,143 adults who were consecutively enrolled with at least two weeks of cough. Data were randomly split evenly into training and test partitions.Exposures Standard TB evaluation was completed, including Xpert MTB/RIF Ultra and culture. At least three solicited coughs were recorded using the Hyfe Research app.Main Outcomes and Measures We invited teams to develop models using 1) cough sound features only and/or 2) cough sound features with routinely available clinical data to classify microbiologically confirmed TB disease. Models were ranked by area under the receiver operating characteristic curve (AUROC) and partial AUROC (pAUROC) to achieve at least 80% sensitivity and 60% specificity.Results Eleven cough models were submitted, as well as six cough-plus-clinical models. AUROCs for cough models ranged from 0.69-0.74, and the highest performing model achieved 55.5% specificity (95% CI 47.7-64.2) at 80% sensitivity. The addition of clinical data improved AUROCs (range 0.78-0.83), five of the six submitted models reached the target pAUROC, and highest performing model had 73.8% (95% CI 60.8-80.0) specificity at 80% sensitivity. In post-challenge subgroup analyses, AUROCs varied by country, and was higher among males and HIV-negative individuals. The probability of TB classification correlated with Xpert Ultra semi-quantitative levels.Conclusions and Relevance In a short period, new and independently validated cough-based TB algorithms were developed through an open-source and transparent process. Open-access data challenges can rapidly advance and improve AI-based tools for global health.Key Points Question Can an open-access data challenge support the rapid development of cough-based artificial intelligence (AI) algorithms to screen for tuberculosis (TB)?Findings In this diagnostic study, teams were provided well-characterized cough sound data from seven countries, and developed and submitted AI models for independent validation. Multiple models that combined clinical and cough data achieved the target accuracy of at least 80% sensitivity and 60% specificity to classify microbiologically-confirmed TB.Meaning Cough-based AI models have promise to support point-of-care TB screening, and open-access data challenges can accelerate the development of AI-based tools for global health.

RR:C19 Evidence Scale rating by reviewer:

  • Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

***************************************

Review: The authors' major claims are to develop models that detect coughs from TB patients using only acoustic coughs and coughs with clinical and demographic information. They have collected a novel dataset from 7 countries struggling with TB.

Undoubtedly this is an impactful attempt to serve TB patients, caregivers, and the system toward developing a smartphone-based screening tool from acoustic sensing. I highly appreciate authors’ hard work and effort to conduct a study across 7 countries, which is extremely difficult and challenging. So, they deserve credit for it. Yet, there are a few pointers that are not quite clear in this manuscript.

  • What modeling scheme has been considered, i.e., unary or binary?

  • If not unary, what are non-cough classes used in the models?

  • While it seems the major focus was on TB coughs, how would the model behave with other types of coughs, e.g., COVID-19?

  • It seems the training data is relatively small, e.g., n=1,105 patients * 3 coughs/patient * 0.5 sec/cough ~ 30 minutes. With so many diversities in it, it’s probably not enough to develop a reasonably good model.

  • There should be a detailed description of datasets, including sample counts across different subgroups to help better assess the findings in figures.  

  • Why does performance vary across sex and models? Similarly, why does performance vary across countries and HIV?

  • It seems the sensitivity and specificity baselines set by WHO need to be met together, not one of them. In that case, none of the models, even the 2nd model with cough and clinical data, does meet the criteria. The cough and clinical data-driven model achieves specificity lower than 60% at >=90% sensitivity.

  • If the focus is to conduct such a study, it needs to be clearly articulated than presenting models as major novelty or innovation since the outsourced models have more limitations than the study.

Based on the above-mentioned pointers, it’s hard to comment on the claim vs. results/findings.

Comments
0
comment
No comments here
Why not start the discussion?