RR\ID Evidence Scale rating by reviewer:

Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.

Review:

Overall assessment:

Claims are not strongly supported, but may yield some insight by the data and methods used.

Decision-makers should consider the claims in this study actionable with limitations based on the methods and data.

The intended aim of this paper was an investigation into coupled dynamics of global influenza (sub)types through the framework of Compositional Data Analysis (CoDA). The need for (sub)type level surveillance and co-circulation were well-presented, and the (sub)types were well-chosen. The proposed methodology involved clear data preprocessing and clustering, forecasting models across these clusters and their evaluation. Efforts have been made to interpret the results, with a focus on model performance comparison across the different (sub)types. (Sub)type abundances and mixing trends were comparatively well-explained. In the discussion, a strong case was made for the importance of (sub)type composition in epidemic forecasting.

Overall, the evidence presented in this paper is undermined by a limited discussion on the utility and complexities of using CoDA, an overall weakly motivated and explained methodology with respect to the forecasting models and their evaluation, and a rough initial presentation of results alongside an attempted discussion. Further elaboration stands to strengthen the author’s argument towards CoDA and towards their findings. In its current form, the work still has room to improve the interpretation of their results and substantiation of the claims made. Further positioning the model results in context of the coupled dynamics is potentially warranted.

A breakdown of comments for the main sections of Results, Discussion and Methods can be found below.

Results:

Fig. 1 is not particularly informative – if this observation is actually also the main takeaway, it should be stressed.

Models M1-M4 appear to be basic models for comparison against the only higher-order model M5. In case there is a convincing rationale behind selecting M1-M5 it should be spelled out.

Meaning of accuracy needs to be worked out: what does “34%” mean? In the Discussion, these results are labeled ‘surprisingly accurate’, can this be substantiated?

Lacks interpretation of (the very small) reported bootstrap errors.

Influence of 10% as cutoQ for negligible cases on reported AUROC undiscussed – how robust is model to different cutoff percentages?

Discussion:

Lacks detailed evaluation of CoDA framework with regards to its limitations when applied to epidemiology.

Performance discrepancy in predicting dominance (Tab. 2) between the different subtypes not discussed.

Why was the model performance for A/H1N1 across Group I worse than random guess?

More solid substantiation required for how CoDA enabled models to capture the required information.

Materials & Methods: