RR:C19 Evidence Scale rating by reviewer:
Potentially informative. The main claims made are not strongly justified by the methods and data, but may yield some insight. The results and conclusions of the study may resemble those from the hypothetical ideal study, but there is substantial room for doubt. Decision-makers should consider this evidence only with a thorough understanding of its weaknesses, alongside other evidence and theory. Decision-makers should not consider this actionable, unless the weaknesses are clearly understood and there is other theory and evidence to further support it.
***************************************
Review:
Sueki and Ueda set out to examine whether suicidal ideation levels among the general population of Japan changed owing to the COVID-19 pandemic by tracking individuals using an internet survey method between January and April 2020. The contemporaneous national statistical reports showing reductions in suicides in the same population add to the importance of using such questionnaire survey data to understand such effects in the living. Suicide rates have so often been seen to fall in war time periods with a return to pre-conflict suicide rates post-conflict. This topic is important. Strengths of this study are that they collected data prospectively before COVID-19 cases began to appear in the population and afterwards; furthermore, they had control over the design and measurement methods (using a validated questionnaire that they administered to participants), in contrast to the widespread growing reliance on analyzing secondary data usually collected for different purposes. Understanding and as far as possible addressing limitations of the study in terms of study design, population representativeness, and the validity of their measure of suicidal ideation, could lead to valuable improvements to the paper. This single author review concentrates on these areas within the reviewer’s expertise; other important areas of expertise that might complement the views expressed here are referred to and include statistical modelling and survey design.
The choice of the topic of suicidality rather than of commoner mental health symptoms is noteworthy and begs further consideration of the context of the study. Suicidality may be a sign of more severe mental illness, but not everyone who ends their life deliberately is mentally ill. Suicidal thoughts are not a requirement of the DSM or WHO-ICD classification criteria for depression or for anxiety. But the presence of such symptoms may be more pathognomonic/specific to clinically significant depression. In contrast, a high proportion of people who end their own lives do not have a history of identified or treated mental illness. The authors mention that some other information was collected including on clinical treatment. Was anything else of relevance measured such as a general or specific mental health questionnaire? It would be essential that this be declared and if available be brought into the analysis (as part of secondary analyses).
The representativeness of the population sample is largely unknown. As this is a study of a possible association that may be less of a problem than in a study estimating for example rates or proportions. However, the possibility always remains that the study participants belong to a population subgroup that differs from others (stratified medicine is based on such subgroup differences in associations). For example non-representativeness is raised by the extremely low response and participation rate initially, as in any internet survey sample. There is no way of knowing if participants differed from non-participants. At the analysis stage weights could be calculated and incorporated, taking account of response levels by gender and key age group and income subgroups based on differences between the sample and the census of Japan (adults). Analyses could take these survey weights into account. It would not tell us if the participants are atypical but it would mean that the precision of the regression model estimates takes account of under or over sampling with reference to subgroups of the population of Japan. For example if the findings are over-reliant on older female participants it would reduce spuriously precise estimates for others who are poorly represented (e.g. younger males). This method is available within the statistical software the authors used. Advice on survey methods should be sought in considering this. Such advice is available in survey teams working in national census bureaus and in specialist survey provider organizations both outside of and sometimes within academic population research centers.
In regard to the validity of the self-harm questionnaire, the authors cite and summarily describe one study by one of the authors (Sueki, 2019) with limited information on its validity. They should provide more information on what work has been completed. Readers would need to know if cognitive interviewing methods were used (to describe what participants understand by each question and if unclear to test improved wordings) during the development and validation of the questionnaire items; the meanings of questions may have changed from pre- to post-COVID-19. Such questionnaire methods (only six items) should only be viewed as screening items and not as definitive measures of suicidality as understood in clinical practice. In a clinical (or systematic semi-structured) interview, there would be cross-questioning to separate out respondents with pervasive suicidal thoughts from those with transient thoughts that have little if any clinical significance and that may only be present temporarily during a period of increased (or decreased) environmental stress. Accordingly the questionnaire evaluation methods could now be repeated because the meaning to respondents may have changed from pre- to post-COVID-19 and perhaps repeated again when COVID-19 has either passed or has become contained and is no longer a significant societal threat.
Although mentioned in the study abstract as an aim, an explicit a priori hypothesis is not stated in the paper but it is implied in the phrase '... to examine changes in suicidal ideation between the pre COVID-19 (T1) and COVID-19 period (T2)...' The regression models (table 2) are incompletely described and explained. Any model of future symptom scores such as of self-harm symptoms should incorporate socio-demographic factors as they play a major part in predicting such outcomes. Information on socio-demographic factors is vital given their importance in modelling the primary outcome (level of suicidality). Income and part-time employment were individual predictors of T2 suicidality after controlling, it seems, only for T1 suicidality. Having child care responsibility was protective. Regression analyses (table 2) were conducted separately with data from all participants, the non-suicidal group only (T1 suicidal ideation = 0), and suicidal group only (T1 suicidal ideation >0). T1 suicidal ideation as a predictor was used in regression models. However it is not clear that a model was built selectively in which multiple predictors are taken into account (as stated already we are only told that adjustments were made for T1 suicidality). So we do not know if adjustment for socio-demographic factors would change these simpler models. We are given the phrase ‘than the reference group’ but this is inadequately explained – what does that mean? A crucial predictor missing from the prediction of the outcome T2 suicidality score is the binary variable, the time of the assessment (T1 or T2, pre-COVID-19 or during it), which is used in simple univariate t-test comparisons that showed crudely a reduction in this population of suicidality. Accordingly it is not clear why the authors seem not to have looked at whether their striking finding of a drop in suicidal ideation (changes in suicidal ideation between the two time periods) holds up when adjusting for the baseline predictors (socio-economic and prior clinical variables and others set out in table 2). And there is no discussion of the possible effect of unmeasured covariates.
Outcomes not used could be binary (a logistic model using a previously chosen cut point on the self-harm score at T2 such as the division used between a score of 0 or a higher score on suicidality at T1) or ordinal level in addition to the continuous outcome used. The latter could add statistical power to the analyses. An independent medical statistician could add to the strength of the research and data analysis process.
The discussion of the role of social integration in society is welcome. Apart from considering the criticisms set out here, the discussion of strengths and limitations is balanced.
The authors should seek funding to repeat the same data collection following up both the same participants (the current cohort if still contactable) and fresh cohorts on multiple occasions during COVID-19 and post-COVID-19 as this will enable statistically more precise and representative estimates of trends over time. Ideally they would also have collected such data on at least a second occasion pre-COVID-19.
It is good from the research ethics standpoint that respondents were signposted to links to professional support resources. It is surprising and disappointing that the authors did not or could not apply for research ethics approval through a national or local area or public health review board; in many countries internal institutional review boards are being used less often in favour of more independent population-based advisory mechanisms.