Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jul 23, 2022
Date Accepted: Jan 15, 2023

The final, peer-reviewed published version of this preprint can be found here:

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation

Owen D, Antypas D, Hassoulas A, Pardiñas AF, Espinosa-Anke L, Collados JC

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation

JMIR AI 2023;2:e41205

DOI: 10.2196/41205

PMID: 37525646

PMCID: 7614849

Detecting Depression in Users of Online Forums: Enabling Early Healthcare Intervention using Language Models

  • David Owen; 
  • Dimosthenis Antypas; 
  • Athanasios Hassoulas; 
  • Antonio Fernández Pardiñas; 
  • Luis Espinosa-Anke; 
  • Jose Camacho Collados

ABSTRACT

Background:

Major Depressive Disorder (MDD) is a common mental disorder that affects 5% of adults worldwide. Early contact with healthcare services is critical in achieving an accurate diagnosis and improving patient outcomes. Key symptoms of MDD (depression hereafter) such as cognitive distortions are observed in verbal communication, which can manifest in the structure of written language as well. Thus, the automatic analysis of text outputs may provide opportunities for early interventions in settings where written communication is rich and regular, such as social media and online forums.

Objective:

The objective was twofold. We sought to gauge the effectiveness of different machine learning approaches to identifying users of the mass online forum Reddit who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date is a relevant factor in performing this detection.

Methods:

Two Reddit datasets containing posts belonging to users with and without a history of depression diagnosis were obtained. An intersection of these datasets provided users with an estimated date of depression diagnosis. This derived dataset was used as input to several machine learning classifiers, including Transformer-based Language Models.

Results:

BERT (Bidirectional Encoder Representations from Transformers) Transformer-based Language Model proved most effective in distinguishing forum users with a known depression diagnosis from those without. It obtained a mean F1 score of 0.64 across the experimental setups used for binary classification. It was also determined that the final 12 weeks (about 3 months) of posts prior to a depressed user’s estimated diagnosis date are most indicative of their illness. Furthermore, in the four-to-eight-week period prior to the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other four-week period in their post history.

Conclusions:

Transformer-based Language Models may be used on data from online social media forums to identify users at risk of psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental healthcare interventions to support those at risk of this condition.


 Citation

Please cite as:

Owen D, Antypas D, Hassoulas A, Pardiñas AF, Espinosa-Anke L, Collados JC

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation

JMIR AI 2023;2:e41205

DOI: 10.2196/41205

PMID: 37525646

PMCID: 7614849

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement