Основні поняття
Investigating task-agnostic representations through self-supervised learning for detecting major depressive disorder (MDD) and post-traumatic stress disorder (PTSD).
Анотація
This study explores the use of self-supervised learning models, PASE and AALBERT, to generate task-agnostic representations for detecting MDD and PTSD. The investigation is conducted using audio and video data collected during interactive sessions. The study focuses on modifying hyperparameters to improve detection performances for mental disorders.
I. Introduction
Studies focus on automatic detection of mental disorders using recorded interactions.
Challenges in finding appropriate feature representation from audio/video data.
Exploration of deep-learning architectures for generating suitable latent representations.
II. SSL Models
A. Multi-target prediction:
PASE architecture generates task-agnostic representation from raw speech.
Modified list of workers includes eGeMAPS, MFB energies, and LPS.
B. Masked prediction:
AALBERT architecture utilizes transformer layers to predict masked frames.
III. Experimental Details
A. Datasets:
Utilization of DAIC-WOZ dataset for developing MDD/PTSD detector.
B. Encoders and Detectors:
PASE/PASE-mod encoder trained on LibriSpeech, DAIC-WOZ, and IEMOCAP datasets.
AALBERT encoder trained on video modality with different input segment lengths.
C. Baselines:
Evaluation of LSTM models as baselines for comparison.
IV. Results
A. Audio Modality:
Detection performances of MDD and PTSD using PASE/PASE-mod encoders.
B. Video Modality:
Detection performances of MDD and PTSD using the AALBERT encoder.
V. Conclusions
The study investigates the task-agnostic traits of SSL representations for detecting correlated mental disorders in audio and video modalities, showing promising results in improving detection performances compared to supervised learning models.