Multimodal Machine Learning Techniques for Depression Detection: A Systematic Literature Review
Main Article Content
Abstract
Depression is a widespread mental health disorder that poses serious social, economic, and healthcare challenges worldwide. Conventional diagnostic approaches rely primarily on self‑reported questionnaires and clinical interviews, which are subjective and may lead to delayed or inaccurate diagnosis. The past few years have seen the development of machine learning and deep learning technologies as useful mechanisms of automatic depression detection. The multimodal machine learning methods have been widely studied among them, as they can combine heterogeneous data modals, including speech, text, facial expression, and physiological modals. This is a systematic literature review analyzing the current studies regarding multimodal machine learning methods in the detection of depression. In accordance with a developed review protocol, applicable studies were determined, screened, and evaluated according to a set of inclusion and exclusion criteria. Following the PRISMA guidelines, a total of 32 studies were included in this synthesis to ensure methodological clarity and reproducibility. The review summarises the results associated with data modalities, modelling methods, fusion approaches, data sets, and metrics of evaluation. In most reviewed studies, multimodal models improved F1-score or accuracy over unimodal baselines, particularly on DAIC-WOZ and AVEC; however, gains were smaller or inconsistent on small or imbalanced datasets. Nevertheless, issues of data scarcity, ethical issues, and overall lack of generalisability are still notable. The review gives an in-depth synthesis of the state of the art and emerges major research gaps to inform the future developments in automated depression detecting systems.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.