eprintid: 6779 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/67/79 datestamp: 2023-11-09 16:18:34 lastmod: 2023-11-09 16:18:34 status_changed: 2023-11-09 16:07:38 type: conference_item metadata_visibility: show creators_name: Pampouchidou, A. creators_name: Simantiraki, O. creators_name: Fazlollahi, A. creators_name: Pediaditis, M. creators_name: Manousos, D. creators_name: Roniotis, A. creators_name: Giannakakis, G. creators_name: Meriaudeau, F. creators_name: Simos, P. creators_name: Marias, K. creators_name: Yang, F. creators_name: Tsiknakis, M. title: Depression assessment by fusing high and low level features from audio, video, and text ispublished: pub keywords: Image processing; Pattern recognition; Semantics; Speech processing, Affective Computing; AVEC 2016; Classification accuracy; Classification algorithm; Classification results; Dynamic characteristics; Multi-modal fusion; Statistical descriptors, Speech recognition note: cited By 67; Conference of 6th International Workshop on Audio/Visual Emotion Challenge, AVEC 2016 ; Conference Date: 16 October 2016; Conference Code:124322 abstract: Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcriptbased) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying notdepressed individuals on the development set and 0.52/0.81, respectively for the test set. © 2016 Copyright held by the owner/author(s). date: 2016 publisher: Association for Computing Machinery, Inc official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995610978&doi=10.1145%2f2988257.2988266&partnerID=40&md5=f04293be7215c843ab60cbedf5acc0a4 id_number: 10.1145/2988257.2988266 full_text_status: none publication: AVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia 2016 pagerange: 27-34 refereed: TRUE isbn: 9781450345163 citation: Pampouchidou, A. and Simantiraki, O. and Fazlollahi, A. and Pediaditis, M. and Manousos, D. and Roniotis, A. and Giannakakis, G. and Meriaudeau, F. and Simos, P. and Marias, K. and Yang, F. and Tsiknakis, M. (2016) Depression assessment by fusing high and low level features from audio, video, and text. In: UNSPECIFIED.