Multimodal and Multi-view Models for Emotion Recognition

Gustavo Aguilar; Viktor Rozgic; Weiran Wang; Chao Wang

doi:10.18653/v1/P19-1095

Back

Multimodal and Multi-view Models for Emotion Recognition

Conference proceeding

Open access

Multimodal and Multi-view Models for Emotion Recognition

Gustavo Aguilar, Viktor Rozgic, Weiran Wang and Chao Wang

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.991-1002

Florence, Italy, 07/2019

2019

DOI: 10.18653/v1/P19-1095

Files and links (1)

url

https://doi.org/10.18653/v1/P19-1095View

Published (Version of record) Open Access

Abstract

Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models. The majority of the studies focus on settings where both modalities are available in training and evaluation. However, in practice, this is not always the case; getting ASR output may represent a bottleneck in a deployment pipeline due to computational complexity or privacy-related constraints. To address this challenge, we study the problem of efficiently combining acoustic and lexical modalities during training while still providing a deployable acoustic model that does not require lexical inputs. We first experiment with multimodal models and two attention mechanisms to assess the extent of the benefits that lexical information can provide. Then, we frame the task as a multi-view learning problem to induce semantic information from a multimodal model into our acoustic-only network using a contrastive loss function. Our multimodal model outperforms the previous state of the art on the USC-IEMOCAP dataset reported on lexical and acoustic information. Additionally, our multi-view-trained acoustic network significantly surpasses models that have been exclusively trained with acoustic features.

Computer Science

Social Sciences

Technology

Computer Science, Artificial Intelligence

Computer Science, Interdisciplinary Applications

Linguistics

Science & Technology

Details

Title: Subtitle: Multimodal and Multi-view Models for Emotion Recognition
Creators: Gustavo Aguilar - University of Houston
Viktor Rozgic - Amazon Com, Seattle, WA 98108 USA
Weiran Wang - Amazon Com, Seattle, WA 98108 USA
Chao Wang - Amazon Com, Seattle, WA 98108 USA
Contributors: A Korhonen (Editor)
D Traum (Editor)
L Marquez (Editor)
Resource Type: Conference proceeding
Publication Details: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.991-1002
Conference: Florence, Italy, 07/2019
DOI: 10.18653/v1/P19-1095
Publisher: Association for Computational Linguistics
Number of pages: 12
Language: English
Date published: 2019
Academic Unit: Computer Science
Record Identifier: 9984696584902771

Metrics

3 Record Views

22 Times Cited - Web of Science