Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis: SITUATED INTERACTION

Qingming Tang; Weiran Wang; Karen Livescu

doi:10.21437/Interspeech.2017-1581

Back

Conference proceeding

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis: SITUATED INTERACTION

Qingming Tang, Weiran Wang and Karen Livescu

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6, pp.1656-1660

Interspeech

01/01/2017

DOI: 10.21437/Interspeech.2017-1581

View Online

Abstract

We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA's advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeant Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition.

Computer Science

Engineering

Technology

Computer Science, Artificial Intelligence

Engineering, Electrical & Electronic

Science & Technology

Details

Title: Subtitle: Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis: SITUATED INTERACTION
Creators: Qingming Tang - Toyota Technol Inst, Chicago, IL 60637 USA
Weiran Wang - Toyota Technol Inst, Chicago, IL 60637 USA
Karen Livescu - Toyota Technol Inst, Chicago, IL 60637 USA
Resource Type: Conference proceeding
Publication Details: 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6, pp.1656-1660
Publisher: Isca-Int Speech Communication Assoc
Series: Interspeech
DOI: 10.21437/Interspeech.2017-1581
ISSN: 2308-457X
Number of pages: 5
Grant note: IIS-1321015 / NSF; National Science Foundation (NSF)
Language: English
Date published: 01/01/2017
Academic Unit: Computer Science
Record Identifier: 9984696581202771

Metrics

6 Record Views

1 Times Cited - Web of Science