Conference proceeding
Triphone State-tying via Deep Canonical Correlation Analysis: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5, pp.3444-3448
Interspeech
01/01/2016
DOI: 10.21437/Interspeech.2016-1300
Abstract
Context-dependent phone models are used in modern speech recognition systems to account for co-articulation effects. Due to the vast number of possible context-dependent phones, state tying is typically used to reduce the number of target classes for acoustic modeling. We propose a novel approach for state-tying which is completely data dependent and requires no domain knowledge. Our method first learns low-dimensional embed dings of context-dependent phones using deep canonical correlation analysis. The learned embeddings capture similarity between triphones and are highly predictable from the acoustics. We then cluster the embeddings and use cluster IDs as tied states. The bottleneck features of a DNN predicting the tied states achieve competitive recognition accuracy on TIMIT. Index Terms: context-dependent phone embeddings, deep canonical correlation analysis, state-tying
Details
- Title: Subtitle
- Triphone State-tying via Deep Canonical Correlation Analysis: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES
- Creators
- Weiran Wang - Toyota Technol Inst, Chicago, IL 60637 USAHao Tang - Toyota Technol Inst, Chicago, IL 60637 USAKaren Livescu - Toyota Technol Inst, Chicago, IL 60637 USA
- Resource Type
- Conference proceeding
- Publication Details
- 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5, pp.3444-3448
- Publisher
- Isca-Int Speech Communication Assoc
- Series
- Interspeech
- DOI
- 10.21437/Interspeech.2016-1300
- ISSN
- 2308-457X
- Number of pages
- 5
- Grant note
- IIS-1321015 / NSF grant; National Science Foundation (NSF)
- Language
- English
- Date published
- 01/01/2016
- Academic Unit
- Computer Science
- Record Identifier
- 9984696571202771
Metrics
1 Record Views