Conference proceeding
Semi-supervised ASR by End-to-end Self-training
INTERSPEECH 2020, Vol.2020-, pp.2787-2791
Interspeech
01/01/2020
DOI: 10.21437/Interspeech.2020-1280
Abstract
While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue. In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR. Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update. Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective. We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels. On a commonly used semi-supervised ASR setting with the Wall Street Journal (WSJ) corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 46%.
Details
- Title: Subtitle
- Semi-supervised ASR by End-to-end Self-training
- Creators
- Yang Chen - Georgia Institute of TechnologyWeiran Wang - SalesforceChao Wang - Amazon Alexa, Seattle, WA USA
- Resource Type
- Conference proceeding
- Publication Details
- INTERSPEECH 2020, Vol.2020-, pp.2787-2791
- Publisher
- Isca-Int Speech Communication Assoc
- Series
- Interspeech
- DOI
- 10.21437/Interspeech.2020-1280
- ISSN
- 2308-457X
- eISSN
- 1990-9772
- Number of pages
- 5
- Language
- English
- Date published
- 01/01/2020
- Academic Unit
- Computer Science
- Record Identifier
- 9984696572902771
Metrics
1 Record Views