Semi-supervised ASR by End-to-end Self-training

Yang Chen; Weiran Wang; Chao Wang

doi:10.21437/Interspeech.2020-1280

Back

Conference proceeding

Semi-supervised ASR by End-to-end Self-training

Yang Chen, Weiran Wang and Chao Wang

INTERSPEECH 2020, Vol.2020-, pp.2787-2791

Interspeech

01/01/2020

DOI: 10.21437/Interspeech.2020-1280

View Online

Abstract

While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue. In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR. Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update. Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective. We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels. On a commonly used semi-supervised ASR setting with the Wall Street Journal (WSJ) corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 46%.

Computer Science

Technology

Audiology & Speech-Language Pathology

Computer Science, Artificial Intelligence

Computer Science, Software Engineering

Life Sciences & Biomedicine

Science & Technology

Details

Title: Subtitle: Semi-supervised ASR by End-to-end Self-training
Creators: Yang Chen - Georgia Institute of Technology
Weiran Wang - Salesforce
Chao Wang - Amazon Alexa, Seattle, WA USA
Resource Type: Conference proceeding
Publication Details: INTERSPEECH 2020, Vol.2020-, pp.2787-2791
Publisher: Isca-Int Speech Communication Assoc
Series: Interspeech
DOI: 10.21437/Interspeech.2020-1280
ISSN: 2308-457X
eISSN: 1990-9772
Number of pages: 5
Language: English
Date published: 01/01/2020
Academic Unit: Computer Science
Record Identifier: 9984696572902771

Metrics

1 Record Views

11 Times Cited - Web of Science