Conference proceeding
Improving Deliberation by Text-Only and Semi-Supervised Training
INTERSPEECH 2022, Vol.2022-, pp.4940-4944
Interspeech
01/01/2022
DOI: 10.21437/Interspeech.2022-243
Abstract
Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4%-12% WER reduction for various tasks compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11% relative. We show that the deliberation model also achieves a positive human side-by-side evaluation compared to the state-of-the-art LM rescorer with reasonable endpointer latencies.
Details
- Title: Subtitle
- Improving Deliberation by Text-Only and Semi-Supervised Training
- Creators
- Ke Hu - Google LLC, Mountain View, CA 94043 USATara N. Sainath - GoogleYanzhang He - GoogleRohit Prabhavalkar - GoogleTrevor Strohman - GoogleSepand Mavandadi - Google LLC, Mountain View, CA 94043 USAWeiran Wang - Google
- Resource Type
- Conference proceeding
- Publication Details
- INTERSPEECH 2022, Vol.2022-, pp.4940-4944
- Publisher
- Isca-Int Speech Communication Assoc
- Series
- Interspeech
- DOI
- 10.21437/Interspeech.2022-243
- ISSN
- 2308-457X
- eISSN
- 1990-9772
- Number of pages
- 5
- Language
- English
- Date published
- 01/01/2022
- Academic Unit
- Computer Science
- Record Identifier
- 9984696858102771
Metrics
1 Record Views