Improving Deliberation by Text-Only and Semi-Supervised Training

Ke Hu; Tara N. Sainath; Yanzhang He; Rohit Prabhavalkar; Trevor Strohman; Sepand Mavandadi; Weiran Wang

doi:10.21437/Interspeech.2022-243

Back

Conference proceeding

Improving Deliberation by Text-Only and Semi-Supervised Training

Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi and Weiran Wang

INTERSPEECH 2022, Vol.2022-, pp.4940-4944

Interspeech

01/01/2022

DOI: 10.21437/Interspeech.2022-243

View Online

Abstract

Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4%-12% WER reduction for various tasks compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11% relative. We show that the deliberation model also achieves a positive human side-by-side evaluation compared to the state-of-the-art LM rescorer with reasonable endpointer latencies.

Acoustics

Computer Science

Engineering

Technology

Audiology & Speech-Language Pathology

Computer Science, Artificial Intelligence

Engineering, Electrical & Electronic

Life Sciences & Biomedicine

Science & Technology

Details

Title: Subtitle: Improving Deliberation by Text-Only and Semi-Supervised Training
Creators: Ke Hu - Google LLC, Mountain View, CA 94043 USA
Tara N. Sainath - Google
Yanzhang He - Google
Rohit Prabhavalkar - Google
Trevor Strohman - Google
Sepand Mavandadi - Google LLC, Mountain View, CA 94043 USA
Weiran Wang - Google
Resource Type: Conference proceeding
Publication Details: INTERSPEECH 2022, Vol.2022-, pp.4940-4944
Publisher: Isca-Int Speech Communication Assoc
Series: Interspeech
DOI: 10.21437/Interspeech.2022-243
ISSN: 2308-457X
eISSN: 1990-9772
Number of pages: 5
Language: English
Date published: 01/01/2022
Academic Unit: Computer Science
Record Identifier: 9984696858102771

Metrics

1 Record Views