JOIST: A Joint Speech and Text Streaming Model for ASR

Tara N. Sainath; Rohit Prabhavalkar; Ankur Bapna; Yu Zhang; Zhouyuan Huo; Zhehuai Chen; Bo Li; Weiran Wang; Trevor Strohman

doi:10.1109/SLT54892.2023.10022774

Conference proceeding

JOIST: A Joint Speech and Text Streaming Model for ASR

Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang and Trevor Strohman

2022 IEEE Spoken Language Technology Workshop (SLT), pp.52-59

01/09/2023

DOI: 10.1109/SLT54892.2023.10022774

View Online

Abstract

We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E) model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous works, we explore joint training with both modalities, rather than pre-training and fine-tuning. In addition, we explore JOIST using a streaming E2E model with an order of magnitude more data, which are also novelties compared to previous works. Through a series of ablation studies, we explore different types of text modeling, including how to model the length of the text sequence and the appropriate text subword unit representation. We find that best text representation for JOIST improves WER across a variety of search and rare-word test sets by 4-14% relative, compared to a model not trained with text. In addition, we quantitatively show that JOIST maintains streaming capabilities, which is important for good user-level experience.

Conferences

Data models

end-to-end ASR

long-tail

Training

Details

Title: Subtitle: JOIST: A Joint Speech and Text Streaming Model for ASR
Creators: Tara N. Sainath - Google
Rohit Prabhavalkar - Google
Ankur Bapna - Google
Yu Zhang - Google
Zhouyuan Huo - Google
Zhehuai Chen - Google
Bo Li - Google
Weiran Wang - Google
Trevor Strohman - Google
Resource Type: Conference proceeding
Publication Details: 2022 IEEE Spoken Language Technology Workshop (SLT), pp.52-59
Publisher: IEEE
DOI: 10.1109/SLT54892.2023.10022774
ISSN: 2639-5479
Language: English
Date published: 01/09/2023
Academic Unit: Computer Science
Record Identifier: 9984696578502771

Metrics

1 Record Views

2 Times Cited - Web of Science