A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

Shaojin Ding; Weiran Wang; Ding Zhao; Tara N. Sainath; Yanzhang He; Robert David; Rami Botros; Xin Wang; Rina Panigrahy; Qiao Liang; Dongseong Hwang; Ian McGraw; Rohit Prabhavalkar; Trevor Strohman

doi:10.21437/Interspeech.2022-10791

Conference proceeding

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, …

INTERSPEECH 2022, Vol.2022-, pp.1706-1710

Interspeech

01/01/2022

DOI: 10.21437/Interspeech.2022-10791

View Online

Abstract

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.

Acoustics

Computer Science

Engineering

Technology

Audiology & Speech-Language Pathology

Computer Science, Artificial Intelligence

Engineering, Electrical & Electronic

Life Sciences & Biomedicine

Science & Technology

Details

Title: Subtitle: A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Creators: Shaojin Ding - Google LLC, Mountain View, CA 94043 USA
Weiran Wang - Google LLC, Mountain View, CA 94043 USA
Ding Zhao - Google LLC, Mountain View, CA 94043 USA
Tara N. Sainath - Google
Yanzhang He - Google
Robert David - Google LLC, Mountain View, CA 94043 USA
Rami Botros - Google LLC, Mountain View, CA 94043 USA
Xin Wang - Google LLC, Mountain View, CA 94043 USA
Rina Panigrahy - Google LLC, Mountain View, CA 94043 USA
Qiao Liang - Google LLC, Mountain View, CA 94043 USA
Dongseong Hwang - Google LLC, Mountain View, CA 94043 USA
Ian McGraw - Google
Rohit Prabhavalkar - Google LLC, Mountain View, CA 94043 USA
Trevor Strohman - Google LLC, Mountain View, CA 94043 USA
Resource Type: Conference proceeding
Publication Details: INTERSPEECH 2022, Vol.2022-, pp.1706-1710
Publisher: Isca-Int Speech Communication Assoc
Series: Interspeech
DOI: 10.21437/Interspeech.2022-10791
ISSN: 2308-457X
eISSN: 1990-9772
Number of pages: 5
Language: English
Date published: 01/01/2022
Academic Unit: Computer Science
Record Identifier: 9984696723502771

Metrics

6 Record Views

2 Times Cited - Web of Science