Conference proceeding
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
INTERSPEECH 2022, Vol.2022-, pp.1706-1710
Interspeech
01/01/2022
DOI: 10.21437/Interspeech.2022-10791
Abstract
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.
Details
- Title: Subtitle
- A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
- Creators
- Shaojin Ding - Google LLC, Mountain View, CA 94043 USAWeiran Wang - Google LLC, Mountain View, CA 94043 USADing Zhao - Google LLC, Mountain View, CA 94043 USATara N. Sainath - GoogleYanzhang He - GoogleRobert David - Google LLC, Mountain View, CA 94043 USARami Botros - Google LLC, Mountain View, CA 94043 USAXin Wang - Google LLC, Mountain View, CA 94043 USARina Panigrahy - Google LLC, Mountain View, CA 94043 USAQiao Liang - Google LLC, Mountain View, CA 94043 USADongseong Hwang - Google LLC, Mountain View, CA 94043 USAIan McGraw - GoogleRohit Prabhavalkar - Google LLC, Mountain View, CA 94043 USATrevor Strohman - Google LLC, Mountain View, CA 94043 USA
- Resource Type
- Conference proceeding
- Publication Details
- INTERSPEECH 2022, Vol.2022-, pp.1706-1710
- Publisher
- Isca-Int Speech Communication Assoc
- Series
- Interspeech
- DOI
- 10.21437/Interspeech.2022-10791
- ISSN
- 2308-457X
- eISSN
- 1990-9772
- Number of pages
- 5
- Language
- English
- Date published
- 01/01/2022
- Academic Unit
- Computer Science
- Record Identifier
- 9984696723502771
Metrics
6 Record Views