Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Rami Botros; Anmol Gulati; Tara N Sainath; Krzysztof Choromanski; Ruoming Pang; Trevor Strohman; Weiran Wang; Jiahui Yu

doi:10.48550/arxiv.2304.00171

Back

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Preprint

Open access

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Rami Botros, Anmol Gulati, Tara N Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang and Jiahui Yu

arXiv (Cornell University)

03/31/2023

DOI: 10.48550/arxiv.2304.00171

Files and links (1)

url

https://doi.org/10.48550/arXiv.2304.00171View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to improve the execution speed, including replacing lower conformer blocks with convolution-only blocks, strategically downsizing the architecture, and utilizing an RNNAttention-Performer. Our optimized conformer can be readily incorporated into a cascaded-encoder setting, allowing a second-pass decoder to operate on its output and improve the accuracy whenever more resources are available. Altogether, we find that these optimizations can reduce latency by a factor of 6.8x, and come at a reasonable trade-off in quality. With the cascaded second-pass, we show that the recognition accuracy is completely recoverable. Thus, our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.

Computer Science - Computation and Language

Computer Science - Sound

Details

Title: Subtitle: Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Creators: Rami Botros
Anmol Gulati
Tara N Sainath
Krzysztof Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
Resource Type: Preprint
Publication Details: arXiv (Cornell University)
DOI: 10.48550/arxiv.2304.00171
eISSN: 2331-8422
Language: English
Date posted: 03/31/2023
Academic Unit: Computer Science
Record Identifier: 9984696875102771

Metrics

15 Record Views