Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition

Haozhe Shan; Albert Gu; Zhong Meng; Weiran Wang; Krzysztof Choromanski; Tara Sainath

doi:10.1109/ICASSP48485.2024.10445950

Back

Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition

Conference proceeding

Open access

Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition

Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski and Tara Sainath

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.12221-12225

04/14/2024

DOI: 10.1109/ICASSP48485.2024.10445950

Files and links (1)

url

https://doi.org/10.1109/ICASSP48485.2024.10445950View

Published (Version of record) Open Access

Abstract

Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. We performed systematic ablation studies to compare variants of S4 models and propose two novel approaches that combine them with convolutions. We found that the most effective design is to stack a small S4 using real-valued recurrent weights with a local convolution, allowing them to work complementarily. Our best model achieves WERs of 4.01%/8.53% on test sets from Librispeech, outperforming Conformers with extensively tuned convolution.

Acoustics

causal model

Conformer

Context modeling

Convolution

Online ASR

Speech processing

Speech recognition

state-space model

Systematics

Details

Title: Subtitle: Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition
Creators: Haozhe Shan - Harvard University Press
Albert Gu - Carnegie Mellon University
Zhong Meng - Google (United States)
Weiran Wang - Google (United States)
Krzysztof Choromanski - Google (United States)
Tara Sainath - Google (United States)
Resource Type: Conference proceeding
Publication Details: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.12221-12225
DOI: 10.1109/ICASSP48485.2024.10445950
eISSN: 2379-190X
Publisher: IEEE
Language: English
Date published: 04/14/2024
Academic Unit: Computer Science
Record Identifier: 9984696583802771

Metrics

15 Record Views