Decoupling speech processing from time

John B. Muegge; Hyoju Kim; Bob McMurray

doi:10.1016/j.tics.2025.05.017

Speech processing has long been defined by unifying principles that suggest that the dynamics of word recognition are closely coupled to the unfolding signal.These argue that candidates are activated immediately and updated incrementally as the signal unfolds, information decays to make room for more input, and internal representations are defined by temporal order.However, converging results from several domains suggest that these principles are less ubiquitous than has long been assumed.Perceptual information does not decay rapidly; buffers in the system may block incremental processing, words are not strictly ordered, and listeners in many real-world circumstances adopt a profile of word recognition that is not incremental.This has strong implications for current evaluation of computational models and requires new frameworks for understanding the basic systems that make up speech processing. Accurate processing of speech requires that listeners map temporally unfolding input to words. A long-held set of principles describes this process: lexical items are activated immediately and incrementally as speech arrives, perceptual and lexical representations then rapidly decay to make room for new information, and lexical entries are temporally structured. In this framework, speech processing is tightly coupled to the temporally unfolding input. However, recent work challenges this: low-level auditory and higher-level lexical representations do not decay and are instead retained over long durations, speech perception may require encapsulated memory buffers, lexical representations are not strictly temporally structured, and listeners can substantially delay lexical access in some circumstances. These findings suggest that current theories and models of word recognition need to be reconceptualized. Accurate processing of speech requires that listeners map temporally unfolding input to words. A long-held set of principles describes this process: lexical items are activated immediately and incrementally as speech arrives, perceptual and lexical representations then rapidly decay to make room for new information, and lexical entries are temporally structured. In this framework, speech processing is tightly coupled to the temporally unfolding input. However, recent work challenges this: low-level auditory and higher-level lexical representations do not decay and are instead retained over long durations, speech perception may require encapsulated memory buffers, lexical representations are not strictly temporally structured, and listeners can substantially delay lexical access in some circumstances. These findings suggest that current theories and models of word recognition need to be reconceptualized.

Decoupling speech processing from time

View Online

Abstract

Details

Metrics