Preprint
SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning
ArXiv.org
Cornell University
02/01/2026
DOI: 10.48550/arxiv.2602.01004
Abstract
Multi-modal large language models (MLLMs) have demonstrated significant progress in reasoning capabilities and shown promising effectiveness in video anomaly understanding (VAU) tasks. However, existing MLLM-based approaches remain largely focused on surface-level descriptions of anomalies, lacking deep reasoning over abnormal behaviors like explicit self-reflection and self-correction. To address that, we propose Self-Reflection-Enhanced Reasoning for Video Anomaly Understanding (SRVAU-R1), a reflection-aware learning framework that incorporates reflection in MLLM reasoning. Specifically, SRVAU-R1 introduces the first reflection-oriented Chain-of-Thought dataset tailored for VAU, providing structured supervision with initial reasoning, self-reflection, and revised reasoning. Based on that, it includes a novel reflection-aware learning paradigm with supervised fine-tuning and reinforcement fine-tuning to enhance multi-modal reasoning for VAU. Extensive experiments on multiple video anomaly benchmarks demonstrate that SRVAU-R1 consistently outperforms existing methods, achieving significant improvements in both temporal anomaly localization accuracy and reasoning quality.
Details
- Title: Subtitle
- SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning
- Creators
- Zihao Zhao - University of IowaShengting Cao - Knox CollegeMuchao Ye - University of Iowa
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arxiv.2602.01004
- ISSN
- 2331-8422
- Publisher
- Cornell University; Ithaca, New York
- Language
- English
- Date posted
- 02/01/2026
- Academic Unit
- Computer Science
- Record Identifier
- 9985139312902771
Metrics
1 Record Views