Journal article
An Online Algorithm for Bayesian Variable Selection in Logistic Regression Models With Streaming Data
Sankhyā. Series B (2008)
10/08/2025
DOI: 10.1007/s13571-025-00391-x
Appears in UI Libraries Support Open Access
Abstract
In several modern applications, data are generated continuously over time, such as data generated from virtual learning platforms. We assume data are collected and analyzed sequentially, in batches. Since traditional or offline methods can be extremely slow, an online method for Bayesian model averaging (BMA) has been recently proposed in the literature. Inspired by the literature on renewable estimation, this work developed an online Bayesian method for generalized linear models (GLMs) that reduces storage and computational demands dramatically compared to traditional methods for BMA. The method works very well when the number of models is small. It can also work reasonably well in moderately large model spaces. For the latter case, the method relies on a screening stage to identify important models in the first several batches via offline methods. Thereafter, the model space remains fixed in all subsequent batches. In the post-screening stage, online updates are made to the model specific parameters, for models selected in the screening stage. For larger model spaces, the chance of missing important models in the screening stage is more likely. This necessitates the development of a method, which permits the model space to be updated as new batches of data arrive. In this article, we develop an online Bayesian model selection method for logistic regression, where the selected models can potentially change throughout the data collection process. We use simulation studies to show that our new method can outperform the previous method. Furthermore, we describe scenarios under which the gain from our new method is expected to be small. We revisit the traffic crash data analyzed in the previous work, and illustrate that our new model selection method can have better performance for variable selection.
Details
- Title: Subtitle
- An Online Algorithm for Bayesian Variable Selection in Logistic Regression Models With Streaming Data
- Creators
- Shamriddha De - University of IowaPayel Ghosal - University of Wisconsin–MadisonJoyee Ghosh - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Sankhyā. Series B (2008)
- DOI
- 10.1007/s13571-025-00391-x
- ISSN
- 0976-8386
- eISSN
- 0976-8394
- Publisher
- Springer Nature
- Number of pages
- 46
- Language
- English
- Electronic publication date
- 10/08/2025
- Academic Unit
- Statistics and Actuarial Science
- Record Identifier
- 9985016016202771
Metrics
2 Record Views