Journal article
Asynchronous and Distributed Data Augmentation for Massive Data Settings
Journal of computational and graphical statistics, Vol.32(3), pp.895-907
09/2023
DOI: 10.1080/10618600.2022.2130928
Abstract
Data augmentation (DA) algorithms are slow in massive data settings due to multiple passes through the entire data. We address this problem by developing a DA extension that exploits asynchronous and distributed computing. The extended DA algorithm is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA is indexed by a parameter
and starts by dividing the entire data into k disjoint subsets and storing them on k processes. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining
-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and
-fraction of old augmented data. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate that ADDA is significantly faster than its parent for many (k, r) choices in three representative models. We also establish the geometric ergodicity of the ADDA Markov chain for all the three models, which yields asymptotically valid standard errors for estimates of desired posterior quantities.
Supplementary materials
for this article are available online.
Details
- Title: Subtitle
- Asynchronous and Distributed Data Augmentation for Massive Data Settings
- Creators
- Jiayuan Zhou - University of Florida HealthKshitij Khare - University of Florida HealthSanvesh Srivastava - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Journal of computational and graphical statistics, Vol.32(3), pp.895-907
- DOI
- 10.1080/10618600.2022.2130928
- ISSN
- 1061-8600
- eISSN
- 1537-2715
- Publisher
- Taylor & Francis
- Grant note
- DOI: 10.13039/100000006, name: Office of Naval Research, award: ONR-BAA N000141812741; DOI: 10.13039/100000001, name: National Science Foundation, award: DMS-1854667/1854662
- Language
- English
- Electronic publication date
- 11/08/2022
- Date published
- 09/2023
- Academic Unit
- Statistics and Actuarial Science
- Record Identifier
- 9984313159802771
Metrics
24 Record Views