Journal article
Efficient parallel processing of range queries through replicated declustering
Distributed and parallel databases : an international journal, Vol.20(2), pp.117-147
09/2006
DOI: 10.1007/s10619-006-9362-5
Abstract
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.
Details
- Title: Subtitle
- Efficient parallel processing of range queries through replicated declustering
- Creators
- Hakan Ferhatosmanoglu - Department of Computer Science and Engineering The Ohio State University Columbus OH 43210Ali Tosun - Department of Computer Science University of Texas San Antonio TX 78249Guadalupe Canahuate - Department of Computer Science and Engineering The Ohio State University Columbus OH 43210Aravind Ramachandran - Microsoft Corporation Redmond WA 98052
- Resource Type
- Journal article
- Publication Details
- Distributed and parallel databases : an international journal, Vol.20(2), pp.117-147
- Publisher
- Kluwer Academic Publishers
- DOI
- 10.1007/s10619-006-9362-5
- ISSN
- 0926-8782
- eISSN
- 1573-7578
- Language
- English
- Date published
- 09/2006
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984083299402771
Metrics
21 Record Views