Book chapter
Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces
Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV, pp.26-50
Lecture Notes in Computer Science, Springer Berlin Heidelberg
11/21/2014
DOI: 10.1007/978-3-662-45714-6_2
Abstract
Top-k (preference) queries are used in several domains to retrieve the set of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$k$$\end{document} tuples that more closely match a given query. For high-dimensional spaces, evaluation of top-k queries is expensive, as data and space partitioning indices perform worse than sequential scan. An alternative approach is the use of sorted lists to speed up query evaluation. This approach extends performance gains when compared to sequential scan to about ten dimensions. However, data-sets for which preference queries are considered, often are high-dimensional. In this paper, we explore the the use of bit-sliced indices (BSI) to encode the attributes or score lists and perform top-k queries over high-dimensional data using bit-wise operations. Our approach does not require sorting or random access to the index. Additionally, bit-sliced indices require less space than other type of indices. The size of the bit-sliced index (without using compression) for a normalized data-set with 3 decimals is 60 times smaller than the size of sorted lists. Furthermore, our experimental evaluation shows that the use of BSI for top-k query processing is more efficient than Sequential Scan for high-dimensional data. When compared to Sequential Top-k Algorithm (STA), BSI is one order of magnitude faster.
Details
- Title: Subtitle
- Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces
- Creators
- Gheorghi Guzun - Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, USAJoel Tosado - Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, USAGuadalupe Canahuate - Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, USA
- Resource Type
- Book chapter
- Publication Details
- Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV, pp.26-50
- Publisher
- Springer Berlin Heidelberg; Berlin, Heidelberg
- Series
- Lecture Notes in Computer Science
- DOI
- 10.1007/978-3-662-45714-6_2
- eISSN
- 1611-3349
- ISSN
- 0302-9743
- Language
- English
- Date published
- 11/21/2014
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984083834402771
Metrics
8 Record Views