Logo image
Supporting Dynamic Quantization for High-Dimensional Data Analytics
Conference proceeding

Supporting Dynamic Quantization for High-Dimensional Data Analytics

Gheorghi Guzun and Guadalupe Canahuate
Proceedings of the ExploreDB'17. International Workshop on Exploratory Search in Databases and the Web (4th : 2017 : Chicago, Ill.), Vol.2017, pp.1-6
05/2017
DOI: 10.1145/3077331.3077336
PMCID: PMC5947868
PMID: 29757329
url
https://www.ncbi.nlm.nih.gov/pmc/articles/5947868View
Open Access

Abstract

Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.

Details

Metrics

Logo image