Scalable preference queries for high-dimensional data using map-reduce

Gheorghi Guzun; Joel E Tosado; Guadalupe Canahuate

doi:10.1109/BigData.2015.7364013

Back

Conference proceeding

Scalable preference queries for high-dimensional data using map-reduce

Gheorghi Guzun, Joel E Tosado and Guadalupe Canahuate

2015 IEEE International Conference on Big Data (Big Data), pp.2243-2252

10/2015

DOI: 10.1109/BigData.2015.7364013

View Online

Abstract

Preference (top-k) queries play a key role in modern data analytics tasks. Top-k techniques rely on ranking functions in order to determine an overall score for each of the objects across all the relevant attributes being examined. This ranking function is provided by the user at query time, or generated for a particular user by a personalized search engine which prevents the pre-computation of the global scores. Executing this type of queries is particularly challenging for high-dimensional data. Recently, bit-sliced indices (BSI) were proposed to answer these preference queries efficiently in a non-distributed environment for data with hundreds of dimensions. As MapReduce and key-value stores proliferate as the preferred methods for analyzing big data, we set up to evaluate the performance of BSI in a distributed environment, in terms of index size, network traffic, and execution time of preference (top-k) queries, over data with thousands of dimensions. Indexing is implemented on top of Apache Spark for both column and row stores and shown to outperform Hive when running on Map-reduce, and Tez for top-k (preference) queries.

Big data

Cities and towns

Computers

Encoding

Indexing

Sparks

Details

Title: Subtitle: Scalable preference queries for high-dimensional data using map-reduce
Creators: Gheorghi Guzun - University of Iowa
Joel E Tosado - University of Iowa
Guadalupe Canahuate - University of Iowa
Resource Type: Conference proceeding
Publication Details: 2015 IEEE International Conference on Big Data (Big Data), pp.2243-2252
DOI: 10.1109/BigData.2015.7364013
Publisher: IEEE
Language: English
Date published: 10/2015
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984197324102771

Metrics

35 Record Views

See more details