Conference proceeding
Finding Maximal Fully-Correlated Itemsets in Large Databases
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, pp.770-775
IEEE International Conference on Data Mining
01/01/2009
DOI: 10.1109/ICDM.2009.89
Abstract
Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Much previous research focuses on finding correlated pairs instead of correlated itemsets in which all items are correlated with each other. When designing gift sets, store shelf arrangements, or website product categories, we are more interested in correlated itemsets than correlated pairs. We solve this problem by finding maximal fully-correlated itemsets (MFCIs), in which all subsets are closely related to all other subsets. Putting the items in an MFCI together can promote sales within this itemset. Though some exsiting methods find high-correlation itemsets, they suffer from both efficiency and effectiveness problems in large datasets. In this paper, we explore high-dimensional correlation in two ways. First, we expand the set of desirable properties for correlation measures and study the advantages and disadvantages of various measures. Second, we propose an MFCI framework to decouple the correlation measure from the need for efficient search. By wrapping the best measure in our MFCI framework, we take advantage of likelihood ratio's superiority in evaluating itemsets, make use of the properties of MFCI to eliminate itemsets with irrelevant items, and still achieve good computational performance.
Details
- Title: Subtitle
- Finding Maximal Fully-Correlated Itemsets in Large Databases
- Creators
- Lian Duan - University of IowaW. Nick Street - University of Iowa
- Contributors
- W Wang (Editor)H Kargupta (Editor)S Ranka (Editor)P S Yu (Editor)X D Wu (Editor)
- Resource Type
- Conference proceeding
- Publication Details
- 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, pp.770-775
- Publisher
- IEEE
- Series
- IEEE International Conference on Data Mining
- DOI
- 10.1109/ICDM.2009.89
- ISSN
- 1550-4786
- eISSN
- 2374-8486
- Number of pages
- 6
- Language
- English
- Date published
- 01/01/2009
- Academic Unit
- Bus Admin College; Nursing; Computer Science; Business Analytics
- Record Identifier
- 9984380538102771
Metrics
1 Record Views