Preprint
Large-Scale Approximate Kernel Canonical Correlation Analysis
arXiv (Cornell University)
02/29/2016
DOI: 10.48550/arxiv.1511.04773
Abstract
Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning. Although there is a closed-form solution for the KCCA objective, it involves solving an N×N eigenvalue system where N is the training set size, making its computational requirements in both memory and time prohibitive for large-scale problems. Various approximation techniques have been developed for KCCA. A commonly used approach is to first transform the original inputs to an M-dimensional random feature space so that inner products in the feature space approximate kernel evaluations, and then apply linear CCA to the transformed inputs. In many applications, however, the dimensionality M of the random feature space may need to be very large in order to obtain a sufficiently good approximation; it then becomes challenging to perform the linear CCA step on the resulting very high-dimensional data matrices. We show how to use a stochastic optimization algorithm, recently proposed for linear CCA and its neural-network extension, to further alleviate the computation requirements of approximate KCCA. This approach allows us to run approximate KCCA on a speech dataset with 1.4 million training samples and a random feature space of dimensionality M=100000 on a typical workstation.
Details
- Title: Subtitle
- Large-Scale Approximate Kernel Canonical Correlation Analysis
- Creators
- Weiran Wang - Toyota Technological Institute at ChicagoKaren Livescu - Toyota Technological Institute at Chicago
- Resource Type
- Preprint
- Publication Details
- arXiv (Cornell University)
- DOI
- 10.48550/arxiv.1511.04773
- eISSN
- 2331-8422
- Language
- English
- Date posted
- 02/29/2016
- Academic Unit
- Computer Science
- Record Identifier
- 9984696564702771
Metrics
1 Record Views