Large-Scale Approximate Kernel Canonical Correlation Analysis

Weiran Wang; Karen Livescu

doi:10.48550/arxiv.1511.04773

Back

Large-Scale Approximate Kernel Canonical Correlation Analysis

Preprint

Open access

Large-Scale Approximate Kernel Canonical Correlation Analysis

Weiran Wang and Karen Livescu

arXiv (Cornell University)

02/29/2016

DOI: 10.48550/arxiv.1511.04773

Files and links (1)

url

https://doi.org/10.48550/arXiv.1511.04773View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning. Although there is a closed-form solution for the KCCA objective, it involves solving an N×N eigenvalue system where N is the training set size, making its computational requirements in both memory and time prohibitive for large-scale problems. Various approximation techniques have been developed for KCCA. A commonly used approach is to first transform the original inputs to an M-dimensional random feature space so that inner products in the feature space approximate kernel evaluations, and then apply linear CCA to the transformed inputs. In many applications, however, the dimensionality M of the random feature space may need to be very large in order to obtain a sufficiently good approximation; it then becomes challenging to perform the linear CCA step on the resulting very high-dimensional data matrices. We show how to use a stochastic optimization algorithm, recently proposed for linear CCA and its neural-network extension, to further alleviate the computation requirements of approximate KCCA. This approach allows us to run approximate KCCA on a speech dataset with 1.4 million training samples and a random feature space of dimensionality M=100000 on a typical workstation.

Computer Science - Learning

Details

Title: Subtitle: Large-Scale Approximate Kernel Canonical Correlation Analysis
Creators: Weiran Wang - Toyota Technological Institute at Chicago
Karen Livescu - Toyota Technological Institute at Chicago
Resource Type: Preprint
Publication Details: arXiv (Cornell University)
DOI: 10.48550/arxiv.1511.04773
eISSN: 2331-8422
Language: English
Date posted: 02/29/2016
Academic Unit: Computer Science
Record Identifier: 9984696564702771

Metrics

1 Record Views