Leveraging kindred projections for dimensionality reduction and improved classification
Abstract
Details
- Title: Subtitle
- Leveraging kindred projections for dimensionality reduction and improved classification
- Creators
- Diego Castaneda
- Contributors
- Guadalupe M Canahuate (Advisor)Hans J Johnson (Committee Member)Thomas L Casavant (Committee Member)
- Resource Type
- Thesis
- Degree Awarded
- Master of Science (MS), University of Iowa
- Degree in
- Electrical and Computer Engineering
- Date degree season
- Spring 2020
- DOI
- 10.17077/etd.005336
- Publisher
- University of Iowa
- Number of pages
- viii, 32 pages
- Copyright
- Copyright 2020 Diego Castaneda
- Language
- English
- Description illustrations
- color illustrations
- Description bibliographic
- Includes bibliographical references (pages 31-32).
- Public Abstract (ETD)
In classification tasks, it is common to deal with data that contains hundreds or thousands of attributes. To most, it might seem that having more data means that one can build more accurate and robust models, but this is often a misconception. Models in high dimensions usually struggle to find useful patterns as these patterns are obscured by noise within the data. There are tree-based decision models that have proven to be effective in these situations by building several models over random subsets of the features. The tree-based decision models then come up with a consensus on the labels for data points by a voting process.
This work proposes a method inspired by models that use subsets of features. Instead of defining subsets of random features, this work suggests grouping attributes by how related they are to each other, specifically by autological reasoning. We then use unsupervised methods to summarize the disjoint subsets of associated features into a single category. A learning step is applied for each disjoint subgroup to determine the best clustering that corresponds to the outcome variables by incorporating information theoretic measures. Then the categorizations are used as factors for fitting a highly explainable classification model. This work demonstrates that we can effectively reduce the dimensions of the original data into meaningful categories and then leverage the groups to improve performance over models that use all the features.
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9983956194502771