Exploring Feature Definition and Selection for Sentiment Classifiers

Yelena Mejova; Padmini Srinivasan

doi:10.1609/icwsm.v5i1.14163

Back

Exploring Feature Definition and Selection for Sentiment Classifiers

Conference proceeding

Open access

Exploring Feature Definition and Selection for Sentiment Classifiers

Yelena Mejova and Padmini Srinivasan

Proceedings of the ... International AAAI Conference on Weblogs and Social Media, Vol.5(1), pp.546-549

08/03/2021

DOI: 10.1609/icwsm.v5i1.14163

Files and links (1)

url

https://doi.org/10.1609/icwsm.v5i1.14163View

Published (Version of record) Open Access

Abstract

In this paper, we systematically explore feature definition and selection strategies for sentiment polarity classification. We begin by exploring basic questions, such as whether to use stemming, term frequency versus binary weighting, negation-enriched features, n-grams or phrases. We then move onto more complex aspects including feature selection using frequency-based vocabulary trimming, part-of-speech and lexicon selection (three types of lexicons), as well as using expected Mutual Information (MI). Using three product and movie review datasets of various sizes, we show, for example, that some techniques are more beneficial for larger datasets than the smaller. A classifier trained on only few features ranked high by MI outperformed one trained on all features in large datasets, yet in small dataset this did not prove to be true. Finally, we perform a space and computation cost analysis to further understand the merits of various feature types.

Details

Title: Subtitle: Exploring Feature Definition and Selection for Sentiment Classifiers
Creators: Yelena Mejova
Padmini Srinivasan
Resource Type: Conference proceeding
Publication Details: Proceedings of the ... International AAAI Conference on Weblogs and Social Media, Vol.5(1), pp.546-549
DOI: 10.1609/icwsm.v5i1.14163
ISSN: 2162-3449
eISSN: 2334-0770
Language: English
Date published: 08/03/2021
Academic Unit: Nursing; Computer Science; Business Analytics
Record Identifier: 9984339313602771

Metrics

30 Record Views