Conference proceeding
Intelligent GP fusion from multiple sources for text classification
Proceedings of the 14th ACM international conference on information and knowledge management, pp.477-484
CIKM '05
10/31/2005
DOI: 10.1145/1099554.1099688
Abstract
This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.
Details
- Title: Subtitle
- Intelligent GP fusion from multiple sources for text classification
- Creators
- Baoping Zhang - Virginia TechYuxin Chen - Virginia TechWeiguo Fan - Virginia TechEdward Fox - Virginia TechMarcos Gonçalves - Universidade Federal de Minas GeraisMarco Cristo - Universidade Federal de Minas GeraisPável Calado - Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
- Resource Type
- Conference proceeding
- Publication Details
- Proceedings of the 14th ACM international conference on information and knowledge management, pp.477-484
- Publisher
- ACM
- Series
- CIKM '05
- DOI
- 10.1145/1099554.1099688
- Language
- English
- Date published
- 10/31/2005
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380492102771
Metrics
10 Record Views