Conference proceeding
NATURAL CLUSTERING USING PYTHON
BIOMAT 2009, p.289
2010
DOI: 10.1142/9789814304900_0020
Abstract
Clustering involves the task of dividing data into homogeneous clusters so that items in the same cluster are as similar as possible and items in different clusters are dissimilar. The Fuzzy C-Means Clustering (FCM) algorithm is one of the most widely used fuzzy clustering algorithms. Using a combination of fuzzy clustering, resampling bootstrapping) and cluster stability analysis for all possible numbers of clusters of the dataset, it is possible to obtain the correct number of clusters. Real datasets present samples which may have some attribute values inconsistent within the same cluster. Using these samples can insert an error that interferes with the quality of classification. This can be solved by modifying the FCM algorithm to accept. a degree of reliability for each attribute of each sample. Adapting this method to work with datasets with a large number of samples is computationally intensive. We use Python for the implementation of the proposed method. Python is a dynamic object-oriented programming language that offers strong support for integration with other languages and comes with extensive standard libraries. Because we use the MPI parallel routines with Python we developed a classification method based on FCM and resampling, which has excellent computing performance and greatly reduced implementation costs.
Details
- Title: Subtitle
- NATURAL CLUSTERING USING PYTHON
- Creators
- D. E. Razera - Federal Institute of São PauloC. D. MacielJ. C. PereiraS. P. Oliveira - University of Iowa
- Contributors
- R P Mondaini (Editor)
- Resource Type
- Conference proceeding
- Publication Details
- BIOMAT 2009, p.289
- Publisher
- World Scientific
- DOI
- 10.1142/9789814304900_0020
- Number of pages
- 4
- Language
- English
- Date published
- 2010
- Academic Unit
- Mathematics; Computer Science
- Record Identifier
- 9984410850402771
Metrics
30 Record Views