Conference proceeding
Experiments in text-based mining and analysis of biological information from MEDLINE on functionally-related genes
18th International Conference on Systems Engineering (ICSEng'05), Vol.2005, pp.326-331
2005
DOI: 10.1109/ICSENG.2005.41
Abstract
Technological advancements such as microarrays have enabled biologists to generate unprecedented quantities of data about biological entities. This has lead to the development of a large number of algorithms for processing and analysis of biological data. Challenges however remain; for instance, genes that function cooperatively need not have similar expression patterns. This suggests the use of non-numerical sources of information to explore the underlying biology. We experimentally study various factors that are inherent in algorithmic methodologies for text analysis. The proposed method accesses MEDLINE dynamically to account for the latest research, with the available literature corresponding to the genes analyzed to develop lists of keywords. Natural language processing (NLP) techniques such as stop-word filtering and stemming are then applied to the lists, and keyword frequencies weighted using the term frequency-inverse document frequency (TFIDF) scheme. The results are input to a hierarchical clustering algorithm to derive groupings of genes by functionality. The process is repeated using z-score weighting and latent semantic analysis (LSA) to determine which yields the most accurate clustering. The study presented examines the importance of these steps and their influence on the overall efficacy of the system. We believe that the analysis conducted as part of this research is invaluable to development and fine-timing of text mining methodologies for biological literature.
Details
- Title: Subtitle
- Experiments in text-based mining and analysis of biological information from MEDLINE on functionally-related genes
- Creators
- N. Moon - San Francisco State UniversityR. Singh - San Francisco State University
- Resource Type
- Conference proceeding
- Publication Details
- 18th International Conference on Systems Engineering (ICSEng'05), Vol.2005, pp.326-331
- Publisher
- IEEE
- DOI
- 10.1109/ICSENG.2005.41
- Language
- English
- Date published
- 2005
- Academic Unit
- Computer Science
- Record Identifier
- 9984446407202771
Metrics
1 Record Views