Mining biological databases for candidate disease genes

Terry A Braun; Todd Scheetz; Gregg L Webster; Thomas L Casavant

doi:10.1117/12.434869

Back

Conference proceeding

Mining biological databases for candidate disease genes

Terry A Braun, Todd Scheetz, Gregg L Webster and Thomas L Casavant

Proceedings of SPIE, Vol.4528(1), pp.169-180

Commercial Applications for High-Performance Computing

07/27/2001

DOI: 10.1117/12.434869

View Online

Abstract

The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

Details

Title: Subtitle: Mining biological databases for candidate disease genes
Creators: Terry A Braun - University of Iowa
Todd Scheetz - University of Iowa
Gregg L Webster - University of Iowa
Thomas L Casavant - University of Iowa
Resource Type: Conference proceeding
Publication Details: Proceedings of SPIE, Vol.4528(1), pp.169-180
Conference: Commercial Applications for High-Performance Computing
DOI: 10.1117/12.434869
ISSN: 0277-786X
Language: English
Date published: 07/27/2001
Academic Unit: Electrical and Computer Engineering; Roy J. Carver Department of Biomedical Engineering; Ophthalmology and Visual Sciences
Record Identifier: 9984197121202771

Metrics

7 Record Views