Journal article
Towards HydroLLM: approaches for building a domain-specific language model for hydrology
Journal of hydroinformatics, Vol.27(10), pp.1652-1666
10/01/2025
DOI: 10.2166/hydro.2025.100
Abstract
As large language models (LLMs) continue to expand, their effective adaptation to specialized fields remains a critical challenge. This work presents an initial step toward the development of HydroLLM, a domain-specific LLM for hydrology. We construct a dataset of approximately 8,800 hydrology-focused question–answer pairs, each with a supporting context passage drawn from textbooks and scientific articles. The dataset includes four instructional formats: multiple-choice, true/false, fill-in-the-blank, and open-ended. Using this corpus, we fine-tune several LLMs of varying type and scale – from compact (1.5B) to large (32B) parameter counts using parameter-efficient LoRA (low-rank adaptation) methods. Our methodology compares different fine-tuned models and evaluates performance using accuracy and cosine similarity metrics across task types. Results show that the 8B-DeepSeek-Llama variant achieved the strongest overall performance, while the 32B model overfitted and the 1.5B model underperformed – demonstrating that larger size is not always advantageous and highlighting the need to match model capacity to dataset size. This work demonstrates that effective domain adaptation requires careful consideration of architecture, parameter count, and task complexity. By establishing performance and identifying the limits of current fine-tuning approaches, we took a concrete step toward building HydroLLM as a robust, domain-specific language model for hydrological analysis and decision support.
Details
- Title: Subtitle
- Towards HydroLLM: approaches for building a domain-specific language model for hydrology
- Creators
- Dilara Kizilkaya - University of IowaYusuf Sermet - Tulane UniversityIbrahim Demir - Tulane University
- Resource Type
- Journal article
- Publication Details
- Journal of hydroinformatics, Vol.27(10), pp.1652-1666
- DOI
- 10.2166/hydro.2025.100
- ISSN
- 1464-7141
- eISSN
- 1465-1734
- Publisher
- IWA PUBLISHING
- Grant note
- National Oceanic and Atmospheric Administration (NOAA)University of Alabama: NA22NWS4320003 NSF: NAIRR240072
This project was funded by the National Oceanic and Atmospheric Administration (NOAA) via a cooperative agreement with the University of Alabama (NA22NWS4320003) awarded to the Cooperative Institute for Research to Operations in Hydrology (CIROH). We also acknowledge NSF grant NAIRR240072 for research computing on multimodal language models in hydrology.
- Language
- English
- Electronic publication date
- 09/18/2025
- Date published
- 10/01/2025
- Academic Unit
- Electrical and Computer Engineering; Civil and Environmental Engineering; IIHR--Hydroscience and Engineering; Injury Prevention Research Center
- Record Identifier
- 9984966334602771
Metrics
15 Record Views