Journal article
Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models
Environmental Data Science, Vol.4, e31
01/01/2025
DOI: 10.1017/eds.2025.10006
Appears in UI Libraries Support Open Access
Abstract
The rapid advancement of large language models (LLMs) has enabled their integration into a wide range of scientific disciplines. This article introduces a comprehensive benchmark dataset specifically designed for testing recent LLMs in the hydrology domain. Leveraging a collection of research articles and hydrology textbooks, we generated a wide array of hydrology-specific questions in various formats, including true/false, multiple-choice, open-ended, and fill-in-the-blank. These questions serve as a robust foundation for evaluating the performance of state-of-the-art LLMs, including GPT-4o-mini, Llama3:8B, and Llama3.1:70B, in addressing domain-specific queries. Our evaluation framework employs accuracy metrics for objective question types and cosine similarity measures for subjective responses, ensuring a thorough assessment of the models’ proficiency in understanding and responding to hydrological content. The results underscore both the capabilities and limitations of artificial intelligence (AI)-driven tools within this specialized field, providing valuable insights for future research and the development of educational resources. By introducing HydroLLM-Benchmark, this study contributes a vital resource to the growing body of work on domain-specific AI applications, demonstrating the potential of LLMs to support complex, field-specific tasks in hydrology.
Details
- Title: Subtitle
- Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models
- Creators
- Dilara Kizilkaya - University of IowaRamteja Sajja - University of IowaYusuf Sermet - University of IowaIbrahim Demir - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Environmental Data Science, Vol.4, e31
- DOI
- 10.1017/eds.2025.10006
- eISSN
- 2634-4602
- Publisher
- Cambridge University Press
- Grant note
- National Oceanic and Atmospheric Administration (NOAA)University of Alabama: NA22NWS4320003 NSF: NAIRR240072
This project was funded by the National Oceanic and Atmospheric Administration (NOAA) via a cooperative agreement with the University of Alabama (NA22NWS4320003) awarded to the Cooperative Institute for Research to Operations in Hydrology (CIROH). We also acknowledge NSF grant NAIRR240072 for research computing on multimodal language models in hydrology.
- Language
- English
- Date published
- 01/01/2025
- Academic Unit
- Electrical and Computer Engineering; Civil and Environmental Engineering; IIHR--Hydroscience and Engineering; Injury Prevention Research Center
- Record Identifier
- 9984827331502771
Metrics
14 Record Views