Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models

Dilara Kizilkaya; Ramteja Sajja; Yusuf Sermet; Ibrahim Demir

doi:10.1017/eds.2025.10006

Back

Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models

Journal article

Open access

Peer reviewed

Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models

Dilara Kizilkaya, Ramteja Sajja, Yusuf Sermet and Ibrahim Demir

Environmental Data Science, Vol.4, e31

01/01/2025

DOI: 10.1017/eds.2025.10006

Appears in UI Libraries Support Open Access

Files and links (1)

url

https://doi.org/10.1017/eds.2025.10006View

Published (Version of record) Open Access

Abstract

The rapid advancement of large language models (LLMs) has enabled their integration into a wide range of scientific disciplines. This article introduces a comprehensive benchmark dataset specifically designed for testing recent LLMs in the hydrology domain. Leveraging a collection of research articles and hydrology textbooks, we generated a wide array of hydrology-specific questions in various formats, including true/false, multiple-choice, open-ended, and fill-in-the-blank. These questions serve as a robust foundation for evaluating the performance of state-of-the-art LLMs, including GPT-4o-mini, Llama3:8B, and Llama3.1:70B, in addressing domain-specific queries. Our evaluation framework employs accuracy metrics for objective question types and cosine similarity measures for subjective responses, ensuring a thorough assessment of the models’ proficiency in understanding and responding to hydrological content. The results underscore both the capabilities and limitations of artificial intelligence (AI)-driven tools within this specialized field, providing valuable insights for future research and the development of educational resources. By introducing HydroLLM-Benchmark, this study contributes a vital resource to the growing body of work on domain-specific AI applications, demonstrating the potential of LLMs to support complex, field-specific tasks in hydrology.

Hydrology

benchmark dataset

domain-specific AI

large language models(LLMs)

natural language processing (NLP)

question generation

UIOWA OA Agreement

Details

Title: Subtitle: Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models
Creators: Dilara Kizilkaya - University of Iowa
Ramteja Sajja - University of Iowa
Yusuf Sermet - University of Iowa
Ibrahim Demir - University of Iowa
Resource Type: Journal article
Publication Details: Environmental Data Science, Vol.4, e31
DOI: 10.1017/eds.2025.10006
eISSN: 2634-4602
Publisher: Cambridge University Press
Grant note: National Oceanic and Atmospheric Administration (NOAA)University of Alabama: NA22NWS4320003 NSF: NAIRR240072
This project was funded by the National Oceanic and Atmospheric Administration (NOAA) via a cooperative agreement with the University of Alabama (NA22NWS4320003) awarded to the Cooperative Institute for Research to Operations in Hydrology (CIROH). We also acknowledge NSF grant NAIRR240072 for research computing on multimodal language models in hydrology.
Language: English
Date published: 01/01/2025
Academic Unit: Electrical and Computer Engineering; Civil and Environmental Engineering; IIHR--Hydroscience and Engineering; Injury Prevention Research Center
Record Identifier: 9984827331502771

Metrics

14 Record Views