How Well Do DeepSeek, ChatGPT, and Gemini Respond to Water Science Questions?

Loading...
Thumbnail Image

Access rights

openAccess
CC BY
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Major/Subject

Mcode

Degree programme

Language

en

Pages

17

Series

Environmental Modelling & Software, Volume 196

Abstract

This study aims to evaluate the performance of three prominent LLMs, DeepSeek R1, ChatGPT-4o, and Gemini 2, in addressing key questions within four core fields of hydrology and water science: machine learning and optimization, remote sensing, flood modeling, and sediment transport. LLMs’ responses are systematically compared to benchmark responses derived from review articles in the respective fields. To assess the LLMs’ efficiency, a novel evaluation rubric is introduced in this study, incorporating four key criteria: relevancy, accuracy, authenticity, and novelty. Findings revealed that each model can address the core aspects of the benchmark questions. DeepSeek R1 achieved the highest overall scores in machine learning and optimization, flood modeling, and sediment transport, while ChatGPT-4o demonstrated superior performance in remote sensing. Notably, DeepSeek R1 and Gemini 2 exhibited the lowest response similarity in 95% of the evaluated questions, whereas ChatGPT-4o and Gemini 2 showed the highest similarity in 70% of cases.

Description

Other note

Citation

Hosseini, S H & Pourzangbar, A 2026, 'How Well Do DeepSeek, ChatGPT, and Gemini Respond to Water Science Questions?', Environmental Modelling & Software, vol. 196, 106772. https://doi.org/10.1016/j.envsoft.2025.106772