Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach

Aiman Suleiman; Dario von Wedel; Ricardo Munoz-Acuna; Simone Redaelli; Abeer Santarisi; Eva-Lotte Seibold; Nikolai Ratajczak; Shinichiro Kato; Nader Said; Eswar Sundar; Valerie Goodspeed; Maximilian S. Schaefer

doi:10.1016/j.cmpb.2024.108313

Back

Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach

Journal article

Peer reviewed

Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach

Aiman Suleiman, Dario von Wedel, Ricardo Munoz-Acuna, Simone Redaelli, Abeer Santarisi, Eva-Lotte Seibold, Nikolai Ratajczak, Shinichiro Kato, Nader Said, Eswar Sundar, …

Computer methods and programs in biomedicine, Vol.254, p.108313

09/2024

DOI: 10.1016/j.cmpb.2024.108313

PMID: 38954915

View Online

Abstract

•ChatGPT aims to replicate human behavior in scientific research but its ability to emulate real-world human reviewers is yet to be discovered.•This study presents the first methodological qualitative assessment of ChatGPT's capability to imitate real human reviewers in scientific research.•Analysis of 720 comments by human reviewers, independently reviewed by researchers showing strong consensus, revealed that the majority of human reviewers’ comments (78.5 %) lacked equivalents in ChatGPT's comments.•Comments on context and methodology exhibited lower levels of complete and partial agreement compared to general comments.•ChatGPT version GPT-4 has a limited ability to emulate human reviewers within the peer review process of scientific research. ChatGPT is an AI platform whose relevance in the peer review of scientific articles is steadily growing. Nonetheless, it has sparked debates over its potential biases and inaccuracies. This study aims to assess ChatGPT's ability to qualitatively emulate human reviewers in scientific research. We included the first submitted version of the latest twenty original research articles published by the 3rd of July 2023, in a high-profile medical journal. Each article underwent evaluation by a minimum of three human reviewers during the initial review stage. Subsequently, three researchers with medical backgrounds and expertise in manuscript revision, independently and qualitatively assessed the agreement between the peer reviews generated by ChatGPT version GPT-4 and the comments provided by human reviewers for these articles. The level of agreement was categorized into complete, partial, none, or contradictory. 720 human reviewers’ comments were assessed. There was a good agreement between the three assessors (Overall kappa >0.6). ChatGPT's comments demonstrated complete agreement in terms of quality and substance with 48 (6.7 %) human reviewers’ comments, partially agreed with 92 (12.8 %), identifying issues necessitating further elaboration or recommending supplementary steps to address concerns, had no agreement with a significant 565 (78.5 %), and contradicted 15 (2.1 %). ChatGPT comments on methods had the lowest proportion of complete agreement (13 comments, 3.6 %), while general comments on the manuscript displayed the highest proportion of complete agreement (17 comments, 22.1 %). ChatGPT version GPT-4 has a limited ability to emulate human reviewers within the peer review process of scientific research.

Artificial Intelligence

Bias

ChatGPT

Peer review

Quality agreement

Details

Title: Subtitle: Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach
Creators: Aiman Suleiman - Albert Einstein College of Medicine
Dario von Wedel - Beth Israel Deaconess Medical Center
Ricardo Munoz-Acuna - Beth Israel Deaconess Medical Center
Simone Redaelli - Beth Israel Deaconess Medical Center
Abeer Santarisi - Beth Israel Deaconess Medical Center
Eva-Lotte Seibold - Beth Israel Deaconess Medical Center
Nikolai Ratajczak - Beth Israel Deaconess Medical Center
Shinichiro Kato - Beth Israel Deaconess Medical Center
Nader Said - Higher Colleges of Technology
Eswar Sundar - Beth Israel Deaconess Medical Center
Valerie Goodspeed - Beth Israel Deaconess Medical Center
Maximilian S. Schaefer - Heinrich Heine University Düsseldorf
Resource Type: Journal article
Publication Details: Computer methods and programs in biomedicine, Vol.254, p.108313
DOI: 10.1016/j.cmpb.2024.108313
PMID: 38954915
NLM abbreviation: Comput Methods Programs Biomed
ISSN: 0169-2607
eISSN: 1872-7565
Publisher: Elsevier B.V
Language: English
Date published: 09/2024
Academic Unit: Anesthesia
Record Identifier: 9985090698302771

Metrics

3 Record Views

See more details