"Crash Test Dummies" for AI-Enabled Clinical Assessment: Validating Virtual Patient Scenarios with Virtual Learners

Brian Gin; Ahreum Lim; Flávia Silva e Oliveira; Kuan Xing; Xiaomei Song; Gayana Amiyangoda; Thilanka Seneviratne; Alison F Doubleday; Ananya Gangopadhyaya; Bob Kiser; Lukas Shum-Tim; Dhruva Patel; Kosala Marambe; Lauren Maggio; Ara Tekian; Yoon Soo Park

doi:10.48550/arxiv.2601.18085

Back

"Crash Test Dummies" for AI-Enabled Clinical Assessment: Validating Virtual Patient Scenarios with Virtual Learners

Preprint

Open access

"Crash Test Dummies" for AI-Enabled Clinical Assessment: Validating Virtual Patient Scenarios with Virtual Learners

Brian Gin, Ahreum Lim, Flávia Silva e Oliveira, Kuan Xing, Xiaomei Song, Gayana Amiyangoda, Thilanka Seneviratne, Alison F Doubleday, Ananya Gangopadhyaya, Bob Kiser, …

ArXiv.org

Cornell University

01/26/2026

DOI: 10.48550/arxiv.2601.18085

Files and links (1)

url

https://doi.org/10.48550/arxiv.2601.18085View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Background: In medical and health professions education (HPE), AI is increasingly used to assess clinical competencies, including via virtual standardized patients. However, most evaluations rely on AI-human interrater reliability and lack a measurement framework for how cases, learners, and raters jointly shape scores. This leaves robustness uncertain and can expose learners to misguidance from unvalidated systems. We address this by using AI "simulated learners" to stress-test and psychometrically characterize assessment pipelines before human use. Objective: Develop an open-source AI virtual patient platform and measurement model for robust competency evaluation across cases and rating conditions. Methods: We built a platform with virtual patients, virtual learners with tunable ACGME-aligned competency profiles, and multiple independent AI raters scoring encounters with structured Key-Features items. Transcripts were analyzed with a Bayesian HRM-SDT model that treats ratings as decisions under uncertainty and separates learner ability, case performance, and rater behavior; parameters were estimated with MCMC. Results: The model recovered simulated learners' competencies, with significant correlations to the generating competencies across all ACGME domains despite a non-deterministic pipeline. It estimated case difficulty by competency and showed stable rater detection (sensitivity) and criteria (severity/leniency thresholds) across AI raters using identical models/prompts but different seeds. We also propose a staged "safety blueprint" for deploying AI tools with learners, tied to entrustment-based validation milestones. Conclusions: Combining a purpose-built virtual patient platform with a principled psychometric model enables robust, interpretable, generalizable competency estimates and supports validation of AI-assisted assessment prior to use with human learners.

Computer Science - Artificial Intelligence

Computer Science - Human-Computer Interaction

Statistics - Applications

Details

Title: Subtitle: "Crash Test Dummies" for AI-Enabled Clinical Assessment: Validating Virtual Patient Scenarios with Virtual Learners
Creators: Brian Gin
Ahreum Lim
Flávia Silva e Oliveira
Kuan Xing
Xiaomei Song
Gayana Amiyangoda
Thilanka Seneviratne
Alison F Doubleday
Ananya Gangopadhyaya
Bob Kiser
Lukas Shum-Tim
Dhruva Patel
Kosala Marambe
Lauren Maggio
Ara Tekian
Yoon Soo Park
Resource Type: Preprint
Publication Details: ArXiv.org
DOI: 10.48550/arxiv.2601.18085
ISSN: 2331-8422
Publisher: Cornell University; Ithaca, New York
Language: English
Date posted: 01/26/2026
Academic Unit: Family and Community Medicine; Office of Consultation and Research in Medical Education
Record Identifier: 9985132079702771

Metrics

1 Record Views