Thesis
Using diagnostic classification models to tease apart overlapping skills: a case study of the ECPE Cloze Task
University of Iowa
Master of Arts (MA), University of Iowa
Spring 2025
DOI: 10.25820/etd.007810
Abstract
Language proficiency assessments can have complex designs, with highly correlated latent constructs and items that test multiple constructs simultaneously. An appropriate statistical framework is crucial for analyzing such complex assessment structure and for providing test users with meaningful, easily interpretable reports. While item response theory (IRT) is a staple of assessment design and analysis, this framework primarily focuses on the items in an assessment and how they relate to one another. The framework of diagnostic classification models (DCMs) provides analysis of the data that estimates not only a test taker’s ability score(s) for each latent ability but also classifies each test taker as a master or non-master in each ability area and groups test takers by their skill ability profiles.
This paper provides several examples of IRT models and DCMs analyzed in R and applied to the same response data from a twenty-eight item multiple-choice cloze task from the Examination for the Certification of Proficiency in English (ECPE). The items on this task measure several linguistic skills, with some items measuring multiple skills simultaneously. The results of these models show the limitations of multidimensional IRT models when the constructs in each dimension are strongly correlated. They also illustrate how DCMs can be used to estimate the ability score necessary to show mastery of a given ability, divide test takers into skill mastery classes based on similar patterns of responses, and provide threshold parameters for each item and each skill mastery class.
Details
- Title: Subtitle
- Using diagnostic classification models to tease apart overlapping skills: a case study of the ECPE Cloze Task
- Creators
- Kezia Walker-Cecil
- Contributors
- Jonathan Templin (Advisor)Lesa Hoffman (Committee Member)Pamela Wesely (Committee Member)
- Resource Type
- Thesis
- Degree Awarded
- Master of Arts (MA), University of Iowa
- Degree in
- Psychological and Quantitative Foundations (Educational Measurement and Statistics)
- Date degree season
- Spring 2025
- DOI
- 10.25820/etd.007810
- Publisher
- University of Iowa
- Number of pages
- viii, 59 pages
- Copyright
- Copyright 2025 Kezia Walker-Cecil
- Language
- English
- Date submitted
- 04/29/2025
- Description illustrations
- illustrations, tables, graphs
- Description bibliographic
- Includes bibliographical references (pages 57-59).
- Public Abstract (ETD)
- Reporting test scores for language proficiency assessments is not easy because these assessments often measure multiple skills that overlap. Test developers must choose statistical models for analyzing test data that can appropriately deal with a variety of challenges, including (a) test items that test more than one skill at a time, (b) results that include ability scores for each skill measured by the test, and (c) score reports that allow test users to interpret test takers’ abilities without requiring additional analysis. Many test developers use the statistical framework of item response theory (IRT), but this framework provides limited analysis of test taker ability and sometimes cannot complete model estimations when the skill dimensions overlap too much. In contrast, the diagnostic classification model (DCM) framework allows for closely correlated skill dimensions while also providing test-taker-specific reports, including estimation of whether the test taker has mastered each skill on the test and which skill mastery profile group the test taker belongs to. This paper illustrates the uses of DCMs compared to IRT models by comparing two IRT models to two DCMs analyzed in R from the same set of language assessment response data. The response data comes from a twenty-eight item multiple-choice task from the Examination for the Certification of Proficiency in English (ECPE). The items on this task measure several linguistic skills, with some items measuring multiple skills simultaneously. The models estimated in this paper show that different statistical frameworks can help answer different questions about the same data.
- Academic Unit
- Psychological and Quantitative Foundations
- Record Identifier
- 9984830726102771
Metrics
2 Record Views