A comparison of calibration methods and proficiency estimators for creating IRT vertical scales

Jungnam Kim

doi:10.17077/etd.vg7okxxc

Back

A comparison of calibration methods and proficiency estimators for creating IRT vertical scales

Dissertation

Open access

A comparison of calibration methods and proficiency estimators for creating IRT vertical scales

Jungnam Kim

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Summer 2007

DOI: 10.17077/etd.vg7okxxc

Files and links (1)

pdf

A comparison of calibration methods and proficiency estimators fo2.97 MBDownload View

Free to read and download, Open Access

Abstract

<p>The main purpose of this study was to construct different vertical scales based on various combinations of calibration methods and proficiency estimators to investigate the impact different choices may have on these properties of the vertical scales that result: grade-to-grade growth, grade-to-grade variability, and the separation of grade distributions. Calibration methods investigated were concurrent calibration, separate calibration, and fixed a, b, and c item parameters for common items with simple prior updates (FSPU). Proficiency estimators investigated were Maximum Likelihood Estimator (MLE) with pattern scores, Expected A Posteriori (EAP) with pattern scores, pseudo-MLE with summed scores, pseudo-EAP with summed scores, and Quadrature Distribution (QD). The study used datasets from the Iowa Tests of Basic Skills (ITBS) in the Vocabulary, Reading Comprehension (RC), Math Problem Solving and Data Interpretation (MPD), and Science tests for grades 3 through 8.</p> <p>For each of the research questions, the following conclusions were drawn from the study. With respect to the comparisons of three calibration methods, for the RC and Science tests, concurrent calibration, compared to FSPU and separate calibration, showed less growth and more slowly decreasing growth in the lower grades, less decrease in variability over grades, and less separation in the lower grades in terms of horizontal distances. For the Vocabulary and MPD tests, differences in both grade-to-grade growth and in the separation of grade distributions were trivial. With respect to the comparisons of five proficiency estimators, for all content areas, the trend of pseudo-MLE ≥ MLE > QD > EAP ≥ pseudo-EAP was found in within-grade SDs, and the trend of pseudo-EAP ≥ EAP > QD > MLE ≥ pseudo-MLE was found in the effect sizes. However, the degree of decrease in variability over grades was similar across proficiency estimators. With respect to the comparisons of the four content areas, for the Vocabulary and MPD tests compared to the RC and Science tests, growth was less, but somewhat steady, and the decrease in variability over grades was less. For separation of grade distributions, it was found that the large growth suggested by larger mean differences for the RC and Science tests was reduced through the use of effect sizes to standardize the differences.</p>

Education

Vertical scale

linking

equating

IRT

calibration

proficiency estimator

Details

Title: Subtitle: A comparison of calibration methods and proficiency estimators for creating IRT vertical scales
Creators: Jungnam Kim - University of Iowa
Contributors: David A Frisbie (Advisor)
Michael J. Kolen (Advisor)
Robert L. Brennan (Committee Member)
Won-Chan Lee (Committee Member)
Richard L. Dykstra (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Psychological and Quantitative Foundations
Date degree season: Summer 2007
DOI: 10.17077/etd.vg7okxxc
Publisher: University of Iowa
Number of pages: viii, 183 pages
Language: English
Date copyrighted: 2007
Description bibliographic: Includes bibliographical references (pages 116-120).
Academic Unit: Psychological and Quantitative Foundations
Record Identifier: 9983777044502771

Metrics

4668 File views/ downloads

413 Record Views