Dissertation
A comparison of calibration methods and proficiency estimators for creating IRT vertical scales
University of Iowa
Doctor of Philosophy (PhD), University of Iowa
Summer 2007
DOI: 10.17077/etd.vg7okxxc
Abstract
<p>The main purpose of this study was to construct different vertical scales based on various combinations of calibration methods and proficiency estimators to investigate the impact different choices may have on these properties of the vertical scales that result: grade-to-grade growth, grade-to-grade variability, and the separation of grade distributions. Calibration methods investigated were concurrent calibration, separate calibration, and fixed a, b, and c item parameters for common items with simple prior updates (FSPU). Proficiency estimators investigated were Maximum Likelihood Estimator (MLE) with pattern scores, Expected A Posteriori (EAP) with pattern scores, pseudo-MLE with summed scores, pseudo-EAP with summed scores, and Quadrature Distribution (QD). The study used datasets from the Iowa Tests of Basic Skills (ITBS) in the Vocabulary, Reading Comprehension (RC), Math Problem Solving and Data Interpretation (MPD), and Science tests for grades 3 through 8.</p>
<p>For each of the research questions, the following conclusions were drawn from the study. With respect to the comparisons of three calibration methods, for the RC and Science tests, concurrent calibration, compared to FSPU and separate calibration, showed less growth and more slowly decreasing growth in the lower grades, less decrease in variability over grades, and less separation in the lower grades in terms of horizontal distances. For the Vocabulary and MPD tests, differences in both grade-to-grade growth and in the separation of grade distributions were trivial. With respect to the comparisons of five proficiency estimators, for all content areas, the trend of pseudo-MLE ≥ MLE > QD > EAP ≥ pseudo-EAP was found in within-grade SDs, and the trend of pseudo-EAP ≥ EAP > QD > MLE ≥ pseudo-MLE was found in the effect sizes. However, the degree of decrease in variability over grades was similar across proficiency estimators. With respect to the comparisons of the four content areas, for the Vocabulary and MPD tests compared to the RC and Science tests, growth was less, but somewhat steady, and the decrease in variability over grades was less. For separation of grade distributions, it was found that the large growth suggested by larger mean differences for the RC and Science tests was reduced through the use of effect sizes to standardize the differences.</p>
Details
- Title: Subtitle
- A comparison of calibration methods and proficiency estimators for creating IRT vertical scales
- Creators
- Jungnam Kim - University of Iowa
- Contributors
- David A Frisbie (Advisor)Michael J. Kolen (Advisor)Robert L. Brennan (Committee Member)Won-Chan Lee (Committee Member)Richard L. Dykstra (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Psychological and Quantitative Foundations
- Date degree season
- Summer 2007
- DOI
- 10.17077/etd.vg7okxxc
- Publisher
- University of Iowa
- Number of pages
- viii, 183 pages
- Copyright
- Copyright 2007 Jungnam Kim
- Language
- English
- Date copyrighted
- 2007
- Description bibliographic
- Includes bibliographical references (pages 116-120).
- Academic Unit
- Psychological and Quantitative Foundations
- Record Identifier
- 9983777044502771
Metrics
4668 File views/ downloads
413 Record Views