# An empirical study of generalizability coefficients for matched and unmatched data

## Date

## Authors

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

The present study was concerned with certain problems in estimating the reliability of tests constructed by random sampling or stratified-random sampling from a population of test items. The major focus was upon the situation in which unmatched data are collected, that is, the case where k test items are randomly selected for each person independently of the items selected for every other person. A computer simulation model was employed to empirically stuffy certain estimators of test reliability for the unmatched data case in the hope of clarifying the extent to which stratification must be taken into account in the choice of reliability formulas. The theory of generalizability provided the theoretical framework for the investigation. According to the theory of generalizability, an individual's test score is never regarded as meaningful in its own right. Rather, it is regarded as a sample from a universe of test scores that the individual would have received if he had been tested over all possible random samples of k items from a content item population. Any conclusions an investigator draws concerning a particular test score will be generalized to the universe of comparable test scores. In order to judge the dependability of such a generalization, he must determine how closely a particular test score agrees with the result to be expected from testing the individual over all possible samples of k items in the content item population. The specific estimators of reliability studied were the alpha, alpha-stratified, gamma, and gamma-stratified coefficients. These coefficients are all estimators of the correlation between true score and test score. The alpha coefficients pertain to the matched data case; the gamma coefficients, to the unmatched data case. The primary focus of the study was on the discrepancy between the gamma coefficients and the correlation between true scores and test scores when the test is, in fact, stratified. A secondary goal of the study was that of investigating the standard errors of measurement of the alpha and gamma coefficients for various sample sizes. The alpha coefficients had been previously studied by means of a computer simulation model which did not simulate the sampling of persons. However, the simulation model developed in this study was considered more flexible than previous models because it permitted the sampling of a particular number of individuals from a subject population. The model also simulated item content and item difficulty level as well as the individual taking the test. With the computer model it was possible to simulate tests stratified on item difficulty, item content, and item difficulty and item content. All results were based on an 18 item test with 3 strata each of which contained 6 items. Concerning the extent to which stratification must be taken into account in the choice of reliability formulas for the unmatched data case, the results indicated that, when the test is in fact stratified, the gamma-stratified coefficient was superior to the gamma coefficient for all stratification cases. Also, the gamma-stratified coefficients were more efficient than the unstratified gamma coefficients over all sample sizes. Concerning the standard errors of the alpha coefficients for all stratification cases, the conclusion was that, in general, both coefficients are equally efficient for all sample sizes. The general implication of these findings for test construction is that, for the unmatched data case, a stratified test will be more reliable than an unstratified test.