Rasch Analysis of Beta Test

Introduction

The following discussion is paraphrased from "A Procedure for Sample-Free Item Analysis," by Benjamin Wright and Nargis Panchapakesan (Educational and Psychological Measurement, 1969, Vol 29, pp. 23-26).

The Rasch model (Rasch, 1960) is one of the latent trait models proposed for person measurement. Rasch proposed that the observed response ani of person n to item i is governed by a binomial probability function of person ability Zn and item easiness Ei. The probability of a right response is:

Through logarithmic transformation, Rasch arrived at the following formula:

where bn = log Zn

and di = log Ei

An important consequence of this model is that the number of correct responses to a given set of items is a sufficient statistic for estimating person ability. The score is the only information needed from the data to make the ability estimate. Therefore, we need only estimate an ability for each possible score. Any person who gets a certain score will be estimated to have the ability associated with that score. All persons who get the same score will be estimated to have the same ability. To restate in a different way, this model produces item statistics independent of examinee samples and person statistics independent of the particular set of items administered.

Item fit to the model

In typical item analysis desirable characteristics of a test are high reliability and validity, therefore items with low indices of reliability or validity are dropped. For the Rasch model the essential criterion is the compatibility of the items with the model.

The failure of an item to fit the model can be traced to two main sources. One is that the model is too simple. It takes account of only one item characteristic -- item easiness. Other item parameters like item discrimination and guessing are neglected. Parameters for discrimination and guessing can be included in a more general model. However, their inclusion makes the application of the model to actual measurement very complicated. This model assumes that all items have the same discrimination, and that the effect of guessing is negligible.

The other source of lack of fit of an item lies in the content of the item. The model assumes that all the items used are measuring the same trait. Items in a test may not fit together if the test is composed of items which measure different abilities. This includes the situation in which the item is so badly constucted or so mis-scored that what it measures is irrelevant to the rest of the test.

If a given set of items fit the model this is the evidence that they refer to a unidimensional ability, that they form a conformable set. Fit to the model also implies that item discriminations are uniform and substantial, that there are no errors in item scoring and that guessing has had a negligible effect. Thus the criterion of fit to the model enables us to identify and delete bad items.

Precision of measurement

In the procedure used here, the "reliability" of a test, a concept which depends upon the ability distribution of the sample, is replaced by the precision of measurement. The standard error of the ability estimate is a measure of the precision attained. This standard error depends primarily upon the number of items used. The range of item easiness with respect to the ability level being measured also affects the standard error of the ability estimate. But in practice this effect is minor compared to the effect of test length.

Notes on analysis:

• Removed person with tracking number 6601534439402250, who answered 41 problems correctly but answered item number 1 incorrectly. This greatly improved the model fit for item number 1. It also improved the characteristics of the very top end of the graph, which had shown a decrease in ability estimate when going from a total non-adjusted score of 40 to a total non-adjusted score of 41.
• Removed persons with tracking numbers bmg0000000000147, bmg0000000000130, and bmg0000000000139, who answered 3 problems correctly. Removing these persons significantly improved the model fit for items number 10, 11, and 17.

The total score used in the graph below is not adjusted for the effects of guessing.