The following discussion is paraphrased from "A Procedure for Sample-Free Item Analysis," by Benjamin Wright and Nargis Panchapakesan (Educational and Psychological Measurement, 1969, Vol 29, pp. 23-26).
The Rasch model (Rasch, 1960) is one of the latent trait models proposed for person measurement. Rasch proposed that the observed response ani of person n to item i is governed by a binomial probability function of person ability Zn and item easiness Ei. The probability of a right response is:
Through logarithmic transformation, Rasch arrived at the following formula:
where bn = log Zn
and di = log Ei
An important consequence of this model is that the number of correct responses to a given set of items is a sufficient statistic for estimating person ability. The score is the only information needed from the data to make the ability estimate. Therefore, we need only estimate an ability for each possible score. Any person who gets a certain score will be estimated to have the ability associated with that score. All persons who get the same score will be estimated to have the same ability. To restate in a different way, this model produces item statistics independent of examinee samples and person statistics independent of the particular set of items administered.
Item fit to the model
In typical item analysis desirable characteristics of a test are high reliability and validity, therefore items with low indices of reliability or validity are dropped. For the Rasch model the essential criterion is the compatibility of the items with the model.
The failure of an item to fit the model can be traced to two main sources. One is that the model is too simple. It takes account of only one item characteristic -- item easiness. Other item parameters like item discrimination and guessing are neglected. Parameters for discrimination and guessing can be included in a more general model. However, their inclusion makes the application of the model to actual measurement very complicated. This model assumes that all items have the same discrimination, and that the effect of guessing is negligible.
The other source of lack of fit of an item lies in the content of the item. The model assumes that all the items used are measuring the same trait. Items in a test may not fit together if the test is composed of items which measure different abilities. This includes the situation in which the item is so badly constucted or so mis-scored that what it measures is irrelevant to the rest of the test.
If a given set of items fit the model this is the evidence that they refer to a unidimensional ability, that they form a conformable set. Fit to the model also implies that item discriminations are uniform and substantial, that there are no errors in item scoring and that guessing has had a negligible effect. Thus the criterion of fit to the model enables us to identify and delete bad items.
Precision of measurement
In the procedure used here, the "reliability" of a test, a concept which depends upon the ability distribution of the sample, is replaced by the precision of measurement. The standard error of the ability estimate is a measure of the precision attained. This standard error depends primarily upon the number of items used. The range of item easiness with respect to the ability level being measured also affects the standard error of the ability estimate. But in practice this effect is minor compared to the effect of test length.
Notes on analysis:
The total score used in the graph below is not adjusted for the effects of guessing.
Adjustment for guessing
For multiple-choice problems, 0.25 points were subtracted for each incorrect answer (as distinguished from an answer that was left blank). Since each multiple-choice problem had five choices, random guessing on five such problems would produce, on average, one correct answer and four incorrect answers, for a total adjusted score of 0. The effect of this scoring method is shown in the graph below. For example, a person who answered 30 total problems correctly also, on average, answered 4 multiple-choice problems incorrectly.
We can use the guessing adjustment curve above to bias the Rasch ability curve as shown in the following graph. The useable range of scores (adjusted for guessing) is about 4 to 39.
The following graph shows the item easiness as determined by the Rasch analysis. Higher values mean easier problems.