**Introduction**

The following discussion is paraphrased from "A Procedure for Sample-Free
Item Analysis," by Benjamin Wright and Nargis Panchapakesan (*Educational
and Psychological Measurement*, 1969, Vol 29, pp. 23-26).

The Rasch model (Rasch, 1960) is one of the latent trait models proposed
for person measurement. Rasch proposed that the observed response
*a*_{ni} of person *n* to item *i* is
governed by a binomial probability function of person ability
*Z*_{n} and item easiness
*E*_{i}. The probability of a *right* response
is:

Through logarithmic transformation, Rasch arrived at the following formula:

where* b*_{n} = log
*Z*_{n}

and* d*_{i} = log
*E*_{i}

An important consequence of this model is that the number of correct
responses to a given set of items is a sufficient statistic for estimating
person ability. The score is the *only* information needed from the
data to make the ability estimate. Therefore, we need only estimate an ability
for each possible score. Any person who gets a certain score will be estimated
to have the ability associated with that score. All persons who get the same
score will be estimated to have the same ability. To restate in a different
way, this model produces item statistics independent of examinee samples and
person statistics independent of the particular set of items administered.

**Item fit to the model**

In typical item analysis desirable characteristics of a test are high reliability and validity, therefore items with low indices of reliability or validity are dropped. For the Rasch model the essential criterion is the compatibility of the items with the model.

The failure of an item to fit the model can be traced to two main sources. One is that the model is too simple. It takes account of only one item characteristic -- item easiness. Other item parameters like item discrimination and guessing are neglected. Parameters for discrimination and guessing can be included in a more general model. However, their inclusion makes the application of the model to actual measurement very complicated. This model assumes that all items have the same discrimination, and that the effect of guessing is negligible.

The other source of lack of fit of an item lies in the content of the item. The model assumes that all the items used are measuring the same trait. Items in a test may not fit together if the test is composed of items which measure different abilities. This includes the situation in which the item is so badly constucted or so mis-scored that what it measures is irrelevant to the rest of the test.

If a given set of items fit the model this is the evidence that they refer to a unidimensional ability, that they form a conformable set. Fit to the model also implies that item discriminations are uniform and substantial, that there are no errors in item scoring and that guessing has had a negligible effect. Thus the criterion of fit to the model enables us to identify and delete bad items.

**Precision of measurement**

In the procedure used here, the "reliability" of a test, a concept which depends upon the ability distribution of the sample, is replaced by the precision of measurement. The standard error of the ability estimate is a measure of the precision attained. This standard error depends primarily upon the number of items used. The range of item easiness with respect to the ability level being measured also affects the standard error of the ability estimate. But in practice this effect is minor compared to the effect of test length.

Notes on analysis:

- Removed person with tracking number 6601534439402250, who answered 41
problems correctly but answered item number 1 incorrectly. This greatly
improved the model fit for item number 1. It also improved the
characteristics of the very top end of the graph, which had shown a
*decrease*in ability estimate when going from a total non-adjusted score of 40 to a total non-adjusted score of 41. - Removed persons with tracking numbers bmg0000000000147, bmg0000000000130, and bmg0000000000139, who answered 3 problems correctly. Removing these persons significantly improved the model fit for items number 10, 11, and 17.

The total score used in the graph below is not adjusted for the effects of guessing.

**Adjustment for guessing**

For multiple-choice problems, 0.25 points were subtracted for each incorrect answer (as distinguished from an answer that was left blank). Since each multiple-choice problem had five choices, random guessing on five such problems would produce, on average, one correct answer and four incorrect answers, for a total adjusted score of 0. The effect of this scoring method is shown in the graph below. For example, a person who answered 30 total problems correctly also, on average, answered 4 multiple-choice problems incorrectly.

We can use the guessing adjustment curve above to bias the Rasch ability curve as shown in the following graph. The useable range of scores (adjusted for guessing) is about 4 to 39.

**Item Easiness**

The following graph shows the item easiness as determined by the Rasch analysis. Higher values mean easier problems.