Paul Maxim's "Renorming Ron Hoeflin's Mega Test"

Published with permission from the author

1. Ron Hoeflin's "Sixth Norming of the Mega Test."
2. Fred Vaughan's "Intelligence Filters." (appears in the online version of Gift of Fire)
3. Kevin Langdon's Reply to Paul Maxim. (Gift of Fire no. 81)
4. Darryl Miyaguchi's Comments on Paul Maxim's renorming.
5. Darryl Miyaguchi's Generic I.Q. Chart
6. SAT percentile ranks table


                     Copyright (C) 1996  by PAUL MAXIM

Reference Data Used:
   1. Data sheets provided by Dr. Hoeflin, showing Mega test scores for 531
testees, plus prior test scores they reported, including the following: 220 on
SAT, 106 on GRE, 76 on LAIT, 80 on Cattell, 75 on CTMM, 46 on Stanford-Binet,
34 on WAIS, 28 on AGCT, and 28 on MAT.

   2. Score breakdowns for 5,157,642 SAT testees during the period 1984-1989.

   3. GRE norming data from ETS.

   4. The IQ scale chart ("Selectivity by IQ") ascribed to Kjeld Hvatum.

   5. A description by Dr. Hoeflin of his Mega test norming process, as con-
tained in his letter to Mr. Maxim of December 19, 1995.  My renorming does not
attempt to theoretically analyze that norming process, but rather replicates
it, using the same data Dr. Hoeflin used; hence, it constitutes an audit of
his results.
              NOTE: This article has not been reviewed by, or approved by, Dr.

Description.  The method Dr. Hoeflin used to norm his Mega test, based on 693
reports he received of prior test scores, can be illustrated in terms of his
largest prior score sample, consisting of 220 SAT scores.  (All statements
herein pertain to SAT before its recent recentering.)

   a) In order to norm the Mega test at the "4-sigma" level (IQ 164), it is
first necessary to determine which score level on SAT corresponds to "4-sigma"
in the general population, equivalent to the top .00333% of all individuals.

   b) Since SAT testees are slightly more intelligent than the population as
a whole, by a factor of about 4/3, their "4-sigma" percentile is raised to
.00444%, representing the top 1/22,500th of all SAT testees.

   c) This percentage is then applied to total number of scores for the period
tabulated, indicating that 229 scores were at the "4-sigma" level or above.

   d) Beginning with the top SAT score level of 1600 on Math plus Verbal (a
"perfect" score), one can then count downward 229 scores until the "4-sigma"
level is reached for this data aggregate.

   e) Dr. Hoeflin's norming assumed that 1550 on SAT represented its "4-sigma"
level.  However, his own data reveals the following numbers:
From this data, it can be interpolated that a score of 1565     1591-1600 - 35
on SAT represents the "4-sigma" level; that is, all scores      1581-1590 -  8
at 1565 or above number .00444% of the total.                   1571-1580 -149
                                                                1561-1570 - 71
   f) It is now necessary, from inspection of the prior SAT
scores reported by Mega testees, to determine how many of them were at 1565 or
above.  Dr. Hoeflin presumed that his top ten SAT scorers were in the "4-sigma"
category; their scores were as follows: 1595, 1586, 1582, 1580, 1570, 1565,
1560, 1560, 1556, and 1555.  However, only six of these scores are at or above
1565, and so it is they which represent the true "4-sigma" group.

   g) Once it is determined that the top six SAT scores represent the "4-sigma"
category on this test, the next step is to pick out, among those 220 testees
reporting SAT scores, their top six Mega scores: these are 44, 44, 43, 43, 40,
and 39.  In other words, an implicit assumption is made that the SAT-reporting
group will attain the same number of "4-sigma" scores on the Mega test as they
did on SAT; hence, the lowest of those six top Mega scores represents the as-
sumed "4-sigma" threshold on Mega, for the SAT-reporting group.  This score was

   h) This putative "4-sigma" level on Mega will now have to be combined and
correlated with the "4-sigma" level derived from a similar procedure applied
to testees reporting GRE scores, to testees reporting CTMM scores, etc.  Then,
when all applicable "4-sigma" levels have been determined, a weighted average
can integrate them in terms of sample size, so as to arrive at one "4-sigma"
score level for the Mega test as a whole.

GRE Sample.  Dr. Hoeflin did not report to me the GRE (combined Math plus Ver-
bal) score level he equated with "4-sigma," but data emanating from ETS indi-
cates that this should be "1620."  The problem in attempting to norm a high-
IQ test on GRE scores at the 4-sigma level is that, around the late 1970's,
ETS reduced the ceiling on each GRE from 900 to 800, making it impossible for
any testee, from that point on, to record a "4-sigma" score.
                                                              A number of Dr.
Hoeflin's Mega testees apparently took the GRE before its ceiling restrictions
were imposed, since three of them reported scores of 1620 or above.  However,
since the Mega test was normed around 1986, it is possible that one or more of
the 106 testees reporting GRE scores was prevented, by the ceiling, from at-
taining a "4-sigma" score.  In order to "adjust" for this possibility, I have
increased the number of GRE scores at the "4-sigma" level from three to four.

The next step (as in the case of the SAT sample) is to count off the top four
Mega test scores attained by testees in the "GRE group"; these were as fol-
lows: 43, 43, 41, and 40.  This indicates that a raw score of 40 on the Mega
test corresponds to "4-sigma" (1620) in the GRE sample.

LAIT Sample.  This represented the next most numerous group, accounting for
11% of all prior score reports used by Dr. Hoeflin.  In his letter to me
dated December 19, 1995, Dr. Hoeflin wrote: "Of 77 (Mega testees) who reported
LAIT scores, 16 scored IQ 164 or higher on LAIT.  For this same sample of 77
people, the top 16 scored 33 or higher on my Mega test.  This puts the four
sigma level (for the Mega test at) 33..."
                                           But there is a problem here, that
Dr. Hoeflin has failed to examine critically.  To begin with, the incidence of
putative "4-sigma" scores in the LAIT sample (16 of 77) is far higher than
that of any other sample, and in percentage terms (20.78%) is over six thou-
sand times higher than the incidence of "4-sigma" in the general population
(.00333%).  This stems directly from the fact that LAIT is an inflationary
test, and generated numerous invalid "4-sigma" scores.
                                                        This distortion may
also be noted in the fact that, in order for 20.78% of any sample to score at
the "4-sigma" level or above, the entire sample would require a mean IQ over
3.2 sigma -- that is, over IQ 152, which is about ten points higher than the
general profile for MEGA testees, and for LAIT testees as well.

Dr. Hoeflin's Mega test data indicates that LAIT scores were, on average, 4
IQ points higher than Mega scores, which in turn were about one point higher
than CTMM scores, even though the CTMM was taken, on average, about a decade
earlier than Mega.  This means that LAIT scores were at least five points
higher, on average, than scores emanating from a professional test such as
CTMM, which was normed on about 60,000 members of the general population, not
on prior, anecdotally-reported test scores.  Furthermore, Dr. Hoeflin's data
indicates that 17 LAIT scores at the 4-sigma level and above were, on average,
eight IQ points higher than Mega test scores attained by the same testees,
with a mean time interval between tests of about six years.

                                             Now, when scores from an inflated
test (LAIT) are used uncritically to norm a subsequent test (Mega), the second
test tends to acquire some of the inflationary characteristics of its predeces-
sor, producing what might be called the "house of cards" syndrome -- that is,
one inaccuracy piled atop another.  To compensate for this factor, I deducted
four points from each reported LAIT IQ score used by Dr. Hoeflin, which re-
duced the number of putative "4-sigma" LAIT scores from 16 to six.  (This is
probably still too high, since six 4-sigma scores out of 77 is still 2300 times
greater than the incidence of 4-sigma in the population as a whole.)
ing this adjustment, the six highest Mega scores attained by the LAIT testees
were as follows: 44, 43, 42, 41, 40, and 39, which accords very well with the
"4-sigma" level on the Mega test suggested by the SAT and GRE samples.  Hence,
the 4-sigma level for the LAIT sample on the Mega test is pegged at "39."

CTTM Sample.  Dr. Hoeflin's data notes only two Mega testees who reported
prior CTMM scores over "4-sigma," which on this test is 164 IQ; these scores
were 179 and 174.  Similarly, among the 75 Mega testees who reported CTMM
scores, the two highest Mega scores were both "43," indicating that this score
should be presumed to represent the "4-sigma" level on Mega.  But for some rea-
son unknown to me, Dr. Hoeflin did not use this figure verbatim;  he instead
"adjusted" it downward to "40," which seems to have no statistical justifica-
tion.  Hence, I have used the original CTMM-generated Mega score of 43 in my
renorming of Dr. Hoeflin's test.

Stanford-Binet Sample.  46 Mega testees reported prior scores on the Stanford-
Binet, including a hefty score of "230" attained by Marilyn vos Savant, who
also scored "46" on Mega.  For some reason, Dr. Hoeflin did not use Marilyn's
score in his Mega test norming -- perhaps because it is identifiably a youth-
ful score, but I have included it in my renorming.  With Marilyn's score in-
cluded, the S-B testees manifested a mean IQ of 146.9, about 7 IQ points high-
er than the CTMM testees, which suggests that there may have been other youth-
ful scores (aside from Marilyn's) included in this sample.
                                                            There are five
Stanford-Binet scores over 4-sigma, and the corresponding top five Mega test
scores (for the S-B group) are 46, 40, 34, 32, and 29.  This points toward 29
as the raw score on Mega which best corresponds to "4-sigma" in the norming
sample, but is ten points below the other indications.

WAIS Sample.  34 Mega testees reported prior WAIS scores, representing 4.9%
of the total scores used by Dr. Hoeflin.  Two of these scores were "4-sigma"
(164 and 162, since the standard deviation on WAIS is 15), and the two top
Mega scores attained by the WAIS testees were 34 and 33.

Cattell, AGCT, and MAT Samples.  Since there is no indication that any of the
testees in these samples reported a "4-sigma" score, they cannot be used for
norming the Mega test at the 4-sigma level.

Combining Sample Results.  The various "4-sigma" levels (on the Mega test) gen-
erated by each prior test sample cited above may now be combined into one, by
means of a "weighting" technique, so that the effect of each sample will be
proportional to its size; this is done by computing a weighted average, as fol-
     Prior Test    Per Cent of the   4-Sigma Level on Mega
     Utilized      Norming Sample    Test for this Sample    Product
      SAT                 31.7%            39                1,236.3
      GRE                 15.3             40                  612.0
      LAIT                11.0             39                  429.0
      CTMM                10.8             43                  464.0
      Stanford-B.          6.7             29                  194.3
      WAIS                 4.9             33                  161.7 
                          80.4%                              3,098.7

The next step is to divide the sum of products (3,098.7) by the sum of weights
(80.4), so as to arrive at a quotient of 38.54, representing the putative "4-
sigma" level on the Mega test for this group of 531 testees; another group
might yield slightly different results.  It would also be helpful to repeat the
above norming process at the 3-sigma and 2-sigma levels, and then draw a smooth
curve through all points; however, it should be fairly obvious that it is the
norming at the 4-sigma level which most directly affects societies such as Pro-
metheus and Mega.
                   At this point, before equating "38.54" raw points on Mega
with 164 IQ on Stanford-Binet, a slight adjustment must be made to account for
"regression to the mean."  Since we are dealing with the very top scores, and
since a decade or more elapsed, on average, between taking the Mega test and
its predecessors, this group of testees should expect to score slightly lower
on the Mega than on their prior tests, all other factors being equal.  This
means that the "4-sigma" level we computed for the Mega test corresponds to
slightly less than "164 IQ" on the Stanford-Binet, and the simplest way to in-
corporate this adjustment is to set the "4-sigma" level on the Mega test at 39.

If the "4-sigma" level on Mega is moved upwards by three raw points, from Ron
Hoeflin's value of "36" to "39," this also implies that the raw score corres-
ponding to the 4.75 sigma level must be moved up by a similar amount, from its
former setting of "43," to a new value of "46."

      Another important factor to be considered in appraising the accuracy of
the Mega test norming is whether or not it is generating too many IQ's at the
4.75 sigma level.  In 1992, the Wall Street Journal reported that 12 such
scores had resulted from 4,000 Mega test administrations, for a percentage in-
cidence of .3% -- but this is 3,000 times greater than the incidence of 4.75
sigma in the general population.

It can be shown that, in order for .3% of the sample to score at a mean (not a
threshold) IQ of "176," the entire sample would need a mean IQ of about 155.
However, Hoeflin's norming sample of 531 testees (82% of which consisted of
OMNI readers, and 18% of high-IQ society members) had a mean IQ somewhere in
the low 140's, while his overall sample of 4,000 (over 90% of whom were OMNI
readers) probably had a mean IQ below 140.  At a 141 IQ level (5 in 1,000),
five thousand tests would have to be administered to yield the expectation of
only one "176 IQ" score; this again leads me to believe that only one, or pos-
sibly two, valid "Mega-level" IQ's have ever been identified through the Mega
testing program.

Here is a tabular summary showing (by sample type) the incidence of "4-sigma"
scores used in norming -- and in renorming -- the Mega test:

   Name of     Size of Norm-  ---Incidence of "4-Sigma" Scores---
   Prior Test   ing Sample    As Used by Hoeflin  -As Corrected--
                               No.    Per Cent    No.    Per Cent
   SAT            220          10        4.5%       6      2.7%
   GRE            106           ?         ?         4      3.8
   LAIT            77          16       20.8        6      7.8
   CTMM            75           2        2.7        2      2.7
   Stanford-B.     46           4        8.7        5     10.8
   WAIS            34           2        5.9        2      5.9
   Cattell      )
   AGCT         )
   MAT          )    These tests were not used for norming at the 4-sigma
   Miscellaneous)     level, since no "4-sigma" scores were reported.

In the above Table, the one glaring anomaly which stands out is the extremely
high incidence of "4-sigma" scores (20.8%) attributed to the LAIT norming sample.
But in his letter to me of December 19, 1995, Dr. Hoeflin stated as follows:

  "Of 25,000 (testees) who took the LAIT, let's say 625 scored at or above the
  four sigma level.  Of 4,000 who took my Mega test, almost exactly 100 scored
  at or above the four sigma level.  If we compare these ratios, we find that
  they are identical, namely: 625/25,000 = 100/4,000 = 1/40." (This equals 2.5%)

Herein lies another conundrum: If Dr. Hoeflin believes it reasonable for 2.5% of
all LAIT testees to score "4-sigma," why did he uncritically use (for purposes
of norming the Mega test) a LAIT sample whose incidence of "4-sigma" scores was
eight times greater, and did he not realize that this would inflate his Mega
       In the March 1996 issue of Dr. Hoeflin's magazine, The Puzzler (P. 16),
Mr. Langdon noted that only one out of three of the LAIT testees he originally
"qualified" for admission into the Four Sigma Society had true 164 IQ's.  In ad-
dition, Dr. Hoeflin noted, "Langdon's second norming actually lowered everyone's
IQ at the upper levels by 5 IQ points compared with his first norming."  In
other words, both Langdon and Hoeflin were well aware of the inflationary ef-
fects of LAIT on testee IQ assessments.
                                         Going back even further in time (1986),
Dr. Hoeflin stated, in Gift of Fire: "I do not trust the norming of the Langdon
test, and would prefer that we (i.e., the Prometheus Society) adopt a more strin-
gent norming procedure...Inflated IQ standards that are not in harmony with real-
world facts strike me as dishonest...The Mega Society likewise has far more mem-
bers than its purported one-in-a-million standard warrants."  But if Dr. Hoeflin
advocated "more stringent norming procedures" for the LAIT in 1986, one wonders
why he didn't use "more stringent procedures" to norm his own Mega test around
the same time, particularly as regards his casual acceptance of anomalous "4-
sigma" levels in his LAIT-derived sample?
                                           To answer my own question, the only
reason I can see why Dr. Hoeflin did not utilize the "more stringent" norming
standards he recommended to Mr. Langdon is because this would have reduced the
membership in Dr. Hoeflin's Mega Society to two persons.  In other words, Dr.
Hoeflin's 1986 statement in Gift of Fire was completely correct -- too bad he
didn't stick to it!

Return to the Uncommonly Difficult I.Q. Tests page.