Stanford Binet Intelligence Test Scale Review

By Jacqueline Chinappi

Stanford Binet Intelligence Scale is an intelligence test which measures cognitive abilities among children and adults. It has gone through many changes through the years. Here we will look at factors including disadvantages, advantages, reliability and comparisons to other intelligence tests

Critical Review of Stanford-Binet Intelligence Scale

The Stanford-Binet Intelligence Scale is a test which helps to measure and evaluate intelligence in children and adults. This test has gone through many changes over a period of almost 100 years. The most recent edition is the Stanford Binet Fifth Edition. Advantages of the Fifth Edition include “more game like than earlier versions with colorful artwork, toys, and manipulative; matches norms to 2000 U.S. Census; contains nonverbal as well as verbal routing test; contains both a general composite score and several factor scores; shares items to maintain continuity with earlier versions; covers age range of 2 through 85+;change-sensitive scores allow for evaluation of extreme performance; has easel format with directions, scoring criteria, and stimuli, for easy administration; has equal balance of verbal and nonverbal content in all factors; contains Nonverbal IQ; has standard deviation of 15 for composite scores, allowing easy comparison with other tests; M = 10, SD = 3 for subtests; uses adaptive testing (routing) to economize on administration time and reduce examinee frustration; uses explicit theoretical framework as guide for item development and alignment of subtests within modeled hierarchy; extends low-end items, allowing earlier identification of individuals with delays or cognitive difficulties; extends high-end items to measure gifted adolescents and adults.” (Becker, 2003)

Reliability & Comparisons

Methods used to estimate reliability included split-half method, test-retest reliability, and interscorer agreement. The coefficients for the subtests were determined using the split-half method. The scores were rectified with the Spearman-Brown formula. Nonverbal subtests reliability coefficients averaged between .85 and .89. Verbal subtests reliability coefficients were strong and fell between .84 and .89. (Johnson & D’Amato, 2004)

“Reliability for the IQ and Factor Index scores was computed using ‘the formula for a reliability of a sum of multiple tests’ (technical manual, p.63).” (Johnson & D’Amato, 2004) The Full Scale IQ average reliability coefficients was .98, Nonverbal IQ average reliability coefficients was .95, Verbal IQ average reliability coefficients was .96, and the Abbreviated Battery IQ average reliability coefficients was .91. The Factor Index scores had included Fluid Reasoning at .90, Knowledge at .92, Quantitative Reasoning at .92, Visual-Spatial Processing at .92, and Working Memory at .91. (Johnson & D’Amato, 2004)

Throughout all of the age levels, standard errors of measurement were discovered. Full Scale IQ standard error of measurement was 2.30, Nonverbal IQ standard error of measurement was 3.26, and Verbal IQ standard error of measurement was 3.05. Correlations ranged from .74 to .97 for interscorer agreement reliability. This data shows sufficient interscorer agreement. (Kush, 2004)

The technical manual for the Stanford Binet details a study involving 104 participants who took both the Fifth Edition and the Fourth Edition. A strong criterion-related validity was found with Full Scale scores averaging .90. The Fifth Edition average Full Scale score was lower than the Fourth Edition average Mean Composite score. (SB5=107.9, SB4 = 111.4). These results are found to be consistent with the Flynn Effect. (Kush, 2004)

Studies which compared the Stanford Binet Intelligence Scale Fifth Edition with Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R) (r = .83); the Wechsler Intelligence Scale for Children-Third Edition (WISC-III) (r = .84); the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) (r = .82); and the Woodcock-Johnson III Tests of Cognitive Abilities (r = .78) found supplementary criterion-related validity. (Kush, 2004)

