Donald Hayes - Lexical Analysis

     This website describes D. P. Hayes’ continuing program of research on natural texts which began in 1980. The focus has been on the relative ‘accessibility’ of any English-language text -- as measured by the LEX statistic. Underlying LEX is a model based on patterns of word choice, determined by software (QANALYSIS): the departure of an edited text from the model’s lognormal statistical distribution. Several experimental and comparative validation studies are described. Samples representing the full spectrum of natural texts -- pre-primers and talk with animals to technical scientific reports -- are contained in Cornell Corpus 2000 ( N = 5000+ texts). Newspapers have remained close to LEX = 0.0 since 1665 – making them a familiar reference level for interpreting any text’s accessibility. LEX meets the standards required of natural science measures: validity, reliability, robustness, stability over centuries, and precision. Empirically, natural texts vary from LEX -85 to +58. A number of substantive studies are described briefly showing the range of LEX’s applications, both applied and theoretical.

See also: Lexical Demand Levels of Schoolbooks: A Corpus

This page was last updated Jun 26, 2006.