Notice: the WebSM website has not been updated since the beginning of 2018.

Web Survey Bibliography

Title Statistical Estimation of Word Acquisition With Application to Readability Prediction
Year 2011
Access date 27.06.2013

Models of language learning play a central role in a wide range of applications: from psycholinguistic theories of how people acquire new word knowledge, to information systems that can automatically match content to users’ reading ability. Traditional methods for estimating word acquisition ages or content readability are typically based on linear regression over a small number of summary features derived from time-consuming user studies or costly expert judgments. With the increasing amounts of content available from the web and other sources, however, new statistical approaches are possible that can exploit this easily acquired data to learn more flexible, fine-grained models of language usage. We present a novel statistical model for document readability that is based on the logistic Rasch model and the quantiles of word acquisition age distributions. We use this model to estimate the distributions of word acquisition ages from empirical readability data collected from the web. We then demonstrate that the estimated acquisition distributions are very effective in predicting both global and local document readability. We also compare the estimated distributions with word acquisition data from existing oral studies, revealing interesting historical trends as well as differences between oral and written word acquisition grade levels.

Access/Direct link

Journal Homepage (abstract) / (full text)

Year of publication2011
Bibliographic typeJournal article

Web survey bibliography - 2011 (622)