Notice: the WebSM website has not been updated since the beginning of 2018.

Web Survey Bibliography

Title Objectivity, Reliability, and Validity of Search Engine Count Estimates
Year 2008
Access date 03.12.2011

Count estimates ("hits") provided by Web search engines have received much attention as a yardstick to measure a variety of phenomena of interest as diverse as, e.g., language statistics, popularity of authors, or similarity between words. Common to these activities is the intention to use Web search engines not only for search but for ad hoc measurement. Using search engine count estimates (SECEs) in this way means that a phenomenon of interest, e.g., the popularity of an author, is conceived of as a measurand, and SECEs are taken to be its quantitative measures. However, the data quality of SECEs has not yet been studied systematically, and concerns have been raised against the use of this kind of data. This article examines the data quality of SECEs focusing on classical goodness criteria, i.e., objectivity, reliability, and validity. The results of a series of studies indicate that with the exception of Boolean queries that use disjunction or negation objectivity as well as test-retest reliability and parallel-test reliability of SECEs is good for most types of browsers and search engines examined. Estimation of validity required model development (all-subsets regression) revealing satisfying results by using an explorative approach to feature selection. The findings are discussed in the light of previous objections and perspectives for using Web search count estimates are delineated.

Access/Direct link

Journal Homepage (abstract) / (full text)

Year of publication2008
Bibliographic typeJournal article