Web Survey Bibliography
Title Predictive inference for non-probability samples: a simulation study
Author Buelens, B.; Burger, J.; van den Brakel, J.
Source Statistics Netherlands (2015)
Year 2016
Access date 07.02.2016
Full text pdf (2.3 MB)
Abstract Non-probability samples provide a challenging source of information for official statistics, because the data generating mechanism is unknown. Making inference from such samples therefore requires a novel approach compared with the classic approach of survey sampling. Design-based inference is a powerful technique for random samples obtained via a known survey design, but cannot legitimately be applied to non-probability samples such as big data and voluntary opt-in panels. We propose a framework for such non-probability samples based on predictive inference. Three classes of methods are discussed. Pseudo-design-based methods are the simplest and apply traditional design-based estimation despite the absence of a survey design; model-based methods specify an explicit model and use that for prediction; algorithmic methods from the field of machine learning produce predictions in a non-linear fashion through computational techniques. We conduct a simulation study with a real-world data set containing annual mileages driven by cars for which a number of auxiliary characteristics are known. A number of data generating mechanisms are simulated, and—in absence of a survey design—a range of methods for inference are applied and compared to the known population values.The first main conclusion from the simulation study is that unbiased inference from a selective non-probability sample is possible, but access to the variables explaining the selection mechanism underlying the data generating process is crucial. Second, exclusively relying on familiar pseudo-design-based methods is often too limited. Model-based and algorithmic methods of inference are more powerful in situations where data are highly selective. Thus, when considering the use of big data or other non-probability samples for official statistics, the statistician must attempt to obtain auxiliary variables or features that could explain the data generating mechanism, and in addition must consider the use of a wider variety of methods for predictive inference than those in typical use at statistical agencies today.
Access/Direct link Statistics Netherlands (Abstract) / (Full text)
Year of publication2015
Bibliographic typeReports, seminars
Web survey bibliography - Statistics Netherlands (11)
- Establishing the accuracy of online panels for survey research; 2016; Bruggen, E.; van den Brakel, J.; Krosnick, J. A.
- Predictive inference for non-probability samples: a simulation study ; 2016; Buelens, B.; Burger, J.; van den Brakel, J.
- The use of within-subject experiments for estimating measurement effects in mixed-mode surveys ; 2014; Klausch, L. T., Schouten, B., Hox, J.
- Measuring well-being: An analysis of different response scales; 2014; van Beuningen, J., van der Houwen, K., Moonen, L.
- The impact of contact effort and interviewer performance on mode-specific nonresponse and measurement...; 2014; Schouten, B., Cobben, F., van der Laan, J., Arends, J.
- Adaptive survey designs to minimize survey mode effects. A case study on the Dutch Labour Force Survey...; 2013; Calinescu, M., Schouten, B.
- Using response probabilities for assessing representativity; 2012; Bethlehem, J.
- Disentangling Mode-Specific Selection and Measurement Bias in Social Surveys; 2012; Buelens, B., van der Laan, J., Schouten, B., Klausch, L. T., van der Brakel, J., Burger, J.
- Inference in surveys with sequential mixed-mode data collection; 2011; Buelens, B., van der Brakel, J.
- The rise of survey sampling; 2009; Bethlehem, J.
- How accurate are self-selection web surveys?; 2008; Bethlehem, J.