Notice: the WebSM website has not been updated since the beginning of 2018.

Web Survey Bibliography

Title Mapping the Field of Automated Data Collection in the Web. Data Types, Collection Approaches and their Research Logic.
Year 2017
Access date 07.04.2017

Relevance & Research Question: Online communication makes the interaction of individuals, organizations and companies visible – because this interaction leaves data trails, or even consists of data itself. It is no surprise, therefore, that social scientists also work intensively on collecting online data. How the techniques used can be methodologically and epistemologically localized, however, is still unclear. This uncertainty is also reflected in the variety of terminology proposals. Concepts such as Computational Social Science (Lazer et al., 2009), Web Mining (Thelwall, 2009), or Digital Methods (Rogers, 2010) come into play. Furthermore, data collection methodology forms a diverse landscape regarding different types of data, collection methods and data providers (Keyling/JŁnger 2016). The paper asks which methodological challenges as well as opportunities result from different types of data and collection methods.

Methods & Data: Based on experience when collecting online data in the field of political communication research three different approaches for automated data collection are discussed and backed up with examples: raw data, application programming interfaces and user interfaces. Each of these approaches is analyzed in terms of seven methodological dimensions: research object, analysis perpective, data level, abstraction, reactivity, structuring and availability.

Results: The analysis results in a classification scheme which helps with identifying specific methodological opportunities and challenges. Comparing different approaches makes clear that the data never speak for themselves. However, there seems to be a lack of standards with regard to, e.g. reliability and validity of the database or the description of the procedure, which sometimes seems to be ignored with references to numerically large datasets. In constrast dealing with smaller datasets may be more valuable under certain conditions.

Added Value: The paper adds value to the ongoing discussion by systematically mapping the landscape of automated data collection methods in the web. It brings to mind the necessity of dealing with quality criteria in Computational Social Science.

Year of publication2017
Bibliographic typeConferences, workshops, tutorials, presentations

Web survey bibliography - Germany (639)