Web Survey Bibliography

Title Balancing Twitter data with survey information to predict electoral outcomes
Year 2017
Access date 11.04.2017

Relevance and research question: In recent years social networks have increasingly been used to study political opinion formation, monitor electoral campaigns and predict electoral outcomes. Still, the main problem of these studies is that data from social networks are usually not a representative sample of the whole population as, for example, people using social media are generally young. In this paper we contribute to overcome sample biases by balancing Twitter data with information from a traditional surveys with the aim of nowcasting and predicting the outcome of a constitutional referendum that recently took place in Italy.

Methods & data: Data used in this research are collected from two different sources. First, using the Twitter API we collect tweets expressing voting intentions during the four weeks before the elections obtaining approximately one million tweets. Second, we use data from a traditional survey containing people’s voting intentions and demographic information such as gender and year of birth. On the first set of data we perform a sentiment analysis as proposed by Hopkins and King (2010) to study voting intentions of Twitter users. Then, to improve the social media forecast, we derive an appropriate set of weights based on the survey’ s information providing an efficient approach to balance the sample’s characteristics and adjust the forecast.

Moreover we perform a topic modelling analysis using a Latent Dirichelet Allocation model (LDA) to extract frequent topics and keywords from the Twitter data.

Results: Results show that we are correctly able to predict the outcome of the referendum, also in comparison with predictions achieved by using data from social media and traditional surveys separately.

Moreover we find that connected to voting “yes” at referendum there are positive words such as future and change while connected to voting “no” there are words such ad fear and risk.

Added value: The comparative advantage of our study is that combining data from social media with traditional survey data allows to net out most of the problems related to sample selection bias commonly present when analysing only on-line sources.

Year of publication2017
Bibliographic typeConferences, workshops, tutorials, presentations

Web survey bibliography (8390)