Logo image
Translating surveys to surveillance on social media: methodological challenges & solutions
Conference proceeding

Translating surveys to surveillance on social media: methodological challenges & solutions

Chao Yang and Padmini Srinivasan
Proceedings of the 2014 ACM conference on web science, pp.4-12
WebSci '14
06/23/2014
DOI: 10.1145/2615569.2615696

View Online

Abstract

Passive surveillance of preferences, opinions and behaviors on social media is becoming increasingly common. The general goal is to make inferences from observations collected from the numerous posts publicly available in blogs, microblogs, and other social forums. A traditional approach for collecting observations is by querying a random (or convenience) sample of individuals with surveys. A wide variety of well respected survey instruments have been developed over many decades especially in social sciences.The question addressed here is: how does one `translate' a survey of interest into surveillance strategies on social media? Specifically, how does one find the posts that could be interpreted as valid responses to the survey? Developing a general methodology for translating a survey into social medial surveillance might further the inclusion of social media research into traditional social science research. We propose a translation methodology using a well-reputed survey (the Satisfaction with Life Scale) as an example. A second methodological contribution that goes beyond the survey translation focus is a crowdsourcing approach, which we claim with reasonable confidence, finds close to \ul{all} the relevant items in a dataset. This is different from the standard approach of asking workers to annotate all items in a small dataset. Our method supports more accurate evaluations (i.e., more precise recall calculations) as well as the development of larger training datasets. Finally the resulting surveillance method derived from the life satisfaction survey achieves recall, precision and F scores between 0.59 and 0.65. This is considerably better than standard methods using lexicons (precision around 0.16) or classifiers (precision, recall and F scores between 0.32 and 0.38).
crowdsourcing information retrieval life satisfaction

Details

Metrics

13 Record Views
Logo image