Translating surveys to surveillance on social media: methodological challenges & solutions

Chao Yang; Padmini Srinivasan

doi:10.1145/2615569.2615696

Back

Conference proceeding

Translating surveys to surveillance on social media: methodological challenges & solutions

Chao Yang and Padmini Srinivasan

Proceedings of the 2014 ACM conference on web science, pp.4-12

WebSci '14

06/23/2014

DOI: 10.1145/2615569.2615696

View Online

Abstract

Passive surveillance of preferences, opinions and behaviors on social media is becoming increasingly common. The general goal is to make inferences from observations collected from the numerous posts publicly available in blogs, microblogs, and other social forums. A traditional approach for collecting observations is by querying a random (or convenience) sample of individuals with surveys. A wide variety of well respected survey instruments have been developed over many decades especially in social sciences.The question addressed here is: how does one `translate' a survey of interest into surveillance strategies on social media? Specifically, how does one find the posts that could be interpreted as valid responses to the survey? Developing a general methodology for translating a survey into social medial surveillance might further the inclusion of social media research into traditional social science research. We propose a translation methodology using a well-reputed survey (the Satisfaction with Life Scale) as an example. A second methodological contribution that goes beyond the survey translation focus is a crowdsourcing approach, which we claim with reasonable confidence, finds close to \ul{all} the relevant items in a dataset. This is different from the standard approach of asking workers to annotate all items in a small dataset. Our method supports more accurate evaluations (i.e., more precise recall calculations) as well as the development of larger training datasets. Finally the resulting surveillance method derived from the life satisfaction survey achieves recall, precision and F scores between 0.59 and 0.65. This is considerably better than standard methods using lexicons (precision around 0.16) or classifiers (precision, recall and F scores between 0.32 and 0.38).

crowdsourcing

information retrieval

life satisfaction

Details

Title: Subtitle: Translating surveys to surveillance on social media: methodological challenges & solutions
Creators: Chao Yang
Padmini Srinivasan
Resource Type: Conference proceeding
Publication Details: Proceedings of the 2014 ACM conference on web science, pp.4-12
Series: WebSci '14
DOI: 10.1145/2615569.2615696
Publisher: ACM
Grant note: name: Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI / NBC), award: D12PC00285
Language: English
Date published: 06/23/2014
Academic Unit: Nursing; Computer Science; Business Analytics
Record Identifier: 9984003189402771

Metrics

13 Record Views