Journal article
Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling
Journal of machine learning research, Vol.16, pp.1-46
01/01/2015
Abstract
It has become increasingly popular to obtain machine learning labels through commercial crowdsourcing services. The crowdsourcing workers or annotators are paid for each label they provide, but the task requester usually has only a limited amount of the budget. Since the data instances have different levels of labeling difficulty and the workers have different reliability for the labeling task, it is desirable to wisely allocate the budget among all the instances and workers such that the overall labeling quality is maximized. In this paper, we formulate the budget allocation problem as a Bayesian Markov decision process (MDP), which simultaneously conducts learning and decision making. The optimal allocation policy can be obtained by using the dynamic programming (DP) recurrence. However, DP quickly becomes computationally intractable when the size of the problem increases. To solve this challenge, we propose a computationally efficient approximate policy which is called optimistic knowledge gradient. Our method applies to both pull crowdsourcing marketplaces with homogeneous workers and push marketplaces with heterogeneous workers. It can also incorporate the contextual information of instances when they are available. The experiments on both simulated and real data show that our policy achieves a higher labeling quality than other existing policies at the same budget level.
Details
- Title: Subtitle
- Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling
- Creators
- Xi Chen - NYU, Stern Sch Business, New York, NY 10012 USAQihang Lin - Univ Iowa, Tippie Coll Business, Iowa City, IA 52242 USADengyong Zhou - Microsoft Res, Redmond, WA 98052 USA
- Resource Type
- Journal article
- Publication Details
- Journal of machine learning research, Vol.16, pp.1-46
- Publisher
- Microtome Publ
- ISSN
- 1532-4435
- eISSN
- 1533-7928
- Number of pages
- 46
- Language
- English
- Date published
- 01/01/2015
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380412602771
Metrics
5 Record Views