Conference proceeding
Inferring Passengers' Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), pp.1637-1647
01/01/2020
DOI: 10.1145/3366423.3380235
Abstract
Public transports, such as subway lines and buses, offer affordable ride-sharing services and reduce the road network traffic. Extracting passengers' preferences from their public transit choices is important to city planners but technically non-trivial. When traveling by taking public transits, passengers make sequences of transit choices, and their rewards are usually influenced by other passengers' choices. This process can be modeled as a Markov Game (MG). In this paper, we make the first effort to model travelers' preferences of making transit choices using MGs. Based on the discovery that passengers usually do not change their policies, we propose novel algorithms to extract reward functions from the observed deterministic equilibrium joint policy of all agents in a general-sum MG to infer travelers' preferences. First, we assume we have the access to the entire joint policy. We characterize the set of all reward functions for which the given joint policy is a Nash equilibrium policy. In order to remove the degeneracy of the solution, we then attempt to pick reward functions so as to maximize the sum of the deviation between the the observed policy and the sub-optimal policy of each agent. This results in a skillfully solvable linear programming algorithm for the multi-agent inverse reinforcement learning (MA-IRL) problem. Then, we deal with the case where we have access to the equilibrium joint policy through a set of actual trajectories. We propose an iterative algorithm inspired by single-agent apprenticeship learning algorithms and the cyclic coordinate descent approach. We evaluate the proposed algorithms on both a simple Grid Game and a unique real-world dataset (from Shenzhen, China). Results show that when we have access to the full policy, our algorithm can efficiently recover most of the reward structure, especially the interaction of agents. In the case where we only have access to a set of sampled expert trajectories, our algorithm can provide an explanation of the expert trajectories. Measured with respect to the experts' unknown reward function, the performance of the policy output by our algorithm is close to that of the expert policy.
Details
- Title: Subtitle
- Inferring Passengers' Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning
- Creators
- Mingzhou Yang - Xi'an Jiaotong UniversityYanhua Li - Worcester Polytechnic InstituteXun Zhou - University of IowaHui Lu - Guangzhou UniversityZhihong Tian - Guangzhou UniversityJun Luo - Lenovo Research Hong Kong
- Resource Type
- Conference proceeding
- Publication Details
- WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), pp.1637-1647
- DOI
- 10.1145/3366423.3380235
- Publisher
- Assoc Computing Machinery
- Number of pages
- 11
- Grant note
- CNS-1657350; CMMI-1831140 / NSF; National Science Foundation (NSF) Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019) 2019B010137004 / Guangdong Province Key Area R&D Program of China U1636215; 61972108; 61871140 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC) DiDi Chuxing Inc. 2018YFB0803504 / National Key research and Development Plan
- Language
- English
- Date published
- 01/01/2020
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380391502771
Metrics
8 Record Views