Inferring Passengers' Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning

Mingzhou Yang; Yanhua Li; Xun Zhou; Hui Lu; Zhihong Tian; Jun Luo

doi:10.1145/3366423.3380235

Back

Conference proceeding

Inferring Passengers' Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning

Mingzhou Yang, Yanhua Li, Xun Zhou, Hui Lu, Zhihong Tian and Jun Luo

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), pp.1637-1647

01/01/2020

DOI: 10.1145/3366423.3380235

View Online

Abstract

Public transports, such as subway lines and buses, offer affordable ride-sharing services and reduce the road network traffic. Extracting passengers' preferences from their public transit choices is important to city planners but technically non-trivial. When traveling by taking public transits, passengers make sequences of transit choices, and their rewards are usually influenced by other passengers' choices. This process can be modeled as a Markov Game (MG). In this paper, we make the first effort to model travelers' preferences of making transit choices using MGs. Based on the discovery that passengers usually do not change their policies, we propose novel algorithms to extract reward functions from the observed deterministic equilibrium joint policy of all agents in a general-sum MG to infer travelers' preferences. First, we assume we have the access to the entire joint policy. We characterize the set of all reward functions for which the given joint policy is a Nash equilibrium policy. In order to remove the degeneracy of the solution, we then attempt to pick reward functions so as to maximize the sum of the deviation between the the observed policy and the sub-optimal policy of each agent. This results in a skillfully solvable linear programming algorithm for the multi-agent inverse reinforcement learning (MA-IRL) problem. Then, we deal with the case where we have access to the equilibrium joint policy through a set of actual trajectories. We propose an iterative algorithm inspired by single-agent apprenticeship learning algorithms and the cyclic coordinate descent approach. We evaluate the proposed algorithms on both a simple Grid Game and a unique real-world dataset (from Shenzhen, China). Results show that when we have access to the full policy, our algorithm can efficiently recover most of the reward structure, especially the interaction of agents. In the case where we only have access to a set of sampled expert trajectories, our algorithm can provide an explanation of the expert trajectories. Measured with respect to the experts' unknown reward function, the performance of the policy output by our algorithm is close to that of the expert policy.

Computer Science

Technology

Telecommunications

Computer Science, Information Systems

Science & Technology

Details

Title: Subtitle: Inferring Passengers' Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning
Creators: Mingzhou Yang - Xi'an Jiaotong University
Yanhua Li - Worcester Polytechnic Institute
Xun Zhou - University of Iowa
Hui Lu - Guangzhou University
Zhihong Tian - Guangzhou University
Jun Luo - Lenovo Research Hong Kong
Resource Type: Conference proceeding
Publication Details: WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), pp.1637-1647
DOI: 10.1145/3366423.3380235
Publisher: Assoc Computing Machinery
Number of pages: 11
Grant note: CNS-1657350; CMMI-1831140 / NSF; National Science Foundation (NSF) Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019) 2019B010137004 / Guangdong Province Key Area R&D Program of China U1636215; 61972108; 61871140 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC) DiDi Chuxing Inc. 2018YFB0803504 / National Key research and Development Plan
Language: English
Date published: 01/01/2020
Academic Unit: Business Analytics
Record Identifier: 9984380391502771

Metrics

8 Record Views

4 Times Cited - Web of Science