Albus JS (1971) A theory of cerebellar function. Math. Biosci. 10(1–2):25–61.Google Scholar
Alonso-Mora J , Samaranayake S , Wallar A , Frazzoli E , Rus D (2017) On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment. Proc. Natl. Acad. Sci. USA 114(3):462–467.Google Scholar
Bailey WA Jr , Clark TD Jr (1987) A simulation analysis of demand and fleet size effects on taxicab service rates. Thesen A , Grant H , Kelton WD , eds. Proc. 19th Conf. Winter Simulation (Association for Computing Machinery, New York), 838–844.Google Scholar
Baird LC III (1993) Advantage updating. Technical report, Wright Laboratory, Wright-Patterson Air Force Base, Dayton, OH.Google Scholar
Bello I , Pham H , Le QV , Norouzi M , Bengio S (2017) Neural combinatorial optimization with reinforcement learning. ICLR 2017 Workshop Track. Accessed May 19, 2020, https://openreview.net/pdf?id=Bk9mxlSFx.Google Scholar
Boutilier C , Cohen A , Hassidim A , Mansour Y , Meshi O , Mladenov M , Schuurmans D (2018) Planning and learning with stochastic action sets. Lang J , ed. Proc. 27th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence, Stockholm), 4674–4682.Google Scholar
Brodsky I (2018) H3: Uber’s hexagonal hierarchical spatial index. Accessed June 26, 2019, https://eng.uber.com/h3/.Google Scholar
Hales TC (2001) The honeycomb conjecture. Discrete Comput. Geometry 25(1):1–22.Google Scholar
Holler J , Vuorio R , Qin Z , Tang X , Jiao Y , Jin T , Singh S , Wang C , Ye J (2019) Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem. Wang J , Shim K , Wu X , eds. 2019 IEEE Internat. Conf. Data Mining (ICDM) (Institute of Electrical and Electronics Engineers, Washington, DC), 1090–1095.Google Scholar
Jindal I , Qin ZT , Chen X , Nokleby M , Ye J (2018) Optimizing taxi carpool policies via reinforcement learning and spatio-temporal mining. Abe N , Liu H , Pu C , Hu X , Ahmed N , Qiao M , Song Y , et al. , eds. 2018 IEEE Internat. Conf. Big Data (Big Data) (Institute of Electrical and Electronics Engineers, Washington, DC), 1417–1426.Google Scholar
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2(1-2):83–97.Google Scholar
Kümmel M , Busch F , Wang DZ (2016) Taxi dispatching and stable marriage. Procedia Comput. Sci. 83(December):163–170.Google Scholar
Li M , Qin Z , Jiao Y , Yang Y , Wang J , Wang C , Wu G , Ye J (2019) Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. Liu L , White R , eds. WWW’19 World Wide Web Conf. (Association for Computing Machinery, New York), 983–994.Google Scholar
Lopes PA , Yadav SS , Ilic A , Patra SK (2019) Fast block distributed CUDA implementation of the Hungarian algorithm. J. Parallel Distributed Comput. 130(August):50–62.Google Scholar
Miao F , Han S , Lin S , Stankovic JA , Zhang D , Munir S , Huang H , He T , Pappas GJ (2016) Taxi dispatch with real-time sensing data in metropolitan areas: A receding horizon control approach. IEEE Trans. Automation Sci. Engrg. 13(2):463–478.Google Scholar
Mnih V , Kavukcuoglu K , Silver D , Rusu AA , Veness J , Bellemare MG , Graves A , et al. . (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Google Scholar
National Bureau of Statistics of China (2019) National economic performance maintained within an appropriate range in 2018 with main development goals achieved. Report, National Bureau of Statistics of China, Beijing.Google Scholar
Nazari M , Oroojlooy A , Snyder L , Takác M (2018) Reinforcement learning for solving the vehicle routing problem. Adv. Neural Inform. Processing Systems 31:9839–9849.Google Scholar
Oda T , Joe-Wong C (2018) MOVI: A model-free approach to dynamic fleet management. IEEE INFOCOM 2018-IEEE Conf. Comput. Commun. (Institute of Electrical and Electronics Engineers, Washington, DC), 2708–2716.Google Scholar
Özkan E , Ward AR (2020) Dynamic matching for real-time ride sharing. Stochastic Systems 10(1):29–70.Link, Google Scholar
Sahr K (2011) Hexagonal discrete global grid systems for geospatial computing. Arch. Photogrammetry Cartography Remote Sensing 22 (January):363–376.Google Scholar
Schulman J , Wolski F , Dhariwal P , Radford A , Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted August 28, https://arxiv.org/abs/1707.06347.Google Scholar
Shou Z , Di X , Ye J , Zhu H , Zhang H , Hampshire R (2020) Optimal passenger-seeking policies on e-hailing platforms using Markov decision process and imitation learning. Transportation Res. Part C: Emerging Tech. 111(February):91–113.Google Scholar
Silver D , Huang A , Maddison CJ , Guez A , Sifre L , Van Den Driessche G , Schrittwieser J , et al. . (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Google Scholar
Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Google Scholar
Sutton RS , Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Sutton RS , Precup D , Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2):181–211.Google Scholar
Tang X , Qin Z , Zhang F , Wang Z , Xu Z , Ma Y , Zhu H , Ye J (2019) A deep value-network based approach for multi-driver order dispatching. Proc. 25th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1780–1790.Google Scholar
Tsitsiklis JN , Van Roy B (1996) Feature-based methods for large scale dynamic programming. Machine Learn. 22(1–3):59–94.Google Scholar
Uher V , Gajdoš P , Snášel V , Lai YC , Radeckỳ M (2019) Hierarchical hexagonal clustering and indexing. Symmetry 11(6):731.Google Scholar
University of Michigan Center for Sustainable Studies (2019) US cities factsheet. Report, Center for Sustainable System, University of Michigan, Ann Arbor. Accessed April 21, 2020, http://css.umich.edu/sites/default/files/US%20Cities_CSS09-06_e2019.pdf.Google Scholar
Van Hasselt H , Guez A , Silver D (2016) Deep reinforcement learning with double Q-learning. 30th AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 2094–2100.Google Scholar
Verma T , Varakantham P , Kraus S , Lau HC (2017) Augmenting decisions of taxi drivers through reinforcement learning for improving revenues. 27th Internat. Conf. Automated Planning Scheduling (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 409–417.Google Scholar
Vinyals O , Fortunato M , Jaitly N (2015) Pointer networks. Adv. Neural Inform. Processing Systems 28:2692–2700.Google Scholar
Wang Z , Qin Z , Tang X , Ye J , Zhu H (2018) Deep reinforcement learning with knowledge transfer for online rides order dispatching. 2018 IEEE Internat. Conf. Data Mining (ICDM) (Institute of Electrical and Electronics Engineers, Washington, DC), 617–626.Google Scholar
Xu Z , Li Z , Guan Q , Zhang D , Li Q , Nan J , Liu C , Bian W , Ye J (2018) Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 905–913.Google Scholar
Yan C , Zhu H , Korolko N , Woodard D (2019) Dynamic pricing and matching in ride-hailing platforms. Naval Res. Logist. , ePub ahead of print November 15, https://doi.org/10.1002/nav.21872.Google Scholar
Zhang L , Hu T , Min Y , Wu G , Zhang J , Feng P , Gong P , Ye J (2017) A taxi order dispatch model based on combinatorial optimization. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2151–2159.Google Scholar

cover image INFORMS Journal on Applied Analytics

Volume 50, Issue 5

September-October 2020

Pages 269-341, C3, ii

Article Information

Metrics

Information

Accepted:June 03, 2020
Published Online:September 24, 2020

Cite as

Zhiwei (Tony) Qin, Xiaocheng Tang, Yan Jiao, Fan Zhang, Zhe Xu, Hongtu Zhu, Jieping Ye (2020) Ride-Hailing Order Dispatching at DiDi via Reinforcement Learning. INFORMS Journal on Applied Analytics 50(5):272-286.

https://doi.org/10.1287/inte.2020.1047

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Ride-Hailing Order Dispatching at DiDi via Reinforcement Learning

References

Volume 50, Issue 5

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News