Adan I, Weiss G (2012) Exact FCFS matching rates for two infinite multitype sequences. Oper. Res. 60(2):475–489.Link, Google Scholar
Adan I, Weiss G (2014) A skill-based parallel service system under FCFS-ALIS—Steady state, overloads, and abandonments. Stochastic Systems 4(1):250–299.Link, Google Scholar
Agarwal A, Foster DP, Hsu D, Kakade SM, Rakhlin A (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Google Scholar
Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Google Scholar
Bertsimas D (1995) The achievable region method in the optimal control of queueing systems; formulations, bounds and policies. Queueing Systems 21(3–4):337–389.Google Scholar
Bertsimas D, Tsitsiklis JN (1997) Introduction to Linear Optimization (Athena Scientific, Belmont, MA).Google Scholar
Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
Bimpikis K, Markakis MG (2019) Learning and hierarchies in service systems. Management Sci. 65(3):1268–1285.Link, Google Scholar
Burnetas AN, Kanavetas O (2012) Adaptive policies for sequential sampling under incomplete information and a cost constraint. Daras N, ed. Applications of Mathematics and Informatics in Military Science, Springer Optimization and its Applications, vol. 71 (Springer, New York), 97–112. Google Scholar
Burnetas AN, Katehakis MN (1996) Optimal adaptive policies for sequential allocation problems. Adv. Appl. Math. 17(2):122–142.Google Scholar
Burnetas AN, Katehakis MN (1997) Optimal adaptive policies for Markov Decision Processes. Math. Oper. Res. 22(1):222–255.Link, Google Scholar
Burnetas AN, Kanavetas O, Katehakis MN (2017) Asymptotically optimal multi-armed bandit policies under a cost constraint. Probab. Engrg. Infom. Sci. 31(3):284–310.Google Scholar
Chen J, Dong J, Shi P (2020) A survey on skill-based routing with applications to service operations management. Queueing Systems 96(1–2):53–82.Google Scholar
Choudhury T, Joshi G, Wang W, Shakkottai S (2021) Job dispatching policies for queueing systems with unknown service rates. MobiHoc’21 Proc. Twenty-Second Internat. Sympos. Theory Algorithmic Foundations Protocol Design Mobile Networks Mobile Comput. (Association for Computing Machinery, New York), 181–190.Google Scholar
Çinlar E, Agnew R (1968) On the superposition of point processes. J. Roy. Statist. Soc. Ser. B Methodological 30:576–581.Google Scholar
Cohen JW (1982) The Single Server Queue. 2nd ed. (Elsevier, Amsterdam).Google Scholar
Comte C, Jonckheere M, Sanders J, Senen-Cerda A (2023) Score-aware policy-gradient methods and performance guarantees using local Lyapunov conditions: Applications to product-form stochastic networks and queueing systems. Preprint, submitted December 5, https://arxiv.org/abs/2312.02804v1.Google Scholar
Dacre M, Glazebrook K, Niño-Mora J (1999) The achievable region approach to the optimal control of stochastic systems. J. Roy. Statist. Soc. Ser. B Statist. Methodology 61(4):747–791.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory COLT 2008 (Omnipress, Madison, WI), 355–366.Google Scholar
Desrosiers J, Lübbecke ME (2005) A primer in column generation. Desaulniers G, Desrosiers J, Solomon MM, eds. Column Generation (Springer, Boston), 1–32.Google Scholar
Fu X, Modiano E (2021) Learning-NUM: Network Utility Maximization with unknown utility functions and queueing delay. Proc. Internat. Sympos. Mobile Ad Hoc Networking Comput. MobiHoc (Association for Computing Machinery, New York), 21–30.Google Scholar
Fu X, Modiano E (2022) Joint learning and control in stochastic queueing networks with unknown utilities. Proc. ACM Measurement Anal. Comput. Systems 6(3):58.Google Scholar
García-Muñoz F, Dávila S, Quezada F (2023) A Benders decomposition approach for solving a two-stage local energy market problem under uncertainty. Appl. Energy 329:120226.Google Scholar
Gurvich I, Whitt W (2010) Service-level differentiation in many-server service systems via queue-ratio routing. Oper. Res. 58(2):316–328.Link, Google Scholar
Harchol-Balter M (2013) Performance Modeling and Design of Computer Systems: Queueing Theory in Action (Cambridge University Press, Cambridge, UK).Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Google Scholar
Hsu WK, Xu J, Lin X, Bell MR (2022) Integrated online learning and adaptive control in queueing systems with uncertain payoffs. Oper. Res. 70(2):1166–1181.Link, Google Scholar
Jia H, Shi C, Shen S (2024) Online learning and pricing for service systems with reusable resources. Oper. Res. 72(3):1203–1241.Link, Google Scholar
Jiang L, Walrand J (2010) A distributed CSMA algorithm for throughput and utility maximization in wireless networks. IEEE/ACM Trans. Networking 18(3):960–972.Google Scholar
Johari R, Kamble V, Kanoria Y (2021) Matching while learning. Oper. Res. 69(2):655–681.Link, Google Scholar
Kim J-h, Vojnovic M (2021) Scheduling servers with stochastic bilinear rewards. Preprint, submitted December 13, https://arxiv.org/abs/2112.06362v1.Google Scholar
Koole GM, Mandelbaum A (2002) Queueing models of call centers: An introduction. Ann. Oper. Res. 113(1–4):41–59.Google Scholar
Krishnasamy S, Arapostathis A, Johari R, Shakkottai S (2018) On learning the cμ rule in single and parallel server networks. 2018 56th Annual Allerton Conf. Commun. Control Comput. Allerton 2018 (IEEE, Piscataway, NJ), 153–154.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Google Scholar
Lee D, Vojnovic M (2021) Scheduling jobs with stochastic holding costs. Adv. Neural Inform. Processing Systems 23:19375–19384.Google Scholar
Liu B, Xie Q, Modiano E (2019) RL-QN: A reinforcement learning framework for optimal control of queueing systems. 2019 57th Annual Allerton Conf. Commun. Control Comput. Allerton 2019 (IEEE, Piscataway, NJ), 663–670.Google Scholar
Liu X, Li B, Shi P, Ying L (2020) POND: Pessimistic–Optimistic oNline Dispatching. Preprint, submitted October 20, https://arxiv.org/abs/2010.09995v1.Google Scholar
Long Z, Zhang H, Zhang J, Zhang ZG (2024) The generalized c/μ rule for queues with heterogeneous server pools. Oper. Res. 72(6):2488–2506.Link, Google Scholar
Sanders J, Borst SC, Van Leeuwaarden JS (2016) Online network optimization using product-form Markov processes. IEEE Trans. Automatic Control 61(7):1838–1853.Google Scholar
Shah V, Gulikers L, Massoulié L, Vojnović M (2020) Adaptive matching for expert systems with uncertain task types. Oper. Res. 68(5):1403–1424.Link, Google Scholar
Shapley LS, Shubik M (1971) The assignment game I: The core. Internat. J. Game Theory 1(1):111–130.Google Scholar
Smith WE (1956) Various optimizers for single‐stage production. Naval Res. Logist. Quart. 3:59–66.Google Scholar
Steiger J, Li B, Lu N (2022) Learning from delayed semi-bandit feedback under strong fairness guarantees. IEEE INFOCOM 2022 IEEE Conf. Comput. Commun. (IEEE, Piscataway, NJ), 1379–1388.Google Scholar
Sun X, Zhao J, Chai S (2023) Congestion-aware matching and learning for service platforms. Preprint, submitted March 14, https://doi.org/10.2139/ssrn.5258944.Google Scholar
Tan B, Srikant R (2012) Online advertisement, optimization and stochastic networks. IEEE Trans. Automatic Control 57(11):2854–2868.Google Scholar
Tan X, Sun B, Leon-Garcia A, Wu Y, Tsang DH (2020) Mechanism design for online resource allocation: A unified approach. SIGMETRICS’20 Abstracts 2020 SIGMETRICS/Performance Joint Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 11–12.Google Scholar
Vera A, Banerjee S (2021) The Bayesian prophet: A low-regret framework for online decision making. Management Sci. 67(3):1368–1391.Link, Google Scholar
Visschers J, Adan I, Weiss G (2012) A product form solution to a system with multi-type jobs and multi-type servers. Queueing Systems 70(3):269–298.Google Scholar
Wei Y, Xu J, Yu SH (2023) Constant regret primal-dual policy for multi-way dynamic matching. Performance Evaluation Rev. 51(1):79–80.Google Scholar
Weiss G (2021) Scheduling and Control of Queueing Networks (Cambridge University Press, Cambridge, UK).Google Scholar
Xia L, Zhang ZG, Li QL (2022) A c/μ-rule for job assignment in heterogeneous group-server queues. Production Oper. Management 31(3):1191–1215.Google Scholar
Yang Z, Srikant R, Ying L (2023) Learning while scheduling in multi-server systems with unknown statistics: MaxWeight with Discounted UCB. Proc. Machine Learn. Res. 206:4275–4312.Google Scholar
Yu H, Neely MJ, Wei X (2017) Online convex optimization with stochastic constraints. Adv. Neural Inform. Processing Systems 30:1429–1439.Google Scholar
Zhong Y, Birge JR, Ward A (2022) Learning the scheduling policy in time-varying multiclass many server queues with abandonment. Preprint, submitted April 21, https://doi.org/10.2139/ssrn.4090021.Google Scholar

Volume 16, Issue 2

June 2026

Pages 109-209

Article Information

Supplemental Material

Metrics

Information

Received:December 13, 2024
Accepted:March 18, 2026
Published Online:May 05, 2026

Cite as

Sanne van Kempen, Jaron Sanders, Fiona Sloothaak, Maarten G. Wolf (2026) Learning Payoffs While Routing in Skill-Based Queues. Stochastic Systems 16(2):140-164.

https://doi.org/10.1287/stsy.2024.0095

Keywords

Acknowledgments

The authors thank Thomas van Vuren for his insightful comments and fruitful discussions. This work is part of Valuable AI, a research collaboration between the Eindhoven University of Technology and the Koninklijke KPN N.V.

PDF download

Available Issues

Available Issues

Available Issues

Learning Payoffs While Routing in Skill-Based Queues

References

Volume 16, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News