Learning Payoffs While Routing in Skill-Based Queues

Published Online:https://doi.org/10.1287/stsy.2024.0095

References

  • Adan I, Weiss G (2012) Exact FCFS matching rates for two infinite multitype sequences. Oper. Res. 60(2):475–489.LinkGoogle Scholar
  • Adan I, Weiss G (2014) A skill-based parallel service system under FCFS-ALIS—Steady state, overloads, and abandonments. Stochastic Systems 4(1):250–299.LinkGoogle Scholar
  • Agarwal A, Foster DP, Hsu D, Kakade SM, Rakhlin A (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Google Scholar
  • Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.LinkGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Google Scholar
  • Bertsimas D (1995) The achievable region method in the optimal control of queueing systems; formulations, bounds and policies. Queueing Systems 21(3–4):337–389.Google Scholar
  • Bertsimas D, Tsitsiklis JN (1997) Introduction to Linear Optimization (Athena Scientific, Belmont, MA).Google Scholar
  • Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • Bimpikis K, Markakis MG (2019) Learning and hierarchies in service systems. Management Sci. 65(3):1268–1285.LinkGoogle Scholar
  • Burnetas AN, Kanavetas O (2012) Adaptive policies for sequential sampling under incomplete information and a cost constraint. Daras N, ed. Applications of Mathematics and Informatics in Military Science, Springer Optimization and its Applications, vol. 71 (Springer, New York), 97–112. Google Scholar
  • Burnetas AN, Katehakis MN (1996) Optimal adaptive policies for sequential allocation problems. Adv. Appl. Math. 17(2):122–142.Google Scholar
  • Burnetas AN, Katehakis MN (1997) Optimal adaptive policies for Markov Decision Processes. Math. Oper. Res. 22(1):222–255.LinkGoogle Scholar
  • Burnetas AN, Kanavetas O, Katehakis MN (2017) Asymptotically optimal multi-armed bandit policies under a cost constraint. Probab. Engrg. Infom. Sci. 31(3):284–310.Google Scholar
  • Chen J, Dong J, Shi P (2020) A survey on skill-based routing with applications to service operations management. Queueing Systems 96(1–2):53–82.Google Scholar
  • Choudhury T, Joshi G, Wang W, Shakkottai S (2021) Job dispatching policies for queueing systems with unknown service rates. MobiHoc’21 Proc. Twenty-Second Internat. Sympos. Theory Algorithmic Foundations Protocol Design Mobile Networks Mobile Comput. (Association for Computing Machinery, New York), 181–190.Google Scholar
  • Çinlar E, Agnew R (1968) On the superposition of point processes. J. Roy. Statist. Soc. Ser. B Methodological 30:576–581.Google Scholar
  • Cohen JW (1982) The Single Server Queue. 2nd ed. (Elsevier, Amsterdam).Google Scholar
  • Comte C, Jonckheere M, Sanders J, Senen-Cerda A (2023) Score-aware policy-gradient methods and performance guarantees using local Lyapunov conditions: Applications to product-form stochastic networks and queueing systems. Preprint, submitted December 5, https://arxiv.org/abs/2312.02804v1.Google Scholar
  • Dacre M, Glazebrook K, Niño-Mora J (1999) The achievable region approach to the optimal control of stochastic systems. J. Roy. Statist. Soc. Ser. B Statist. Methodology 61(4):747–791.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory COLT 2008 (Omnipress, Madison, WI), 355–366.Google Scholar
  • Desrosiers J, Lübbecke ME (2005) A primer in column generation. Desaulniers G, Desrosiers J, Solomon MM, eds. Column Generation (Springer, Boston), 1–32.Google Scholar
  • Fu X, Modiano E (2021) Learning-NUM: Network Utility Maximization with unknown utility functions and queueing delay. Proc. Internat. Sympos. Mobile Ad Hoc Networking Comput. MobiHoc (Association for Computing Machinery, New York), 21–30.Google Scholar
  • Fu X, Modiano E (2022) Joint learning and control in stochastic queueing networks with unknown utilities. Proc. ACM Measurement Anal. Comput. Systems 6(3):58.Google Scholar
  • García-Muñoz F, Dávila S, Quezada F (2023) A Benders decomposition approach for solving a two-stage local energy market problem under uncertainty. Appl. Energy 329:120226.Google Scholar
  • Gurvich I, Whitt W (2010) Service-level differentiation in many-server service systems via queue-ratio routing. Oper. Res. 58(2):316–328.LinkGoogle Scholar
  • Harchol-Balter M (2013) Performance Modeling and Design of Computer Systems: Queueing Theory in Action (Cambridge University Press, Cambridge, UK).Google Scholar
  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Google Scholar
  • Hsu WK, Xu J, Lin X, Bell MR (2022) Integrated online learning and adaptive control in queueing systems with uncertain payoffs. Oper. Res. 70(2):1166–1181.LinkGoogle Scholar
  • Jia H, Shi C, Shen S (2024) Online learning and pricing for service systems with reusable resources. Oper. Res. 72(3):1203–1241.LinkGoogle Scholar
  • Jiang L, Walrand J (2010) A distributed CSMA algorithm for throughput and utility maximization in wireless networks. IEEE/ACM Trans. Networking 18(3):960–972.Google Scholar
  • Johari R, Kamble V, Kanoria Y (2021) Matching while learning. Oper. Res. 69(2):655–681.LinkGoogle Scholar
  • Kim J-h, Vojnovic M (2021) Scheduling servers with stochastic bilinear rewards. Preprint, submitted December 13, https://arxiv.org/abs/2112.06362v1.Google Scholar
  • Koole GM, Mandelbaum A (2002) Queueing models of call centers: An introduction. Ann. Oper. Res. 113(1–4):41–59.Google Scholar
  • Krishnasamy S, Arapostathis A, Johari R, Shakkottai S (2018) On learning the cμ rule in single and parallel server networks. 2018 56th Annual Allerton Conf. Commun. Control Comput. Allerton 2018 (IEEE, Piscataway, NJ), 153–154.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Google Scholar
  • Lee D, Vojnovic M (2021) Scheduling jobs with stochastic holding costs. Adv. Neural Inform. Processing Systems 23:19375–19384.Google Scholar
  • Liu B, Xie Q, Modiano E (2019) RL-QN: A reinforcement learning framework for optimal control of queueing systems. 2019 57th Annual Allerton Conf. Commun. Control Comput. Allerton 2019 (IEEE, Piscataway, NJ), 663–670.Google Scholar
  • Liu X, Li B, Shi P, Ying L (2020) POND: Pessimistic–Optimistic oNline Dispatching. Preprint, submitted October 20, https://arxiv.org/abs/2010.09995v1.Google Scholar
  • Long Z, Zhang H, Zhang J, Zhang ZG (2024) The generalized c/μ rule for queues with heterogeneous server pools. Oper. Res. 72(6):2488–2506.LinkGoogle Scholar
  • Sanders J, Borst SC, Van Leeuwaarden JS (2016) Online network optimization using product-form Markov processes. IEEE Trans. Automatic Control 61(7):1838–1853.Google Scholar
  • Shah V, Gulikers L, Massoulié L, Vojnović M (2020) Adaptive matching for expert systems with uncertain task types. Oper. Res. 68(5):1403–1424.LinkGoogle Scholar
  • Shapley LS, Shubik M (1971) The assignment game I: The core. Internat. J. Game Theory 1(1):111–130.Google Scholar
  • Smith WE (1956) Various optimizers for single‐stage production. Naval Res. Logist. Quart. 3:59–66.Google Scholar
  • Steiger J, Li B, Lu N (2022) Learning from delayed semi-bandit feedback under strong fairness guarantees. IEEE INFOCOM 2022 IEEE Conf. Comput. Commun. (IEEE, Piscataway, NJ), 1379–1388.Google Scholar
  • Sun X, Zhao J, Chai S (2023) Congestion-aware matching and learning for service platforms. Preprint, submitted March 14, https://doi.org/10.2139/ssrn.5258944.Google Scholar
  • Tan B, Srikant R (2012) Online advertisement, optimization and stochastic networks. IEEE Trans. Automatic Control 57(11):2854–2868.Google Scholar
  • Tan X, Sun B, Leon-Garcia A, Wu Y, Tsang DH (2020) Mechanism design for online resource allocation: A unified approach. SIGMETRICS’20 Abstracts 2020 SIGMETRICS/Performance Joint Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 11–12.Google Scholar
  • Vera A, Banerjee S (2021) The Bayesian prophet: A low-regret framework for online decision making. Management Sci. 67(3):1368–1391.LinkGoogle Scholar
  • Visschers J, Adan I, Weiss G (2012) A product form solution to a system with multi-type jobs and multi-type servers. Queueing Systems 70(3):269–298.Google Scholar
  • Wei Y, Xu J, Yu SH (2023) Constant regret primal-dual policy for multi-way dynamic matching. Performance Evaluation Rev. 51(1):79–80.Google Scholar
  • Weiss G (2021) Scheduling and Control of Queueing Networks (Cambridge University Press, Cambridge, UK).Google Scholar
  • Xia L, Zhang ZG, Li QL (2022) A c/μ-rule for job assignment in heterogeneous group-server queues. Production Oper. Management 31(3):1191–1215.Google Scholar
  • Yang Z, Srikant R, Ying L (2023) Learning while scheduling in multi-server systems with unknown statistics: MaxWeight with Discounted UCB. Proc. Machine Learn. Res. 206:4275–4312.Google Scholar
  • Yu H, Neely MJ, Wei X (2017) Online convex optimization with stochastic constraints. Adv. Neural Inform. Processing Systems 30:1429–1439.Google Scholar
  • Zhong Y, Birge JR, Ward A (2022) Learning the scheduling policy in time-varying multiclass many server queues with abandonment. Preprint, submitted April 21, https://doi.org/10.2139/ssrn.4090021.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.