Stochastic Gradient Descent with Adaptive Data

Published Online:https://doi.org/10.1287/opre.2024.1014

References

  • Agarwal A, Duchi JC (2012) The generalization ability of online algorithms for dependent data. IEEE Trans. Inform. Theory 59(1):573–587.CrossrefGoogle Scholar
  • Agarwal A, Kakade SM, Lee JD, Mahajan G (2020) Optimality and approximation with policy gradient methods in Markov decision processes. Conf. Learn. Theory (PMLR, New York), 64–66.Google Scholar
  • Agarwal A, Kakade SM, Lee JD, Mahajan G (2021) On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Machine Learn. Res. 22(98):1–76.Google Scholar
  • Alvo M, Russo D, Kanoria Y, Lee M (2023) Deep reinforcement learning for inventory networks: Toward reliable policy optimization. Preprint, submitted June 20, https://arxiv.org/abs/2306.11246. Google Scholar
  • Anderson ET, Fitzsimons GJ, Simester D (2006) Measuring and mitigating the costs of stockouts. Management Sci. 52(11):1751–1763.LinkGoogle Scholar
  • Baxendale PH (2005) Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B):700–738.Google Scholar
  • Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
  • Bhandari J, Russo D (2024) Global optimality guarantees for policy gradient methods. Oper. Res. 72(5):1906–1927.LinkGoogle Scholar
  • Bhandari J, Russo D, Singal R (2018) A finite time analysis of temporal difference learning with linear function approximation. Conf. Learn. Theory (PMLR, New York), 1691–1692.Google Scholar
  • Che E, Dong J, Namkoong H (2024) Differentiable discrete event simulation for queuing network control. Preprint, submitted September 5, https://arxiv.org/abs/2409.03740.Google Scholar
  • Chen X, Liu Y, Hong G (2023a) An online learning approach to dynamic pricing and capacity sizing in service systems. Oper. Res. 72(6):2677–2697.LinkGoogle Scholar
  • Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2023b) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):1352–1367.LinkGoogle Scholar
  • Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.LinkGoogle Scholar
  • Cui T, Dong J, Jasra A, Tong XT (2025) Convergence speed and approximation accuracy of numerical MCMC. Adv. Appl. Probab. 57(1):101–133.CrossrefGoogle Scholar
  • Dalal G, Szörényi B, Thoppe G, Mannor S (2018) Finite sample analyses for TD(0) with function approximation. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
  • Doucet A, Tadic V (2017) Asymptotic bias of stochastic gradient search. Ann. Appl. Probab. 27(6):3255–3304.Google Scholar
  • Drusvyatskiy D, Xiao L (2023) Stochastic optimization with decision-dependent distributions. Math. Oper. Res. 48(2):954–998.LinkGoogle Scholar
  • Duchi JC, Agarwal A, Johansson M, Jordan MI (2012) Ergodic mirror descent. SIAM J. Optim. 22(4):1549–1578.CrossrefGoogle Scholar
  • Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.CrossrefGoogle Scholar
  • Glasserman P (1992) Stationary waiting time derivatives. Queueing Syst. 12:369–389.CrossrefGoogle Scholar
  • Glynn PW (1990) Likelihood ratio gradient estimation for stochastic systems. Comm. ACM 33(10):75–84.CrossrefGoogle Scholar
  • Glynn PW, Johari R, Rasouli M (2020) Adaptive experimental design with temporal interference: A maximum likelihood approach. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 15054–15064.Google Scholar
  • Heidelberger P, Cao X-R, Zazanis MA, Suri R (1988) Convergence properties of infinitesimal perturbation analysis estimates. Management Sci. 34(11):1281–1302.LinkGoogle Scholar
  • Hu Y, Wager S (2022) Switchback experiments under geometric mixing. Preprint, submitted September 1, https://arxiv.org/abs/2209.00197.Google Scholar
  • Huh WT, Rusmevichientong P (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150–159.LinkGoogle Scholar
  • Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.LinkGoogle Scholar
  • Huo D, Chen Y, Xie Q (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
  • Karimi B, Miasojedow B, Moulines E, Wai H-T (2019) Non-asymptotic analysis of biased stochastic approximation scheme. Conf. Learn. Theory (PMLR, New York), 1944–1974.Google Scholar
  • Kim J, Randhawa RS (2018) The value of dynamic pricing in large queueing systems. Oper. Res. 66(2):409–425.LinkGoogle Scholar
  • Krishnasamy S, Sen R, Johari R, Shakkottai S (2021) Learning unknown service rates in queues: A multiarmed bandit approach. Oper. Res. 69(1):315–330.LinkGoogle Scholar
  • Levin DA, Peres Y (2017) Markov Chains and Mixing Times, vol. 107, 2nd ed. (American Mathematical Society, Providence, RI).CrossrefGoogle Scholar
  • Li Q, Wai H-T (2022) State dependent performative prediction with stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3164–3186.Google Scholar
  • Li X, Liang J, Chen X, Zhang Z (2025) Convergence and inference of stream SGD, with applications to queueing systems and inventory control. Preprint, submitted September 18, https://arxiv.org/abs/2309.09545.Google Scholar
  • Mei J, Xiao C, Szepesvari C, Schuurmans D (2020) On the global convergence rates of softmax policy gradient methods. Internat. Conf. Machine Learn. (PMLR, New York), 6820–6829.Google Scholar
  • Mendler-Dünner C, Perdomo J, Zrnic T, Hardt M (2020) Stochastic optimization for performative prediction. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4929–4939.Google Scholar
  • Meyn SP, Tweedie RL (2009) Markov Chains and Stochastic Stability, 2nd ed. (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte Carlo gradient estimation in machine learning. J. Mach. Learning Res. 21(132):1–62.Google Scholar
  • Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Qu G, Wierman A (2020) Finite-time analysis of asynchronous stochastic approximation and q-learning. Conf. Learn. Theory (PMLR, New York), 3185–3205.Google Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • Roberts G, Rosenthal J (1997) Geometric ergodicity and hybrid Markov chains. Electron. Commun. Probab. 2:13–25.Google Scholar
  • Roy A, Balasubramanian K, Ghadimi S (2022) Constrained stochastic nonconvex optimization with state-dependent Markov data. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 23256–23270.Google Scholar
  • Rudolf D, Schweizer N (2018) Perturbation theory for Markov chains via Wasserstein distance. Bernoulli 24(4A):2610–2639.Google Scholar
  • Shapiro A, Dentcheva D, Ruszczynski A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
  • Sun T, Sun Y, Yin W (2018) On Markov chain gradient descent. Preprint, submitted September 12, https://arxiv.org/abs/1809.04216.Google Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Tang J, Chen B, Shi C (2023) Online learning for dual-index policies in dual-sourcing systems. Manufacturing Service Oper. Management 26(2):758–774.LinkGoogle Scholar
  • Walton N, Xu K (2021) Learning and information in stochastic networks and queues. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications (INFORMS, Catonsville, MD), 161–198.Google Scholar
  • Wang L, Cai Q, Yang Z, Wang Z (2019) Neural policy gradient methods: Global optimality and rates of convergence. Preprint, submitted August 29, https://arxiv.org/abs/1909.01150.Google Scholar
  • Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learn. 8:229–256.CrossrefGoogle Scholar
  • Xiao L (2022) On the convergence rates of policy gradient methods. J. Machine Learn. Res. 23(282):1–36.Google Scholar
  • Xiong H, Xu T, Liang Y, Zhang W (2021) Non-asymptotic convergence of Adam-type reinforcement learning algorithms under Markovian sampling. Proc. AAAI Conf. Artificial Intelligence 35(12):10460–10468.CrossrefGoogle Scholar
  • Xu T, Wang Z, Liang Y (2020) Improving sample complexity bounds for (natural) actor-critic algorithms. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4358–4369.Google Scholar
  • Yuan R, Gower RM, Lazaric A (2022) A general sample complexity analysis of vanilla policy gradient. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3332–3380.Google Scholar
  • Zhang H, Chao X, Shi C (2020a) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.LinkGoogle Scholar
  • Zhang K, Koppel A, Zhu H, Basar T (2020b) Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control Optim. 58(6):3586–3612.CrossrefGoogle Scholar
  • Zhong Y, Birge JR, Ward A (2022) Learning to schedule in multiclass many-server queues with abandonment. Preprint, submitted May 9, http://dx.doi.org/10.2139/ssrn.4090021.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.