Stochastic Gradient Descent with Adaptive Data

Ethan Che
Ethan Che
[email protected]
https://orcid.org/0000-0002-7770-6904
Columbia University, New York, New York 10027
Search for more papers by this author
,
Jing Dong
Corresponding Author
Jing Dong
[email protected]
https://orcid.org/0000-0001-6387-4088
Columbia University, New York, New York 10027
Search for more papers by this author
,
Xin T. Tong
Xin T. Tong
[email protected]
https://orcid.org/0000-0002-8124-612X
National University of Singapore, Singapore 119077
Search for more papers by this author

Columbia University, New York, New York 10027

Search for more papers by this author

Jing Dong

Corresponding Author

Jing Dong

[email protected]

https://orcid.org/0000-0001-6387-4088

Columbia University, New York, New York 10027

Search for more papers by this author

Xin T. Tong

[email protected]

https://orcid.org/0000-0002-8124-612X

National University of Singapore, Singapore 119077

Search for more papers by this author

Published Online:4 Mar 2026https://doi.org/10.1287/opre.2024.1014

References

Agarwal A, Duchi JC (2012) The generalization ability of online algorithms for dependent data. IEEE Trans. Inform. Theory 59(1):573–587.Crossref, Google Scholar
Agarwal A, Kakade SM, Lee JD, Mahajan G (2020) Optimality and approximation with policy gradient methods in Markov decision processes. Conf. Learn. Theory (PMLR, New York), 64–66.Google Scholar
Agarwal A, Kakade SM, Lee JD, Mahajan G (2021) On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Machine Learn. Res. 22(98):1–76.Google Scholar
Alvo M, Russo D, Kanoria Y, Lee M (2023) Deep reinforcement learning for inventory networks: Toward reliable policy optimization. Preprint, submitted June 20, https://arxiv.org/abs/2306.11246. Google Scholar
Anderson ET, Fitzsimons GJ, Simester D (2006) Measuring and mitigating the costs of stockouts. Management Sci. 52(11):1751–1763.Link, Google Scholar
Baxendale PH (2005) Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B):700–738.Google Scholar
Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
Bhandari J, Russo D (2024) Global optimality guarantees for policy gradient methods. Oper. Res. 72(5):1906–1927.Link, Google Scholar
Bhandari J, Russo D, Singal R (2018) A finite time analysis of temporal difference learning with linear function approximation. Conf. Learn. Theory (PMLR, New York), 1691–1692.Google Scholar
Che E, Dong J, Namkoong H (2024) Differentiable discrete event simulation for queuing network control. Preprint, submitted September 5, https://arxiv.org/abs/2409.03740.Google Scholar
Chen X, Liu Y, Hong G (2023a) An online learning approach to dynamic pricing and capacity sizing in service systems. Oper. Res. 72(6):2677–2697.Link, Google Scholar
Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2023b) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):1352–1367.Link, Google Scholar
Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.Link, Google Scholar
Cui T, Dong J, Jasra A, Tong XT (2025) Convergence speed and approximation accuracy of numerical MCMC. Adv. Appl. Probab. 57(1):101–133.Crossref, Google Scholar
Dalal G, Szörényi B, Thoppe G, Mannor S (2018) Finite sample analyses for TD(0) with function approximation. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
Doucet A, Tadic V (2017) Asymptotic bias of stochastic gradient search. Ann. Appl. Probab. 27(6):3255–3304.Google Scholar
Drusvyatskiy D, Xiao L (2023) Stochastic optimization with decision-dependent distributions. Math. Oper. Res. 48(2):954–998.Link, Google Scholar
Duchi JC, Agarwal A, Johansson M, Jordan MI (2012) Ergodic mirror descent. SIAM J. Optim. 22(4):1549–1578.Crossref, Google Scholar
Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
Glasserman P (1992) Stationary waiting time derivatives. Queueing Syst. 12:369–389.Crossref, Google Scholar
Glynn PW (1990) Likelihood ratio gradient estimation for stochastic systems. Comm. ACM 33(10):75–84.Crossref, Google Scholar
Glynn PW, Johari R, Rasouli M (2020) Adaptive experimental design with temporal interference: A maximum likelihood approach. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 15054–15064.Google Scholar
Heidelberger P, Cao X-R, Zazanis MA, Suri R (1988) Convergence properties of infinitesimal perturbation analysis estimates. Management Sci. 34(11):1281–1302.Link, Google Scholar
Hu Y, Wager S (2022) Switchback experiments under geometric mixing. Preprint, submitted September 1, https://arxiv.org/abs/2209.00197.Google Scholar
Huh WT, Rusmevichientong P (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150–159.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Huo D, Chen Y, Xie Q (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
Karimi B, Miasojedow B, Moulines E, Wai H-T (2019) Non-asymptotic analysis of biased stochastic approximation scheme. Conf. Learn. Theory (PMLR, New York), 1944–1974.Google Scholar
Kim J, Randhawa RS (2018) The value of dynamic pricing in large queueing systems. Oper. Res. 66(2):409–425.Link, Google Scholar
Krishnasamy S, Sen R, Johari R, Shakkottai S (2021) Learning unknown service rates in queues: A multiarmed bandit approach. Oper. Res. 69(1):315–330.Link, Google Scholar
Levin DA, Peres Y (2017) Markov Chains and Mixing Times, vol. 107, 2nd ed. (American Mathematical Society, Providence, RI).Crossref, Google Scholar
Li Q, Wai H-T (2022) State dependent performative prediction with stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3164–3186.Google Scholar
Li X, Liang J, Chen X, Zhang Z (2025) Convergence and inference of stream SGD, with applications to queueing systems and inventory control. Preprint, submitted September 18, https://arxiv.org/abs/2309.09545.Google Scholar
Mei J, Xiao C, Szepesvari C, Schuurmans D (2020) On the global convergence rates of softmax policy gradient methods. Internat. Conf. Machine Learn. (PMLR, New York), 6820–6829.Google Scholar
Mendler-Dünner C, Perdomo J, Zrnic T, Hardt M (2020) Stochastic optimization for performative prediction. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4929–4939.Google Scholar
Meyn SP, Tweedie RL (2009) Markov Chains and Stochastic Stability, 2nd ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte Carlo gradient estimation in machine learning. J. Mach. Learning Res. 21(132):1–62.Google Scholar
Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY).Google Scholar
Qu G, Wierman A (2020) Finite-time analysis of asynchronous stochastic approximation and q-learning. Conf. Learn. Theory (PMLR, New York), 3185–3205.Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
Roberts G, Rosenthal J (1997) Geometric ergodicity and hybrid Markov chains. Electron. Commun. Probab. 2:13–25.Google Scholar
Roy A, Balasubramanian K, Ghadimi S (2022) Constrained stochastic nonconvex optimization with state-dependent Markov data. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 23256–23270.Google Scholar
Rudolf D, Schweizer N (2018) Perturbation theory for Markov chains via Wasserstein distance. Bernoulli 24(4A):2610–2639.Google Scholar
Shapiro A, Dentcheva D, Ruszczynski A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
Sun T, Sun Y, Yin W (2018) On Markov chain gradient descent. Preprint, submitted September 12, https://arxiv.org/abs/1809.04216.Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Tang J, Chen B, Shi C (2023) Online learning for dual-index policies in dual-sourcing systems. Manufacturing Service Oper. Management 26(2):758–774.Link, Google Scholar
Walton N, Xu K (2021) Learning and information in stochastic networks and queues. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications (INFORMS, Catonsville, MD), 161–198.Google Scholar
Wang L, Cai Q, Yang Z, Wang Z (2019) Neural policy gradient methods: Global optimality and rates of convergence. Preprint, submitted August 29, https://arxiv.org/abs/1909.01150.Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learn. 8:229–256.Crossref, Google Scholar
Xiao L (2022) On the convergence rates of policy gradient methods. J. Machine Learn. Res. 23(282):1–36.Google Scholar
Xiong H, Xu T, Liang Y, Zhang W (2021) Non-asymptotic convergence of Adam-type reinforcement learning algorithms under Markovian sampling. Proc. AAAI Conf. Artificial Intelligence 35(12):10460–10468.Crossref, Google Scholar
Xu T, Wang Z, Liang Y (2020) Improving sample complexity bounds for (natural) actor-critic algorithms. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4358–4369.Google Scholar
Yuan R, Gower RM, Lazaric A (2022) A general sample complexity analysis of vanilla policy gradient. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3332–3380.Google Scholar
Zhang H, Chao X, Shi C (2020a) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zhang K, Koppel A, Zhu H, Basar T (2020b) Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control Optim. 58(6):3586–3612.Crossref, Google Scholar
Zhong Y, Birge JR, Ward A (2022) Learning to schedule in multiclass many-server queues with abandonment. Preprint, submitted May 9, http://dx.doi.org/10.2139/ssrn.4090021.Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:May 08, 2024
Accepted:January 13, 2026
Published Online:March 04, 2026

Cite as

Ethan Che, Jing Dong, Xin T. Tong (2026) Stochastic Gradient Descent with Adaptive Data. Operations Research 0(0).

https://doi.org/10.1287/opre.2024.1014

Keywords

Acknowledgments

The authors thank the anonymous Associate Editor and reviewers for their constructive and insightful feedback.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Stochastic Gradient Descent with Adaptive Data

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News