Stochastic Gradient Descent with Adaptive Data
References
- (2012) The generalization ability of online algorithms for dependent data. IEEE Trans. Inform. Theory 59(1):573–587.Crossref, Google Scholar
- (2020) Optimality and approximation with policy gradient methods in Markov decision processes. Conf. Learn. Theory (PMLR, New York), 64–66.Google Scholar
- (2021) On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Machine Learn. Res. 22(98):1–76.Google Scholar
- (2023) Deep reinforcement learning for inventory networks: Toward reliable policy optimization. Preprint, submitted June 20, https://arxiv.org/abs/2306.11246. Google Scholar
- (2006) Measuring and mitigating the costs of stockouts. Management Sci. 52(11):1751–1763.Link, Google Scholar
- (2005) Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B):700–738.Google Scholar
- (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
- (2024) Global optimality guarantees for policy gradient methods. Oper. Res. 72(5):1906–1927.Link, Google Scholar
- (2018) A finite time analysis of temporal difference learning with linear function approximation. Conf. Learn. Theory (PMLR, New York), 1691–1692.Google Scholar
- (2024) Differentiable discrete event simulation for queuing network control. Preprint, submitted September 5, https://arxiv.org/abs/2409.03740.Google Scholar
- (2023a) An online learning approach to dynamic pricing and capacity sizing in service systems. Oper. Res. 72(6):2677–2697.Link, Google Scholar
- (2023b) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):1352–1367.Link, Google Scholar
- (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.Link, Google Scholar
- (2025) Convergence speed and approximation accuracy of numerical MCMC. Adv. Appl. Probab. 57(1):101–133.Crossref, Google Scholar
- (2018) Finite sample analyses for TD(0) with function approximation. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
- (2017) Asymptotic bias of stochastic gradient search. Ann. Appl. Probab. 27(6):3255–3304.Google Scholar
- (2023) Stochastic optimization with decision-dependent distributions. Math. Oper. Res. 48(2):954–998.Link, Google Scholar
- (2012) Ergodic mirror descent. SIAM J. Optim. 22(4):1549–1578.Crossref, Google Scholar
- (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
- (1992) Stationary waiting time derivatives. Queueing Syst. 12:369–389.Crossref, Google Scholar
- (1990) Likelihood ratio gradient estimation for stochastic systems. Comm. ACM 33(10):75–84.Crossref, Google Scholar
- (2020) Adaptive experimental design with temporal interference: A maximum likelihood approach. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 15054–15064.Google Scholar
- (1988) Convergence properties of infinitesimal perturbation analysis estimates. Management Sci. 34(11):1281–1302.Link, Google Scholar
- (2022) Switchback experiments under geometric mixing. Preprint, submitted September 1, https://arxiv.org/abs/2209.00197.Google Scholar
- (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150–159.Link, Google Scholar
- (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
- (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
- (2019) Non-asymptotic analysis of biased stochastic approximation scheme. Conf. Learn. Theory (PMLR, New York), 1944–1974.Google Scholar
- (2018) The value of dynamic pricing in large queueing systems. Oper. Res. 66(2):409–425.Link, Google Scholar
- (2021) Learning unknown service rates in queues: A multiarmed bandit approach. Oper. Res. 69(1):315–330.Link, Google Scholar
- (2017) Markov Chains and Mixing Times, vol. 107, 2nd ed. (American Mathematical Society, Providence, RI).Crossref, Google Scholar
- (2022) State dependent performative prediction with stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3164–3186.Google Scholar
- (2025) Convergence and inference of stream SGD, with applications to queueing systems and inventory control. Preprint, submitted September 18, https://arxiv.org/abs/2309.09545.Google Scholar
- (2020) On the global convergence rates of softmax policy gradient methods. Internat. Conf. Machine Learn. (PMLR, New York), 6820–6829.Google Scholar
- (2020) Stochastic optimization for performative prediction. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4929–4939.Google Scholar
- (2009) Markov Chains and Stochastic Stability, 2nd ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2020) Monte Carlo gradient estimation in machine learning. J. Mach. Learning Res. 21(132):1–62.Google Scholar
- (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2020) Finite-time analysis of asynchronous stochastic approximation and q-learning. Conf. Learn. Theory (PMLR, New York), 3185–3205.Google Scholar
- (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- (1997) Geometric ergodicity and hybrid Markov chains. Electron. Commun. Probab. 2:13–25.Google Scholar
- (2022) Constrained stochastic nonconvex optimization with state-dependent Markov data. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 23256–23270.Google Scholar
- (2018) Perturbation theory for Markov chains via Wasserstein distance. Bernoulli 24(4A):2610–2639.Google Scholar
- (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
- (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
- (2018) On Markov chain gradient descent. Preprint, submitted September 12, https://arxiv.org/abs/1809.04216.Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2023) Online learning for dual-index policies in dual-sourcing systems. Manufacturing Service Oper. Management 26(2):758–774.Link, Google Scholar
- (2021) Learning and information in stochastic networks and queues. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications (INFORMS, Catonsville, MD), 161–198.Google Scholar
- (2019) Neural policy gradient methods: Global optimality and rates of convergence. Preprint, submitted August 29, https://arxiv.org/abs/1909.01150.Google Scholar
- (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learn. 8:229–256.Crossref, Google Scholar
- (2022) On the convergence rates of policy gradient methods. J. Machine Learn. Res. 23(282):1–36.Google Scholar
- (2021) Non-asymptotic convergence of Adam-type reinforcement learning algorithms under Markovian sampling. Proc. AAAI Conf. Artificial Intelligence 35(12):10460–10468.Crossref, Google Scholar
- (2020) Improving sample complexity bounds for (natural) actor-critic algorithms. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 4358–4369.Google Scholar
- (2022) A general sample complexity analysis of vanilla policy gradient. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3332–3380.Google Scholar
- (2020a) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
- (2020b) Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control Optim. 58(6):3586–3612.Crossref, Google Scholar
- (2022) Learning to schedule in multiclass many-server queues with abandonment. Preprint, submitted May 9, http://dx.doi.org/10.2139/ssrn.4090021.Google Scholar

