Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes
Published Online:23 Jul 2024https://doi.org/10.1287/mnsc.2020.00038
References
- (2003) Price-directed replenishment of subsets: Methodology and its application to inventory routing. Manufacturing Service Oper. Management 5(4):348–371.Link, Google Scholar
- (2012) Computing near-optimal policies in generalized joint replenishment. INFORMS J. Comput. 24(1):148–164.Link, Google Scholar
- (2013) Dynamic capacity allocation to customers who remember past service. Management Sci. 59(3):592–612.Link, Google Scholar
- (2019) Multiagent mechanism design without money. Oper. Res. 67(5):1417–1436.Link, Google Scholar
- (2017) Strong duality and sensitivity analysis in semi-infinite linear programming. Math. Programming 161(1–2):451–485.Crossref, Google Scholar
- (2016) Detection of mitotic nuclei in breast histopathology images using localized ACM and Random Kitchen Sink based classifier. 2016 38th Annual Internat. Conf. IEEE Engrg. Medicine Biol. Soc. (EMBC) (IEEE, Piscataway, NJ), 2435–2439.Google Scholar
- (1996) Neuro-Dynamic Programming, vol. 5 (Athena Scientific, Belmont, MA).Google Scholar
- (2012) Non-parametric approximate dynamic programming via the kernel method. Adv. Neural Inf. Process. Syst. 25:386–394.Google Scholar
- (2019) Relaxation analysis for the dynamic knapsack problem with stochastic item sizes. SIAM J. Optim. 29(1):1–30.Crossref, Google Scholar
- (2022) Information relaxations and duality in stochastic dynamic programs: A review and tutorial. Foundations and Trends in Optimization 5(3):246–339.Crossref, Google Scholar
- (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4-part-1):785–801.Link, Google Scholar
- (2005) Uncertain convex programs: Randomized solutions and confidence levels. Math. Programming 102(1):25–46.Crossref, Google Scholar
- (2006) The scenario approach to robust control design. IEEE Trans. Automat. Control. 51(5):742–753.Crossref, Google Scholar
- (2012) Spectral Methods in Fluid Dynamics (Springer Science & Business Media, New York).Google Scholar
- (1996) Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance Math. Econom. 19(1):19–30.Crossref, Google Scholar
- (2014) Coordinating inventory control and pricing strategies for perishable products. Oper. Res. 62(2):284–300.Link, Google Scholar
- (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
- (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29(3):462–478.Link, Google Scholar
- (2012a) Approximate dynamic programming via a smoothed linear program. Oper. Res. 60(3):655–674.Link, Google Scholar
- (2012b) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Link, Google Scholar
- (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.Link, Google Scholar
- (2006) Tetris: A Study of Randomized Constraint Sampling (Springer London, London), 189–201.Google Scholar
- (2006) Approximate linear-programming algorithms for graph-based Markov decision processes. Proc. 2006 Conf. ECAI 2006 17th Eur. Conf. Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, 590–594.Google Scholar
- (2021) Sample-efficient automated deep reinforcement learning. Internat. Conf. Learning Representations (ICLR, Appleton, WI).Google Scholar
- (2018) Addressing function approximation error in actor-critic methods. Internat. Conf. Machine Learn. (PMLR, New York), 1587–1596.Google Scholar
- (2004) Simulation for American options: Regression now or regression later? Monte Carlo and Quasi-Monte Carlo Methods 2002 (Springer, Berlin, Heidelberg), 213–226.Crossref, Google Scholar
- (2003) Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Res. 19:399–468.Crossref, Google Scholar
- (2024) Learning to walk via deep reinforcement learning. Robot. Sci. Syst. Forthcoming.Google Scholar
- (2020) A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Automatic Control 65(1):115–129.Crossref, Google Scholar
- (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.Link, Google Scholar
- (1996) Discrete-time Markov Control Processes: Basic Optimality Criteria, vol. 30 (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2015) Structural properties of the optimal policy for dual-sourcing systems with general lead times. IIE Trans. 47(8):841–850.Crossref, Google Scholar
- (2011) Managing Perishable and Aging Inventories: Review and Future Research Directions (Springer, New York), 393–436.Google Scholar
- (2007) An infinite-dimensional linear programming algorithm for deterministic semi-Markov decision processes on borel spaces. Math. Oper. Res. 32(3):528–550.Link, Google Scholar
- (1998) Error bounds for convex inequality systems. Generalized Convexity, Generalized Monotonicity: Recent Results (Springer US, Boston), 75–110.Google Scholar
- (2020) Revisiting approximate linear programming: Constraint-violation learning with applications to inventory control and energy storage. Management Sci. 66(4):1544–1562.Link, Google Scholar
- (2022) A parameter-free and projection-free restarting level set method for adaptive constrained convex optimization under the error bound condition. Working paper, The University of Iowa, Tippie College of Business, Iowa City.Google Scholar
- (2001) Valuing American options by simulation: A simple least-squares approach. Rev. Financ. Stud. 14(1):113–147.Crossref, Google Scholar
- (2013) Faster ridge regression via the Subsampled Randomized Hadamard Transform. Adv. Neural Inf. Process. Syst. 26:369–377.Google Scholar
- (2010) Air-combat strategy using approximate dynamic programming. J. Guid. Control Dyn. 33(5):1641–1654.Crossref, Google Scholar
- (2013) Correlated random features for fast semi-supervised learning. Adv. Neural Inf. Process. Syst. 26:440–448.Google Scholar
- (2006) Universal kernels. J. Mach. Learn. Res. 7(Dec):2651–2667.Google Scholar
- (2017) Approximate linear programming for logistic Markov decision processes. Proc. Twenty-sixth Internat. Joint Conf. Artificial Intelligence (Melbourne, Australia), 2486–2493.Google Scholar
- (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
- (2012) Foundations of Machine Learning, 1st ed. (MIT Press, Cambridge, MA).Google Scholar
- (2023) A review of the operations literature on real options in energy. Eur. J. Oper. Res. 309(2):469–487.Crossref, Google Scholar
- (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.Link, Google Scholar
- (2019) Fourier tools are much more powerful than commonly thought. Lobachevskii J. Math. 40(8):1122–1131.Crossref, Google Scholar
- (2019) Deep exploration via randomized value functions. J. Mach. Learn. Res. 20(124):1–62.Google Scholar
- (2003) Reinforcement learning for humanoid robotics. Proc. Third IEEE-RAS Internat. Conf. Humanoid Robots (Karlsruhe, Germany), 1–20.Google Scholar
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2007) Random features for large-scale kernel machines. Adv. Neural Inf. Process. Syst. 20:1177–1184.Google Scholar
- (2008a) Uniform approximation of functions with random bases. 2008 46th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 555–561.Google Scholar
- (2008b) Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. Adv. Neural Inf. Process. Syst. 21:1313–1320.Google Scholar
- (1985) Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110(2):568–582.Crossref, Google Scholar
- (2018) On data-dependent random features for improved generalization in supervised learning. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
- (2009) Semi-infinite programming, duality, discretization and optimality conditions. Optimization 58(2):133–161.Crossref, Google Scholar
- (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359.Crossref, Google Scholar
- (2016) Learning kernels with random features. Adv. Neural Inf. Process. Syst. 29:1298–1306.Google Scholar
- (2017) Markov decision processes for screening and treatment of chronic diseases. Markov Decision Processes in Practice (Springer International Publishing, Cham, Switzerland), 189–222.Crossref, Google Scholar
- (2014) Quadratic Approximation of Cost Functions in Lost Sales and Perishable Inventory Control Problems (Fuqua School of Business, Duke University, Durham, NC).Google Scholar
- (2013) On the approximate linear programming approach for network revenue management problems. INFORMS J. Comput. 26(1):121–134.Link, Google Scholar
- (2010) Stability of error bounds for semi-infinite convex constraint systems. SIAM J. Optim. 20(4):2080–2096.Crossref, Google Scholar
- (2020) Random sketching for neural networks with ReLU. IEEE Trans. Neural Netw. Learn. Syst. 32(2):748–762.Crossref, Google Scholar
- (2018) Scalable spectral clustering using random binning features. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2506–2515.Google Scholar
- (2019) Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378.Crossref, Google Scholar
- (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.Link, Google Scholar

