Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes

Parshan Pakiman
Parshan Pakiman
[email protected]
https://orcid.org/0000-0001-5372-9135
College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607
Search for more papers by this author
,
Selvaprabu Nadarajah
Corresponding Author
Selvaprabu Nadarajah
[email protected]
https://orcid.org/0000-0001-6218-5862
College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607
Search for more papers by this author
,
Negar Soheili
Negar Soheili
[email protected]
https://orcid.org/0000-0002-2114-5319
College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607
Search for more papers by this author
,
Qihang Lin
Qihang Lin
[email protected]
https://orcid.org/0000-0003-2943-3267
Tippie College of Business, University of Iowa, Iowa City, Iowa 52242
Search for more papers by this author

College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607

Search for more papers by this author

Selvaprabu Nadarajah

Corresponding Author

Selvaprabu Nadarajah

[email protected]

https://orcid.org/0000-0001-6218-5862

College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607

Search for more papers by this author

Negar Soheili

[email protected]

https://orcid.org/0000-0002-2114-5319

College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607

Search for more papers by this author

Qihang Lin

[email protected]

https://orcid.org/0000-0003-2943-3267

Tippie College of Business, University of Iowa, Iowa City, Iowa 52242

Search for more papers by this author

Published Online:23 Jul 2024https://doi.org/10.1287/mnsc.2020.00038

References

Adelman D (2003) Price-directed replenishment of subsets: Methodology and its application to inventory routing. Manufacturing Service Oper. Management 5(4):348–371.Link, Google Scholar
Adelman D, Klabjan D (2012) Computing near-optimal policies in generalized joint replenishment. INFORMS J. Comput. 24(1):148–164.Link, Google Scholar
Adelman D, Mersereau AJ (2013) Dynamic capacity allocation to customers who remember past service. Management Sci. 59(3):592–612.Link, Google Scholar
Balseiro SR, Gurkan H, Sun P (2019) Multiagent mechanism design without money. Oper. Res. 67(5):1417–1436.Link, Google Scholar
Basu A, Martin K, Ryan CT (2017) Strong duality and sensitivity analysis in semi-infinite linear programming. Math. Programming 161(1–2):451–485.Crossref, Google Scholar
Beevi KS, Nair MS, Bindu GR (2016) Detection of mitotic nuclei in breast histopathology images using localized ACM and Random Kitchen Sink based classifier. 2016 38th Annual Internat. Conf. IEEE Engrg. Medicine Biol. Soc. (EMBC) (IEEE, Piscataway, NJ), 2435–2439.Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming, vol. 5 (Athena Scientific, Belmont, MA).Google Scholar
Bhat N, Farias V, Moallemi CC (2012) Non-parametric approximate dynamic programming via the kernel method. Adv. Neural Inf. Process. Syst. 25:386–394.Google Scholar
Blado D, Toriello A (2019) Relaxation analysis for the dynamic knapsack problem with stochastic item sizes. SIAM J. Optim. 29(1):1–30.Crossref, Google Scholar
Brown DB, Smith JE (2022) Information relaxations and duality in stochastic dynamic programs: A review and tutorial. Foundations and Trends in Optimization 5(3):246–339.Crossref, Google Scholar
Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4-part-1):785–801.Link, Google Scholar
Calafiore G, Campi MC (2005) Uncertain convex programs: Randomized solutions and confidence levels. Math. Programming 102(1):25–46.Crossref, Google Scholar
Calafiore GC, Campi MC (2006) The scenario approach to robust control design. IEEE Trans. Automat. Control. 51(5):742–753.Crossref, Google Scholar
Canuto C, Hussaini MY, Quarteroni A, Thomas AZ (2012) Spectral Methods in Fluid Dynamics (Springer Science & Business Media, New York).Google Scholar
Carriere JF (1996) Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance Math. Econom. 19(1):19–30.Crossref, Google Scholar
Chen X, Pang Z, Pan L (2014) Coordinating inventory control and pricing strategies for perishable products. Oper. Res. 62(2):284–300.Link, Google Scholar
De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
De Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29(3):462–478.Link, Google Scholar
Desai VV, Farias VF, Moallemi CC (2012a) Approximate dynamic programming via a smoothed linear program. Oper. Res. 60(3):655–674.Link, Google Scholar
Desai VV, Farias VF, Moallemi CC (2012b) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Link, Google Scholar
Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.Link, Google Scholar
Farias VF, Van Roy B (2006) Tetris: A Study of Randomized Constraint Sampling (Springer London, London), 189–201.Google Scholar
Forsell N, Sabbadin R (2006) Approximate linear-programming algorithms for graph-based Markov decision processes. Proc. 2006 Conf. ECAI 2006 17th Eur. Conf. Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, 590–594.Google Scholar
Franke JK, Koehler G, Biedenkapp A, Hutter F (2021) Sample-efficient automated deep reinforcement learning. Internat. Conf. Learning Representations (ICLR, Appleton, WI).Google Scholar
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Internat. Conf. Machine Learn. (PMLR, New York), 1587–1596.Google Scholar
Glasserman P, Yu B (2004) Simulation for American options: Regression now or regression later? Monte Carlo and Quasi-Monte Carlo Methods 2002 (Springer, Berlin, Heidelberg), 213–226.Crossref, Google Scholar
Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Res. 19:399–468.Crossref, Google Scholar
Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S (2024) Learning to walk via deep reinforcement learning. Robot. Sci. Syst. Forthcoming.Google Scholar
Haskell WB, Jain R, Sharma H, Yu P (2020) A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Automatic Control 65(1):115–129.Crossref, Google Scholar
Haugh MB, Kogan L (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.Link, Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov Control Processes: Basic Optimality Criteria, vol. 30 (Springer Science & Business Media, New York).Crossref, Google Scholar
Hua Z, Yu Y, Zhang W, Xu X (2015) Structural properties of the optimal policy for dual-sourcing systems with general lead times. IIE Trans. 47(8):841–850.Crossref, Google Scholar
Karaesmen IZ, Scheller-Wolf A, Deniz B (2011) Managing Perishable and Aging Inventories: Review and Future Research Directions (Springer, New York), 393–436.Google Scholar
Klabjan D, Adelman D (2007) An infinite-dimensional linear programming algorithm for deterministic semi-Markov decision processes on borel spaces. Math. Oper. Res. 32(3):528–550.Link, Google Scholar
Lewis AS, Pang J-S (1998) Error bounds for convex inequality systems. Generalized Convexity, Generalized Monotonicity: Recent Results (Springer US, Boston), 75–110.Google Scholar
Lin Q, Nadarajah S, Soheili N (2020) Revisiting approximate linear programming: Constraint-violation learning with applications to inventory control and energy storage. Management Sci. 66(4):1544–1562.Link, Google Scholar
Lin Q, Ma R, Nadarajah S, Soheili N (2022) A parameter-free and projection-free restarting level set method for adaptive constrained convex optimization under the error bound condition. Working paper, The University of Iowa, Tippie College of Business, Iowa City.Google Scholar
Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: A simple least-squares approach. Rev. Financ. Stud. 14(1):113–147.Crossref, Google Scholar
Lu Y, Dhillon P, Foster DP, Ungar L (2013) Faster ridge regression via the Subsampled Randomized Hadamard Transform. Adv. Neural Inf. Process. Syst. 26:369–377.Google Scholar
McGrew JS, How JP, Williams B, Roy N (2010) Air-combat strategy using approximate dynamic programming. J. Guid. Control Dyn. 33(5):1641–1654.Crossref, Google Scholar
McWilliams B, Balduzzi D, Buhmann JM (2013) Correlated random features for fast semi-supervised learning. Adv. Neural Inf. Process. Syst. 26:440–448.Google Scholar
Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J. Mach. Learn. Res. 7(Dec):2651–2667.Google Scholar
Mladenov M, Boutilier C, Schuurmans D, Elidan G, Meshi O, Lu T (2017) Approximate linear programming for logistic Markov decision processes. Proc. Twenty-sixth Internat. Joint Conf. Artificial Intelligence (Melbourne, Australia), 2486–2493.Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of Machine Learning, 1st ed. (MIT Press, Cambridge, MA).Google Scholar
Nadarajah S, Secomandi N (2023) A review of the operations literature on real options in energy. Eur. J. Oper. Res. 309(2):469–487.Crossref, Google Scholar
Nadarajah S, Margot F, Secomandi N (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.Link, Google Scholar
Nersessian A (2019) Fourier tools are much more powerful than commonly thought. Lobachevskii J. Math. 40(8):1122–1131.Crossref, Google Scholar
Osband I, Van Roy B, Russo DJ, Wen Z (2019) Deep exploration via randomized value functions. J. Mach. Learn. Res. 20(124):1–62.Google Scholar
Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. Proc. Third IEEE-RAS Internat. Conf. Humanoid Robots (Karlsruhe, Germany), 1–20.Google Scholar
Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. Adv. Neural Inf. Process. Syst. 20:1177–1184.Google Scholar
Rahimi A, Recht B (2008a) Uniform approximation of functions with random bases. 2008 46th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 555–561.Google Scholar
Rahimi A, Recht B (2008b) Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. Adv. Neural Inf. Process. Syst. 21:1313–1320.Google Scholar
Schweitzer PJ, Seidmann A (1985) Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110(2):568–582.Crossref, Google Scholar
Shahrampour S, Beirami A, Tarokh V (2018) On data-dependent random features for improved generalization in supervised learning. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
Shapiro A (2009) Semi-infinite programming, duality, discretization and optimality conditions. Optimization 58(2):133–161.Crossref, Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al. (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359.Crossref, Google Scholar
Sinha A, Duchi JC (2016) Learning kernels with random features. Adv. Neural Inf. Process. Syst. 29:1298–1306.Google Scholar
Steimle LN, Denton BT (2017) Markov decision processes for screening and treatment of chronic diseases. Markov Decision Processes in Practice (Springer International Publishing, Cham, Switzerland), 189–222.Crossref, Google Scholar
Sun P, Wang K, Zipkin P (2014) Quadratic Approximation of Cost Functions in Lost Sales and Perishable Inventory Control Problems (Fuqua School of Business, Duke University, Durham, NC).Google Scholar
Tong C, Topaloglu H (2013) On the approximate linear programming approach for network revenue management problems. INFORMS J. Comput. 26(1):121–134.Link, Google Scholar
Van Ngai H, Kruger A, Théra M (2010) Stability of error bounds for semi-infinite convex constraint systems. SIAM J. Optim. 20(4):2080–2096.Crossref, Google Scholar
Wang D, Zeng J, Lin SB (2020) Random sketching for neural networks with ReLU. IEEE Trans. Neural Netw. Learn. Syst. 32(2):748–762.Crossref, Google Scholar
Wu L, Chen PY, Yen IEH, Xu F, Xia Y, Aggarwal C (2018) Scalable spectral clustering using random binning features. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2506–2515.Google Scholar
Yang Q, Zhang J, Shi G, Hu J, Wu Y (2019) Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378.Crossref, Google Scholar
Zipkin P (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.Link, Google Scholar

Volume 71, Issue 4

April 2025

Pages iv-vi, 2751-3636

Article Information

Supplemental Material

Metrics

Information

Received:January 07, 2020
Accepted:December 08, 2023
Published Online:July 23, 2024

Cite as

Parshan Pakiman; , Selvaprabu Nadarajah; , Negar Soheili; , Qihang Lin (2024) Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes. Management Science 71(4):3384-3404.

https://doi.org/10.1287/mnsc.2020.00038

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes

References

Volume 71, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News