Self-Guided Approximate Linear Programs: Randomized Multi-Shot Approximation of Discounted Cost Markov Decision Processes

Published Online:https://doi.org/10.1287/mnsc.2020.00038

References

  • Adelman D (2003) Price-directed replenishment of subsets: Methodology and its application to inventory routing. Manufacturing Service Oper. Management 5(4):348–371.LinkGoogle Scholar
  • Adelman D, Klabjan D (2012) Computing near-optimal policies in generalized joint replenishment. INFORMS J. Comput. 24(1):148–164.LinkGoogle Scholar
  • Adelman D, Mersereau AJ (2013) Dynamic capacity allocation to customers who remember past service. Management Sci. 59(3):592–612.LinkGoogle Scholar
  • Balseiro SR, Gurkan H, Sun P (2019) Multiagent mechanism design without money. Oper. Res. 67(5):1417–1436.LinkGoogle Scholar
  • Basu A, Martin K, Ryan CT (2017) Strong duality and sensitivity analysis in semi-infinite linear programming. Math. Programming 161(1–2):451–485.CrossrefGoogle Scholar
  • Beevi KS, Nair MS, Bindu GR (2016) Detection of mitotic nuclei in breast histopathology images using localized ACM and Random Kitchen Sink based classifier. 2016 38th Annual Internat. Conf. IEEE Engrg. Medicine Biol. Soc. (EMBC) (IEEE, Piscataway, NJ), 2435–2439.Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming, vol. 5 (Athena Scientific, Belmont, MA).Google Scholar
  • Bhat N, Farias V, Moallemi CC (2012) Non-parametric approximate dynamic programming via the kernel method. Adv. Neural Inf. Process. Syst. 25:386–394.Google Scholar
  • Blado D, Toriello A (2019) Relaxation analysis for the dynamic knapsack problem with stochastic item sizes. SIAM J. Optim. 29(1):1–30.CrossrefGoogle Scholar
  • Brown DB, Smith JE (2022) Information relaxations and duality in stochastic dynamic programs: A review and tutorial. Foundations and Trends in Optimization 5(3):246–339.CrossrefGoogle Scholar
  • Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4-part-1):785–801.LinkGoogle Scholar
  • Calafiore G, Campi MC (2005) Uncertain convex programs: Randomized solutions and confidence levels. Math. Programming 102(1):25–46.CrossrefGoogle Scholar
  • Calafiore GC, Campi MC (2006) The scenario approach to robust control design. IEEE Trans. Automat. Control. 51(5):742–753.CrossrefGoogle Scholar
  • Canuto C, Hussaini MY, Quarteroni A, Thomas AZ (2012) Spectral Methods in Fluid Dynamics (Springer Science & Business Media, New York).Google Scholar
  • Carriere JF (1996) Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance Math. Econom. 19(1):19–30.CrossrefGoogle Scholar
  • Chen X, Pang Z, Pan L (2014) Coordinating inventory control and pricing strategies for perishable products. Oper. Res. 62(2):284–300.LinkGoogle Scholar
  • De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.LinkGoogle Scholar
  • De Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29(3):462–478.LinkGoogle Scholar
  • Desai VV, Farias VF, Moallemi CC (2012a) Approximate dynamic programming via a smoothed linear program. Oper. Res. 60(3):655–674.LinkGoogle Scholar
  • Desai VV, Farias VF, Moallemi CC (2012b) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.LinkGoogle Scholar
  • Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.LinkGoogle Scholar
  • Farias VF, Van Roy B (2006) Tetris: A Study of Randomized Constraint Sampling (Springer London, London), 189–201.Google Scholar
  • Forsell N, Sabbadin R (2006) Approximate linear-programming algorithms for graph-based Markov decision processes. Proc. 2006 Conf. ECAI 2006 17th Eur. Conf. Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, 590–594.Google Scholar
  • Franke JK, Koehler G, Biedenkapp A, Hutter F (2021) Sample-efficient automated deep reinforcement learning. Internat. Conf. Learning Representations (ICLR, Appleton, WI).Google Scholar
  • Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Internat. Conf. Machine Learn. (PMLR, New York), 1587–1596.Google Scholar
  • Glasserman P, Yu B (2004) Simulation for American options: Regression now or regression later? Monte Carlo and Quasi-Monte Carlo Methods 2002 (Springer, Berlin, Heidelberg), 213–226.CrossrefGoogle Scholar
  • Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Res. 19:399–468.CrossrefGoogle Scholar
  • Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S (2024) Learning to walk via deep reinforcement learning. Robot. Sci. Syst. Forthcoming.Google Scholar
  • Haskell WB, Jain R, Sharma H, Yu P (2020) A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Trans. Automatic Control 65(1):115–129.CrossrefGoogle Scholar
  • Haugh MB, Kogan L (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.LinkGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov Control Processes: Basic Optimality Criteria, vol. 30 (Springer Science & Business Media, New York).CrossrefGoogle Scholar
  • Hua Z, Yu Y, Zhang W, Xu X (2015) Structural properties of the optimal policy for dual-sourcing systems with general lead times. IIE Trans. 47(8):841–850.CrossrefGoogle Scholar
  • Karaesmen IZ, Scheller-Wolf A, Deniz B (2011) Managing Perishable and Aging Inventories: Review and Future Research Directions (Springer, New York), 393–436.Google Scholar
  • Klabjan D, Adelman D (2007) An infinite-dimensional linear programming algorithm for deterministic semi-Markov decision processes on borel spaces. Math. Oper. Res. 32(3):528–550.LinkGoogle Scholar
  • Lewis AS, Pang J-S (1998) Error bounds for convex inequality systems. Generalized Convexity, Generalized Monotonicity: Recent Results (Springer US, Boston), 75–110.Google Scholar
  • Lin Q, Nadarajah S, Soheili N (2020) Revisiting approximate linear programming: Constraint-violation learning with applications to inventory control and energy storage. Management Sci. 66(4):1544–1562.LinkGoogle Scholar
  • Lin Q, Ma R, Nadarajah S, Soheili N (2022) A parameter-free and projection-free restarting level set method for adaptive constrained convex optimization under the error bound condition. Working paper, The University of Iowa, Tippie College of Business, Iowa City.Google Scholar
  • Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: A simple least-squares approach. Rev. Financ. Stud. 14(1):113–147.CrossrefGoogle Scholar
  • Lu Y, Dhillon P, Foster DP, Ungar L (2013) Faster ridge regression via the Subsampled Randomized Hadamard Transform. Adv. Neural Inf. Process. Syst. 26:369–377.Google Scholar
  • McGrew JS, How JP, Williams B, Roy N (2010) Air-combat strategy using approximate dynamic programming. J. Guid. Control Dyn. 33(5):1641–1654.CrossrefGoogle Scholar
  • McWilliams B, Balduzzi D, Buhmann JM (2013) Correlated random features for fast semi-supervised learning. Adv. Neural Inf. Process. Syst. 26:440–448.Google Scholar
  • Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J. Mach. Learn. Res. 7(Dec):2651–2667.Google Scholar
  • Mladenov M, Boutilier C, Schuurmans D, Elidan G, Meshi O, Lu T (2017) Approximate linear programming for logistic Markov decision processes. Proc. Twenty-sixth Internat. Joint Conf. Artificial Intelligence (Melbourne, Australia), 2486–2493.Google Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.CrossrefGoogle Scholar
  • Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of Machine Learning, 1st ed. (MIT Press, Cambridge, MA).Google Scholar
  • Nadarajah S, Secomandi N (2023) A review of the operations literature on real options in energy. Eur. J. Oper. Res. 309(2):469–487.CrossrefGoogle Scholar
  • Nadarajah S, Margot F, Secomandi N (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.LinkGoogle Scholar
  • Nersessian A (2019) Fourier tools are much more powerful than commonly thought. Lobachevskii J. Math. 40(8):1122–1131.CrossrefGoogle Scholar
  • Osband I, Van Roy B, Russo DJ, Wen Z (2019) Deep exploration via randomized value functions. J. Mach. Learn. Res. 20(124):1–62.Google Scholar
  • Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. Proc. Third IEEE-RAS Internat. Conf. Humanoid Robots (Karlsruhe, Germany), 1–20.Google Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Rahimi A, Recht B (2007) Random features for large-scale kernel machines. Adv. Neural Inf. Process. Syst. 20:1177–1184.Google Scholar
  • Rahimi A, Recht B (2008a) Uniform approximation of functions with random bases. 2008 46th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 555–561.Google Scholar
  • Rahimi A, Recht B (2008b) Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. Adv. Neural Inf. Process. Syst. 21:1313–1320.Google Scholar
  • Schweitzer PJ, Seidmann A (1985) Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110(2):568–582.CrossrefGoogle Scholar
  • Shahrampour S, Beirami A, Tarokh V (2018) On data-dependent random features for improved generalization in supervised learning. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
  • Shapiro A (2009) Semi-infinite programming, duality, discretization and optimality conditions. Optimization 58(2):133–161.CrossrefGoogle Scholar
  • Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al. (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359.CrossrefGoogle Scholar
  • Sinha A, Duchi JC (2016) Learning kernels with random features. Adv. Neural Inf. Process. Syst. 29:1298–1306.Google Scholar
  • Steimle LN, Denton BT (2017) Markov decision processes for screening and treatment of chronic diseases. Markov Decision Processes in Practice (Springer International Publishing, Cham, Switzerland), 189–222.CrossrefGoogle Scholar
  • Sun P, Wang K, Zipkin P (2014) Quadratic Approximation of Cost Functions in Lost Sales and Perishable Inventory Control Problems (Fuqua School of Business, Duke University, Durham, NC).Google Scholar
  • Tong C, Topaloglu H (2013) On the approximate linear programming approach for network revenue management problems. INFORMS J. Comput. 26(1):121–134.LinkGoogle Scholar
  • Van Ngai H, Kruger A, Théra M (2010) Stability of error bounds for semi-infinite convex constraint systems. SIAM J. Optim. 20(4):2080–2096.CrossrefGoogle Scholar
  • Wang D, Zeng J, Lin SB (2020) Random sketching for neural networks with ReLU. IEEE Trans. Neural Netw. Learn. Syst. 32(2):748–762.CrossrefGoogle Scholar
  • Wu L, Chen PY, Yen IEH, Xu F, Xia Y, Aggarwal C (2018) Scalable spectral clustering using random binning features. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2506–2515.Google Scholar
  • Yang Q, Zhang J, Shi G, Hu J, Wu Y (2019) Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378.CrossrefGoogle Scholar
  • Zipkin P (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.