Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Published Online:https://doi.org/10.1287/moor.2023.1375

References

  • [1] Abbasi-Yadkori Y, Bartlett PL, Kanade V, Seldin Y, Szepesvári C (2013) Online learning in Markov decision processes with adversarially chosen transition probability distributions. Proc. 27th Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2508–2516.Google Scholar
  • [2] Abolhassani M, Ehsani S, Esfandiari H, HajiAghayi M, Kleinberg R, Lucier B (2017) Beating 1-1/e for ordered prophets. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 61–71.Google Scholar
  • [3] Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (ACM, New York), 1405–1424.Google Scholar
  • [4] Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.LinkGoogle Scholar
  • [5] Agrawal S, Daskalakis C, Mirrokni V, Sivan B (2018) Robust repeated auctions under heterogeneous buyer behavior. 19th ACM Conf. Econom. Comput. (ACM, New York), 171.Google Scholar
  • [6] Alaei S (2014) Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput. 43(2):930–972.CrossrefGoogle Scholar
  • [7] Amin K, Rostamizadeh A, Syed U (2013) Learning prices for repeated auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1169–1177.Google Scholar
  • [8] Amin K, Rostamizadeh A, Syed U (2014) Repeated contextual auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 622–630.Google Scholar
  • [9] Anari N, Niazadeh R, Saberi A, Shameli A (2019) Nearly optimal pricing algorithms for production constrained and laminar Bayesian selection. Proc. 2019 ACM Conf. Econom. Comput. (ACM, New York), 91–92.Google Scholar
  • [10] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • [11] Arora R, Dekel O, Tewari A (2012) Online bandit learning against an adaptive adversary: From regret to policy regret. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1747–1754.Google Scholar
  • [12] Arora R, Dinitz M, Marinov TV, Mohri M (2018) Policy regret in repeated games. Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6733–6742.Google Scholar
  • [13] Audibert J, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. COLT ‘09 22nd Conf. Learn. Theory (Montreal).Google Scholar
  • [14] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • [15] Azar PD, Kleinberg R, Weinberg SM (2014) Prophet inequalities with limited information. Proc. Twenty-Fifth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1358–1377.Google Scholar
  • [16] Azar Y, Chiplunkar A, Kaplan H (2018) Prophet secretary: Surpassing the 1-1/e barrier. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 303–318.Google Scholar
  • [17] Babaioff M, Dughmi S, Kleinberg RD, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4:1–4:26.Google Scholar
  • [18] Babaioff M, Immorlica N, Lucier B, Weinberg SM (2015) A simple and approximately optimal mechanism for an additive buyer. ACM SIGecom Exchanges 13(2):31–35.CrossrefGoogle Scholar
  • [19] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13:1–13:55.CrossrefGoogle Scholar
  • [20] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • [21] Beyhaghi H, Golrezaei N, Leme RP, Pal M, Siva B (2018) Improved approximations for free-order prophets and second-price auctions. Preprint, submitted.Google Scholar
  • [22] Blum A, Hartline JD (2005) Near-optimal online auctions. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘05 (SIAM, Vancouver), 1156–1163.Google Scholar
  • [23] Blum A, Kumar V, Rudra A, Wu F (2004) Online learning in online auctions. Theoret. Comput. Sci. 324(2–3):137–146.CrossrefGoogle Scholar
  • [24] Brantley K, Dudík M, Lykouris T, Miryoosefi S, Simchowitz M, Slivkins A, Sun W (2020) Constrained episodic reinforcement learning in concave-convex and knapsack settings. Preprint, submitted June 9, https://doi.org/10.48550/arXiv.2006.05051.Google Scholar
  • [25] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • [26] Bubeck S, Devanur NR, Huang Z, Niazadeh R (2017) Online auctions and multi-scale online learning. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 497–514.Google Scholar
  • [27] Bubeck S, Devanur NR, Huang Z, Niazadeh R (2019) Multi-scale online learning: Theory and applications to online auctions and pricing. J. Machine Learn. Res. 20(62):62:1–62:37.Google Scholar
  • [28] Cai Y, Daskalakis C, Weinberg SM (2012) Optimal multi-dimensional mechanism design: Reducing revenue to welfare maximization. 2012 IEEE 53rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 130–139.Google Scholar
  • [29] Cai Y, Devanur NR, Weinberg SM (2016) A duality based unified approach to Bayesian mechanism design. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 926–939.Google Scholar
  • [30] Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [31] Chawla S, Miller JB (2016) Mechanism design for subadditive agents via an ex ante relaxation. Proc. 2016 ACM Conf. Econom. Comput. (ACM, New York), 579–596.Google Scholar
  • [32] Chawla S, Devanur NR, Kulkarni J, Niazadeh R (2017) Truth and regret in online scheduling. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 423–440.Google Scholar
  • [33] Chawla S, Hartline JD, Malec DL, Sivan B (2010) Multi-parameter mechanism design and sequential posted pricing. Proc. 42nd ACM Sympos. Theory Comput. STOC ‘10 (ACM, New York), 311–320.Google Scholar
  • [34] Chawla S, Devanur NR, Holroyd AE, Karlin AR, Martin JB, Sivan B (2017) Stability of service under time-of-use pricing. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. STOC ‘17 (ACM, New York), 184–197.Google Scholar
  • [35] Correa J, Saona R, Ziliotto B (2018) Prophet secretary through blind strategies. Preprint, submitted July 19, https://doi.org/10.48550/arXiv.1807.07483.Google Scholar
  • [36] Correa J, Saona R, Ziliotto B (2021) Prophet secretary through blind strategies. Math. Programming 190:483–521.CrossrefGoogle Scholar
  • [37] Correa J, Foncea P, Hoeksma R, Oosterwijk T, Vredeveld T (2017) Posted price mechanisms for a random stream of customers. Proc. 2017 ACM Conf. Econom. Comput. (ACM, New York), 169–186.Google Scholar
  • [38] Correa J, Foncea P, Hoeksma R, Oosterwijk T, Vredeveld T (2019) Recent developments in prophet inequalities. ACM SIGecom Exchanges 17(1):61–70.CrossrefGoogle Scholar
  • [39] Dekel O, Hazan E (2013) Better rates for any adversarial deterministic MDP. Proc. 30th Internat. Conf. Machine Learn. ICML ‘13, vol. 28 (JMLR), 675–683.Google Scholar
  • [40] Dekel O, Ding J, Koren T, Peres Y (2014) Bandits with switching costs: T2/3 regret. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 459–467.Google Scholar
  • [41] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • [42] Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7:1–7:41.CrossrefGoogle Scholar
  • [43] Duetting P, Feldman M, Kesselheim T, Lucier B (2017) Prophet inequalities made easy: Stochastic optimization by pricing non-stochastic inputs. 2017 IEEE 58rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 540–551.Google Scholar
  • [44] Dütting P, Fischer F, Klimm M (2016) Revenue gaps for discriminatory and anonymous sequential posted pricing. Preprint, submitted.Google Scholar
  • [45] Esfandiari H, Hajiaghayi M, Liaghat V, Monemizadeh M (2017) Prophet secretary. SIAM J. Discrete Math. 31(3):1685–1701.CrossrefGoogle Scholar
  • [46] Even-Dar E, Kakade SM, Mansour Y (2004) Experts in a Markov decision process. Adv. Neural Inform. Processing Systems 17 NIPS ‘04 (Curran Associates, Inc., Red Hook, NY), 401–408.Google Scholar
  • [47] Feige U, Koren T, Tennenholtz M (2017) Chasing ghosts: Competing with stateful policies. SIAM J. Comput. 46(1):190–223.CrossrefGoogle Scholar
  • [48] Feldman M, Gravin N, Lucier B (2015) Combinatorial auctions via posted prices. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (SIAM, San Diego), 123–135.Google Scholar
  • [49] Feldman M, Fu H, Gravin N, Lucier B (2013) Simultaneous auctions are (almost) efficient. Proc. Forty-Fifth Annual ACM Sympos. Theory Comput. (ACM, New York), 201–210.Google Scholar
  • [50] Feldman M, Koren T, Livni R, Mansour Y, Zohar A (2016) Online pricing with strategic and patient buyers. Annual Conf. Neural Inform. Processing Systems NIPS ‘16 (Springer, Barcelona, Spain), 3864–3872.Google Scholar
  • [51] Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. Second Eur. Conf. Comput. Learn. Theory EuroCOLT ‘95, vol. 904 (Springer, Barcelona, Spain), 23–37.Google Scholar
  • [52] Göbel O, Hoefer M, Kesselheim T, Schleiden T, Vöcking B (2014) Online independent set beyond the worst-case: Secretaries, prophets, and periods. Internat. Colloquium Automata Language Programming (Springer, Berlin), 508–519.Google Scholar
  • [53] Guan P, Raginsky M, Willett R (2014) From minimax value to low-regret algorithms for online Markov decision processes. Amer. Control Conf. ACC ‘14 (IEEE, Piscataway, NJ), 471–476.Google Scholar
  • [54] Hajiaghayi MT, Kleinberg R, Sandholm T (2007) Automated online mechanism design and prophet inequalities. Proc. 22nd National Conf. Artificial Intelligence, vol. 1 (AAAI, Washington, DC), 58–65.Google Scholar
  • [55] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • [56] Immorlica N, Sankararaman KA, Schapire RE, Slivkins A (2019) Adversarial bandits with knapsacks. 60th IEEE Annual Sympos. Foundations Comput. Sci. FOCS 19 (IEEE, Piscataway, NJ), 202–219.Google Scholar
  • [57] Kalai AT, Vempala SS (2005) Efficient algorithms for online decision problems. J. Comput. System Sci. 71(3):291–307.CrossrefGoogle Scholar
  • [58] Kanoria Y, Nazerzadeh H (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web Internet Econom. (Springer, Berlin), 232.Google Scholar
  • [59] Kesselheim T, Radke K, Tönnis A, Vöcking B (2014) Primal beats dual on online packing lps in the random-order model. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 303–312.Google Scholar
  • [60] Kleinberg R, Weinberg SM (2012) Matroid prophet inequalities. Proc. Forty-Fourth Annual ACM Sympos. Theory Comput. (ACM, New York), 123–136.Google Scholar
  • [61] Kleinberg RD, Leighton FT (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Sympos. Foundations Comput. Sci. FOCS ‘03 (IEEE, Piscataway, NJ), 594–605.Google Scholar
  • [62] Lee E, Singla S (2018) Optimal online contention resolution schemes via ex-ante prophet inequalities. Preprint, submitted June 25, https://doi.org/10.48550/arXiv.1806.09251.Google Scholar
  • [63] Lehmann B, Lehmann D, Nisan N (2006) Combinatorial auctions with decreasing marginal utilities. Games Econom. Behav. 55(2):270–296.CrossrefGoogle Scholar
  • [64] Lehmann D, O’Callaghan L, Shoham Y (2002) Truth revelation in approximately efficient combinatorial auctions. J. ACM 49(5):577–602.CrossrefGoogle Scholar
  • [65] Leme RP, Schneider J (2018) Contextual search via intrinsic volumes. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 268–282.Google Scholar
  • [66] Littlestone N, Warmuth MK (1989) The weighted majority algorithm. 30th Annual Sympos. Foundations Comput. Sci. FOCS ‘89 (IEEE, Piscataway, NJ), 256–261.Google Scholar
  • [67] Lucier B (2017) An economic view of prophet inequalities. ACM SIGecom Exchanges 16(1):24–47.CrossrefGoogle Scholar
  • [68] Lykouris T, Simchowitz M, Slivkins A, Sun W (2021) Corruption robust exploration in episodic reinforcement learning. Proc. Machine Learn. Res., 1–4.Google Scholar
  • [69] Mohri M, Munoz A (2014) Optimal regret minimization in posted-price auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 1871–1879.Google Scholar
  • [70] Neu G, György A, Szepesvári C, Antos A (2014) Online Markov decision processes under bandit feedback. IEEE Trans. Automatic Control 59(3):676–691.CrossrefGoogle Scholar
  • [71] Niazadeh R, Saberi A, Shameli A (2018) Prophet inequalities vs. approximating optimum online. Internat. Conf. Web Internet Econom. (Springer, Berlin), 356–374.Google Scholar
  • [72] Rubinstein A (2016) Beyond matroids: Secretary problem and prophet inequality with general constraints. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 324–332.Google Scholar
  • [73] Samuel-Cahn E (1984) Comparison of threshold stop rules and maximum for independent nonnegative random variables. Ann. Probab. 12(4):1213–1216.CrossrefGoogle Scholar
  • [74] Yan Q (2011) Mechanism design via correlation gap. Proc. Twenty-Second Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia, PA), 710–719.Google Scholar
  • [75] Yao AC (1977) Probabilistic computations: Toward a unified measure of complexity. 18th Annual Sympos. Foundations Comput. Sci. FOCS ‘77 (IEEE, Piscataway, NJ), 222–227.Google Scholar
  • [76] Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3):737–757.LinkGoogle Scholar
  • [77] Zhang X, Wu C, Huang Z, Li Z (2018) Occupation-oblivious pricing of cloud jobs via online learning. 2018 IEEE Conf. Comput. Comm. INFOCOM ‘18 (IEEE, Piscataway, NJ), 2456–2464.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.