Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek
Yuval Emek
[email protected]
Faculty of Data and Decision Sciences, Technion—Israel Institute of Technology, 3200003 Haifa, Israel;
Search for more papers by this author
,
Ron Lavi
Ron Lavi
[email protected]
Faculty of Data and Decision Sciences, Technion—Israel Institute of Technology, 3200003 Haifa, Israel;Department of Economics, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom;
Search for more papers by this author
,
Rad Niazadeh
Rad Niazadeh
[email protected]
https://orcid.org/0000-0002-5880-6221
Booth School of Business, University of Chicago, Chicago, Illinois 60637;
Search for more papers by this author
,
Yangguang Shi
Corresponding Author
Yangguang Shi
[email protected]
https://orcid.org/0000-0003-2923-2748
Booth School of Business, University of Chicago, Chicago, Illinois 60637;
Search for more papers by this author

Faculty of Data and Decision Sciences, Technion—Israel Institute of Technology, 3200003 Haifa, Israel;

Faculty of Data and Decision Sciences, Technion—Israel Institute of Technology, 3200003 Haifa, Israel;Department of Economics, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom;

Search for more papers by this author

Rad Niazadeh

[email protected]

https://orcid.org/0000-0002-5880-6221

Booth School of Business, University of Chicago, Chicago, Illinois 60637;

Search for more papers by this author

Yangguang Shi

Corresponding Author

Yangguang Shi

[email protected]

https://orcid.org/0000-0003-2923-2748

Booth School of Business, University of Chicago, Chicago, Illinois 60637;

Search for more papers by this author

Published Online:7 Jun 2023https://doi.org/10.1287/moor.2023.1375

References

[1] Abbasi-Yadkori Y, Bartlett PL, Kanade V, Seldin Y, Szepesvári C (2013) Online learning in Markov decision processes with adversarially chosen transition probability distributions. Proc. 27th Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2508–2516.Google Scholar
[2] Abolhassani M, Ehsani S, Esfandiari H, HajiAghayi M, Kleinberg R, Lucier B (2017) Beating 1-1/e for ordered prophets. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 61–71.Google Scholar
[3] Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (ACM, New York), 1405–1424.Google Scholar
[4] Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
[5] Agrawal S, Daskalakis C, Mirrokni V, Sivan B (2018) Robust repeated auctions under heterogeneous buyer behavior. 19th ACM Conf. Econom. Comput. (ACM, New York), 171.Google Scholar
[6] Alaei S (2014) Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput. 43(2):930–972.Crossref, Google Scholar
[7] Amin K, Rostamizadeh A, Syed U (2013) Learning prices for repeated auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1169–1177.Google Scholar
[8] Amin K, Rostamizadeh A, Syed U (2014) Repeated contextual auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 622–630.Google Scholar
[9] Anari N, Niazadeh R, Saberi A, Shameli A (2019) Nearly optimal pricing algorithms for production constrained and laminar Bayesian selection. Proc. 2019 ACM Conf. Econom. Comput. (ACM, New York), 91–92.Google Scholar
[10] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
[11] Arora R, Dekel O, Tewari A (2012) Online bandit learning against an adaptive adversary: From regret to policy regret. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1747–1754.Google Scholar
[12] Arora R, Dinitz M, Marinov TV, Mohri M (2018) Policy regret in repeated games. Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6733–6742.Google Scholar
[13] Audibert J, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. COLT ‘09 22nd Conf. Learn. Theory (Montreal).Google Scholar
[14] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
[15] Azar PD, Kleinberg R, Weinberg SM (2014) Prophet inequalities with limited information. Proc. Twenty-Fifth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1358–1377.Google Scholar
[16] Azar Y, Chiplunkar A, Kaplan H (2018) Prophet secretary: Surpassing the 1-1/e barrier. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 303–318.Google Scholar
[17] Babaioff M, Dughmi S, Kleinberg RD, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4:1–4:26.Google Scholar
[18] Babaioff M, Immorlica N, Lucier B, Weinberg SM (2015) A simple and approximately optimal mechanism for an additive buyer. ACM SIGecom Exchanges 13(2):31–35.Crossref, Google Scholar
[19] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13:1–13:55.Crossref, Google Scholar
[20] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
[21] Beyhaghi H, Golrezaei N, Leme RP, Pal M, Siva B (2018) Improved approximations for free-order prophets and second-price auctions. Preprint, submitted.Google Scholar
[22] Blum A, Hartline JD (2005) Near-optimal online auctions. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘05 (SIAM, Vancouver), 1156–1163.Google Scholar
[23] Blum A, Kumar V, Rudra A, Wu F (2004) Online learning in online auctions. Theoret. Comput. Sci. 324(2–3):137–146.Crossref, Google Scholar
[24] Brantley K, Dudík M, Lykouris T, Miryoosefi S, Simchowitz M, Slivkins A, Sun W (2020) Constrained episodic reinforcement learning in concave-convex and knapsack settings. Preprint, submitted June 9, https://doi.org/10.48550/arXiv.2006.05051.Google Scholar
[25] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
[26] Bubeck S, Devanur NR, Huang Z, Niazadeh R (2017) Online auctions and multi-scale online learning. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 497–514.Google Scholar
[27] Bubeck S, Devanur NR, Huang Z, Niazadeh R (2019) Multi-scale online learning: Theory and applications to online auctions and pricing. J. Machine Learn. Res. 20(62):62:1–62:37.Google Scholar
[28] Cai Y, Daskalakis C, Weinberg SM (2012) Optimal multi-dimensional mechanism design: Reducing revenue to welfare maximization. 2012 IEEE 53rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 130–139.Google Scholar
[29] Cai Y, Devanur NR, Weinberg SM (2016) A duality based unified approach to Bayesian mechanism design. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 926–939.Google Scholar
[30] Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[31] Chawla S, Miller JB (2016) Mechanism design for subadditive agents via an ex ante relaxation. Proc. 2016 ACM Conf. Econom. Comput. (ACM, New York), 579–596.Google Scholar
[32] Chawla S, Devanur NR, Kulkarni J, Niazadeh R (2017) Truth and regret in online scheduling. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 423–440.Google Scholar
[33] Chawla S, Hartline JD, Malec DL, Sivan B (2010) Multi-parameter mechanism design and sequential posted pricing. Proc. 42nd ACM Sympos. Theory Comput. STOC ‘10 (ACM, New York), 311–320.Google Scholar
[34] Chawla S, Devanur NR, Holroyd AE, Karlin AR, Martin JB, Sivan B (2017) Stability of service under time-of-use pricing. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. STOC ‘17 (ACM, New York), 184–197.Google Scholar
[35] Correa J, Saona R, Ziliotto B (2018) Prophet secretary through blind strategies. Preprint, submitted July 19, https://doi.org/10.48550/arXiv.1807.07483.Google Scholar
[36] Correa J, Saona R, Ziliotto B (2021) Prophet secretary through blind strategies. Math. Programming 190:483–521.Crossref, Google Scholar
[37] Correa J, Foncea P, Hoeksma R, Oosterwijk T, Vredeveld T (2017) Posted price mechanisms for a random stream of customers. Proc. 2017 ACM Conf. Econom. Comput. (ACM, New York), 169–186.Google Scholar
[38] Correa J, Foncea P, Hoeksma R, Oosterwijk T, Vredeveld T (2019) Recent developments in prophet inequalities. ACM SIGecom Exchanges 17(1):61–70.Crossref, Google Scholar
[39] Dekel O, Hazan E (2013) Better rates for any adversarial deterministic MDP. Proc. 30th Internat. Conf. Machine Learn. ICML ‘13, vol. 28 (JMLR), 675–683.Google Scholar
[40] Dekel O, Ding J, Koren T, Peres Y (2014) Bandits with switching costs: T2/3 regret. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 459–467.Google Scholar
[41] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
[42] Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7:1–7:41.Crossref, Google Scholar
[43] Duetting P, Feldman M, Kesselheim T, Lucier B (2017) Prophet inequalities made easy: Stochastic optimization by pricing non-stochastic inputs. 2017 IEEE 58rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 540–551.Google Scholar
[44] Dütting P, Fischer F, Klimm M (2016) Revenue gaps for discriminatory and anonymous sequential posted pricing. Preprint, submitted.Google Scholar
[45] Esfandiari H, Hajiaghayi M, Liaghat V, Monemizadeh M (2017) Prophet secretary. SIAM J. Discrete Math. 31(3):1685–1701.Crossref, Google Scholar
[46] Even-Dar E, Kakade SM, Mansour Y (2004) Experts in a Markov decision process. Adv. Neural Inform. Processing Systems 17 NIPS ‘04 (Curran Associates, Inc., Red Hook, NY), 401–408.Google Scholar
[47] Feige U, Koren T, Tennenholtz M (2017) Chasing ghosts: Competing with stateful policies. SIAM J. Comput. 46(1):190–223.Crossref, Google Scholar
[48] Feldman M, Gravin N, Lucier B (2015) Combinatorial auctions via posted prices. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (SIAM, San Diego), 123–135.Google Scholar
[49] Feldman M, Fu H, Gravin N, Lucier B (2013) Simultaneous auctions are (almost) efficient. Proc. Forty-Fifth Annual ACM Sympos. Theory Comput. (ACM, New York), 201–210.Google Scholar
[50] Feldman M, Koren T, Livni R, Mansour Y, Zohar A (2016) Online pricing with strategic and patient buyers. Annual Conf. Neural Inform. Processing Systems NIPS ‘16 (Springer, Barcelona, Spain), 3864–3872.Google Scholar
[51] Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. Second Eur. Conf. Comput. Learn. Theory EuroCOLT ‘95, vol. 904 (Springer, Barcelona, Spain), 23–37.Google Scholar
[52] Göbel O, Hoefer M, Kesselheim T, Schleiden T, Vöcking B (2014) Online independent set beyond the worst-case: Secretaries, prophets, and periods. Internat. Colloquium Automata Language Programming (Springer, Berlin), 508–519.Google Scholar
[53] Guan P, Raginsky M, Willett R (2014) From minimax value to low-regret algorithms for online Markov decision processes. Amer. Control Conf. ACC ‘14 (IEEE, Piscataway, NJ), 471–476.Google Scholar
[54] Hajiaghayi MT, Kleinberg R, Sandholm T (2007) Automated online mechanism design and prophet inequalities. Proc. 22nd National Conf. Artificial Intelligence, vol. 1 (AAAI, Washington, DC), 58–65.Google Scholar
[55] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
[56] Immorlica N, Sankararaman KA, Schapire RE, Slivkins A (2019) Adversarial bandits with knapsacks. 60th IEEE Annual Sympos. Foundations Comput. Sci. FOCS 19 (IEEE, Piscataway, NJ), 202–219.Google Scholar
[57] Kalai AT, Vempala SS (2005) Efficient algorithms for online decision problems. J. Comput. System Sci. 71(3):291–307.Crossref, Google Scholar
[58] Kanoria Y, Nazerzadeh H (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web Internet Econom. (Springer, Berlin), 232.Google Scholar
[59] Kesselheim T, Radke K, Tönnis A, Vöcking B (2014) Primal beats dual on online packing lps in the random-order model. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 303–312.Google Scholar
[60] Kleinberg R, Weinberg SM (2012) Matroid prophet inequalities. Proc. Forty-Fourth Annual ACM Sympos. Theory Comput. (ACM, New York), 123–136.Google Scholar
[61] Kleinberg RD, Leighton FT (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Sympos. Foundations Comput. Sci. FOCS ‘03 (IEEE, Piscataway, NJ), 594–605.Google Scholar
[62] Lee E, Singla S (2018) Optimal online contention resolution schemes via ex-ante prophet inequalities. Preprint, submitted June 25, https://doi.org/10.48550/arXiv.1806.09251.Google Scholar
[63] Lehmann B, Lehmann D, Nisan N (2006) Combinatorial auctions with decreasing marginal utilities. Games Econom. Behav. 55(2):270–296.Crossref, Google Scholar
[64] Lehmann D, O’Callaghan L, Shoham Y (2002) Truth revelation in approximately efficient combinatorial auctions. J. ACM 49(5):577–602.Crossref, Google Scholar
[65] Leme RP, Schneider J (2018) Contextual search via intrinsic volumes. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 268–282.Google Scholar
[66] Littlestone N, Warmuth MK (1989) The weighted majority algorithm. 30th Annual Sympos. Foundations Comput. Sci. FOCS ‘89 (IEEE, Piscataway, NJ), 256–261.Google Scholar
[67] Lucier B (2017) An economic view of prophet inequalities. ACM SIGecom Exchanges 16(1):24–47.Crossref, Google Scholar
[68] Lykouris T, Simchowitz M, Slivkins A, Sun W (2021) Corruption robust exploration in episodic reinforcement learning. Proc. Machine Learn. Res., 1–4.Google Scholar
[69] Mohri M, Munoz A (2014) Optimal regret minimization in posted-price auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 1871–1879.Google Scholar
[70] Neu G, György A, Szepesvári C, Antos A (2014) Online Markov decision processes under bandit feedback. IEEE Trans. Automatic Control 59(3):676–691.Crossref, Google Scholar
[71] Niazadeh R, Saberi A, Shameli A (2018) Prophet inequalities vs. approximating optimum online. Internat. Conf. Web Internet Econom. (Springer, Berlin), 356–374.Google Scholar
[72] Rubinstein A (2016) Beyond matroids: Secretary problem and prophet inequality with general constraints. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 324–332.Google Scholar
[73] Samuel-Cahn E (1984) Comparison of threshold stop rules and maximum for independent nonnegative random variables. Ann. Probab. 12(4):1213–1216.Crossref, Google Scholar
[74] Yan Q (2011) Mechanism design via correlation gap. Proc. Twenty-Second Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia, PA), 710–719.Google Scholar
[75] Yao AC (1977) Probabilistic computations: Toward a unified measure of complexity. 18th Annual Sympos. Foundations Comput. Sci. FOCS ‘77 (IEEE, Piscataway, NJ), 222–227.Google Scholar
[76] Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3):737–757.Link, Google Scholar
[77] Zhang X, Wu C, Huang Z, Li Z (2018) Occupation-oblivious pricing of cloud jobs via online learning. 2018 IEEE Conf. Comput. Comm. INFOCOM ‘18 (IEEE, Piscataway, NJ), 2456–2464.Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 2

May 2024

Pages 653-1302, C2

Article Information

Metrics

Information

Received:December 14, 2020
Accepted:March 19, 2023
Published Online:June 07, 2023

Cite as

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi (2023) Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes. Mathematics of Operations Research 49(2):880-900.

https://doi.org/10.1287/moor.2023.1375

Keywords

Acknowledgments

An earlier conference version of this work was presented at the 34th Conference on Neural Information Processing Systems, and an extended abstract of this paper appeared in the Proceedings of the 34th Conference on Neural Information Processing Systems.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

References

Volume 49, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News