Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes
References
- [1] (2013) Online learning in Markov decision processes with adversarially chosen transition probability distributions. Proc. 27th Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2508–2516.Google Scholar
- [2] (2017) Beating 1-1/e for ordered prophets. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 61–71.Google Scholar
- [3] (2015) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (ACM, New York), 1405–1424.Google Scholar
- [4] (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
- [5] (2018) Robust repeated auctions under heterogeneous buyer behavior. 19th ACM Conf. Econom. Comput. (ACM, New York), 171.Google Scholar
- [6] (2014) Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput. 43(2):930–972.Crossref, Google Scholar
- [7] (2013) Learning prices for repeated auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1169–1177.Google Scholar
- [8] (2014) Repeated contextual auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 622–630.Google Scholar
- [9] (2019) Nearly optimal pricing algorithms for production constrained and laminar Bayesian selection. Proc. 2019 ACM Conf. Econom. Comput. (ACM, New York), 91–92.Google Scholar
- [10] (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- [11] (2012) Online bandit learning against an adaptive adversary: From regret to policy regret. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1747–1754.Google Scholar
- [12] (2018) Policy regret in repeated games. Annual Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6733–6742.Google Scholar
- [13] (2009) Minimax policies for adversarial and stochastic bandits. COLT ‘09 22nd Conf. Learn. Theory (Montreal).Google Scholar
- [14] (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- [15] (2014) Prophet inequalities with limited information. Proc. Twenty-Fifth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1358–1377.Google Scholar
- [16] (2018) Prophet secretary: Surpassing the 1-1/e barrier. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 303–318.Google Scholar
- [17] (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4:1–4:26.Google Scholar
- [18] (2015) A simple and approximately optimal mechanism for an additive buyer. ACM SIGecom Exchanges 13(2):31–35.Crossref, Google Scholar
- [19] (2018) Bandits with knapsacks. J. ACM 65(3):13:1–13:55.Crossref, Google Scholar
- [20] (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- [21] (2018) Improved approximations for free-order prophets and second-price auctions. Preprint, submitted.Google Scholar
- [22] (2005) Near-optimal online auctions. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘05 (SIAM, Vancouver), 1156–1163.Google Scholar
- [23] (2004) Online learning in online auctions. Theoret. Comput. Sci. 324(2–3):137–146.Crossref, Google Scholar
- [24] (2020) Constrained episodic reinforcement learning in concave-convex and knapsack settings. Preprint, submitted June 9, https://doi.org/10.48550/arXiv.2006.05051.Google Scholar
- [25] (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- [26] (2017) Online auctions and multi-scale online learning. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 497–514.Google Scholar
- [27] (2019) Multi-scale online learning: Theory and applications to online auctions and pricing. J. Machine Learn. Res. 20(62):62:1–62:37.Google Scholar
- [28] (2012) Optimal multi-dimensional mechanism design: Reducing revenue to welfare maximization. 2012 IEEE 53rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 130–139.Google Scholar
- [29] (2016) A duality based unified approach to Bayesian mechanism design. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 926–939.Google Scholar
- [30] (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [31] (2016) Mechanism design for subadditive agents via an ex ante relaxation. Proc. 2016 ACM Conf. Econom. Comput. (ACM, New York), 579–596.Google Scholar
- [32] (2017) Truth and regret in online scheduling. Proc. 2017 ACM Conf. Econom. Comput. EC ‘17 (ACM, New York), 423–440.Google Scholar
- [33] (2010) Multi-parameter mechanism design and sequential posted pricing. Proc. 42nd ACM Sympos. Theory Comput. STOC ‘10 (ACM, New York), 311–320.Google Scholar
- [34] (2017) Stability of service under time-of-use pricing. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. STOC ‘17 (ACM, New York), 184–197.Google Scholar
- [35] (2018) Prophet secretary through blind strategies. Preprint, submitted July 19, https://doi.org/10.48550/arXiv.1807.07483.Google Scholar
- [36] (2021) Prophet secretary through blind strategies. Math. Programming 190:483–521.Crossref, Google Scholar
- [37] (2017) Posted price mechanisms for a random stream of customers. Proc. 2017 ACM Conf. Econom. Comput. (ACM, New York), 169–186.Google Scholar
- [38] (2019) Recent developments in prophet inequalities. ACM SIGecom Exchanges 17(1):61–70.Crossref, Google Scholar
- [39] (2013) Better rates for any adversarial deterministic MDP. Proc. 30th Internat. Conf. Machine Learn. ICML ‘13, vol. 28 (JMLR), 675–683.Google Scholar
- [40] (2014) Bandits with switching costs: T2/3 regret. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 459–467.Google Scholar
- [41] (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
- [42] (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7:1–7:41.Crossref, Google Scholar
- [43] (2017) Prophet inequalities made easy: Stochastic optimization by pricing non-stochastic inputs. 2017 IEEE 58rd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 540–551.Google Scholar
- [44] (2016) Revenue gaps for discriminatory and anonymous sequential posted pricing. Preprint, submitted.Google Scholar
- [45] (2017) Prophet secretary. SIAM J. Discrete Math. 31(3):1685–1701.Crossref, Google Scholar
- [46] (2004) Experts in a Markov decision process. Adv. Neural Inform. Processing Systems 17 NIPS ‘04 (Curran Associates, Inc., Red Hook, NY), 401–408.Google Scholar
- [47] (2017) Chasing ghosts: Competing with stateful policies. SIAM J. Comput. 46(1):190–223.Crossref, Google Scholar
- [48] (2015) Combinatorial auctions via posted prices. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA ‘15 (SIAM, San Diego), 123–135.Google Scholar
- [49] (2013) Simultaneous auctions are (almost) efficient. Proc. Forty-Fifth Annual ACM Sympos. Theory Comput. (ACM, New York), 201–210.Google Scholar
- [50] (2016) Online pricing with strategic and patient buyers. Annual Conf. Neural Inform. Processing Systems NIPS ‘16 (Springer, Barcelona, Spain), 3864–3872.Google Scholar
- [51] (1995) A decision-theoretic generalization of on-line learning and an application to boosting. Second Eur. Conf. Comput. Learn. Theory EuroCOLT ‘95, vol. 904 (Springer, Barcelona, Spain), 23–37.Google Scholar
- [52] (2014) Online independent set beyond the worst-case: Secretaries, prophets, and periods. Internat. Colloquium Automata Language Programming (Springer, Berlin), 508–519.Google Scholar
- [53] (2014) From minimax value to low-regret algorithms for online Markov decision processes. Amer. Control Conf. ACC ‘14 (IEEE, Piscataway, NJ), 471–476.Google Scholar
- [54] (2007) Automated online mechanism design and prophet inequalities. Proc. 22nd National Conf. Artificial Intelligence, vol. 1 (AAAI, Washington, DC), 58–65.Google Scholar
- [55] (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
- [56] (2019) Adversarial bandits with knapsacks. 60th IEEE Annual Sympos. Foundations Comput. Sci. FOCS 19 (IEEE, Piscataway, NJ), 202–219.Google Scholar
- [57] (2005) Efficient algorithms for online decision problems. J. Comput. System Sci. 71(3):291–307.Crossref, Google Scholar
- [58] (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web Internet Econom. (Springer, Berlin), 232.Google Scholar
- [59] (2014) Primal beats dual on online packing lps in the random-order model. Sympos. Theory Comput. STOC ‘14 (ACM, New York), 303–312.Google Scholar
- [60] (2012) Matroid prophet inequalities. Proc. Forty-Fourth Annual ACM Sympos. Theory Comput. (ACM, New York), 123–136.Google Scholar
- [61] (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Sympos. Foundations Comput. Sci. FOCS ‘03 (IEEE, Piscataway, NJ), 594–605.Google Scholar
- [62] (2018) Optimal online contention resolution schemes via ex-ante prophet inequalities. Preprint, submitted June 25, https://doi.org/10.48550/arXiv.1806.09251.Google Scholar
- [63] (2006) Combinatorial auctions with decreasing marginal utilities. Games Econom. Behav. 55(2):270–296.Crossref, Google Scholar
- [64] (2002) Truth revelation in approximately efficient combinatorial auctions. J. ACM 49(5):577–602.Crossref, Google Scholar
- [65] (2018) Contextual search via intrinsic volumes. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 268–282.Google Scholar
- [66] (1989) The weighted majority algorithm. 30th Annual Sympos. Foundations Comput. Sci. FOCS ‘89 (IEEE, Piscataway, NJ), 256–261.Google Scholar
- [67] (2017) An economic view of prophet inequalities. ACM SIGecom Exchanges 16(1):24–47.Crossref, Google Scholar
- [68] (2021) Corruption robust exploration in episodic reinforcement learning. Proc. Machine Learn. Res., 1–4.Google Scholar
- [69] (2014) Optimal regret minimization in posted-price auctions with strategic buyers. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 1871–1879.Google Scholar
- [70] (2014) Online Markov decision processes under bandit feedback. IEEE Trans. Automatic Control 59(3):676–691.Crossref, Google Scholar
- [71] (2018) Prophet inequalities vs. approximating optimum online. Internat. Conf. Web Internet Econom. (Springer, Berlin), 356–374.Google Scholar
- [72] (2016) Beyond matroids: Secretary problem and prophet inequality with general constraints. Proc. Forty-Eighth Annual ACM Sympos. Theory Comput. (ACM, New York), 324–332.Google Scholar
- [73] (1984) Comparison of threshold stop rules and maximum for independent nonnegative random variables. Ann. Probab. 12(4):1213–1216.Crossref, Google Scholar
- [74] (2011) Mechanism design via correlation gap. Proc. Twenty-Second Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia, PA), 710–719.Google Scholar
- [75] (1977) Probabilistic computations: Toward a unified measure of complexity. 18th Annual Sympos. Foundations Comput. Sci. FOCS ‘77 (IEEE, Piscataway, NJ), 222–227.Google Scholar
- [76] (2009) Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3):737–757.Link, Google Scholar
- [77] (2018) Occupation-oblivious pricing of cloud jobs via online learning. 2018 IEEE Conf. Comput. Comm. INFOCOM ‘18 (IEEE, Piscataway, NJ), 2456–2464.Google Scholar

