Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Proc. 25th Internat. Conf. Neural Inform. Processing Systems, Advances in Neural Information Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
Abernethy J, Kale S (2013) Adaptive market making via online learning. Proc. 27th Internat. Conf. Neural Inform. Processing Systems, Advances in Neural Information Processing Systems, vol. 2 (Curran Associates Inc., Red Hook, NY), 2058–2066.Google Scholar
Agarwal A, Bartlett P, Dama M (2010) Optimal allocation strategies for the dark pool problem. Proc. 13th Internat. Conf. Artificial Intelligence Statist. (JMLR), 9–16.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Almgren R, Chriss N (2001) Optimal execution of portfolio transactions. J. Risk 3:5–40.Crossref, Google Scholar
Alsabah H, Capponi A, Ruiz Lacedelli O, Stern M (2021) Robo-advising: Learning investors’ risk preferences via portfolio choices. J. Financial Econometrics 19(2):369–392.Crossref, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(v):397–422.Google Scholar
Baldacci B, Manziuk I (2020) Adaptive trading strategies across liquidity pools. Market Microstructure Liquidity 6(01–04):2050008. Google Scholar
Bernasconi M, Martino S, Vittori E, Trovò F, Restelli M (2022) Dark-pool smart order routing: A combinatorial multi-armed bandit approach. Proc. 3rd ACM Internat. Conf. AI Finance (ACM, New York), 352–360.Google Scholar
Bernasconi-De-Luca M, Fusco L, Dragić O (2021) martinobdl/itch: Itch50converter. Accessed January 3, 2026, https://zenodo.org/record/5209267.Google Scholar
Bistritz I, Zhou Z, Chen X, Bambos N, Blanchet J (2022) No weighted-regret learning in adversarial bandits with delays. J. Machine Learn. Res. 23(1):6205–6247.Google Scholar
Blanchet J, Xu R, Zhou Z (2024) Delay-adaptive learning in generalized linear contextual bandits. Math. Oper. Res. 49(1):326–345.Link, Google Scholar
Bouchaud J-P (2010) Price impact. Encyclopedia Quant. Finance.Google Scholar
Camilleri R, Jamieson K, Katz-Samuels J (2021) High-dimensional experimental design and kernel bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1227–1237.Google Scholar
Cartea Á, Jaimungal S (2016) Incorporating order-flow into optimal execution. Math. Financial Econom. 10:339–364.Crossref, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (JMLR), 208–214.Google Scholar
Coache A, Jaimungal S (2024) Reinforcement learning with dynamic convex risk measures. Math. Finance 34(2):557–587.Crossref, Google Scholar
Cont R, Kukanov A (2017) Optimal order placement in limit order markets. Quant. Finance 17(1):21–39.Crossref, Google Scholar
Cont R, Kukanov A, Stoikov S (2014) The price impact of order book events. J. Financial Econometrics 12(1):47–88.Crossref, Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
Ganchev K, Nevmyvaka Y, Kearns M, Vaughan JW (2010) Censored exploration and the dark pool problem. Comm. ACM 53(5):99–107.Crossref, Google Scholar
Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
Hambly B, Xu R, Yang H (2023) Recent advances in reinforcement learning in finance. Math. Finance 33(3):437–503.Google Scholar
Hsieh P-C, Liu X, Bhattacharya A, Kumar PR (2019) Stay with me: Lifetime maximization through heteroscedastic linear bandits with reneging. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 2800–2809.Google Scholar
Huo X, Fu F (2017) Risk-aware multi-armed bandit problem with application to portfolio selection. Roy. Soc. Open Sci. 4(11):171377.Crossref, Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600. Google Scholar
Ji J, Xu R, Zhu R (2022) Risk-aware linear bandits with application in smart order routing. Proc. 3rd ACM Internat. Conf. AI Finance (ACM, New York), 334–342.Google Scholar
JPMS Frequently Asked Questions–U.S. Equities (2021) J.P. Morgan Securities LLC Electronic Trading: Frequently Asked Questions–US Equities. Accessed January 3, 2026, https://www.jpmorgan.com/content/dam/jpm/cib/complex/content/markets/aqua/US_Electronic_Trading_FAQs.pdf.Google Scholar
Kiefer J, Wolfowitz J (1960) The equivalence of two extremum problems. Canadian J. Math. 12:363–366.Crossref, Google Scholar
Laruelle S, Lehalle C-A, Pages G (2011) Optimal split of orders across liquidity pools: A stochastic algorithm approach. SIAM J. Financial Math. 2(1):1042–1076.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
Lin Q, Chen X, Peña J (2015) A trade execution model under a composite dynamic coherent risk measure. Oper. Res. Lett. 43(1):52–58.Crossref, Google Scholar
Markowitz H (1952) Portfolio selection. J. Finance 7(1):7791.Google Scholar
NASDAQ (2017) Tradetalks Nasdaq PSX is the most unique equity exchange. Accessed January 3, 2026, https://www.nasdaq.com/articles/tradetalks-nasdaq-psx-is-the-most-unique-equity-exchange-2017-04-19.Google Scholar
NASDAQ (2022a) Introduction to Nasdaq BX. Accessed January 3, 2026, https://www.nasdaq.com/solutions/nasdaq-bx-stock-market.Google Scholar
NASDAQ (2022b) Introduction to Nasdaq PSX. Accessed January 3, 2026, https://www.nasdaq.com/solutions/nasdaq-psx-stock-market.Google Scholar
NASDAQ ITCH Data (2022) Nasdaq ITCH data. Accessed January 3, 2026, https://emi.nasdaq.com/ITCH/.Google Scholar
Pike-Burke C, Agrawal S, Szepesvari C, Grunewalder S (2018) Bandits with delayed, aggregated anonymous feedback. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 4105–4113.Google Scholar
Preis T (2011) Price-time priority and pro rata matching in an order book model of financial markets. Econophysics of Order-Driven Markets (Springer, Milano), 65–72.Crossref, Google Scholar
Rosenblatt Securities (2022) Let there be light: US edition: Market structure report. Accessed January 3, 2026, https://www.rblt.com/market-reports/let-there-be-light-us-edition-42.Google Scholar
Rubinstein M (2002) Markowitz’s “portfolio selection”: A fifty-year retrospective. J. Finance 57(3):1041–1045.Crossref, Google Scholar
Sani A, Lazaric A, Munos R (2012) Risk-aversion in multi-armed bandits. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates Inc., Red Hook, NY).Google Scholar
Saux P, Maillard O (2023) Risk-aware linear bandits with convex loss. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 7723–7754.Google Scholar
Shen W, Wang J, Jiang Y-G, Zha H (2015) Portfolio choices with orthogonal bandit learning. Proc. 24th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 974–980.Google Scholar
Shipra Agrawal NG (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
Si N, Zhang F, Zhou Z, Blanchet J (2023) Distributionally robust batch contextual bandits. Management Sci. 69(10):5772–5793.Link, Google Scholar
Simchi-Levi D, Wang C, Zheng Z (2023a) Non-stationary experimental design under structured trends. Preprint, submitted July 25, https://doi.org/10.2139/ssrn.4514568.Google Scholar
Simchi-Levi D, Zheng Z, Zhu F (2023b) Stochastic multi-armed bandits: Optimal trade-off among optimality, consistency, and tail risk. Proc. 37th Conf. Neural Inform. Processing Systems (Neural Information Processing Systems Foundation, New Orleans, LA).Google Scholar
Tan VYF, Prashanth LA, Jagannathan K (2022) A survey of risk-aware multi-armed bandits. Proc. Thirty-First Internat. Joint Conf. Artificial Intelligence (IJCAI), 5623–5629.Google Scholar
Vakili S, Zhao Q (2016) Risk-averse multi-armed bandit problems under mean variance measure. IEEE J. Selected Topics Signal Processing 10(6):1093–1111.Crossref, Google Scholar
Yu JY, Mannor S (2009) Piecewise-stationary bandit problems with side observations. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 1177–1184.Google Scholar
Yu C, Liu J, Nemati S, Yin G (2021) Reinforcement learning in healthcare: A survey. ACM Comput. Surveys 55(1):1–36.Crossref, Google Scholar
Zhou XY, Yin G (2003) Markowitz’s mean variance portfolio selection with regime switching: A continuous-time model. SIAM J. Control Optim. 42(4):1466–1482.Crossref, Google Scholar
Zhu R, Kveton B (2022) Safe data collection for offline and online policy learning. Preprint, submitted August 4, https://arxiv.org/abs/2111.04835.Google Scholar
Zhu Q, Tan VYF (2020) Thompson sampling algorithms for mean variance bandits. Proc. 37th Internat. Conf. Machine Learn. (PMLR, New York), 11599–11608.Google Scholar
Zhu F, Zheng Z (2020) When demands evolve larger and noisier: Learning and earning in a growing environment. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 11629–11638.Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:January 29, 2024
Accepted:March 30, 2026
Published Online:May 15, 2026

Cite as

Jingwei Ji, Renyuan Xu, Ruihao Zhu (2026) Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing. Operations Research 0(0).

https://doi.org/10.1287/opre.2024.0771

Keywords

Acknowledgments

The authors thank the area editor, associate editor, and reviewers for helpful comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News