A New Likelihood Ratio Method for Training Artificial Neural Networks

Yijie Peng
Yijie Peng
[email protected]
https://orcid.org/0000-0003-2584-8131
Department of Management Science and Information Systems, Guanghua School of Management, Peking University, Beijing 100871, China;
Search for more papers by this author
,
Li Xiao
Corresponding Author
Li Xiao
[email protected]
https://orcid.org/0000-0002-3063-0869
Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;
Search for more papers by this author
,
Bernd Heidergott
Bernd Heidergott
[email protected]
Department of Operations Analytics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands;
Search for more papers by this author
,
L. Jeff Hong
L. Jeff Hong
[email protected]
https://orcid.org/0000-0001-7011-4001
Department of Management Science, School of Management, Fudan University, Shanghai 200433, China;
Search for more papers by this author
,
Henry Lam
Henry Lam
[email protected]
https://orcid.org/0000-0002-3193-563X
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Search for more papers by this author

Department of Management Science and Information Systems, Guanghua School of Management, Peking University, Beijing 100871, China;

Search for more papers by this author

Li Xiao

Corresponding Author

Li Xiao

[email protected]

https://orcid.org/0000-0002-3063-0869

Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;

Search for more papers by this author

Bernd Heidergott

[email protected]

Department of Operations Analytics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands;

Search for more papers by this author

L. Jeff Hong

[email protected]

https://orcid.org/0000-0001-7011-4001

Department of Management Science, School of Management, Fudan University, Shanghai 200433, China;

Search for more papers by this author

Henry Lam

[email protected]

https://orcid.org/0000-0002-3193-563X

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:17 Sep 2021https://doi.org/10.1287/ijoc.2021.1088

References

Azulay A, Weiss Y (2018) Why do deep convolutional. networks generalize so poorly to small image transformations? Preprint, submitted May 30, http://arxiv.org/abs/1805.12177.Google Scholar
Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2):1008–1031.Crossref, Google Scholar
Cao X, Gong NL (2017) Mitigating evasion attacks to deep neural networks via region-based classification. Proc. 33rd Annual Comput. Security Appl. Conf., 278–287.Google Scholar
Carlini N, Wagner D (2017) Toward evaluating the robustness of neural networks. Proc. 2017 IEEE Symp. Security Privacy, 39–57.Google Scholar
Carlini N, Mishra P, Vaidya T, Zhang Y, Sherr M, Shields C, Wagner D, Zhou W (2016) Hidden voice commands. USENIX Security Symp., 513–530.Google Scholar
Choromanski K, Rowland M, Sindhwani V, Turner RE, Weller A (2018) Structured evolution with compact architectures for scalable policy optimization. Preprint, submitted April 6, https://arxiv.org/abs/1804.02395.Google Scholar
Cui Z, Fu MC, Hu J-Q, Liu Y, Peng Y, Zhu L (2020) On the variance of single-run unbiased stochastic derivative estimators. INFORMS J. Comput. 32(2):390–407.Abstract, Google Scholar
Elsayed GF, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow IJ, Sohl-Dickstein J (2018) Adversarial examples that fool both human and computer vision. Preprint, submitted February 22, https://arxiv.org/abs/1802.08195.Google Scholar
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. Preprint, revised April 10, https://arxiv.org/abs/1707.08945.Google Scholar
Fu MC (2015) Stochastic gradient estimation. Fu MC, ed. Handbooks of Simulation Optimization (Springer, New York), 105–147.Crossref, Google Scholar
Fu MC, Hong LJ, Hu J-Q (2009) Conditional Monte Carlo estimation of quantile sensitivities. Management Sci. 55(12):2019–2027.Link, Google Scholar
Glynn PW (1990) Likelihood ratio gradient estimation for stochastic systems. Comm. ACM. 33(10):75–84.Crossref, Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://arxiv.org/abs/1412.6572.Google Scholar
Haykin S (1998) Neural Networks: A Comprehensive Foundation (Prentice Hall, Hoboken, NJ).Google Scholar
Haykin SS (2009) Neural Networks and Learning Machines, vol. 3 (Pearson, Upper Saddle River, NJ).Google Scholar
Heidelberger P, Cao X-R, Michael A Zazanis RS (1988) Convergence properties of infinitesimal perturbation analysis estimates. Management Sci. 34(11):1281–1302.Link, Google Scholar
Heidergott B, Leahu H (2010) Weak differentiability of product measures. Math. Oper. Res. 35(1):27–51.Link, Google Scholar
Heidergott B, Volk-Makarewicz W (2016) A measure-valued differentiation approach to sensitivity analysis of quantiles. Math. Oper. Res. 41(1):293–317.Link, Google Scholar
Hendrycks D, Dietterich TG (2018) Benchmarking neural network robustness to common corruptions and perturbations. Preprint, submitted July 4, http://arxiv.org/abs/1807.01697.Google Scholar
Ho Y-C, Cao X-R (1991) Discrete Event Dynamic Systems and Perturbation Analysis (Kluwer Academic Publishers, Boston).Crossref, Google Scholar
Ho Y-C, Cao X, Cassandras C (1983) Infinitesimal and finite perturbation analysis for queueing networks. Automatica 19(4):439–445.Crossref, Google Scholar
Hong LJ (2009) Estimating quantile sensitivities. Oper. Res. 57(1):118–130.Link, Google Scholar
Jin J, Dundar A, Culurciello E (2015) Robust convolutional neural networks under adversarial noise. Preprint, submitted November 19, https://arxiv.org/abs/1511.06306.Google Scholar
Jordan MI (2018) Dynamical, symplectic and stochastic perspectives on gradient-based optimization. Proc. Internat. Congress of Mathematicians: Rio de Janeiro 2018.Google Scholar
LeCun YA, Bottou L, Orr GB, Müller K-R (2012) Efficient backprop. Yann AL, Leon B, Genevieve BO, Klaus-Robert M, eds. Neural Networks: Tricks of the Trade (Springer, New York), 9–48.Crossref, Google Scholar
L’Ecuyer P (1990) A unified view of the IPA, SF, and LR gradient estimation techniques. Management Sci. 36(11):1364–1383.Link, Google Scholar
Lehman J, Chen J, Clune J, Stanley KO (2018) Es is more than just a traditional finite-difference approximator. Proc. Genetic Evolutionary Comput. Conf. (ACM), 450–457.Google Scholar
Liao F, Liang M, Dong Y, Pang T, Zhu J, Hu X (2017) Defense against adversarial attacks using high-level representation guided denoiser. Preprint, submitted December 8, https://arxiv.org/abs/1712.02976.Google Scholar
Liu G, Hong LJ (2011) Kernel estimation of the Greeks for options with discontinuous payoffs. Oper. Res. 59(1):96–108.Link, Google Scholar
Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte Carlo gradient estimation in machine learning. J. Machine Learn. Res. 21(132):1–62.Google Scholar
Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.Crossref, Google Scholar
Norgaard M, Ravn O, Poulsen NK, Hansen LK (2000) Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook. Advanced Textbooks in Control and Signal Processing (Springer, Berlin).Google Scholar
Papernot N, Jha S, Fredrikson M, Celik ZB, McDaniel P, Swami A (2016) The limitations of deep learning in adversarial settings. 2016 IEEE Eur. Symp. Security Privacy, 372–387.Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. Internat. Conf. Machine Learn., 1310–1318.Google Scholar
Peng Y, Fu MC, Hu J-Q, Heidergott B (2018) A new unbiased stochastic derivative estimator for discontinuous sample performances with structural parameters. Oper. Res. 66(2):487–499.Link, Google Scholar
Pflug GC (1996) Optimization of Stochastic Models (Kluwer Academic, Boston).Crossref, Google Scholar
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Networks 12(1):145–151.Crossref, Google Scholar
Recht B, Roelofs R, Schmidt L, Shankar V (2018) Do CIFAR-10 classifiers generalize to CIFAR-10? Preprint, submitted June 1, http://arxiv.org/abs/1806.00451.Google Scholar
Reiman MI, Weiss A (1989) Sensitivity analysis for simulations via likelihood ratios. Oper. Res. 37(5):830–844.Link, Google Scholar
Rubinstein RY (1986) The score function approach for sensitivity analysis of computer simulation models. Math. Comput. Simulation 28(5):351–379.Crossref, Google Scholar
Rubinstein RY (1992) Sensitivity analysis of discrete event systems by the “push out” method. Ann. Oper. Res. 39(1):229–250.Crossref, Google Scholar
Rubinstein RY, Shapiro A (1993) Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method (Wiley, New York).Google Scholar
Rumerlhar DE, Hinton GE, Williams RJ (1986) Learning representation by back-propagating errors. Nature 323:533–536.Crossref, Google Scholar
Sabour S, Faghri F, Cao Y, Fleet DJ (2015) Adversarial manipulation of deep representations. Preprint, submitted November 16, https://arxiv.org/abs/1511.05122.Google Scholar
Schalkoff RJ (1997) Artificial Neural Networks, vol. 1 (McGraw-Hill, New York).Google Scholar
Sharda R (1994) Neural networks for the ms/or analyst: An application bibliography. Interfaces 24(2):116–130.Link, Google Scholar
Szegedy C, Sutskever I, Bruna J, Erhan D, Goodfellow I, Zaremba W, Fergus R (2014) Intriguing properties of neural networks. Preprint, submitted December 21, https://arxiv.org/abs//1312.6199.Google Scholar
Tam KY, Kiang MY (1992) Managerial applications of neural networks: The case of bank failure predictions. Management Sci. 38(7):926–947.Link, Google Scholar
Wang Y, Fu MC, Marcus SI (2012) A new stochastic derivative estimator for discontinuous payoff functions with application to financial derivatives. Oper. Res. 60(2):447–460.Link, Google Scholar
Xiao L, Peng Y, Hong LJ, Ke Z, Yang S (2020) Training artificial neural networks by generalized likelihood ratio method: An effective way to improve robustness. Proc. IEEE Trans. Automation Sci. Engrg (Institute of Electrical and Electronics Engineers (IEEE), New York), 1343–1348.Google Scholar
Zantedeschi V, Nicolae M-I, Rawat A (2017) Efficient defenses against adversarial attacks. Proc. 10th ACM Workshop Artificial Intelligence Security (Association for Computing Machinery, New York), 39–49.Google Scholar

cover image INFORMS Journal on Computing

Volume 34, Issue 1

January-February 2022

Pages 1-669, C2

Article Information

Supplemental Material

Metrics

Information

Received:November 30, 2019
Accepted:April 01, 2021
Published Online:September 17, 2021

Cite as

Yijie Peng, Li Xiao, Bernd Heidergott, L. Jeff Hong, Henry Lam (2021) A New Likelihood Ratio Method for Training Artificial Neural Networks. INFORMS Journal on Computing 34(1):638-655.

https://doi.org/10.1287/ijoc.2021.1088

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A New Likelihood Ratio Method for Training Artificial Neural Networks

References

Volume 34, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News