An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes

Jiaqiao Hu
Jiaqiao Hu
[email protected]
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, New York 11794, USA
Search for more papers by this author
,
Michael C. Fu
Michael C. Fu
[email protected]
Robert H. Smith School of Business and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
Search for more papers by this author
,
Vahid R. Ramezani
Vahid R. Ramezani
[email protected]
Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
Search for more papers by this author
,
Steven I. Marcus
Steven I. Marcus
[email protected]
Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
Search for more papers by this author

Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, New York 11794, USA

Search for more papers by this author

Michael C. Fu

[email protected]

Robert H. Smith School of Business and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA

Search for more papers by this author

Vahid R. Ramezani

[email protected]

Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA

Search for more papers by this author

Steven I. Marcus

[email protected]

Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA

Search for more papers by this author

Published Online:1 May 2007https://doi.org/10.1287/ijoc.1050.0155

Abstract

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

cover image INFORMS Journal on Computing

Volume 19, Issue 2

Spring 2007

Pages 149-312

Article Information

Metrics

Information

Received:April 01, 2004
Accepted:June 01, 2005
Published Online:May 01, 2007

Cite as

Jiaqiao Hu, Michael C. Fu, Vahid R. Ramezani, Steven I. Marcus, (2007) An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes. INFORMS Journal on Computing 19(2):161-174.

https://doi.org/10.1287/ijoc.1050.0155

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes

Abstract

Volume 19, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News