Scheduling Checks and Saves

Published Online:https://doi.org/10.1287/ijoc.4.1.60

A job is to be run on a machine subject to random failures. Failures are not self-evident. They must be detected by explicit tests, or checks. Checks detect failures to avoid wasting time working with a defective machine. After each successful check one has the option of saving the work just completed. Then, when failures occur, only the work done since the last save must be repeated. Effective use of checks and saves requires a compromise, since these procedures are themselves time-consuming. Scheduling saves alone, when failures are evident as soon as they occur, is often called checkpointing. The novelty of the model studied here stems from not assuming that failures are self-evident. This compounds the usual checkpointing problem by requiring schedules of failure checks as well as saves. This paper gives schedules of checks and saves that minimize the expected total time required to complete a given job. The most general failure mechanism considered is a renewal process. The Poisson process receives special emphasis, as it leads to the simplest results.

INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.