Optimal Strategy for Item Presentation in a Learning Process

Published Online:https://doi.org/10.1287/mnsc.13.11.773

We treat a dynamic programming problem concerned with an application of tailoring programmed instruction to the individual student. We use a model of learning based on stimulus-sampling theory in which a subject is to be taught n items in the course of N trials. The problem is to determine a strategy of trial-by-trial item selection to maximize the expected terminal level of achievement of the subject; a trial consists of a test on a selected item followed by a reinforcement or teaching action relative to the item. A subject is either in the “conditioned” or “unconditioned” state with respect to an item. His response to a test is either correct or incorrect, and the probability of a correct response depends upon his state; thus, the state is not in exact correspondence with the response. The reinforcement action permits a probabilistic transition from the unconditioned to the conditioned state during a trial. States are not observable; a strategy is based upon the history of responses to items presented up to the current trial. Associated with a subject is a current state probability vector (λ1, λ2,…, λn), λi= probability of conditioned state relative to item i, given the subject's history to date. We prove that the following (locally optimal) strategy is (globally) optimal: In each trial, present any item for which the current probability of the conditioned state is least.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.