Optimal Strategy for Item Presentation in a Learning Process
Abstract
We treat a dynamic programming problem concerned with an application of tailoring programmed instruction to the individual student. We use a model of learning based on stimulus-sampling theory in which a subject is to be taught n items in the course of N trials. The problem is to determine a strategy of trial-by-trial item selection to maximize the expected terminal level of achievement of the subject; a trial consists of a test on a selected item followed by a reinforcement or teaching action relative to the item. A subject is either in the “conditioned” or “unconditioned” state with respect to an item. His response to a test is either correct or incorrect, and the probability of a correct response depends upon his state; thus, the state is not in exact correspondence with the response. The reinforcement action permits a probabilistic transition from the unconditioned to the conditioned state during a trial. States are not observable; a strategy is based upon the history of responses to items presented up to the current trial. Associated with a subject is a current state probability vector (λ1, λ2,…, λn), λi= probability of conditioned state relative to item i, given the subject's history to date. We prove that the following (locally optimal) strategy is (globally) optimal: In each trial, present any item for which the current probability of the conditioned state is least.

