Optimal Control of Brownian Inventory Models with Convex Inventory Cost: Discounted Cost Case

We consider an inventory system in which inventory level fluctuates as a Brownian motion in the absence of control. The inventory continuously accumulates cost at a rate that is a general convex function of the inventory level, which can be negative when there is a backlog. At any time, the inventory level can be adjusted by a positive or negative amount, which incurs a fixed positive cost and a proportional cost. The challenge is to find an adjustment policy that balances the inventory cost and adjustment cost to minimize the expected total discounted cost. We provide a tutorial on using a three-step lower-bound approach to solving the optimal control problem under a discounted cost criterion. In addition, we prove that a four-parameter control band policy is optimal among all feasible policies. A key step is the constructive proof of the existence of a unique solution to the free boundary problem. The proof leads naturally to an algorithm to compute the four parameters of the optimal control band policy.


Introduction
Dai and Yao [6] studied the optimal control of Brownian inventory models under the longrun average cost criterion. This paper is a companion of [6]. It studies the same Brownian inventory models, but under the discounted cost criterion. Its main purpose is to provide a tutorial on the powerful, lower-bound approach to proving the optimality of a control band policy among all feasible policies. The tutorial is rigorous and, except the standard Itô formula, self contained. In addition, this paper contributes to the literature by proving the existence of a "smooth" solution to the free boundary problem with a general convex holding cost function. As a consequence, a four-parameter optimal control band policy is shown to be optimal. Our existence proof also leads naturally to an algorithm to compute the optimal control band parameters.
The introduction in [6] gives detailed descriptions of the Brownian inventory models, control band policies, and the lower-bound approach. It also gives an extensive literature review. Most of the development there including the motivation to study non-linear holding cost function and the literature review applies to this paper as well and it will not be repeated here. In the rest of this introduction, we highlight the development that is specific to the discounted cost case.
As in [6], inventory position is assumed to be adjustable, either upward or downward. All adjustments are realized immediately without any leadtime delay. Each upward adjustment with amount ξ > 0 incurs a cost K + kξ, where K ≥ 0 and k > 0 are the fixed cost and the variable cost, respectively, for each upward adjustment. Similarly, each downward adjustment with amount ξ incurs a cost of L + ξ with fixed cost L ≥ 0 and variable cost > 0. In addition, we assume that the holding cost function h : R → R + is a general convex function that satisfies some minimal assumptions in Assumption 1. The objective is to find some control policy that balances the inventory cost and the adjustment cost so that, starting from any initial inventory level x, the (infinite-horizon) expected total discounted cost is minimized. When both upward and downward fixed costs are positive, the model is an impulse control problem. When both fixed cost are zero, the corresponding Brownian control problem is a singular control or instantaneous control problem. It was demonstrated in Section 6 of [6] that a singular control problem is much easier to solve than an impulse control problem and a two-parameter control band policy is optimal. This control band policy can be considered as the limit of a sequence of four-parameter control band policies, each of which is optimal for an impulse control problem. Therefore, in this paper, we restrict ourselves to impulse control problems; namely, we assume that K > 0 and L > 0. Although in this paper we do not consider the singular control problem or the Brownian control problem when the inventory backlog is not allowed, our proof for the existence of an optimal control band policy for the impulse control problem can be extended to cover these two cases. These extensions were carried out in Sections 6 and 7 of [6] in the average cost setting.
When the inventory holding cost function is linear, namely, for some constants p 1 > 0 and h 1 > 0, Constantinides and Richard [5] proved that a four-parameter control band policy is optimal under the condition that h 1 − βk > 0 and p 1 − β > 0. (1.2) As explained in [5] h 1 /β is the present value of the holding cost of keeping one unit of inventory now to infinity. If h 1 /β ≤ , it will never be optimal to reduce the inventory level as long as L > 0. Similarly, if p 1 /β ≤ k, it will never be optimal to increase the inventory as long as K > 0. Thus, condition (1.2) is also necessary for a four-parameter control band policy to be optimal. Baccarin [1] sketched a proof that a four-parameter control band policy is also optimal when the holding cost is quadratic given by where h 1 ≥ 0, p 1 ≥ 0, h 2 > 0 and p 2 > 0. In his proof, condition (1.2) is not needed any more as long as h 2 > 0 and p 2 > 0. Baccarin [1] deferred the detailed proof for the existence of a solution to the four-parameter free boundary problem to an online supplement. Unfortunately, this document can no longer be located over the Internet. Assuming K = L > 0 and k = = 0, Plehn-Dujowich [10] proved that a three-parameter control band policy is optimal when the holding cost function h satisfies h and h are continuous; (1.4) h is strictly concave and single-peaked; (1.5) |h|, |h | and |h | are bounded by a polynomial. (1.6) Both the linear cost in (1.1) and the quadratic cost in (1.3) do not satisfy the smoothness condition in (1.4).
In this paper, when the holding cost function is assumed to be general, satisfying Assumption 1 in Section 2, we prove that a four-parameter control band policy is optimal. Assumption 1 on the convex holding cost function h is considerable weaker than those in literature. The cost functions in [1,5,10] all satisfy Assumption 1. Condition (2.6) in Assumption 1 is analogs to (1.2) and is automatically satisfied for h in (1.3). Similar to the companion paper [6], we adopt the three-step lower-bound approach in our proof. In the first step, we prove that if there exists a "smooth" test function f that satisfies a set of differential inequalities, the function f dominates the value function at every initial inventory level x. In the second step, given a control band policy, it is shown that the value function within the band can be obtained as the unique solution to a second order differential equation. In the third step, a solution to a free boundary problem is shown to exist and satisfy the conditions for f in the first step.
The result in step 1 is known as the "verification theorem" in literature. All three prior papers [1,5,10] invoked the verification theorem in Richard [12], which in turn generalized the pioneering work of Bensoussan and Lions [2,3]. This tutorial advocates the lower bound approach that was also adopted by Harrison et. al [8] and Harrison and Taksar [9]. The advantage of this approach is that, except for applying the standard Itô formula, it is self-contained, and therefore this approach can readily be rigorously adopted in other related settings.
The free boundary problem is specified using the well known "smooth-pasting" method (see, e.g., [4]). Solving the free boundary problem in Step 3 is the most difficult task. We prove the existence of a C 1 solution to the free boundary problem that has four free parameters. Though our proof is similar to the one in [5], where a linear holding cost function is used, our proof is considerably more difficult. Unlike the proof in [5], our proof is also constructive so that it leads naturally an algorithm to compute the four parameters of the optimal control band. Recently, Feng and Muthuraman [7] developed an algorithm to compute the parameters of an optimal control band policy for the discounted Brownian control problem. They illustrated the convergence of their algorithm through some numerical examples. However, the convergence of their algorithm was not established.
The rest of this paper is organized as follows. In Section 2, we define our Brownian control problem. In Section 3 we present a version of Itô formula that does not require the test function f be C 2 function. A lower bound for all feasible policies is established in Section 4. Section 5 shows that under a control band policy, the value function within the band can be obtained as a solution to a second order ordinary differential equation (ODE). Under the assumption that a free-boundary problem has a unique solution that has desired regularity properties, Section 6 proves that there is a control band policy whose discounted cost achieves the lower bound. Thus, the control band policy is optimal among all feasible policies. Section 7 is a lengthy one that devotes to the construction of the solution to the free-boundary problem. In the section, the parameters for the optimal control band policy are characterized. Section 7 constitutes the main technical contribution of this paper.

Impulse Brownian Control Models
Let X = {X(t), t ≥ 0} be a Brownian motion with drift µ and variance σ 2 , starting from x. Then, X has the following representation is a standard Brownian motion that has drift 0, variance 1, starting from 0. We assume W is defined on some filtered probability space (Ω, {F t }, F, P) and W is an {F t }-martingale. Thus, W is also known as an {F t }-standard Brownian motion. We use X to model the netput process of the firm. For each t ≥ 0, X(t) represents the inventory level at time t if no control has been exercised by time t. The netput process will be controlled and the actual inventory level at time t, after controls has been exercised, is denoted by Z(t). The controlled process is denoted by Z = {Z(t), t ≥ 0}. With a slight abuse of terminology, we call Z(t) the inventory level at time t, although when Z(t) < 0, |Z(t)| is the backorder level at time t.
Controls are dictated by a policy. A policy ϕ is a pair of stochastic processes (Y 1 , Y 2 ) that satisfies the following three properties: (a) for each sample path ω ∈ Ω, Y i (ω, ·) ∈ D, where D is the set of functions on R + = [0, ∞) that are right continuous on [0, ∞) and have left limits in (0, ∞), and Y 2 (t) the cumulative upward and downward adjustment, respectively, of the inventory in [0, t]. Under a given policy (Y 1 , Y 2 ), the inventory level at time t is given by Therefore, Z is a semimartingale, namely, a martingale σW plus a process that is of bounded variation. Because K is assumed to be positive, we restrict upward controls that have a finitely many upward adjustment in a finite interval. This is equivalent to requiring Y 1 to be piecewise constant function on each sample path. Under such an upward control, the upward adjustment times can be listed as a discrete sequence {T 1 (n) : n ≥ 0}, where the nth upward adjustment time can be defined recursively via where, by convention, T 1 (0) = 0 and ∆Y 1 (t) = Y 1 (t) − Y 1 (t−). The amount of the nth upward adjustment is denoted by It is clear that specifying such a upward adjustment policy Y 1 = {Y 1 (t), t ≥ 0} is equivalent to specifying a sequence of {(T 1 (n), ξ 1 (n)) : n ≥ 0}. In particular, given the sequence, one has and N 1 (t) = max{n ≥ 0 : T 1 (n) ≤ t} is the number of upward controls [0, t]. Thus, it is sufficient to specify the sequence {(T 1 (n), ξ 1 (n)) : n ≥ 0} to describe an upward adjustment policy. Similarly, since L > 0, it is sufficient to specify the sequence {(T 2 (n), ξ 2 (n)) : n ≥ 0} to describe a downward adjustment policy and Merging these two sequences, we have the sequence {(T n , ξ n ), n ≥ 0}, where T n is the nth adjustment time of the inventory and ξ n is the amount of adjustment at time T n . When ξ n > 0, the nth adjustment is an upward adjustment and when ξ n < 0, the nth adjustment is a downward adjustment. The policy (Y 1 , Y 2 ) is adapted if T n is an {F t }-stopping time and each adjustment ξ n must be F Tn− measurable, In general, we allow an upward or downward adjustment at time t = 0. By convention, we set Z(0−) = x and call Z(0−) the initial inventory level. By (2.1), which can be different from the initial inventory level Z(0−). Under a feasible policy ϕ = {(Y 1 (t), Y 2 (t)} with initial inventory level Z(0−) = x and a discount rate β > 0, the expected total discounted cost DC(x, ϕ) is defined to be where E x is the expectation operator conditioning on the initial inventory level being Z(0−) = x. Because of (2.2) and (2.3), this Brownian inventory control model is called the impulse Brownian control model. Clearly, we need to restrict our feasible policies to satisfy Otherwise, DC(x, ϕ) = ∞. We assume the inventory cost function h : R → R + satisfies the following assumption. (e) h (x) has smaller order than e λ 1 x as x ↑ ∞, that is +∞ a e −λ 1 y h (y)dy < ∞, (2.8) where λ 1 = (µ 2 + 2βσ 2 ) 1/2 − µ /σ 2 > 0.
(b) The continuous and convex holding cost function h can be relaxed to be continuously differentiable once and twice at all but a finitely many points. (c) When lim x→∞ h (x) ≤ β, it follows the same reasoning as in [5] that it will never be optimal to reduce the inventory level as long as L > 0. Similarly, when lim x→−∞ h (x) ≥ kβ, it will never be optimal to increase the inventory level as long as K > 0.
The following elementary lemma on the holding cost function is useful in later development. By using the L' Hôpital rule, one has where the last equality is due to (2.9).
(b) We prove (2.13). The proof of (2.13) is similar and is omitted. The first part of (2.6) implies that there exist a constant c 1 > 0 and x ∈ (a, ∞) such that for any x ≥ x , where the first inequality is due to the assumption h (x) ≥ 0 for x > a. By using the L' Hôpital rule, one has lim x↑∞ λ 2 x a e λ 2 (y−a) h (y)dy

Itô Formula
In this section, we present the Itô formula, tailored to the discounted setting. Then where is the generator of the (µ, σ 2 )-Brownian motion X, and t 0 e −βt f (Z(s))dW (s) is interpreted as the Itô integral.

Lower Bound
In this section, we state and prove a theorem that establishes a lower bound for the optimal expected total discounted cost. This theorem is closely related to the "verification theorem" in literature. Its proof is self contained, using the Itô formula in Section 3. Define Then DC(x, ϕ) ≥ f (x) for each feasible policy ϕ and each initial state Z(0−) = x ∈ R.

Control Band Policies
We use {d, D, U, u} to denote the control band policy associated with parameters d, D, U , and u with d < D < U < u. Let us fix a control band policy ϕ = {d, D, U, u} and an initial inventory level Z(0−) = x. The adjustment amount ξ n of the control band policy is given by and for n = 1, 2, ..., where again Z(t−) denotes the left limit at time t, T 0 = 0 and is the nth adjustment time. (By convention, we assume Z is right continuous having left limits.) Our first task is to obtain an expression for the value functionV , whereV (x) is the expected total discounted cost when the initial inventory level is x. We first present the following lemma.
then for each starting point x ∈ R, the expected total discounted cost DC(x, ϕ) is given bȳ

Remark. (5.2) and (5.3) imply thatV is continuous at d and u.
Proof. Consider the control band policy which, together with (5.5), implies that The analysis for the case x ≥ u is analogous and is omitted.
We end this section by explicitly finding a solution V to (5.1)-(5.3).
Then V is a solution to (5.1)-(5.3). In (5.7) and (5.8), we set Proof. Let so that z = λ 1 and z = −λ 2 are two solutions of the quadratic equation The homogenous ordinary differential equation (ODE) Γg − βg = 0 has two independent solutions g 1 (x) and g 2 (x), where where a is the minimum point of the convex inventory cost function h. Then the nonhomogenous ODE (5.1) has a particular solution V 0 (x) given by A general solution V (x) to (5.1) is given by Boundary conditions (5.2) and (5.3) become Using the coefficients defined in (5.9)-(5.10), we see the boundary conditions (5.11) and (5.12) become from which we have unique solution for A 1 and B 1 given in (5.7) and (5.8).

Optimal Policy and Optimal Parameters
Theorem 4.1 suggests the following strategy to obtain an optimal policy. We hope that a control band policy is optimal. Therefore, the first task is to find an optimal policy among all control band policies. We denote this optimal control band policy by ϕ * = {d * , D * , U * , u * } with the expected total discounted cost for any starting point x ∈ R. We hope thatV can be used as the function f in Theorem 4.1. To find the corresponding f that satisfies all the conditions of Theorem 4.1, we provide the conditions that should be imposed on the optimal parameters See Section 5.2 of [6] for an intuitive derivation of these conditions. Under condition (6.3), V is a C 1 function on R. Therefore, (6.3) is also known as the "smooth-pasting" condition.
In this section, we will first prove in Theorem 6.1 the existence of parameters d * , D * , U * and u * such that the value function V , defined on [d * , u * ], corresponding the control band policy ϕ * = {d * , D * , U * , u * } satisfies (5.1)-(5.3) and (6.2)-(6.3). As part of the solution, we are to find the boundary points d * , D * , U * and u * from equations (5.1)-(5.3) and (6.2)-(6.3). These equations define a free boundary problem. We then prove in Theorem 6.2 that the functionV in (6.1) with parameters d * , D * , U * and u * satisfies all the conditions in Theorem 4.1; therefore, the control band policy ϕ * is optimal among all feasible policies.
To facilitate the presentation of Theorem 6.1, we first find a general solution without worrying about boundary conditions (5.2) and (5.3). Proposition 1 shows that V is given in the form where V 0 is given in (5.6). Since A 1 and B 1 are yet to be determined, which need d * , D * , U * and u * , V is also yet to be determined. Differentiating both sides of (5.1) with respect to x, we have that g(x) in (6.5) can be rewritten as where the third equality uses the assumption that h(a) = 0 and in the last equality A and B satisfy 2 The following theorem characterizes optimal parameters (d * , D * , U * , u * ) and parameters A * and B * in (6.7) via solution g = g A,B . Furthermore, g has a local minimum at x 1 < a and a local maximum at x 2 > a. The function g is strictly decreasing on (−∞, x 1 ), strictly increasing on (x 1 , x 2 ) and strictly decreasing again on (x 2 , ∞). If g satisfies all conditions (6.6), (6.10)-(6.15) in Theorem 6.1, V (x) in (6.4) clearly satisfies all conditions (5.1)-(5.3) and (6.2)-(6.3). The proof of Theorem 6.1 is long, and we defer it to Section 7.
Theorem 6.2. Assume that the holding cost function h satisfies Assumption 1. Let d * < D * < U * < u * , along with constants A * and B * , be the unique solution in Theorem 6.1. Then the control band policy ϕ * = {d * , D * , U * , u * } is optimal among all non-anticipating policies.
Proof. Letḡ We now show thatV satisfies all the conditions in Theorem 4.1. Thus, Theorem 4.1 shows that the expected total discounted cost under any feasible policy is at leastV (x). SinceV (x) is the expected total discounted cost under the control band policy ϕ * with starting point x,V (x) is the optimal cost and the control band policy ϕ * is optimal among all feasible policies.
First,V (x) is in C 2 ((d * , u * )). Condition (6.10) implies and (6.11) implies By Theorem 5.1,V defined in (6.17) must be the discounted cost under control band policy ϕ * . Now, we show thatV (x) satisfies the rest of conditions in Theorem 4.1. Conditions (6.12) and (6.15) imply that truncated functionV (x) is continuous in R. Therefore, In particular It follows from part (a) and part (b) of Lemma 7.2 that d * < Figure 1). Thus, we Now we verify thatV satisfies (4.3). Let x, y ∈ R with y < x. Then, where the first inequality follows fromḡ(z) = −k for z ≤ d * andḡ(z) = g(z) ≥ −k for D * < z < u * andḡ(z) = ≥ −k for z ≥ u * , and the second inequality follows from the fact thatḡ(z) = g(z) ≤ −k for z ∈ [d * , D * ]; see, Figure 1. Thus (4.3) is proved. It remains to verify thatV satisfies (4.4). For x, y ∈ R with y > x.

Optimal Control Band Parameters
This section is devoted to the proof of Theorem 6.1. We separate the proof into a series of lemmas.
Since h is convex, one has h (x) ≤ h (y) whenever the derivatives at x < y exist. It follows that lim x↑a h (x) and lim x↓a h (x) exist. Define We have h (a−) ≤ h (a+). Recall the function g in (6.7). Using the integration by parts, one has (7.1) It follows that for x > a. We have the following lemma.  Similarly we can prove (7.5). and each A satisfying B < A, (7.8)  For each fixed A and B satisfying (7.8)-(7.9), the local maximizer Remark. (a) The set of (A, B) that satisfies (7.7) and (7.8) is the shaded region in Figure  2. The set of (A, B) that satisfies (7.8) and (7.9) is the shaded region in Figure 3. Proof. We only prove the existence of x 1 and the properties of g(x) in x ∈ (−∞, a). The proof for the existence of x 2 and the properties of g(x) in x ∈ (a, ∞) is similar, and it is omitted.
In order to prove the existence of x 1 , we divide B ∈ B, h (a+) into two cases: B ∈ B, h (a−) and B ∈ h (a−), h (a+) .
Note that h is convex, we have h (x) ≥ 0 for all x ∈ R except x = a. Therefore, a x e λ 2 (y−a) h (y)dy ≥ 0 is decreasing in x ∈ (−∞, a). Then for fixed B ∈ B, h (a−) , there exists an x with x ∈ (−∞, a] such that We are going to prove that g (x) is strictly increasing in x ∈ (−∞, x ) and Since g (x) is continuous and strictly increasing in x ∈ (−∞, x ), (7.18) and (7.19) imply that there exists a unique Combining this with (7.20), we have from which one proves the existence of x 1 and properties of g(x) in (−∞, a). It remains to prove that g (x) is strictly increasing in x ∈ (−∞, x ), and that (7.12), and (7.18)-(7.20) hold. We first prove that g (x) is strictly increasing in x ∈ (−∞, x ). For x ∈ (−∞, x ), where the first inequality is due to (7.8). Using (7.3), we further have that for x ∈ (−∞, x ), > 0. (7.22) This proves g (x) is strictly increasing in (−∞, x ). To see (7.18), it follows from (7.2) that To evaluate this limit, we first have where the last equality follows from (2.11). Next, To see (7.19), it follows from (7.2) that where the second and last equalities are due to (7.17) and the last inequality is due to (7.8).
We next prove (7.29). From (7.10), we have We will show that lim This, together with (7.37), implies that 0 = lim from which one has that B ↓ B when x 1 ↓ −∞. Using the monotonicity between x 1 and B (see (7.34)), we must have (7.29). It remains to prove (7.38). To see this, = lim where the last equality is due to (2.11). Therefore, we have proved (7.38). The proof for (7.31) and (7.32) is similar.  Proof. (a) First, fix a B that satisfies (7.39). We consider the value of g A,B (x 2 (A, B)) for A ∈ (B ∨ h (a−), A).
> 0. Therefore, using the expression in (6.7) for g, we have where the second equality uses λ 1 λ 2 = 2β σ 2 and the last inequality is due to the first part of (2.6).
It remains to prove (7.52). Next we consider two cases: B ∈ (h (a−), A ∧ β) and B ∈ (−∞, h (a−)]. If B ∈ (h (a−), A ∧ β), lim A↓B x 2 (A, B) = a in (7.31) implies that where the inequality is due to the second part of (2.6). A, B)) > −k. Then (7.55) and (7.56) imply that there exists a unique B(A) ∈ (B, A) such that is also well defined for A ∈ (A int , A).

63)
and for A ∈ (A 1 , A), ∂A dx > 0, (7.68) where the second equality is due to g A,B(A) (x 2 (A, B(A))) = 0, and the inequality is due to where the second inequality holds because (7.36) and the equality is due to (7.32). Therefore, for A ∈ (A , A), (A, B(A))).
Applying the Implicit Function Theorem to g A,B(A) (U 1 (A, B(A))) = + M 1 , we have that where the second equality is due to (7.45), and the inequality is due to U 1 (A, B(A)) > x 1 (A, B(A)) and g A,B(A) (U 1 (A, B(A))) > 0. Thus, for any A ∈ (A , A), Therefore, for any A ∈ (A , A), which, together with (7.71), implies (7.67). Define It follows from (7.45) that where the last inequality is due to (7.49). See Figure 5 for the point (A 1 , B 1 ). (b) Applying the Implicit Function Theorem to Λ 2 (A * (B), B) = L, we have (7.74).  Proof. We only need to show that Λ 1 (A * (B), B) can take any value in (−∞, 0) for B ∈ (B, B 1 ) and is strictly increasing in B.