Estimating a Function and Its Derivatives Under a Smoothness Condition

Published Online:https://doi.org/10.1287/moor.2020.0161

Abstract

We consider the problem of estimating an unknown function f*:RdR and its partial derivatives from a noisy data set of n observations, where we make no assumptions about f* except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of f* in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as n. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.

1. Introduction

We are concerned with the problem of estimating an unknown function f*:RdR and its partial derivatives over a domain [a,b]d of interest when there is no closed-form formula for f*; hence, simulation must be conducted to estimate f*(x) at xRd, or only noisy data on f* are available. We make no assumptions about f* except that it has square integrable partial derivatives of order m.

To place the problem in a more rigorous setting, we consider the situation where we wish to estimate the unknown function f*:RdR and its partial derivatives by taking observations (X1,Y1),,(Xn, Yn) satisfying

Yi=f*(Xi)+εi
for i=1,2,,n, where ((Xi,εi):1in) is a sequence of [a,b]d×R-valued independent and identically distributed (iid) random vectors satisfying E(εi|Xi)=0 and E(εi2|Xi)=σ2< for i=1,2,,n.

Under the assumption that f* has square integrable partial derivatives of order m with m1, our goal is to estimate f* and its partial derivatives of order up to m1 as functions over [a,b]d from the data set (X1,Y1),,(Xn,Yn).

When the only information available on the unknown regression function f* is the fact that it is “smooth” so that it has square integrable partial derivatives of order m, one way to estimate f* is to fit a “smooth” function to the data set, that is, find the solution to the following problem:

MinimizeEn(f)1ni=1n(Yif(Xi))2(1)
over fF for some appropriately chosen set F of functions. The next question is “Which set is appropriate for F?” One natural candidate for F is the set F of functions f:RdR whose partial derivatives of order m are square integrable, that is, F={f:RdR:J(f)<}, where
J(f)|α|=mRd{Dαf(x)}2dx,

α=(α1,,αd) is a d-tuple of nonnegative integers, |α| is the order of α defined by |α|=α1++αd, and Dα denotes the partial differential operator defined by

Dα=α1++αdx1α1xdαd.

However, a question arises as to whether there exists a solution to (1). To guarantee the existence of a solution to (1), one of the properties that F must have is the completeness with respect to a suitably chosen metric (or a pseudometric). It turns out that F must be larger than F to be complete. However, if we let F be the set of “generalized functions” whose “weak” derivatives of order m are square integrable, then F contains F and is large enough to be a semi-Hilbert space in the ·m seminorm, where fmJ(f); see, for example, Meinguet [51, theorem 1 on p. 130]. (A weak derivative of a function f is a generalized version of the classical derivative in the sense that whenever the classical derivative Dαf exists, it is also a weak derivative of f.) Hence, throughout this paper, we will assume F=Fm, where Fm is the space of generalized functions whose weak partial derivatives of order m are square integrable; see Section 2 of this paper and Oden and Reddy [54] for the precise definitions of Fm, generalized functions, and weak derivatives. We will further restrict our attention to the functions f whose smoothness J(f) is bounded by a certain constant. Thus, a natural candidate for the estimator of f* is the solution f^n to the following problem:

Problem (A):MinimizefFmEn(f)subject toJ(f)Un
for some upper bound Un on J(f). As Proposition 1 of this paper suggests, the solution to Problem (A) exists under the assumption that 2m>d and {X1,,Xn} are Pm1-unisolvent. For definitions, see Assumption 1 in Section 3.

The question now becomes how to compute the solution to Problem (A) numerically. As Proposition 6 of this paper states, the solution to Problem (A) can be found by solving a convex programming problem. However, it is not clear how to compute or estimate Un from the data set. This necessitates another approach that can estimate f* using a parameter that can be directly estimated from the data set. As an alternative to Problem (A), we consider the solution g^n to the following problem:

Problem (B):MinimizefFmJ(f)subject toEn(f)Sn
for some upper bound Sn on the average of squared errors. Problem (B) is appealing from a computational point of view; it is intuitively acceptable that Sn should be close to σ2, and σ2 can be readily estimated from the data set.

In the context of nonparametric regression techniques in the statistics literature, Problems (A) and (B) are often contrasted with the following formulation:

Problem (C):MinimizefFmEn(f)+λJ(f),
where λ is a constant called the smoothing parameter. The solution to Problem (C) is often referred to as the “penalized least squares estimator”; see Györfi et al. [27], Wahba [71], and the references therein. One weakness of the penalized least squares estimator is that its performance is highly sensitive to the choice of λ, and the question of how to choose the smoothing parameter λ has been a challenging one. Cross validation (Craven and Wahba [15]) has been a popular method for choosing the smoothing parameter, but it does not guarantee the consistency of the estimators of the derivatives as n. Thus, when the primary purpose of a modeler is to estimate a derivative of f* and he or she does not want to use any parameter that cannot be directly estimated from the data set, Problem (B) is preferred from the computational point of view.

This paper is motivated by a stock trader whose goal is to estimate “gamma,” which is the second derivative of the value, f*, of a stock option as a function of the underlying stock price. The only information that he has is the fact that gamma is a smooth function of the stock price. The trader can use Problem (C) to estimate f* and its second derivative, but he needs to estimate the smoothing parameter λ. Cross validation does not guarantee the consistency of the estimator of the second derivative of f* as n. Moreover, the trader does not want to use any parameter that cannot be directly estimated from a data set. In this situation, Problem (B) better serves his purpose because its only parameter Sn can be directly estimated from the data set. In fact, Problem (B) has been used widely within the numerical analysis community; see, for example, Cox [13, p. 530].

Despite its popularity in the numerical analysis community, little is known about how to compute f^n and g^n numerically and what statistical properties f^n and g^n possess. The contributions of this paper are showing that f^n and g^n are computable as solutions to convex programs, establishing the consistency of f^n and g^n as estimators of f*, establishing the consistency of Dαf^n and Dαg^n as estimators of Dαf* for |α|=1,,m1, and computing the convergence rate of f^n when n. The key contributions of this paper can be summarized as follows.

  1. We identify the relationship between Problems (A) and (B). We prove that for Un within a certain interval, the solution f^n to Problem (A) exists uniquely and becomes a unique solution to Problem (B) with Sn=J(f^n). Conversely, for Sn within a certain interval, the solution g^n to Problem (B) exists uniquely and becomes a unique solution to Problem (A) with Un=En(g^n).

  2. We discuss the computational aspects of Problems (A) and (B). We show that the solutions to Problems (A) and (B) can be found by solving convex programming problems. There exist numerous efficient algorithms that can successfully find the solutions to convex programs (see, for example, Boyd and Vandenberghe [6], Zangwill [76]), so this enables us to easily compute the solutions to Problems (A) and (B).

  3. We establish the consistency of f^n and g^n as estimators of f* and the consistency of Dαf^n and Dαg^n as estimators of Dαf* for |α|=1,,m1 as n. Our main results for Problem (A) state that if UnJ(f*) for n sufficiently large and lim supnUn<, we have

    supx[a,b]d|f^n(x)f*(x)|0almost surely (a.s.),
    and we have
    E(f^n(X)f*(X))20andE(Dαf^n(X)Dαf*(X))20
    for |α|=1,,m1 as n under modest assumptions. On the other hand, our main results for Problem (B) state that under some conditions on g^n and Sn, we have
    supx[a,b]d|g^n(x)f*(x)|0a.s.,
    E(g^n(X)f*(X))20 and E(Dαg^n(X)Dαf*(X))20
    for |α|=1,,m1 as n.

  4. We compute the convergence rate of f^n as n. Specifically, when UnJ(f*) for n sufficiently large and lim supnUn<, we have

    {1ni=1n(f^n(Xi)f*(Xi))2}1/2=Op(nm/(2m+d))
    under the assumption that the εi’s are uniformly sub-Gaussian.

1.1. Literature Review

This paper is a contribution to the literature of nonparametric regression for smooth functions. Problem (C) has been studied extensively in the literature, whereas relatively less attention has been paid to Problems (A) and (B). For Problem (C), the existence of a solution is established by Duchon [19], and the convergence rates are derived by Cox [14], Rice and Rosenblatt [57], Utreras [67], and van de Geer [69]. Craven and Wahba [15] and Utreras [66] discuss how to choose the smoothing parameter through cross validation. Duchon [19] has shown that the solution to Problem (C) has an explicit representation as a sum of basis functions. However, this approach often requires solving a system of linear equations that involves a dense and highly ill-conditioned matrix (see Dierckx [17, p. 145]). Numerically more efficient algorithms are developed by Hutchinson and de Hoog [33] and Luo and Wahba [47]. Schoenberg [60] has shown that a solution to Problem (A) is also a solution to Problem (C) for some λ appropriately chosen, whereas Kersey [35] and Reinsch [56] studied the relationships between Problems (C) and (B). Green et al. [24] computed convergence rates of the Laplacian smoothing estimator, which can be viewed as a discrete approximation of Problem (C). For further references, see, for example, de Boor [16], Dierckx [17], Green and Silverman [23], Györfi et al. [27], Wahba [71], and Wegman and Wright [74]. For Problem (A), when d=1, the existence and uniqueness of the solution are established by Schoenberg [60], and the convergence rates are derived by van de Geer [68]. When d2, we could not find any work concerning how to compute the solution to Problem (A) numerically or what statistical properties the solution to Problem (A) possesses. For Problem (B), when d=1, the existence and uniqueness of the solution are established by Reinsch [56]. However, we could not find any work concerning how to compute the solution to Problem (B) and what statistical properties the solution to Problem (B) possesses either when d=1 or when d2. Therefore, the main contribution of this paper is to establish a theoretical foundation of Problems (A) and (B) for the case when d1 by suggesting a convex programming formulation, establishing consistency, and computing the convergence rate.

This paper is also a contribution to the literature on shape constrained estimation. Many researchers have studied the problem of estimating the unknown regression function f* when f* is known to possess shape properties, such as monotonicity or convexity. One of the most popular approaches is least squares estimation under shape restrictions. For instance, when f* is known to be monotone, one can fit a monotone function to the data set by solving the following quadratic program:

MinimizefEn(f)subject tof(Xi)f(Xj) if XiXj for 1i,jn,(2)
where XiXj denotes coordinate-wise monotonicity. The solution to (2) is referred to as the isotonic regression estimator and has well-established theory. Brunk [7] and Brunk [8] introduced isotonic regression, established consistency, and computed convergence rates when d=1. Mammen [48] considered estimation of a smooth monotone function when d=1. Lim [44] established consistency of isotonic regression and computed its convergence rate when d1 and f* is possibly misspecified. On the other hand, when f* is known to be convex, the constraint of (2) can be replaced by the convexity condition, and the solution to the corresponding quadratic program is referred to as the convex regression estimator. Hildreth [30] first introduced convex regression when d=1; Hanson and Pledger [28] established consistency when d=1; Groeneboom et al. [26] computed the rate of convergence when d=1; Kuosmanen [40] formulated the convex regression estimator as the solution to a quadratic program when d1; Lim and Glynn [46] and Seijo and Sen [61] established consistency of the multivariate convex regression estimator; Lim [44] established consistency of convex regression and computed its convergence rates when d1 and f* is possibly misspecified; Bertsimas and Mundru [4] and Lee et al. [43] proposed computationally efficient algorithms for estimating a concave monotone function and a convex function, respectively; Lim [44] and Yagi et al. [75] proposed hypothesis tests for detecting monotonicity and convexity; Kuosmanen and Johnson [42] discussed an interesting application of production function estimation and proposed a shape constrained estimator for such an application; and Kuosmanen and Johnson [41] established connection between data envelopment analysis and least squares estimation under shape restrictions. Two drawbacks of the convex regression estimator are that (1) it becomes computationally expensive when n increases and that (2) it tends to overfit the data near the boundary of the domain; see, for example, Lim [45, p. 70]. To overcome these shortcomings, Yagi et al. [75] proposed a local polynomial kernel estimator with shape constraints, and Keshvari [36] fitted a convex piecewise linear function to data where the number of linear segments is prespecified. An advantage of Problem (B) is that one can readily compute an estimate of Sn using sample variances. In the context of optimizing behavior in economics, Varian [70] similarly used sample variances to set bounds for acceptable deviations from the fitted function. For a comprehensive survey, see Johnson and Jiang [34] and the references therein. Although the previous literature has focused on monotonicity and convexity constraints, this paper demonstrates that one can work with the smoothness condition only. In particular, Problem (A) replaces the constraint of (2) by a condition on the degree of smoothness (i.e., J(f)Un). Therefore, this paper can be viewed as an addition to the literature on nonparametric regression subject to shape constraints.

This paper can be viewed as a contribution to the literature on metamodel estimation or response surface estimation, which has been an important subject of research in the simulation community. A common goal in this setting is to fit a metamodel to a simulation output, where the simulation runs are time consuming. Therefore, most work has focused on a setting where n is relatively small. One of the earlier works is the response surface methodology that is based on the idea of fitting a polynomial function to the data set locally; see, for example, Myers and Montgomery [52]. When the Yi’s are possibly correlated, the kriging-based methods serve as good alternatives; see, for example, Ankenman et al. [3]. The kriging-based methods, in general, do not assume that the regression function f* is m times differentiable, where m2. Hence, they do not produce any estimator of the derivatives of f*. Rather, the kriging estimator of f*(x) at any x[a,b]d is expressed as the weighted sum of the Yi’s. The weights are dependent on x0 and calculated in a way in which the mean squared deviations are minimized. In Chen et al. [9], under the assumption that the regression function is differentiable, the gradient estimates are observed along with the Yi’s, and they are used to obtain an estimator of f* that shows superior performance to the stochastic kriging estimator (see Ankenman et al. [3]). In the radial basis function estimators, the regression function is approximated by a linear combination of radially symmetric functions, where the coefficients and other parameters are estimated from the data set; see, for example, Franke [21]. The orthogonal series estimators or wavelet estimators represent the regression function by its Fourier series and estimate the coefficients from the data set; see, for example, Donoho and Johnstone [18]. Nonparametric regression methods, such as the kernel regression method (Nadaraya [53], Watson [73]), the local polynomial regression estimators (Cleveland [12]), the smoothing splines (Reinsch [55]), the weighted least squares regression estimators (Ruppert and Wand [58], Salemi et al. [59]), and the nearest neighbor estimators (Shapiro [62], Stone [64]), have received considerable attention in the statistics literature; see, for example, Eubank [20], Green and Silverman [23], Härdle [29], and Wand and Jones [72] for comprehensive surveys. Most of these estimators invariably require the estimation of smoothing parameters that cannot be estimated directly from the data set. In this regard, Problem (B) has the advantage of estimating only one parameter, Sn, which has a direct meaning and can be estimated from the sample variances computed from the data set.

1.2. Organization of This Paper

This paper is organized as follows. Section 2 introduces some definitions and preliminaries. We precisely describe the relationship between Problems (A) and (B) in Section 3, whereas Section 4 is concerned with the convex programming formulations of Problems (A) and (B). Section 5 states the main results regarding consistency and convergence rates, and the numerical results can be found in Section 6. Section 7 provides discussions on future research topics. The proofs of all theorems, corollaries, lemmas, and propositions are provided in the Appendix.

2. Definitions and Preliminaries

For x=(x1,,xd)Rd, its norm is given by x={j=1nxj2}1/2. For a sequence of random variables (Zn:n1) and a sequence of positive real numbers (αn:n1), we say Zn=Op(αn) as n if for any ϵ>0, there exist constants C and N such that P(|Zn/αn|>C)<ϵ for all nN.

When 2m>d, we define Fm by the space of generalized functions whose weak partial derivatives of order m are square integrable, that is,

Fm={fD(Rd):Rd{Dαf(x)}2dx< for all α such that |α|=m},
where D(Rd) is the space of Schwartz distributions. A test function on Rd is a real-valued, infinitely differentiable function with compact support. Let D(Rd) be the set of all test functions on Rd. A functional on D(Rd) is a mapping that assigns to each ϕD(Rd) a real number. A (Schwartz) distribution on Rd is a continuous linear functional on D(Rd). See Adams and Fournier [2, p. 20] or Oden and Reddy [54, p. 14] for detailed definitions. Let (·,·)m denote the semi-inner product on Fm, defined by
(f,g)m|α|=mRd{Dαf(x)}{Dαg(x)}dx
for f,gFm. The associated seminorm ·m is defined by fm(f,f)m1/2. It should be noted that the kernel of this seminorm is the vector space Pm1 of all polynomials of total degree less than or equal to m1 defined on Rd, the dimension of Pm1 is
M(m+d1d),(3)
and Pm1 is spanned by the M monomials of the total degree less than or equal to m1. We will denote the M monomials of total degree less than or equal to m1 by p1,,pM.

It should be also noted that ·m is a seminorm on Fm, but a norm on the equivalence classes in Fm/Pm1, and Fm/Pm1 is a Hilbert space under (·,·)m; see Meinguet [51, theorem 1, p. 130] for details.

For any open-bounded subset ΩRd, we define L2(Ω) by the set of functions f defined on Ω satisfying Ω{f(x)}2dx<. Its L2(Ω) norm is defined as follows:

fL2(Ω)=(Ω{f(x)}2dx)1/2
for fL2(Ω). We also define Wm(Ω) by the set of functions whose weak partial derivatives up to order m are in L2(Ω), that is,
Wm(Ω)={fL2(Ω):DαfL2(Ω) for all α such that |α|m}.

Its Wm(Ω) norm is defined as follows:

fWm(Ω)=(Ω|α|m{Dαf(x)}2dx)1/2
for fWm(Ω). Wm(Ω) is referred to as a Sobolev space of order m on Ω.

3. Relationships Between Problems (A) and (B)

In this section, we study the relationship between Problems (A) and (B). When d=1, Schoenberg [60] established some results similar to Lemma 1, Proposition 3, Proposition 4, and Proposition 5 of this paper. Thus, some of our results in this section can be viewed as extensions of those in Schoenberg [60] to the multidimensional case. Throughout this section, we assume that (X1,Y1),,(Xn,Yn) are given and that the following assumption holds.

Assumption 1.

m is a positive integer satisfying 2m>d, and {X1,,Xn} is a set of mutually distinct points containing a Pm1-unisolvent set. A set of points {x1,,xL}Rd is called Pm1-unisolvent if for any pPm1, the condition p(xj)=0 for all 1jL implies p(x)=0 for all xRd. In particular, when d=1, {x1,,xL} is Pm1-unisolvent if Lm and x1,,xL are mutually distinct.

Propositions 1 and 2 establish the existence of the solutions to Problems (A) and (B) under Assumption 1.

Proposition 1.

Let Un0 be given. Under Assumption 1, there exists a solution to Problem (A). Furthermore, if f^n and f˜n are two solutions to Problem (A), then f^n(Xi)=f˜n(Xi) for 1in.

Proposition 2.

Let Sn0 be given. Under Assumption 1, there exists a solution to Problem (B). Furthermore, if g^n and g˜n are two solutions to Problem (B), then they differ by a polynomial of total degree less than or equal to m1 (i.e., g^ng˜nPm1).

Our next propositions, Propositions 3, 4, and 5, are concerned with the uniqueness of the solutions to Problems (A) and (B). For our analysis, we need to define the following function: Ψn:[0,)R. For any nonnegative real number u, let Ψn(u)=En(fu), where fu is a solution to Problem (A) with Un=u. The following lemma characterizes the shape of Ψn.

Lemma 1.

There exists a minimizer, say fP, of En(f) over fPm1. Under Assumption 1, there exists a unique solution, say fI, to the following problem:

MinimizefFmJ(f) subject tof(Xi)=Yi1in.

Under Assumption 1, Ψn is nonnegative, convex, and strictly decreasing over [0,J(fI)] with Ψn(0)=En(fP) and Ψn(x)=0 for all xJ(fI).

The following propositions discuss the uniqueness of the solutions to Problems (A) and (B) and the relationship between them.

Proposition 3.

Assume Assumption 1. Let 0Un<J(fI) be given. Then, there exists a unique solution f^n to Problem (A), and f^n satisfies J(f^n)=Un. In other words, if f^n and f˜n are two solutions to Problem (A), then f^n(x)=f˜n(x) for all xRd. Furthermore, f^n is also a solution to Problem (B) with Sn=En(f^n).

Proposition 4.

Assume Assumption 1. Let 0<SnEn(fP) be given. Then, there exists a unique solution g^n to Problem (B), and g^n satisfies En(g^n)=Sn. In other words, if g^n and g˜n are two solutions to Problem (B), then g^n(x)=g˜n(x) for all xRd. Furthermore, g^n is also a solution to Problem (A) with Un=J(g^n).

Combining Propositions 3 and 4 yields the following proposition.

Proposition 5.

Assume Assumption 1.

  1. Let 0UnJ(fI) be given. Then, there exists a unique solution f^n to Problem (A), and f^n satisfies J(f^n)=Un. Furthermore, f^n is also a unique solution to Problem (B) with Sn=En(f^n).

  2. Let 0<SnEn(fP) be given. Then, there exists a unique solution g^n to Problem (B), and g^n satisfies En(g^n)=Sn. Furthermore, g^n is also a unique solution to Problem (A) with Un=J(g^n).

The proofs of Lemma 1 and Propositions 15 are provided in Appendix A.

4. Convex Programming Formulations for Problems (A) and (B)

In this section, we discuss the question of how to compute the solutions to Problems (A) and (B) numerically. The following propositions, Propositions 6 and 7, reveal that we can find the solutions to Problems (A) and (B) by solving convex programs. Their proofs are provided in Appendix A.

We start with some definitions. Let s1,,sM be any Pm1-unisolvent set of M fixed points in Rd. (M is given in (3).) Let q1,,qM be the unique polynomials of total degree less than m satisfying qi(sj)=1 if i=j and qi(sj)=0 if ij, that is,

qi(x)=[p1(x)pM(x)][p1(s1)pM(s1)p1(sM)pM(sM)]1·ei
for xRd and 1iM, where ei is the M×1 column vector whose ith entry is one and zero elsewhere. Let R:Rd×RdR be defined by
R(s,t)=(1)m[Km(st)i=1Mqi(t)Km(ssi)j=1Mqj(s)Km(sjt)+i=1Mj=1Mqi(s)qj(t)Km(sisj)]
for s,tRd, where Km:RdR is given by
Km(z)={θm,dz2mdlnz,if 2md is evenθm,dz2md,otherwise
for zRd,
θm,d={(1)d/2+122m1πd/2(m1)!(md/2)!,if 2md is even(1)mΓ(d/2m)22mπd/2(m1)!,otherwise
and Γ(·) denotes the gamma function.

Proposition 6.

Let Un0 be given. Under Assumption 1, a solution to Problem (A) exists by Proposition 1. Furthermore, for any solution f˜n to Problem (A), there exists a solution f^n to Problem (A), satisfying f˜n(Xi)=f^n(Xi) for 1in, which can be represented by

f^n(x)=i=1Mc^ipi(x)+i=1nd^iR(x,Xi)(4)
for xRd, where (c^1,,c^M, d^1,,d^n, f^n(X1),,f^n(Xn)) is a solution to the following convex program in the decision variables c1,,cM, d1,,dn, y1, ,ynR.
Minimize1ni=1n(Yiyi)2subjecttoi=1nj=1nR(Xi,Xj)didjUni=1Mcipi(Xj)+i=1ndiR(Xj,Xi)=yj1jn.(5)

Conversely, any function of form (4), where (c^1,,c^M, d^1,,d^n, f^n(X1),,f^n(Xn)) is any solution to (6), is a solution to Problem (A).

Proposition 7.

Let Sn0 be given. Under Assumption 1, a solution to Problem (B) exists by Proposition 2. Furthermore, g^n is a solution to Problem (B) if and only if g^n is represented by

g^n(x)=i=1Mc^ipi(x)+i=1nd^iR(x,Xi)(6)
for xRd, where (c^1,,c^M, d^1,,d^n, g^n(X1),,g^n(Xn)) is a solution to the following convex program in the decision variables c1,,cM, d1,,dn, y1, ,ynR.
Minimizei=1nj=1nR(Xi,Xj)didjsubjectto1ni=1n(Yiyi)2Sni=1Mcipi(Xj)+i=1ndiR(Xj,Xi)=yj1jn.

Remark 1.

It should be noted that when d=1, the representation in (4) coincides with the form of a natural spline; see Greville [25, equation 4.1 on p. 3] for the details.

Remark 2.

Proposition 6 reveals that for any solution f˜n to Problem (A), there exists a solution f^n to Problem (A) satisfying f˜n(Xi)=f^n(Xi) for 1in, which is a linear combination of the pi’s and R(·,Xi)’s. In addition, Proposition 7 implies that a solution to Problem (B) is always a linear combination of the pi’s and R(·,Xi)’s.

5. Consistency and Rates of Convergence

In this section, we focus on the consistency and convergence rates of the solutions to Problems (A) and (B). Toward this goal, we need the following assumptions.

Assumption 2.

  1. 2m>d.

  2. X,X1,X2, are iid [a,b]d-valued random vectors having a common positive continuous density function τ:[a,b]dR. This implies that there exist τ1,τ2>0 such that τ1τ(x)τ2 for all x[a,b]d.

Assumption 3.

  1. f*Fm.

  2. (X,Y),(X1,Y1),(X2,Y2), is a sequence of iid [a,b]d×R-valued random vectors satisfying Y=f*(X)+ε and Yi=f*(Xi)+εi for i=1,2, .

  3. E(ε|X)=E(εi|Xi)=0 and E(ε2|X)=E(εi2|Xi)=σ2< for i=1,2, .

Assumption 4.

ε,ε1,ε2, are uniformly sub-Gaussian random variables (i.e., there exist positive real numbers A and B such that E(exp(tε))A exp(Bt2) for any tR).

Assumption 5.

  1. E(f*(X)2)<.

  2. f*, restricted to [a,b]d, is not a polynomial of degree less than or equal to m1.

  3. f* is continuous on [a,b]d.

We need Assumption 2(i) for the following reason. When 2m>d is not assumed, a function in Fm is not well defined on a set of measure 0 because the functions in Fm are determined only outside a set of measure 0. Under the assumption 2m>d, it is known that the functions in Fm are continuous, so the point evaluation is well defined; see Meinguet [51, theorem 1 on p. 130].

The next lemma shows that Assumption 2 guarantees a very important property of the Xi’s, the so-called Pm1-unisolvency. The proof of Lemma 2 is provided in Appendix A.

Lemma 2.

Under Assumption 2, {X1,,Xn} is a set of mutually distinct points containing a Pm1-unisolvent set for n sufficiently large a.s. In other words, Assumption 2 implies Assumption 1 for n sufficiently large a.s.

Lemma 2 ensures that the solutions to Problems (A) and (B) exist for n sufficiently large a.s. under Assumption 2.

We are now ready to present our main results. Theorems 1 and 2 reveal that f^n and g^n are consistent estimators of f* and that Dαf^ and Dαg^n are consistent estimators of Dαf* for |α|=1,,m1. Theorem 3 computes a lower bound on the rate of convergence of f^n. We discuss the rate of convergence of g^n in Remark 4.

Theorems 1 and 2 use the following assumptions on f^n and g^n.

Assumption 6.

There exists a constant cA such that lim supnJ(f^n)cA for all n sufficiently large a.s.

Assumption 7.

There exists a constant cB such that lim supnJ(g^n)cB for all n sufficiently large a.s.

Assumption 8.

1ni=1n(f^n(Xi)f*(Xi))22ni=1nεi(f^n(Xi)f*(Xi))+An
for a sequence of random variables (An:n1) satisfying lim supnAn0 a.s.

Assumption 9.

1ni=1n(g^n(Xi)f*(Xi))22ni=1nεi(g^n(Xi)f*(Xi))+Bn
for a sequence of random variables (Bn:n1) satisfying lim supnBn0 a.s.

Although Assumptions 69 are somewhat hard to verify, we can establish them by imposing some conditions on Un and Sn. For example, Assumptions 6, 8, and 9 are implied by Assumptions 10, 11, and 12, respectively.

Assumption 10.

lim supnUn<.

Assumption 11.

J(f*)Un for all n sufficiently large a.s.

Assumption 12.

lim supnSnσ2.

We are now ready to present Theorem 1, Theorem 2, Corollary 1, and Corollary 2.

Theorem 1.

Under Assumptions 2, 3, 6, and 8,

supx[a,b]d|f^n(x)f*(x)|0a.s.,(7)
E(f^n(X)f*(X))20 and E(Dαf^n(X)Dαf*(X))20(8)
for |α|=1,,m1 as n.

Theorem 2.

Under Assumptions 2, 3, 7, and 9,

supx[a,b]d|g^n(x)f*(x)|0a.s.,(9)
E(g^n(X)f*(X))20 and E(Dαg^n(X)Dαf*(X))20(10)
for |α|=1,,m1 as n.

Corollary 1.

Under Assumptions 2, 3, 10, and 11, (7) and (8) hold.

Corollary 2.

Under Assumptions 2, 3, 7, and 12, (9) and (10) hold.

We now turn to the convergence rate of f^n. We need Assumption 13, which is implied by Assumption 11.

Assumption 13.

1ni=1n(f^n(Xi)f*(Xi))22ni=1nεi(f^n(Xi)f*(Xi))
for all n sufficiently large a.s.

Under Assumption 13, the convergence rate of f^n is established in Theorem 3.

Theorem 3.

Under Assumptions 2, 3, 4, 6, and 13,

{1ni=1n(f^n(Xi)f*(Xi))2}1/2=Op(nm/(2m+d))(11)
as n.

Corollary 3.

Under Assumptions 2, 3, 4, 10, and 11, (11) holds.

Remark 3.

It should be noted that the rate m/(2m+d) established in Theorem 3 is identical to the optimal rate of convergence for the estimation of a function of smoothness of order m, which was established by Stone [65]. However, as Corollary 3 states, this rate is achieved under Assumptions 10 and 11, which may be hard to verify if sufficient information on J(f*) is not available. One should also note that there are many estimators that achieve this optimal rate. See Kohler et al. [38] for a kernel estimator; Cox [14] for a smoothing spline estimator; Györfi et al. [27, theorems 4.3, 5.2., and 6.2] for partitioning, kernel, and nearest neighbor estimators, respectively; and Kohler et al. [37] for partitioning and nearest neighbor estimators. Furthermore, Stone [65] established that the optimal rate of convergence for the estimation of the rth derivative of a function of smoothness of order m is (mr)/(2m+d) for r=0,1,,m1, and this rate is achieved by the smoothing spline estimator proposed by Cox [14].

We next discuss a situation where the solution to Problem (B) exists uniquely.

Theorem 4.

Consider the problem:

MinimizeE(c1p1(X)++cMpM(X)f*(X))2(12)
over c1,,cMR. Under Assumption 5, (12) has a solution, and the minimizing value ν* of (12) is positive. Furthermore, if lim supnSn<σ2+ν* a.s., under Assumption 2, Assumption 3(ii), Assumption 3(iii), and Assumption 5, the solution to Problem (B) exists uniquely for n sufficiently large a.s.

The proofs of Theorems 14 and Corollaries 13 are provided in Appendix B.

Remark 4.

The same convergence rate as in Theorem 3 can be obtained for g^n under the assumption

1ni=1n(g^n(Xi)f*(Xi))22ni=1nεi(g^n(Xi)f*(Xi))(13)
for all n sufficiently large a.s. (Arguments similar to those in the proof of Theorem 3 can be applied.) However, we were not able to find a suitable condition on Sn under which (13) holds.

Remark 5.

In the case where the error ε1 is dependent on X1, our results in Theorems 1, 2, 3, and 4 hold with light modifications in the proofs.

Remark 6.

In many practical situations, the variance of the Yi’s is heterogeneous (i.e., Var(ε1) is dependent on X1). A natural candidate for Sn in such a case is SnE(Var(ε1|X1))=E(Var(Y1|X1))=[a,b]dVar(Y1|x)τ(x)dx, where τ(·) is the density function of X1. When the Xi’s are uniformly distributed, Sn can be estimated by computing the average of Var(Yi|Xi) across i{1,,n}.

6. Numerical Results

In this section, we observe the numerical behavior of the solutions to Problems (A), (B), and (C). To describe the detailed procedure, we start by noticing that, in practice, we can often observe Y=f*(X)+ε multiple times at a fixed design point X[a,b]d. Thus, we assume that we can collect iid observations ((Xi,Yij):1in,1jr), where Yij=f*(Xi)+εij for 1in and 1jr and the εij’s, conditional on Xi, are iid copies of ε conditional on X. We then compute the average of the Yij’s across multiple replications (i.e., Y¯i=j=1rYij/r for each i{1,,n}). Next, the Y¯i’s will be used in place of the Yi’s in Problems (A), (B), and (C).

Next, we provide more detailed descriptions on how the solutions to Problems (A)–(C) are obtained.

6.1. Problem (B)

In order to estimate Sn, we observe that Remark 6 suggests using the average of the Var(Yi|Xi)’s as an estimate of Sn when the Xi’s are uniformly distributed. Because we use the Y¯i’s in place of the Yi’s, we use the average of Var(Y¯i|Xi)=(1/r)Var(Yi|Xi) across i{1,,n} as an estimate of Sn. Therefore, our estimate of Sn is

1nri=1nS^i,(14)
where S^i=j=1r(YijY¯i)2/(r1).

We then solve the convex programming problem in Proposition 7 with the Yi’s replaced by the Y¯i’s and Sn replaced by (14) using CVX, a package for solving convex programs developed by Grant and Boyd [22]. The solution obtained this way is denoted by g^n, and its objective value J(g^n) is denoted by U^n. U^n is used as an estimate of Un when solving Problem (A).

6.2. Problem (A)

We next solve the convex programming problem in Proposition 6 by using U^n=J(g^n) as an estimate of Un and the Y¯i’s in place of the Yi’s. CVX is used to solve the convex programming formulation.

6.3. Problem (C)

The key issue here is how to determine the smoothing parameter λ. We use cross validation to determine the smoothing parameter. We also use a fixed sequence of real numbers (λn:n1) decreasing to zero as n. The smoothing parameter λ chosen by cross validation depends on the observed data set ((Xi,Yij):1in,1jr), whereas λn does not depend on the observed data set and depends on n only.

To define the cross validation function, let fλ[k] be the minimizer of

1ni=1,ikn(Y¯if(Xi))2+λJ(f)
over fFm. Then, the cross validation function V(λ) is defined by
V(λ)=1nk=1n(Y¯kfλ[k](Xk))2(15)
for λ0, and the estimator of the smoothing parameter based on cross validation is the minimizer of V(λ) over λ0. We call this estimator λ^CV. The solution to Problem (C) based on cross validation is computed by solving Problem (C) with λ replaced by λ^CV and the Yi’s replaced by the Y¯i’s; see Wahba [71, equations (2.4.23) and (2.4.24) on p. 33] for the details on how Problem (C) is solved using systems of linear equations.

We also solve Problem (C) with the Yi’s replaced by the Y¯i’s and λ replaced by λn for n1. The specific choice of (λn:n1) is given in Sections 6.5 and 6.6.

6.4. How to Select m

One needs a priori knowledge on m to determine the right value of m. It should be noted that the smoothness of a function g:RR is measured by {g(2)(x)}2dx. Thus, if f* itself is believed to be smooth, then we set m=2. If f* is believed to have smooth partial derivatives of order 2, then we set m=4.

6.5. M/M/1 Queue

We consider the case where f*(x) is the steady-state mean waiting time of a customer in a single-server queue under the first come, first served discipline with infinite capacity buffer, exponential service times, exponential interarrival times, unit arrival rate, and a service rate of x[1.5,2.0]. f*(x) is known to be 1/x(x1); see Hillier and Lieberman [31, p. 781]. We chose an M/M/1 queue because there exists a closed-form formula for the steady-state waiting time of a customer, so we can compare our estimators with the true values. We set Xi=1.5+i/(2n)1/(4n) for 1in. For each i{1,,n} and j{1,,r} with r=100, we generated Yij by averaging the waiting times of the first 1,000 customers arriving at the queue when the queue is initialized empty and idle. We next computed the solution g^n to Problem (B) and the solution f^n to Problem (A) with the Yi’s replaced by the Y¯i’s and m=4. Sn is estimated from (14), and Un is estimated from J(g^n). Both Problems (A) and (B) are solved using CVX.

For Problem (C), we computed λ^CV by minimizing V(λ) in (15) over λ{1011,5·1011,1010,5·1010,,1}. In addition to λ^CV, we used three sequences of real numbers, λ(1)=(106/n:n1), λ(2)=(107/n:n1), and λ(3)=(108/n:n1), for λ. To select λ(2), we tried several sequences of real numbers and selected the one generating the lowest empirical integrated mean square error (EIMSE) values for f* and f*(2). The rate of decay for (λn:n1) was chosen to be of order 1/n based on the suggestion made by Cox [13].

To measure the accuracy of g^n, the solution to Problem (B), we computed the EIMSE of g^n as follows:

1ni=1n(g^n(Xi)f*(Xi))2.

We also computed the EIMSE of g^n(2) as follows:

1ni=1n(g^n(2)(Xi)f*(2)(Xi))2,
where g^n(2) and f*(2) denote the second derivatives of g^n and f*, respectively. The EIMSE values of the solutions to Problems (A) and (C) and those of their second derivatives are computed similarly.

Table 1 reports the 95% confidence intervals of the EIMSE values of the estimators of f* computed from Problems (A)–(C) based on 400 iid replications for each n value. The columns labeled as (A) and (B) report the results obtained from Problems (A) and (B), respectively. The columns labeled as CV, λ(1), λ(2), and λ(3) report the results obtained from Problem (C) when λ is set as λ^CV, λ(1), λ(2), and λ(3), respectively. The values in Table 1 are measured in 104. Table 2 reports the 95% confidence intervals of the EIMSE values of the estimators of f*(2) computed from Problems (A)–(C) based on 400 iid replications for each n value.

Table

Table 1. The 95% confidence intervals of the EIMSE values for the estimators of f* computed from Problems (A)–(C) when f* is the steady-state mean waiting time of a customer in an M/M/1 queue. All values are measured in 104.

Table 1. The 95% confidence intervals of the EIMSE values for the estimators of f* computed from Problems (A)–(C) when f* is the steady-state mean waiting time of a customer in an M/M/1 queue. All values are measured in 104.

n(A)(B)(C)
CVλ(1)λ(2)λ(3)
151.21±0.091.14±0.091.24±0.101.13±0.091.13±0.091.16±0.09
250.96±0.070.87±0.060.93±0.060.86±0.060.86±0.060.86±0.06
350.75±0.050.67±0.050.70±0.050.65±0.040.65±0.050.67±0.05
Table

Table 2. The 95% confidence intervals of the EIMSE values for the estimators of f*(2) computed from Problems (A)–(C) when f* is the steady-state mean waiting time of a customer in an M/M/1 queue.

Table 2. The 95% confidence intervals of the EIMSE values for the estimators of f*(2) computed from Problems (A)–(C) when f* is the steady-state mean waiting time of a customer in an M/M/1 queue.

n(A)(B)(C)
CVλ(1)λ(2)λ(3)
157.52±0.426.06±0.4319.63±3.705.34±0.425.32±0.447.07±0.74
256.79±0.365.07±0.3314.73±2.854.35±0.294.21±0.335.84±0.67
355.87±0.294.30±0.249.38±1.773.53±0.203.28±0.244.45±0.50

6.6. Stock Trader’s Problem

We consider the stock trader’s problem that motivated this paper. Consider a trader who trades stock options on a daily basis. Each day, his decision is either buy or sell a stock option, and he bases his decision on gamma, which is the second derivative of the value of the stock option as a function of the underlying stock price. The value of the stock option, f*, typically has no closed-form formula, so simulation is required to estimate the value of the stock option and its second derivative with respect to the underlying stock price. Because simulation takes a long time to conduct, the estimation of f* and f*(2) is done a day before any decision is made. Because f* and f*(2) depend on the underlying stock price and because we cannot precisely predict the underlying stock price for the next day, f* and f*(2) are typically estimated over a range of possible values of the underlying stock price. Hence, the trader’s goal is to estimate gamma, f*(2), over a range of possible values of the underlying stock price. Furthermore, in his estimation, he does not want to use any parameter that is not estimated from the data set.

To place this problem in a more specific setting, we consider the case where f*(x) is the price of a European call option on a nondividend-paying stock when the underlying stock price is x[0,2]. We chose a European call option because there exists a closed-form formula for its price, so we can compare our estimates with the true values. Specifically, we consider the case where the strike price is K=1.3, the risk-free annual interest rate is s=0.03, the stock price volatility is 0.3, the underlying stock has a drift of 0.03, and the time to maturity of the option is T=1 year. It is known that the price of this option is given by

f*(x)=xN(ln(x/K)+(s+σ2/2)TσT)KesTN(ln(x/K)+(sσ2/2)TσT)
for x[0,2], where N(·) is the cumulative probability distribution function of a standard normal random variable; see, for example, Hull [32, p. 295].

To compute our estimators, we set Xi=2i/n1/n for 1in. For each Xi, we assumed that the current stock price is Xi and simulated a sample path (St:0tT) of a geometric Brownian motion up to time T with a drift of 0.03 and a volatility of 0.3. We then computed Yij, the price of the option by computing exp(sT)max(0,STK), where ST is the value of the geometric Brownian motion we generated at time T. This procedure was repeated r=5,000 times, generating ((Xi,Yij):1in,1jr). We next computed the solutions to Problems (A) and (B) with the Yi’s replaced by the Y¯i’s and m=4. Sn is estimated from (14), and Un is estimated from J(g^n). Both Problems (A) and (B) are solved with CVX.

For Problem (C), we computed λ^CV by minimizing V(λ) in (15) over λ{1011,5·1011,1010,5·1010,,1}. In addition to λ^CV, we used three sequences of real numbers, λ(4)=(105/n:n1), λ(5)=(106/n:n1), and λ(6)=(107/n:n1), for λ. To select λ(5), we tried several sequences of real numbers and selected the one generating the lowest EIMSE values for f* and f*(2). The rate of decay for (λn:n1) was chosen to be of order 1/n based on the suggestion made by Cox [13].

Table 3 reports the 95% confidence intervals of the EIMSE values of the estimators of f* computed from Problems (A)–(C) based on 400 iid replications for each n value. The columns labeled as (A) and (B) report the results obtained from Problems (A) and (B), respectively. The columns labeled as CV, λ(4), λ(5), and λ(6) report the results obtained from Problem (C) when λ is set as λ^CV, λ(4), λ(5), and λ(6), respectively. The values in Table 3 are measured in 105.

Table

Table 3. The 95% confidence intervals of the EIMSE values for the estimators of f* computed from Problems (A)–(C) when f* is the price of a European call option. All values are measured in 105.

Table 3. The 95% confidence intervals of the EIMSE values for the estimators of f* computed from Problems (A)–(C) when f* is the price of a European call option. All values are measured in 105.

n(A)(B)(C)
CVλ(4)λ(5)λ(6)
155.24±0.151.72±0.111.22±0.101.23±0.080.94±0.081.02±0.08
251.14±0.071.14±0.070.80±0.060.82±0.040.60±0.050.66±0.05
350.88±0.050.88±0.050.54±0.040.60±0.030.41±0.030.46±0.03

Table 4 reports the 95% confidence intervals of the EIMSE values of the estimators of f*(2) computed from Problems (A)–(C) based on 400 iid replications for each n value. The values in Table 3 are measured in 101.

Table

Table 4. The 95% confidence intervals of the EIMSE values for the estimators of f*(2) computed from Problems (A)–(C) when f* is the price of a European call option. All values are measured in 101.

Table 4. The 95% confidence intervals of the EIMSE values for the estimators of f*(2) computed from Problems (A)–(C) when f* is the price of a European call option. All values are measured in 101.

n(A)(B)(C)
CVλ(4)λ(5)λ(6)
150.96±0.030.57±0.051.72±0.320.46±0.010.36±0.040.90±0.12
250.44±0.030.45±0.032.80±0.580.35±0.010.25±0.030.68±0.09
350.36±0.010.36±0.012.89±0.660.28±0.010.18±0.020.53±0.06

6.7. Observations from Numerical Experiments

When we compare Problems (A) and (B) with Problem (C) with the smoothing parameter estimated from cross validation, Problem (B) shows superior performance when estimating the second derivative of f*. Problem (C) generates extreme estimates in a few percents of time, thereby degrading the overall performance. This phenomenon has been observed by other researchers; see the discussion in Wahba [71, p. 65].

When we compare Problems (A) and (B) with Problem (C) with a fixed smoothing parameter λn for n1, Problem (C) shows superior performance. However, it should be noted that to select λn, we tried several sequences of real numbers and selected the one generating the lowest EIMSE values for f* and f*(2). This was possible because we already knew what f* and f*(2) were in all of our numerical examples. In reality, one does not have a priori knowledge of the exact values of f*(Xi) and f*(2)(Xi) for 1in, so finding (λn:n1) this way is not feasible. However, in our numerical experiments, the results from Problem (C) with a fixed smoothing parameter λn give us a sense of how well Problems (A) and (B) perform compared with the best possible performance of Problem (C). Tables 14 report that the performance of Problems (A) and (B) is as good as that of Problem (C) with the best possible choice of (λn:n1).

6.8. When Multiple Observations Cannot Be Made at a Fixed Design Point in [a,b]d

Throughout Section 6, we assumed that multiple observations can be made at any fixed design point x in [a,b]d. These multiple observations were used to empirically estimate Sn. In many applications, multiple observations at a point x[a,b]d may not be available. In such cases, one needs to use other ways to empirically estimate Sn. One possible approach is partitioning [a,b]d into a finite number of hyperrectangles, computing the sample variance of the Yi’s whose Xi’s fall within each hyperrectangle, and computing the average of these sample variances (weighted by the density function of X evaluated at each hyperrectangle) as an estimate of Sn. This approach is based on the observation that Sn should ideally be E(Var(Y|X)).

To illustrate this approach numerically, we consider the following example: f*:[0,1]R is given by f*(x)=(x1/4)2 for x[0,1], m=4, Xi=i/n1/(2n) for 1in, and Yi=f*(Xi)+εi for 1in, where the εi’s follow a uniform distribution over [0.25,0.25]. We assumed that only one observation Yi=f*(Xi)+εi is made for 1in. To estimate Sn, we partitioned [0, 1] into the five intervals I1=[0,0.2],I2=[0.2,0.4],I3=[0.4,0.6],I4=[0.6,0.8], and I5=[0.8,1.0] and computed the sample variances, say S1,S2,S3,S4, and S5, in which Sk is the sample variance of the Yi’s whose Xi’s fall in Ik for 1k5. We set Sn equal to the average of S1,S2,S3,S4, and S5. We then solved Problem (B) to obtain g^n and g^n(2). This procedure was repeated 1,600 times independently. Using these 1,600 replications, we computed the 95% confidence intervals of the EIMSE values of g^n and g^n(2), respectively. Table 5 reports the 95% confidence intervals for a variety of n values. The EIMSE values in Table 5 decrease as n increases, indicating that the choice of Sn is appropriate.

Table

Table 5. The 95% confidence intervals of the EIMSE values for g^n and g^n(2) when Sn was set as the average of sample variances across five intervals.

Table 5. The 95% confidence intervals of the EIMSE values for g^n and g^n(2) when Sn was set as the average of sample variances across five intervals.

nEIMSE for g^nEIMSE for g^n(2)
300.0028±0.00016.68±0.42
400.0021±0.00015.36±0.37
500.0016±0.00014.03±0.28

7. Conclusions

We conclude this paper with some discussions on future research topics.

7.1. How to Choose Sn When Multiple Observations Are Not Available

In Section 6.8, we discussed a way to empirically estimate Sn when multiple observations are not available at a design point in [a,b]d. In this approach, we divide [a,b]d into smaller hyperrectangles, compute the sample variance of the Yi’s whose Xi’s fall within each hyperrectangle, and compute the average of these sample variances (weighted by the density function of X evaluated at each hyperrectangle) as an estimate of Sn. We conjecture that this approach will generate an asymptotically consistent estimator if the volume of the hyperrectangles decreases to zero and the number of Xi’s within each hyperrectangle increases to infinity as n. A more rigorous study on this conjecture is a good future research topic.

7.2. Combining the Smoothness Condition with Shape Constraints

Because our proposed formulations, Problems (A) and (B), impose only the smoothness condition on the underlying function f*, the proposed estimators do not necessarily produce a function that satisfies additional shape conditions on f*, such as convexity or monotonicity. For example, even if f* is convex, our estimators are not necessarily convex. Figure 1 shows the solution to Problem (B) when f*:[0,1]R is given by f*(x)=(x1/4)2 for x[0,1], m=4, n=10, Xi=i/n1/(2n) for 1in, and Yi=f*(Xi)+εi for 1in, where the εi’s follow a uniform distribution over [0.5,0.5] and Sn=1/12. Even if f* is convex, g^n is not necessarily convex.

Figure 1. The solid line is f*, the dots are the Yi’s, and the dashed line is the solution to Problem (B).

One possible fix to this phenomenon is incorporating additional conditions to our formulation. A good topic for future research is combining shape constraints, such as convexity and monotonicity, with the smoothness condition in our proposed formulations.

7.3. Establishing Consistency Under a Condition That Is Easier to Verify

Corollary 2 is established under Assumption 7, which is hard to verify. We conjecture that the conclusions of Corollary 2 hold under a more natural condition that Snσ2 as n. This remains as an open question. Another open question is computing the convergence rate of g^n under suitable conditions on Sn.

7.4. Extending Our Formulations to Shape-Constrained Estimation Problems

Problem (A) can be seen as an estimation problem of a smooth function by minimizing the sum of squared errors under the smoothness condition J(f)Un. This type of formulation has been used for shape-constrained estimation problems, such as convex regression. For example, Mazumder et al. [49] proposed estimating an unknown convex function by minimizing the sum of squared errors over convex functions satisfying a Lipschitz condition. An interesting question for future research is how to extend Problem (B) to shape-constrained problems, such as convex and isotonic regression problems.

Appendix A. Proofs of Propositions 1–7 and Lemmas 1 and 2

To establish the existence of the solutions to Problems (A) and (B), we define the following functions φn:RnR and ϕn:RnR. For any z=(z1,,zn)Rn, consider the following problem:

Problem(I):MinimizefFmJ(f)  subject tof(Xi)=zi1in.

Under Assumption 1, the solution to Problem (I) exists uniquely; see Meinguet [50, equation 29 and theorem 3 on p. 299]. We will denote the solution to Problem (I) by fz. Define φn(z)=J(fz). On the other hand, ϕn:RnR is defined by ϕn(z)=(1/n)i=1n(Yizi)2 for z=(z1,,zn)Rn. It can be easily seen that φn is convex over Rn and that ϕn is strictly convex over Rn.

For any S0 and U0, we define VURn and TSRn by

VU={zRn:φn(z)U}andTS={zRn:ϕn(z)S}.

Both SU and TS are nonempty, closed, and convex subsets of Rn.

We recall that Ψn:[0,)R is defined as follows. For any nonnegative real number u, Ψn(u)=En(fu), where fu is a solution Problem (A) with Un=u.

Proof of Proposition 1.

Because VU is nonempty, closed, and convex and because ϕn is strictly convex over VU, there exists a unique minimizer z*=(z1*,,zn*) of ϕn over VU and fz* is a solution to Problem (A).

To prove the second part, let f^n and f˜n be two solutions to Problem (A). Then, (f^n(X1),, f^n(Xn)) and (f˜n(X1),,f˜n(Xn)) are two minimizers of ϕn over VU. By the uniqueness of the minimizer of ϕn over VU, it follows that f^n(Xi)=f˜n(Xi) for 1in. 

Proof of Proposition 2.

Because TS is nonempty, closed, and convex and because φn is convex, there exists a minimizer z*=(z1*,,zn*) of φn over TS and fz* is a solution to Problem (B).

To prove the second part, we note that for any fFm, J(f)=fm2, and ·m is a seminorm in Fm and a norm in Fm/Pm1. We will denote an equivalence class in Fm/Pm1 by [f] for fFm.

Let y* and z* be minimizers of φn over TS. Because

[f1/2y*+1/2z*]m1/2[fy*]+1/2[fz*]m(1/2)[fy*]m+(1/2)[fz*]m=[fy*]m
and [fy*]m[f1/2y*+1/2z*]m, we can conclude
1/2[fy*]+1/2[fz*]m=(1/2)[fy*]m+(1/2)[fz*]m,
which implies either [fy*]=0 or [fy*]=c[fz*] for some constant c. In the latter case, c must be equal to one because we have fy*m=fz*m. In either case, fy*fz*Pm1. 

Proof of Lemma 1.

The existence of fP follows from the fact that

1ni=1n(Yi(c1p1(Xi)++cMpM(Xi)))2(A.1)
is convex over (c1,,cM)RM.

The only nontrivial part is that Ψn is strictly decreasing over [0,J(fI)]. Let 0x<yJ(fI) be given. We need to show that Ψn(x)>Ψn(y). Suppose, on the contrary, that

Ψn(x)=Ψn(y).(A.2)

Let fx and fy be the solutions to Problem (A) with U=x and U=y, respectively. Note that (fx(X1),,fx(Xn))Vy and minimizes ϕn over Vy because of (A.2).

We will construct an n-dimensional ball around (fx(X1),,fx(Xn)) that is contained in Vy. If this is proven, it will follow that fx(Xi)=Yi for 1in, and hence, fx interpolates the Yi’s, so J(fI)J(fx). Together with the facts x<J(fI) and J(fx)x, we will reach a contradiction.

To construct such a ball, we notice that the Xi’s are distinct, so we can find δ0>0 such that XiXj>δ0 for ij. For any ϵ(0,ϵ0], we will show that (fx(X1)+ϵ,,fx(Xn)+ϵ) is in Vy. (ϵ0>0 will be specified later.) Define θϵ:RnR by

θϵ(z)=fx(z)+ϵ(χ1(z)++χn(z))
for zRn, where the χi’s are the “mollifiers” defined by
χi(z)={e·exp(δ0/(zXi2δ0)),if zXi<δ00,otherwise
for zRn and 1in. The χi’s have the following properties:
  1. χi(Xi)=1 and χ(Xj)=0 for all ji;

  2. the χi’s are infinitely differentiable; and

  3. J(χi)C for all 1in and some constant C.

It follows that θϵ(Xi)=fx(Xi)+ϵ for 1in, and

J(θϵ)=θϵm2(fxm+ϵχ1++χnm)2J(fx)+2ϵnC1/2fxm+ϵ2n2C,
so taking ϵ0 sufficiently small yields J(θϵ)J(fy)y. Thus, (fx(X1)+ϵ,,fx(Xn)+ϵ) is in Vy. 

Proof of Proposition 4.

Let 0<SnEn(fP) be given. By Proposition 2, there exists a solution g^n to Problem (B). Let U*=Ψn1(Sn) and h˜ be a solution to Problem (A) with Un=U*. We will first show that g^n is also a solution to Problem (A) with Un=U* and satisfies En(g^n)=Sn. By definition, h˜ minimizes En(f) over all functions f satisfying J(f)U* and satisfies

En(h˜)=Sn(A.3)
and
J(h˜)U*.(A.4)

By (A.3), h˜ is a feasible solution to Problem (B), so we should have

J(g^n)J(h˜).(A.5)

By (A.4) and (A.5), we have

J(g^n)U*,(A.6)
and hence, g^n is a feasible solution to Problem (A) with Un=U*. Thus,
En(h˜)En(g^n).(A.7)

Also, g^n is a feasible solution to Problem (B), so we should have

En(g^n)Sn.(A.8)

By (A.3), (A.7), and (A.8), we can conclude

En(g^n)=En(h˜)=Sn,
and hence, together with (A.6), g^n becomes a solution to Problem (A) with Un=U*.

Next, we turn to the uniqueness of the solution to Problem (B). Let g^n and g˜n be two solutions to Problem (B). By Proposition 1, we have

g^ng˜n=p˜n(A.9)
for some p˜nPm1. By the previous arguments, g^n and g˜n are also solutions to Problem (A) with Un=U*. By Proposition 1, we have
g^n(Xi)=g˜n(Xi)(A.10)
for 1in. From (A.9) and (A.10), we have p˜n(Xi)=0 for 1in, which implies p˜n(x)=0 for all xRd by the Pm1-unisolvency of the Xi’s. Hence, g^n(x)=g˜n(x) for all xRd. 

Proof of Proposition 3.

Let 0Un<J(fI) be given. By Proposition 1, there exists a solution f^n to Problem (A). Let S*=Ψn(Un) and h˜ be a solution to Problem (B) with Sn=S*. We will first show that f^n is also a solution to Problem (B) with Sn=S* and satisfies J(f^n)=Un. First, we will prove that

UnJ(h˜).(A.11)

To prove (A.11), we note that by Proposition 4, h˜ is a solution to

minimizefFmEn(f)  subject to J(f)Un
with En(h˜)=S*. If, on the contrary, J(h˜)<Un, then h˜ is also a solution to
minimizefFmEn(f)  subject to J(f)J(h˜)
with En(h˜)=S*, which contradicts the fact that Ψn is strictly decreasing over [0,J(fI)]. Thus, (A.11) is proven.

By Proposition 4, h˜ is a solution to Problem (A), so we have

En(f^n)=En(h˜)=S*.(A.12)

By (A.12), f^n is a feasible solution to Problem (B) with Sn=S*, so we should have

J(h˜)J(f^n).(A.13)

Because f^n is a feasible solution to Problem (A), we have

J(f^n)Un.(A.14)

By combining (A.11), (A.13), and (A.14), we have

J(h˜)=J(f^n)=Un.(A.15)

By (A.12) and (A.15), f^n is also a solution to Problem (B) with S=S* and satisfies J(f^n)=Un.

We next turn to the uniqueness of the solution to Problem (A). Let f^n and f˜n be two solutions to Problem (A). By the previous arguments, f^n and f˜n are also two solutions to Problem (B) with Sn=S*. By Proposition 4, the solution to Problem (B) with Sn=S* is unique, so f^n=f˜n. 

Proof of Proposition 5.

Combine Propositions 3 and 4. 

Proof of Proposition 6.

Let H0=span{p1,,pM} and H1 be the reproducing kernel Hilbert space with the R(s,t)’s as the reproducing kernels. Then, Fm is the direct sum of H0 and H1; see Meinguet [50, equation 13 on p. 295 and equation 20 on p. 296]. Any element f of Fm has the following representation:

f(x)=i=1Mcipi(x)+i=1ndiR(x,Xi)+ρ(x)
for xRd, where ρ is some element in Fm perpendicular to p1(·),,pM(·),R(·,X1),,R(·,Xn) because of the property of Hilbert spaces. Let h(·)=i=1ndiR(·,Xi)+ρ(·). By Meinguet [50, equation 14 on p. 295], for j{1,2,,n},
h(Xj)=(R(Xj,·),h(·))m=(R(Xj,·),i=1ndiR(·,Xi)+ρ(·))m=i=1ndiR(Xj,Xi)
because ρ is perpendicular to R(·,X1),,R(·,Xn) and R(Xj,·) is the representer of evaluation at Xj. Hence, Problem (A) can be expressed as MinimizefFm1ni=1n(Yi(j=1Mcjpj(Xi)+j=1ndjR(Xj,Xi)))2 subject to J(i=1ndiR(·,Xi))+J(ρ)Un. Hence, if f(x)=i=1Mcipi(x)+i=1ndiR(x,Xi)+ρ(x) is a solution to Problem (A), then g(x)=i=1Mcipi(x)+i=1ndiR(x,Xi) is also a solution to Problem (A). Also, for such g, note that
J(g)=J(i=1ndiR(·,Xi))=(i=1ndiR(·,Xi),i=1ndiR(·,Xi))m=i=1nj=1ndidj(R(·,Xi),R(·,Xj))m=i=1nj=1ndidjR(Xi,Xj).

Proof of Proposition 7.

Let H0=span{p1,,pM} and H1 be the reproducing kernel Hilbert space with the R(s,t)’s as the reproducing kernels. Then, Fm is the direct sum of H0 and H1; see Meinguet [50, equation 13 on p. 295 and equation 20 on p. 296]. Any element f of Fm has the following representation:

f(x)=i=1Mcipi(x)+i=1ndiR(x,Xi)+ρ(x)
for xRd, where ρ is some element in F perpendicular to p1(·),,pM(·),R(·,X1),,R(·,Xn) because of the property of Hilbert spaces. By arguments similar to those in the proof of Proposition 6, Problem (B) can be expressed as MinimizefFJ(i=1ndiR(·,Xi))+J(ρ) subject to 1ni=1n(Yi(j=1Mcjpj(Xi)+j=1ndjR(Xi,Xj)))2Sn. Hence, J(ρ)=0 and ρ=0. The rest of the proposition follows from arguments similar to those in the proof of Proposition 6. 

Proof of Lemma 2.

From the assumption that τ is continuous on [a,b]d, the Xi’s are mutually distinct a.s. By Chui [10, theorem 9.4 on p. 135] or Chung and Yao [11], there exists a Pm1-unisolvent set {x1*,,xr*} in [a,b]d with some fixed positive integer r.

Define a function H:Rr×dR by

H(x1,,xr)=det(Px1,,xrTPx1,,xr)
for (x1,,xr)Rr×d, where
Px1,,xr=(p1(x1)pM(x1)p1(xr)pM(xr)).

Then, H(x1,,xr)0 at (x1,,xr)=(x1*,,xr*) because Px1*,,xr*T, Px1*,,xr* is positive definite. Because H is continuous, there exists ϵ0>0 such that

(x1,,xr)(x1*,,xr*)ϵ0
implies H(x1,,xr)0. Note that H(x1,,xr)0 implies that {x1,,xr} is a Pm1-unisolvent. By Assumption 2, there exists a subset {X1*,,Xr*}{X1,,Xn} such that
Xi*xi*<ϵ0/r(A.16)
for 1ir for all n sufficiently large a.s. We note that (A.16) implies
(X1*,,Xr*)(x1*,,xr*)ϵ0,
and hence, {X1*,,Xr*} is a Pm1-unisolvent set. 

Appendix B. Proofs of Theorems 1–4 and Corollaries 1–3

We start by defining a set of functions that will play an important role in the proofs of Theorems 14. We define Am by

Am={fFm:[a,b]d{Dαf(x)}2dxcA+cB+1 for all α such that |α|=m,|f(x)|α0,|f(x)f(y)|α1xyδ for x,y[a,b]d}(B.1)
for some constants α0 and α1, where δ=1 when m2 and δ=1/2 when m=1. We endow Am with a metric d and a pseudometric dn defined as follows:
d(f1,f2)=supx[a,b]d|f1(x)f2(x)|
and
dn(f1,f2)={1ni=1n(f1(Xi)f2(Xi))2}1/2
for f1,f2Am.

In Lemmas B.1 and B.2, we will establish that Assumption 7 guarantees that g^n, restricted to [a,b]d, belongs to Am for n sufficiently large a.s. In the proof of Lemma B.3, we will use the fact that Am can be covered by a finite number of balls with radius ϵ in metric d for any ϵ>0. In the proof of Theorem 3, we will use the fact that the number of balls of radius ϵ needed to cover Am in metric dn is of order exp(ϵd/m).

To prove Theorems 1 and 2, we need Lemmas B.1, B.2, B.3, and B.4. Here is our first lemma.

Lemma B.1.

Let Ω be an open-bounded subset of Rd containing [a,b]d. Under Assumptions 2, 3, 7, and 9, there exists a constant α0, depending on Ω, such that

supxΩ|g^n(x)|α0
for n sufficiently large a.s.

Proof of Lemma B.1.

The proof is divided into three steps.

  • Step 1. We prove (1/n)i=1ng^n(Xi)2 is bounded for n sufficiently large (uniformly on n) a.s. To see this, one notices that by Assumption 3, Assumption 9, and the strong law of large numbers, we have

    1ni=1ng^n(Xi)22ni=1n(g^n(Xi)Yi)2+2ni=1nYi22ni=1n(g^n(Xi)f*(Xi))24ni=1nεi(g^n(Xi)f*(Xi))+2ni=1nεi2+2ni=1nYi22Bn+2ni=1nεi2+2ni=1nYi22σ2+2E(f*(X)+ε)2+1M1(B.2)
    for n sufficiently large a.s. We used the fact that f*Fm and 2m>d, and hence, f* is continuous on Rd and is bounded over [a,b]d.

  • Step 2. We use the Sobolev integral identity to express g^n as the sum of a polynomial of degree less than m and a bounded function over Ω. To fill in the details, note that g^n is continuous over Rd and hence, belongs to L2(Ω) when it is restricted to Ω. By Assumption 7 and Adams and Fournier [1, theorem 2 on p. 717], g^n, restricted to Ω, belongs to Wm(Ω) for n sufficiently large a.s. The Sobolev integral identity (see Oden and Reddy [54, theorem 3.6 on p. 78]) allows us to express

    g^n(x)=u^n(x)+v^n(x)
    for all xΩ, where u^n is a polynomial of degree less than m,
    v^n(x)=Ωxymd|α|=mQα(x,y)Dαg^n(y)dy
    for xΩ, and Qα(x,y), |α|=m, are bounded infinitely differentiable functions of x and y. Hölder’s inequality implies
    |v^n(x)||α|=m{Ωxy2m2ddyΩ{Qα(x,y)Dαg^n(y)}2dy}1/2maxx,yΩ|Qα(x,y)||α|=m{Ωxy2m2ddyΩ{Dαg^n(y)}2dy}1/2(B.3)
    for xΩ.

    Let ρ be the radius of the smallest ball, centered at the origin, containing Ω. Then, changing to the spherical coordinate system yields

    Ωxy2m2ddy02π02ρr2(md)rd1drdθ=2π(2ρ)2(md/2)(B.4)
    by the fact that 2m>d.

    By Assumption 7, (B.3), and (B.4), there exists a constant M2 such that

    supxΩ|v^n(x)|M2(B.5)
    for n sufficiently large a.s.

  • Step 3. We use (B.5) and the Pm1-unisolvency of {X1,,Xn} to show that supxΩ|u^n(x)| is bounded uniformly on n for n sufficiently large a.s.

    First, combining (B.2) and (B.5) yields

    1ni=1nu^n(Xi)22ni=1ng^n(Xi)2+2ni=1nv^n(Xi)22M1+2M22(B.6)
    for n sufficiently large a.s.

Second, we consider the following matrix:

Zn=(p1(X1)p2(X1)pM(X1)p1(Xn)p2(Xn)pM(Xn)).

By the Pm1-unisolvency of the Xi’s, the matrix W(1/n)ZnTZn is positive definite because wTWw=0 with wT=(w1,,wM) implies i=1Mwipi(Xj) =0 for all 1jn, which then implies wi=0 for all 1iM. Let λ1,,λM be the positive eigenvalues of W, and let ρ1,,ρM be the corresponding eigenvectors with ρi=1 for 1iM. Let λmin be the minimum of λ1,,λM.

We next prove that

lim infn λminλ*>0(B.7)
for some constant λ*. Because ρiTWρi=λi for 1iM, we have λminminw=1wTWw, where wT=(w1,,wM). So,
λminminw12+w12++wM2=11ni=1n(w1p1(Xi)++wMpM(Xi))2.

Note that Λ={(w1,,wM)RM:w12++wM2=1} is a nonempty closed subset of RM, and

1ni=1n(w1p1(Xi)++wMpM(Xi))2E(w1p1(X)++wMpM(X))2
as n uniformly on Λ because Λ is bounded and the Xi’s are in [a,b]d.

By Shapiro et al. [63, proposition 5.2 on p. 157],

minw12++wM2=11ni=1n(w1p1(Xi)++wMpM(Xi))2minw12++wM2=1E(w1p1(X)++wMpM(X))2
a.s. as n. Note that minw12++wM2=1E(w1p1(X)++wMpM(X))20 because E(w1p1(X)++wMpM(X))2=0 implies w1==wM=0. Hence,
lim infn λminminw12++wM2=1E(w1p1(X)++wMpM(X))2λ*>0,
proving (B.7).

We finally prove that u^n is bounded on Ω uniformly on n for n sufficiently large. Let u^n be given by u^n=a^n,1p1++a^n,MpM for some a^n,1,,a^n,M. Let a^nT(a^n,1,,a^n,M) and note that the Rayleigh–Ritz theorem implies

1ni=1nu^n(Xi)2=a^nTWa^nλmin(a^n,12++a^n,M2)λmina^n,i2(B.8)
for 1iM.

Combining (B.6), (B.7), and (B.8) yields

|a^n,i|(2M1+2M22)/(λ*/2)(B.9)
for 1iM and all n sufficiently large a.s., and hence,
|u^n(x)|=|a^n,1p1(x)++a^n,MpM(x)|M(2M1+2M22)/(λ*/2)max1iM,xΩ|pi(x)|M3(B.10)
for all xΩ and n sufficiently large a.s., proving Step 3.

Combining (B.5) and (B.10) completes the proof of Lemma B.1. 

Lemma B.2.

Under Assumptions 2, 3, 7, and 9, there exists a constant α1 such that

|g^n(x)g^n(y)|α1xyδ
for any x,y[a,b]d for n sufficiently large a.s., where δ=1 when m2 and δ=1/2 when m=1.

Proof of Lemma B.2.

We consider two cases separately: the case when m=1 and the case when m2.

  • Case 1. m=1. Because 2m>d, it follows that d=1. The fundamental theorem of calculus and the continuity of g^n imply

    g^n(x)g^n(y)=yxg^n(1)(t)dt,
    where g^n(1) is the weak derivative of g^n of the first order. By Assumption 7 and Hölder’s inequality,
    |g^n(x)g^n(y)|{yx{g^n(1)(t)}2dtyxdt}1/2(cB+1)1/2xy1/2
    for yx and n sufficiently large a.s., proving Lemma B.2.

  • Case 2. m2. Let Ω be a bounded open subset of Rd containing [a,b]d.

We first prove that Dαg^n(x) with |α|=1 is bounded uniformly on n and x for n sufficiently large a.s. By Oden and Reddy [54, equation 3.48 on p. 81] and Lemma B.1, lim supng^nWn(Ω)< a.s. Assumption 7 and Adams and Fournier [1, theorem 2 on p. 717] imply that there exists a constant M4 such that

max|β|=1Ω{Dβg^n(x)}2dxM4 and max|β|=2Ω{Dβg^n(x)}2dxM4(B.11)
for n sufficiently large a.s.

When g^n is restricted to Ω, Dαg^n with |α|=1 belongs to W1(Ω) for n sufficiently large a.s. Applying the Sobolev integral identity (Oden and Reddy [54, theorem 3.6 on p. 68]) to Dαg^n with |α|=1 yields

Dαg^n(x)=Ωζ(y)Dαg^n(y)dy+Ωxymd|β|=2Qβ(x,y)Dβg^n(y)dy
for xΩ, where ζ(y) is a continuous bounded function of y and Qβ(x,y), |β|=2, are bounded infinitely differentiable functions of x and y. Hence, for xΩ and |α|=1, combining Hölder’s inequality, (B.4), and (B.11) yields
|Dαg^n(x)|{Ωζ(y)2dy}1/2{ΩDαg^n(y)2dy}1/2+|β|=2{Ωxy2m2ddy}1/2{ΩQβ(x,y)2{Dβg^n(y)}2dy}1/2M5(B.12)
for some constant M5 and n sufficiently large a.s.

We next note that by the extended mean-value theorem, for any x=(x1,,xd) and y=(y1,,yd), satisfying x,x+y[a,b]d, we have

g^n(x+y)g^n(x)=g^nx1(z1)y1++g^nxd(zd)yd
for some z=x+cy with c(0,1), and hence,
|g^n(x+y)g^n(x)|{y12++yd2}1/2{g^nx1(z1)2++g^nxd(zd)2}1/2M6y.

This is for some constant M6 and n sufficiently large a.s.

Combining the two cases, we complete the proof of Lemma B.2. 

Lemma B.3.

Under Assumptions 2, 3, 7, and 9,

1ni=1nεi(g^n(Xi)f*(Xi))0
as n a.s.

Proof of Lemma B.3.

Let ϵ>0 be given. Let Am be defined as in (B.1). Assumption 7, Lemma B.1, and Lemma B.2 imply that under Assumptions 2 and 3, any solution to Problem (B), restricted to [a,b]d, belongs to Am for n sufficiently large a.s. By Kolmogorov and Tihomirov [39, theorem 13], there exist ll(ϵ) and h1,,hl in Am such that for any gAm, there is ii(g){1,,l} satisfying

supx[a,b]d|g(x)hi(x)|ϵ.(B.13)

For each j{1,,l}, we have

1ni=1nεi(g^n(Xi)f*(Xi))=1ni=1nεi(g^n(Xi)hj(Xi))+1ni=1nεi(hj(Xi)f*(Xi))1ni=1n|εi|supx[a,b]d|g^n(x)hj(x)|+max1jl1ni=1nεi(hj(Xi)f*(Xi)),
and hence,
1ni=1nεi(g^n(Xi)f*(Xi))min1jl(supx[a,b]d|g^n(x)hj(x)|)(1ni=1n|εi|)+max1jl1ni=1nεi(hj(Xi)f*(Xi))ϵni=1n|εi|+max1jl1ni=1nεi(hj(Xi)f*(Xi)) by (44)ϵE|ε|+ϵ
for n sufficiently large a.s. by the strong law of large numbers. 

Lemma B.4.

Under Assumptions 2, 3, 7, and 9,

1ni=1n(g^n(Xi)f*(Xi))20
as n a.s.

Proof of Lemma B.4.

Note that

1ni=1n(g^n(Xi)f*(Xi))22ni=1nεi(g^n(Xi)f*(Xi))+Sn1ni=1nεi2.

Lemma B.4 follows from Assumption 9 and Lemma B.3. 

Proof of Theorem 2.

We first use Lemma B.4 and the fact that f* is continuous over [a,b]d to conclude that g^n converges to f* uniformly over [a,b]d a.s.

Fix ϵ>0. We can find a finite number of sets S1,,Sl covering [a,b]d, each having a diameter less than ϵ (i.e., sup{xy:x,ySj}ϵ) and having a nonempty interior. Because f*Fm and hence, is continuous over [a,b]d, Lemma B.2 implies that we can find a constant, say c, satisfying

|g^n(x)g^n(y)|cxyδ and |f*(x)f*(y)|cxyδ
for x,y[a,b]d and n sufficiently large a.s., where δ=1 when m2 and δ=1/2 when m=1.

For each xSj and XiSj,

|g^n(x)f*(x)||g^n(x)g^n(Xi)|+|g^n(Xi)f*(Xi)|+|f*(Xi)f*(x)|cϵδ+|g^n(Xi)f*(Xi)|+cϵδ,
so
supxSj|g^n(x)f*(x)|2cϵδ+i=1n|g^n(Xi)f*(Xi)|I(XiSj)i=1nI(XiSj)2cϵδ+1ni=1n|g^n(Xi)f*(Xi)|ni=1nI(XiSj))2cϵδ+1ni=1n(g^n(Xi)f*(Xi))2ni=1nI(XiSj)).

By Lemma B.4 and Assumption 2, we conclude that

lim supn supxSj|g^n(x)f*(x)|0
a.s. Because there are finitely many Sj’s, we conclude that
supx[a,b]d|g^n(x)f*(x)|0
a.s. as n.

We next use Lemmas B.1, B.2, and B.4 to establish

E(g^n(X)f*(X))20(B.14)
as n.

We need to use the truncated value defined as follows. For any c0 and x0, the truncated value of x, denoted by Tc(x), is defined by

Tc(x)={x,if xc0,otherwise.(B.15)

Note that Tc(x2)={Tc(x)}2 for x0, Tc(x)Tc(y) for 0xy, and T(x+y)T(x)+T(y) for x,y0. Together with the fact that |x+y||x|+|y| for x,yR, we obtain

Tc(x+y)2Tc(x)2+Tc(y2)+2Tc|x|·Tc|y|(B.16)
Tc(x+y)2Tc(x)2Tc(y2)+2Tc|x|·Tc|y|.(B.17)

We will prove that

ETc(g^n(X)f*(X))20(B.18)
as n for each c>0. Once (B.18) is proven, letting c for each n will prove (B.14). To show (B.18), let ϵ>0 and c>0 be given. Let Am be defined as in (B.1). Lemmas B.1 and B.2 imply that, under Assumption 2 and Assumption 3, any solution to Problem (B), restricted to [a,b]d, belongs to Am for n sufficiently large a.s. By Kolmogorov and Tihomirov [39, theorem 13], there exist ll(ϵ) and h1,,hl in Am such that for any gAm, there is ii(g){1,,l} satisfying
supx[a,b]d|g(x)hi(x)|ϵ.(B.19)

For each j{1,,l}, we have

ETc(g^n(X)f*(X))21ni=1nTc(g^n(Xi)f*(Xi))2=ETc(g^n(X)hj(X)+hj(X)f*(X))21ni=1nTc(g^n(Xi)hj(Xi)+hj(Xi)f*(Xi))2ETc(g^n(X)hj(X))2+E(hj(X)f*(X))2+2ETc|g^n(X)hj(X)|Tc|hj(X)f*(X)|1ni=1nTc(g^n(Xi)hj(Xi))21ni=1nTc(hj(Xi)f*(Xi))2+2ni=1nTc|g^n(Xi)hj(Xi)|Tc|hj(Xi)f*(Xi)|(B.16)and(B.17)max1jl{ETc(hj(X)f*(X))21ni=1nTc(hj(Xi)f*(Xi))2}+ETc(g^n(X)hj(X))2+2ETc(g^n(X)hj(X))2ETc(hj(X)f*(X))2+1ni=1nTc(g^n(Xi)hj(Xi))2+21ni=1nTc(g^n(Xi)hj(Xi))21ni=1nTc(hj(Xi)f*(Xi))2
by the Cauchy–Schwarz inequality, so
ETc(g^n(X)f*(X))21ni=1nTc(g^n(Xi)f*(Xi))2max1jl{ETc(hj(X)f*(X))21ni=1nTc(hj(Xi)f*(Xi))2}+min1jlETc(g^n(X)hj(X))2+min1jl2ETc(g^n(X)hj(X))2ETc(hj(X)f*(X))2+min1jl1ni=1nTc(g^n(Xi)hj(Xi))2+min1jl21ni=1nTc(g^n(Xi)hj(Xi))21ni=1nTc(hj(Xi)f*(Xi))2ϵ+2ϵ2+4cϵ
a.s. for n sufficiently large, proving
ETc(g^n(X)f*(X))21ni=1nTc(g^n(Xi)f*(Xi))20
as n. By the dominated convergence theorem, we have
ETc(g^n(X)f*(X))2E[1ni=1nTc(g^n(Xi)f*(Xi))2]0
and
E[1ni=1nTc(g^n(Xi)f*(Xi))2]0
as n, and hence, (B.18) follows.

To prove the last part of Theorem 2, let j{1,,m1} be fixed. By Assumption 2(ii), for any α with |α|=j,

E(Dαg^n(X)Dαf*(X))2τ2|β|j[a,b]d(Dβg^n(x)Dβf*(x))2dx.(B.20)

By Adams and Fournier [1, third equality in theorem 2 on p. 717], there exists a constant C such that the right-hand side of (B.20) is less than or equal to

τ2C|β|m{[a,b]d(Dβg^n(x)Dβf*(x))2dx}j/2m{[a,b]d(g^n(x)f*(x))2dx}(mj)/2m(τ2/τ1)C|β|m{[a,b]d(Dβg^n(x)Dβf*(x))2dx}j/2m{E(g^n(X)f*(X))2}(mj)/2m.

By Adams and Fournier [1, first equality of theorem 2 on p. 717] and the fact that J(g^n)cB+1 for n sufficiently large a.s. and E(g^n(X)f*(X))20 as n,

|β|m{[a,b]d(Dβg^n(x)Dβf*(x))2dx}j/2m
is bounded for n sufficiently large a.s. Therefore, E(Dαg^n(X)Dαf*(X))2 is bounded by a random variable, say Zn, that converges to zero as n a.s. By using TcE(Dαg^n(X)Dαf*(X))2Tc(Zn) for any c>0 and the dominated convergence theorem, we obtain TcE(Dαg^n(X)Dαf*(X))20 as n, and hence, it follows that E(Dαg^n(X)Dαf*(X))20 as n. 

Proof of Theorem 1.

We apply arguments similar to those in Theorem 2 and Lemmas B.1B.4 to establish the following claims.

  • Step 1. By Assumption 6, J(f^n)cA+1 for n sufficiently large a.s.

  • Step 2. Use arguments similar to those in the proof of Lemma B.1 to establish that Assumptions 2, 3, 6, and 8 imply that there exists a constant β0β0(Ω) such that |f^n(x)|β0 for all xΩ and n sufficiently large a.s. for any bounded open subset Ω of Rd containing [a,b]d.

  • Step 3. Use arguments similar to those in the proof of Lemma B.2 to establish that Assumptions 2, 3, 6, and 8 imply that there exists a constant β1 such that |f^n(x)f^n(y)|β1xyδ for x,y[a,b]d and n sufficiently large a.s., where δ=1 when m2 and δ=1/2 when m=1.

  • Step 4. Use arguments similar to those in the proof of Lemma B.3 to establish that Assumptions 2, 3, 6, and 8 imply (1/n)i=1nεi(f^n(Xi)f*(Xi))0 as n a.s.

  • Step 5. Use arguments similar to those in the proof of Lemma B.4 to establish that Assumptions 2, 3, 6, and 8 imply (1/n)i=1n(f^n(Xi)f*(Xi))20 as n a.s.

  • Step 6. Use arguments similar to those in the proof of Theorem 2 to establish supx[a,b]d|f^n(x)f*(x)|0 as n a.s., E(f^n(X)f*(X))20 as n, and E(Dαf^n(X)Dαf*(X))20 as n for any α satisfying |α|m1. 

Proof of Corollary 1.

It suffices to prove that Assumption 10 implies Assumption 6 and that Assumption 11 implies Assumption 8. Because J(f^n)Un for all n, under Assumption 10, we have lim supnJ(f^n)lim supnUn<, so Assumption 6 holds. Note that Assumption 11 implies that f* is a feasible solution to Problem (A) for all n sufficiently large a.s., so En(f^n)En(f*), or equivalently, 1ni=1n(f^n(Xi)f*(Xi))22ni=1nεi(f^n(Xi)f*(Xi)). 

Proof of Corollary 2.

It suffices to prove that Assumption 12 implies Assumption 9. Because En(g^n)Sn,

1ni=1n(g^n(Xi)f*(Xi))22ni=1nεi(g^n(Xi)f*(Xi))+Sn1ni=1nεi2.

So, Assumption 12 implies Assumption 9. 

Proof of Theorem 3.

Let f˜n be a solution to

MinimizefAmEn(f)  subject to J(f)Un.

We note that

{1ni=1n(f˜n(Xi)f*(Xi))2}1/2=Op(nm/(2m+d))(B.21)
implies Theorem 3 because of Assumption 6 and Steps 2 and 3 in the proof of Theorem 1. Hence, the rest of the proof is devoted to proving (B.21).

By Assumption 13, we have

1ni=1n(f˜n(Xi)f*(Xi))22ni=1nεi(f˜n(Xi)f*(Xi)).(B.22)

Hence, if we can prove

1n|i=1nεi(f˜n(Xi)f*(Xi))|/{1ni=1n(f˜n(Xi)f*(Xi))2}(1/2)(1d/2m)=Op(1),(B.23)
then the combination of (B.22) and (B.23) will prove (B.21).

To establish (B.23), we introduce the notion of ϵ-covering sets and the covering number of a metric or pseudometric space (T,d) as follows. For any ϵ>0, an ϵ-covering set of (T,d) is a class of functions in T such that for any fT, there exists hT satisfying d(f,h)<ϵ. The covering number N(ϵ,T,d) of (T,d) is the number of elements of a minimal covering set. In other words,

N(ϵ,T,d)=min{L:There exists f1,,fL such that Ti=1LB(fi,ϵ)},
where B(f,ϵ)={gT:d(f,g)ϵ} for fT.

We note that there exists a positive constant Γ such that for each ϵ>0,

N(ϵ)log(1+N(ϵ,Am,d))Γϵd/m;(B.24)
see, for example, Birman and Solomyak [5]. (B.24) implies
Nn(ϵ)log(1+N(ϵ,Am,dn))Γϵd/m.(B.25)

(B.25) together with Step 2 in the proof of Theorem 1 and Assumption 4 implies (B.23) by van de Geer [69, lemma 8.4], which completes the proof of Theorem 3. 

Proof of Corollary 3.

In the proof of Corollary 2, we already proved that Assumption 10 implies Assumption 6. It suffices to prove that Assumption 11 implies Assumption 13. Under Assumption 11, f* is a feasible solution to Problem (A) for n sufficiently large a.s., so En(f^n)En(f*), or equivalently, 1ni=1n(f^n(Xi)f*(Xi))22ni=1nεi(f^n(Xi)f*(Xi)) for all n sufficiently large a.s. 

Proof of Theorem 4.

Let L2={f:RdR:E(f(X)2)<} be equipped with the seminorm |f|{E(f(X)2)}1/2 for fL2. It should be noted that L2 is a semi-Hilbert space and that Pm1 is a closed subspace of L2. By Assumption 5(i), f*L2. By the projection theorem, there exists fPm1 minimizing E(f(X)f*(X))2 over Pm1. Let ν*E(f(X)f*(X))2. To prove that ν*>0, suppose, on the contrary, that E(f(X)f*(X))2=0. By Assumption 5(iii) and Assumption 2, we reach f*(x)=f(x) for x[a,b]d, which contradicts Assumption 5(ii).

To prove the second part of Theorem 4, let fP be defined as in Lemma 1. We will show that for n sufficiently large,

0<SnEn(fP),(B.26)
a.s., which implies that there exists a unique solution g^n to Problem (B) for n sufficiently large a.s.

To show (B.26), we will first show that

En(fP)min(c1,,cM)RME(c1p1(X)++cMpM(X)f*(X))2+σ2(B.27)
as n. To prove (B.27), we first note that fP is a solution to
min(c1,,cM)RM1ni=1n(c1p1(X)++cMpM(X)f*(Xi))22ni=1nεi(c1p1(X)++cMpM(X)f*(Xi))+1ni=1nεi2
and that
1ni=1n(c1p1(X)++cMpM(X)f*(Xi))22ni=1nεi(c1p1(X)++cMpM(X)f*(Xi))+1ni=1nεi2E(c1p1(X)++cMpM(X)f*(X))2+σ2
as n a.s. by Assumption 3(ii), Assumption 3(iii), Assumption 5(i), and the strong law of large numbers. We then use Shapiro et al. [63, theorem 5.4 on p. 159] and the fact that
E(c1p1(X)++cMpM(X)f*(X))2+σ2(B.28)
is convex in (c1,,cM). The only nontrivial condition of Shapiro et al. [63, theorem 5.4] is that the set of solutions to (B.28) is nonempty and bounded. In the first part of this proof, we showed that there exists a solution to (B.28). To prove that the set of solutions to (B.28) is bounded, we notice that E(c1p1(X)++cMpM(X))2 is positive definite because of Assumption 2 and the fact that [a,b]d contains a Pm1-unisolvent set, and hence, E(c1p1(X)++cMpM(X)f*(X))2 is coercive over (c1,,cM)RM. We then conclude that the set of solutions to (B.28) is bounded. Thus, the application of Shapiro et al. [63, theorem 5.4 on p. 159] yields (B.27). Because the minimizing value of (B.28) is σ2+ν* and lim supnSn<σ2+ν* a.s., we reach (B.26). 

References

  • [1] Adams RA, Fournier J (1977) Cone conditions and properties of Sobolev spaces. J. Math. Anal. Appl. 61(3):713–734.CrossrefGoogle Scholar
  • [2] Adams RA, Fournier JJF (2003) Sobolev Spaces, 2nd ed. (Academic Press, Amsterdam).Google Scholar
  • [3] Ankenman BE, Nelson BL, Staum J (2010) Stochastic kriging for simulation metamodeling. Oper. Res. 58(2):371–382.LinkGoogle Scholar
  • [4] Bertsimas D, Mundru N (2020) Sparse convex regression. INFORMS J. Comput. 33(1):262–279.LinkGoogle Scholar
  • [5] Birman MS, Solomyak MZ (1967) Piecewise-polynomial approximations of functions of the classes Wpα. Math. USSR-Sbornik 2(3):295–317.CrossrefGoogle Scholar
  • [6] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [7] Brunk HD (1958) On the estimation of parameters restricted by inequalities. Ann. Math. Statist. 29(2):437–454.CrossrefGoogle Scholar
  • [8] Brunk HD (1970) Estimation of isotonic regression. Puri ML, ed. Nonparametric Techniques in Statistical Inference (Cambridge University Press, London), 177–197.Google Scholar
  • [9] Chen X, Ankenman BE, Nelson BL (2013) Enhancing stochastic kriging metamodels with gradient estimators. Oper. Res. 61(2):512–528.LinkGoogle Scholar
  • [10] Chui CK (1988) Multivariate Splines (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [11] Chung KC, Yao TH (1977) On lattices admitting unique Lagrange interpolations. SIAM J. Numerical Anal. 14(4):735–743.CrossrefGoogle Scholar
  • [12] Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74(368):829–836.CrossrefGoogle Scholar
  • [13] Cox DD (1983) Asymptotics for M-type smoothing splines. Ann. Statist. 11(2):530–551.CrossrefGoogle Scholar
  • [14] Cox DD (1984) Multivariate smoothing spline functions. SIAM J. Numerical Anal. 21(4):789–813.CrossrefGoogle Scholar
  • [15] Craven P, Wahba G (1979) Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31:377–403.CrossrefGoogle Scholar
  • [16] de Boor C (1978) A Practical Guide to Splines (Springer, New York).CrossrefGoogle Scholar
  • [17] Dierckx P (1993) Curve and Surface Fitting with Splines (Oxford University Press, New York).CrossrefGoogle Scholar
  • [18] Donoho DL, Johnstone IM (1994) Ideal spatial adaptation via wavelet shrinkage. Biometrika 81(3):425–455.CrossrefGoogle Scholar
  • [19] Duchon J (1979) Splines minimizing rotation invariant semi-norms in Sobolev spaces. Schempp W, Zeller K, eds. Multivariate Approximation Theory (Birkhäuser-Verlag, Basel, Switzerland), 85–100.Google Scholar
  • [20] Eubank RL (1999) Nonparametric Regression and Spline Smoothing (Marcel Dekker, New York).CrossrefGoogle Scholar
  • [21] Franke R (1982) Scattered data interpolation: Tests of some methods. Math. Comput. 38(157):181–200.Google Scholar
  • [22] Grant M, Boyd S (2014) CVX: MATLAB software for disciplined convex programming, version 2.1. Accessed April 17, 2024, http://cvxr.com/cvx.Google Scholar
  • [23] Green PJ, Silverman BW (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach (Chapman & Hall, London).CrossrefGoogle Scholar
  • [24] Green A, Balakrishnan S, Tibshirani R (2021) Minimax optimal regression over Sobolev spaces via Laplacian regularization on neighborhood graphs. Banerjee A, Fukumizu K, eds. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2602–2610.Google Scholar
  • [25] Greville TNE (1969) Theory and Applications of Spline Functions (Academic Press, New York).Google Scholar
  • [26] Groeneboom P, Jongbloed G, Wellner JA (2001) Estimation of a convex function: Characterization and asymptotic theory. Ann. Statist. 29(6):1653–1698.CrossrefGoogle Scholar
  • [27] Györfi L, Kohler M, Krzyżak A, Walk H (2002) A Distribution-Free Theory of Nonparametric Regression (Springer, New York).CrossrefGoogle Scholar
  • [28] Hanson DL, Pledger G (1976) Consistency in concave regression. Ann. Statist. 4(6):1038–1050.CrossrefGoogle Scholar
  • [29] Härdle W (1990) Applied Nonparametric Regression (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [30] Hildreth C (1954) Point estimates of ordinates of concave functions. J. Amer. Statist. Assoc. 49(267):598–619.CrossrefGoogle Scholar
  • [31] Hillier FS, Lieberman GJ (1967) Introduction to Operations Research, 9th ed. (Holden-Day, San Francisco).Google Scholar
  • [32] Hull JC (2006) Options, Futures, and Other Derivatives (Prentice Hall, Hoboken, NJ).Google Scholar
  • [33] Hutchinson MF, de Hoog FR (1985) Noisy data with spline functions. Numerische Mathematik 47:99–106.CrossrefGoogle Scholar
  • [34] Johnson AL, Jiang DR (2018) Shape constraints in economics and operations research. Statist. Sci. 33(4):527–546.CrossrefGoogle Scholar
  • [35] Kersey SN (2003) On the problems of smoothing and near-interpolation. Math. Comput. 72(244):1873–1885.CrossrefGoogle Scholar
  • [36] Keshvari A (2018) Segmented concave least squares: A nonparametric piecewise linear regression. Eur. J. Oper. Res. 266(2):585–594.CrossrefGoogle Scholar
  • [37] Kohler M, Krzyżak A, Walk H (2006) Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data. J. Multivariate Anal. 97(2):311–323.CrossrefGoogle Scholar
  • [38] Kohler M, Krzyżak A, Walk H (2009) Optimal global rates of convergence for nonparametric regression with unbounded data. J. Statist. Planning Inference 139(4):1286–1296.CrossrefGoogle Scholar
  • [39] Kolmogorov AN, Tihomirov VM (1961) ϵ-Entropy and ϵ-capacity of sets in functional spaces. Amer. Math. Soc. Translations 2(17):277–364.CrossrefGoogle Scholar
  • [40] Kuosmanen T (2008) Representation theorem for convex nonparametric least squares. Econom. J. 11(2):308–325.CrossrefGoogle Scholar
  • [41] Kuosmanen T, Johnson AL (2010) Data envelopment analysis as nonparametric least squares regression. Oper. Res. 58(1):149–160.LinkGoogle Scholar
  • [42] Kuosmanen T, Johnson AL (2017) Modeling joint production of multiple outputs in StoNED: Directional distance function approach. Eur. J. Oper. Res. 262(2):792–801.CrossrefGoogle Scholar
  • [43] Lee C, Johnson AL, Moreno-Centeno E, Kuosmanen T (2013) A more efficient algorithm for convex nonparametric least squares. Eur. J. Oper. Res. 227(2):391–400.CrossrefGoogle Scholar
  • [44] Lim E (2020) The limiting behavior of isotonic and convex regression estimators when the model is misspecified. Electronic J. Statist. 14(1):2053–2097.CrossrefGoogle Scholar
  • [45] Lim E (2021) Consistency of penalized convex regression. Internat. J. Statist. Probab. 10(1):69–78.CrossrefGoogle Scholar
  • [46] Lim E, Glynn PW (2012) Consistency of multidimensional convex regression. Oper. Res. 60(1):196–208.LinkGoogle Scholar
  • [47] Luo Z, Wahba G (1997) Hybrid adaptive splines. J. Amer. Statist. Assoc. 92(437):107–116.CrossrefGoogle Scholar
  • [48] Mammen E (1991) Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19(2):741–759.CrossrefGoogle Scholar
  • [49] Mazumder R, Choudhury A, Iyengar G, Sen B (2019) A computational framework for multivariate convex regression and its variants. J. Amer. Statist. Assoc. 114(525):318–331.CrossrefGoogle Scholar
  • [50] Meinguet J (1979) Multivariate interpolation at arbitrary points made simple. Z. Angew. Math. Phys. 30:292–304.CrossrefGoogle Scholar
  • [51] Meinguet J (1984) Surface spline interpolation: Basic theory and computational aspects. Singh SP, Burry JWH, Watson B, eds. Approximation Theory and Spline Functions, NATO ASI Series, vol. 136 (Springer, Dordrecht, Netherlands), 127–142.CrossrefGoogle Scholar
  • [52] Myers RH, Montgomery DC (2002) Response Surface Methodology: Process and Product Optimization Using Designed Experiments (Wiley, New York).Google Scholar
  • [53] Nadaraya EA (1964) On estimating regression. Theory Probab. Appl. 9(1):141–142.CrossrefGoogle Scholar
  • [54] Oden JT, Reddy JN (1976) An Introduction to the Mathematical Theory of Finite Elements (Wiley, New York).Google Scholar
  • [55] Reinsch CH (1967) Smoothing by spline functions. Numerische Mathematik 10:177–183.CrossrefGoogle Scholar
  • [56] Reinsch CH (1971) Smoothing by spline functions II. Numerische Mathematik 16:451–454.CrossrefGoogle Scholar
  • [57] Rice J, Rosenblatt M (1983) Smoothing splines: Regression, derivatives and deconvolution. Ann. Statist. 11(1):141–156.CrossrefGoogle Scholar
  • [58] Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann. Statist. 22(3):1346–1370.CrossrefGoogle Scholar
  • [59] Salemi P, Nelson BL, Staum J (2016) Moving least squares regression for high-dimensional stochastic simulation metamodeling. ACM Trans. Model. Comput. Simulation 26:16:1–16:25.CrossrefGoogle Scholar
  • [60] Schoenberg IJ (1964) Spline functions and the problem of graduation. Proc. Natl. Acad. Sci. USA 52(4):947–950.CrossrefGoogle Scholar
  • [61] Seijo E, Sen B (2011) Nonparametric least squares estimation of a multivariate convex regression function. Ann. Statist. 39(3):1633–1657.CrossrefGoogle Scholar
  • [62] Shapiro A (2000) On the asymptotics of constrained local M-estimators. Ann. Statist. 28(3):948–960.CrossrefGoogle Scholar
  • [63] Shapiro A, Dentcheva D, Ruszczyński AP (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [64] Stone CJ (1977) Consistent nonparametric regression. Ann. Statist. 5(4):595–645.CrossrefGoogle Scholar
  • [65] Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8(6):1348–1360.CrossrefGoogle Scholar
  • [66] Utreras FI (1981) Optimal smoothing of noisy data using spline functions. SIAM J. Sci. Statist. Comput. 2(3):349–362.CrossrefGoogle Scholar
  • [67] Utreras FI (1988) Convergence rates for multivariate smoothing spline functions. J. Approx. Theory 52(1):1–27.CrossrefGoogle Scholar
  • [68] van de Geer S (1990) Estimating a regression function. Ann. Statist. 18(2):907–924.CrossrefGoogle Scholar
  • [69] van de Geer S (2000) Empirical Process in M-Estimation (Cambridge University Press, Cambridge, UK).Google Scholar
  • [70] Varian HR (1985) Non-parametric analysis of optimizing behavior with measurement error. J. Econometrics 30(1):445–458.CrossrefGoogle Scholar
  • [71] Wahba G (1990) Spline Models for Observational Data (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [72] Wand MP, Jones MC (1995) Kernel Smoothing (Chapman & Hall, London).CrossrefGoogle Scholar
  • [73] Watson GS (1964) Smooth regression analysis. Sankhya Ser. A 26(4):359–372.Google Scholar
  • [74] Wegman EJ, Wright IW (1983) Splines in statistics. J. Amer. Statist. Assoc. 78(382):351–365.CrossrefGoogle Scholar
  • [75] Yagi D, Chen Y, Johnson AL, Kuosmanen T (2020) Shape-constrained kernel-weighted least squares: Estimating production functions for Chilean manufacturing industries. J. Bus. Econom. Statist. 38(1):43–54.CrossrefGoogle Scholar
  • [76] Zangwill WI (1969) Nonlinear Programming: A Unified Approach (Prentice-Hall, Hoboken, NJ).Google Scholar