Nonparametric estimation of service time characteristics in infinite-server queues with nonstationary Poisson input

This paper provides a mathematical framework for estimation of the service time distribution and the expected service time of an infinite-server queueing system with a nonhomogeneous Poisson arrival process, in the case of partial information, where only the number of busy servers are observed over time. The problem is reduced to a statistical deconvolution problem, which is solved by using Laplace transform techniques and kernels for regularization. Upper bounds on the mean squared error of the proposed estimators are derived. Some concrete simulation experiments are performed to illustrate how the method can be applied and to provide some insight in the practical performance.


Introduction
The advance of more and larger datasets leads to new questions in operations research and statistics.This paper can be placed in the intersection of these two fields.In particular, we study the statistical problems of estimating the distribution function and expectation of service times in an infinite-server queueing model in case of partial information.The information is incomplete in the sense that the number of busy servers is observed, but the individual customers cannot be tracked.

Infinite-server queue
First we provide some background on the M/G/∞ queueing model, which is well studied and could be considered as a standard model.In such queues there are arrivals according to a homogeneous Poisson process, each customer is served independently of all other customers and customers do not have to queue for service, because there is an infinite number of servers.The model has a wide variety of applications in e.g.telecommunication, road traffic and hospital modeling.It can be used as an approximation to M/G/n systems, where n is relatively large with respect to the arrival rate, but the model is also interesting in its own right.For example, it can be interpreted outside queueing theory as a model for the size of a population.In this paper we will use queueing terminology (servers, customers, etc.), but these terms could be adjusted according to the application.For example, 'customers' could be cars travelling between two locations and their 'service time' could be the travel time.In many applications it is seen that the arrival rate is not constant, but it varies over time.We will provide some examples below.This observation motivates studying the M t /G/∞ queue, where the arrival rate is assumed to be a nonhomogeneous Poisson process.This model is still particularly tractable (cf.[12]) and amenable for statistical analysis, as shown in this paper.

Statistical queueing problems
Queueing theory studies probabilistic properties of random processes in service systems on the basis of a complete specification of system parameters.In this paper we are interested in inverse problems when unknown characteristics of a system should be inferred from observations of the associated random processes.Typically such observations are incomplete in the sense that individual customers cannot be tracked as they go through the service system.The importance of such statistical inverse problems with incomplete observations was emphasized in [3].
The service time distribution and its expected value are important performance metrics of the infinite-server queue (note that waiting times are identical to service times in infinite-server queues).Our goal is to estimate these characteristics of the M t /G/∞ system from observations of the queue-length process.Specifically, let {τ j , j ∈ Z} be arrival epochs constituting a realization of the non-homogeneous Poisson point process on the positive real half line with intensity {λ(t), t ≥ 0}.The service times {σ j , j ∈ Z} are positive independent random variables with common distribution G, independent of {τ j , j ∈ Z}.Suppose that we observe the queue length (or: number of busy servers) X(•) restricted to the time interval [0, T ].The goal is to construct estimators of the service time distribution G and the expected service time with provable accuracy guarantees.
In case of complete data, where it can be seen when each customer arrives and leaves, the above estimation problems are trivial.The difficulties of the incomplete data problems lie in the fact that only macro level data is observed, i.e.only the number of busy servers are recorded.
Applications A non-exhaustive list of example applications of our analysis is given below: • Traffic: Toll plazas or sensors often count the number of cars entering and leaving a certain highway segment.Cars can overtake each other and therefore do not leave in the same order as they arrive.The problem is to determine the distribution of the speed of the cars over that segment.A solution that would make the problem trivial is to identify the car by recording its license plate.This comes with several downsides.For example, it could be costly to implement and there could be privacy concerns.There have been several attempts in the literature to solve this problem in another way, usually by trying to match vehicles at the upstream and downstream points based on non-unique signatures (for example, vehicle length), which can be obtained by a particular type of dual-loop detectors cf.e.g.[8].Our approach applies to the harder case where single-loop detectors are used, i.e.where only the vehicle counts at the upstream and downstream points are known.
• Communication systems: If two internet routers only track the timestamp of when a packet arrived, then what is the distribution of the packet flow duration between those routers?The importance of such statistical inverse problems was emphasized by [3].In particular, the recent paper [2] proposes a sampling framework for measuring internet traffic based on the M/G/∞ model.
• Biology: As noted in [18], the production of molecules may be 'bursty', in the sense that periods of high production activity can be followed by periods of low activity.
In [9], this is modeled using an interrupted Poisson process, i.e. a Poisson process that is modulated by a stochastic ON/OFF switch.The number of active molecules can then be modeled as the number of jobs in a modulated infinite-server queue.By expectation maximization and maximum likelihood techniques it is possible to filter the most likely ON/OFF arrival rate sample path, cf.[19].Subsequently, using such filtered paths, our methods can provide estimates of the lifetime distribution of molecules, when the molecules cannot be tracked separately but only the aggregate amount.
In all of these examples it would be unreasonable to assume that the arrival rate is constant.It is known that traffic is busier during rush hours, production of molecules is bursty, and internet activity is higher during the day than at night.Thus inhomogeneity of the Poisson process is an important feature in a variety of applications.

The paper contribution
In this paper we deal with the statistical inversion problem of estimating the service time distribution G and the expected service time µ G in the M t /G/∞ queue, using only observations of the queue-length process.We approach the problem as a statistical deconvolution problem and develop nonparametric kernel estimators based on Laplace transform techniques.Their accuracy is analyzed over suitable nonparametric classes of service time distributions.In particular, we derive upper bounds on the worst-case root mean squared error (RMSE) of the proposed estimators, and show how properties of the arrival rate function {λ(t), t ≥ 0} and service time distribution G affect the estimation accuracy.Furthermore, we provide details on the implementation of the estimators.For example, we describe an adaptive estimation procedure for the distribution function and confirm its efficiency by a simulation study.
Our results are based on a formula for the joint moment generating function of the queue-length process at different time instances.We provide a derivation of this result, which to the best of our knowledge has not appeared in the literature before and is of independent interest.Current literature Our contribution is related to two different strands of research.First, a similar type of statistical inference questions in queueing theory was studied before, but then for the homogeneous case, i.e. the M/G/∞ system; see, e.g., [7], [5], [22], [14], [15] and references therein.The analysis in case of the M t /G/∞ system is vastly different.This is due to the non-stationary nature of the M t /G/∞ queue, while the analysis for the M/G/∞ relied on stationary measures.Second, similar deconvolution problems, such as density deconvolution, have been studied in statistical literature; see, e.g., [25] and [13].However, this strand of research typically considers models with independent observations, and advocates the use of Fourier transform techniques.In contrast, our setting is completely different: the queue-length process in the M t /G/∞ model is intrinsically dependent, and Fourier-based techniques are not applicable since the arrival rate {λ(t), t ≥ 0} need not be (square) integrable.A connection can be drawn with [4] where a deconvolution problem of estimating intensity of a non-homogeneous Poisson process on a finite interval was considered.We also mention recent work [1] where Laplace transform techniques were applied for signal deconvolution in a Gaussian model.

Outline
The rest of the paper is organized as follows.In Section 2 we present the formula for the joint moment generating function of the queue-length process at different time instances, from which known results of [12] can be derived.In particular, the covariance of the queue length at different points in time can be found, which plays an important role in the subsequent statistical analysis.In Section 3 we formulate the estimation problems and introduce necessary notation and assumptions.Our main statistical results, Theorems 2, 3 and 4, that provide upper bounds on the risk of estimators of G and µ G are given in Sections 4 and 5. Section 6 discusses numerical implementation of the proposed estimators and presents simulation results.Proofs of main results of this paper are given in Section 7.

Properties of the queue-length process
In this section we derive some probabilistic results on properties of the queue-length process of the M t /G/∞ queue; these results provide a basis for construction of our estimators.
Let {τ j , j ∈ Z} be arrival epochs constituting a realization of a non-homogeneous Poisson process on the real line with intensity {λ(t), t ∈ R}.The service times {σ j , j ∈ Z} are positive independent random variables with common distribution G, independent of {τ j , j ∈ Z}.Assume that the system operates infinite time; then the queue-length process {X(t), t ∈ R} is given by The next result provides formulas for the Laplace transform of finite dimensional distributions of {X(t), t ∈ R}.
and for 0 ≤ a < b ≤ ∞ let Let t 1 < • • • < t m be fixed points, and let t 0 = −∞ by convention; then for any θ = (θ 1 , . . ., θ m ), m ≥ 1 one has where E G stands for the expectation with respect to the probability measure P G generated by the queue-length process {X(t), t ∈ R} when the service time distribution is G.
The following statement is an immediate consequence of the result of Theorem 1 for specific cases m = 1 and m = 2.
Corollary 1.For any t ∈ R ln E G exp{θX(t)} = (e θ − 1)H(t), i.e., X(t) is a Poisson random variable with parameter H(t).Moreover, for any and therefore The results in Corollary 1 are well known; they are proved, e.g., in [12, Section 1] using a completely different technique.We were not able, however, to locate formula (2) in the existing literature.

Estimation problems, notation and assumptions
In this section we formulate estimation problems, introduce necessary notation and assumptions and present a general idea for the construction of estimators of linear functionals of service time distribution G.

Formulation of estimation problems
Consider an M t /G/∞ queueing system with Poisson arrivals of intensity {λ(t), t ≥ 0}.Assume that n independent realizations X 1 (t), . . ., X n (t) of the queue-length process {X(t), t ≥ 0} are observed on the interval [0, T ].Using the observations X n,T := {X k (t), 0 ≤ t ≤ T, k = 1, . . ., n} we want to estimate the service time distribution G and the expected service time µ G := ∞ 0 [1 − G(t)]dt.In the sequel, we will be interested in estimating the value G(x 0 ) of G at a single given point x 0 > 0.
It is worth noting that the observation scheme of this paper, where n independent copies of the queue-length process are given, is quite standard in statistics of non-stationary processes.We refer, e.g., to [20] where nonparametric estimation of intensity of a nonhomogeneous Poisson process was considered.On the other hand, the accuracy of the estimator also increases as a function of λ 0 , where λ 0 scales the arrival rate.In other words, good estimations can be obtained even for n = 1, as long as the arrival rate is high enough.
By estimators G(x 0 ) = G(x 0 ; X n,T ) and μG = μG (X n,T ) of G(x 0 ) and µ G respectively we mean measurable functions of X n,T .Their accuracy will be measured by the worstcase risk over a set G of distributions G.In particular, for a functional class G the risk of G(x 0 ) is defined by while the risk of μG is defined by Our goal is to construct estimators of G(x 0 ) and µ G with provable accuracy guarantees over natural classes G of service time distributions.

General idea for estimator construction
It follows from Theorem 1 that expectation ] of the queue-length process X(t) is related to the service time distribution G via the convolution equation ( 1), Therefore estimating a linear functional θ(G) of G, such as G(x 0 ) or µ G , from observation of X(t) is a statistical inverse problem of the deconvolution type.
We will base construction of our estimators on the linear functional strategy that is frequently used for solving ill-posed inverse problems.The main idea of this strategy is to find a pair of kernels, say, K(x, t) and L(x, t) such that the following two conditions are fulfilled: (i) the integral K(x, t) Ḡ(t)dt approximates the value θ(G); (ii) the kernel L is related to K via the equation K(x, t) Ḡ(t)dt = L(x, t)H(t)dt.
In view of Corollary 1 and by condition (ii), the statistic L(x, t)X(t)dt is an unbiased estimator for the integral K(x, t) Ḡ(t)dt, which by (i) approximates the value θ(G).Thus, L(x, t)X(t)dt is a reasonable estimator of θ(G).

Notation and assumptions
In order to construct estimators in our settings we will use the Laplace transform techniques.For this purpose we require the following notation.
Notation.For a generic locally integrable function ψ on R the bilateral Laplace transform is defined by The region of convergence of the integral on the right hand side is a vertical strip in the complex plane; it will be denoted by ψ(t)dt, and the corresponding region of convergence is a half-plane The inversion formula for the Laplace transform is Assumptions on the arrival rate.As we will show in the sequel, estimation accuracy depends on the growth of λ(t) at infinity and on the rate of decay of the Laplace transform λ(z) over vertical lines in the region of convergence Σ λ .These properties of the arrival rate are quantified in the next two assumptions.
Several remarks on these assumptions are in order.Assumption 1 states growth conditions on λ: (a) allows exponential growth, while (b) is a polynomial growth condition.The important case of bounded λ is included in Assumption 1(b) and corresponds to p = 0.Here note that a 2 > 0 while a 1 ≥ 0, and a convenient normalization max{a 1 , a 2 } = 1 is used.The growth conditions (4) and ( 5) guarantee that λ is absolutely convergent in the half-plane Σ λ = {z ∈ C : Re(z) > σ λ } with σ λ > 0 and σ λ = 0 respectively.Assumption 2 states that λ does not have zeros in Σ λ .By the Riemann-Lebesgue lemma, λ decreases along vertical lines in Σ λ (see, e.g., [10, §23]), and Assumption 2 stipulates the decrease rate.Note that if γ > 1 in (6) then and λ(t) can be inverted from λ by integrating over any vertical line Re < ∞ for any σ > σ λ , and the Laplace inversion formula should be understood in the sense of the L 2 -convergence (cf.[24, Chapter 2, §10]).
The next examples demonstrate that Assumptions 1 and 2 hold in many cases of interest.

Estimation of the service time distribution
In this section we consider the problem of estimating the service time distribution.

Estimator construction
The estimator construction follows the linear functional strategy described in Section 3.2.
Let K be a fixed bounded function supported on [0, ∞).For real number h > 0 we define Since the convergence region of λ is The kernel K will be always chosen so that Σ K = C, and if Assumption 2 holds then λ(z) does not have zeros in Σ λ .Then function K(zh)/ λ(−z) is analytic in Σ L := {z ∈ C : Re(z) < −σ λ }, and kernel L h does not depend on real number s provided that s < −σ λ .
The next result reveals a relationship between kernels L h and K h (•) := (1/h)K(•/h) and provides motivation for construction of our estimator.(1), and assume that Proof.The first condition in (8) ensures that L h (t) is well defined and finite for each t.In view of the second condition, by Fubini's theorem Now we show that the definition (7) implies that function L h solves the equation Indeed, the bilateral Laplace transform of the left-hand side is On the other hand, It follows from the definition of L h (t) and the inversion formula for the bilateral Laplace transform [cf.[24, Chapter VI, §5]] that L h (z) = K(zh)/ λ(−z) for all z in a region where Now we are in a position to define the estimator of G(x 0 ) based on n independent realizations X 1 (t), . . ., X n (t) of the queue-length process {X(t), t ≥ 0} observed on the interval [0, T ].Let {t (k) j } be an ordered set of the departure and arrival epochs of realization X k (t), k = 1, . . ., n, so that X k (•) is constant in between any two sequential epochs.The estimator of G(x 0 ) is defined as follows The construction depends on the design parameter h that will be specified in the sequel.

Upper bound on the risk
Now we study the accuracy of the estimator Gh (x 0 ) defined in (10).The risk of the estimator Gh (x 0 ) will be analyzed under local smoothness and moment assumptions on the probability distribution G.In particular, we will assume that G belongs to a local H ölder class and has bounded second moment.

Definition 1.
Let A > 0, β > 0 and d > 0. We say that a function f belongs to the class We denote also Let K be a kernel supported on [0, 1] and satisfying the following conditions: (K1) For a fixed positive integer m (K2) For a positive integer number r kernel K is r times continuously differentiable on R and Note that conditions (K1)-(K2) are standard in nonparametric kernel estimation; see, e.g., [23].
Remark 2. Theorem 2 considers asymptotics as n → ∞.Another natural asymptotic regime is the heavy traffic limit when the scale parameter λ 0 of the arrival intensity tends to infinity while n = 1.An inspection of the proof shows that the result of Theorem 2 remains true if asymptotics n → ∞ with fixed λ 0 is replaced by λ 0 → ∞ with fixed n.

Remark 3.
In general the rate of convergence ϕ n may depend on x 0 , and this dependence is primarily determined by behavior of the arrival rate at infinity.In particular, if λ(t) increases exponentially then the accuracy is proportional to (e 2σ λ x 0 ) β/(2β+2γ+1) and deteriorates rapidly with growth of x 0 .Note however that if the λ is bounded (here σ λ = 0 and p = 0) then the upper bound does not depend on x 0 .
Remark 4. In cases σ λ > 0 and σ λ = 0, p = 0 the statement of the theorem remains intact if boundedness of the second moment of G in the definition of Hβ,x 0 (A, M ) is replaced by the boundedness of the first moment.This is not so in the case σ λ = 0, p > 0: if boundedness of the first moment only is assumed then the dependence on x 0 is worse for large x 0 as the bound becomes proportional to x p+1 0 .

Estimation of the expected service time
In this section we consider the problem of estimating the expected service time

Estimator construction
For real number b > 1/4 (where 1/4 is chosen arbitrarily) let ψ b (t) be an infinitely differentiable function on R such that Since ψ b is infinitely differentiable, the Laplace transform ψ b is an entire function.Define where σ λ is the abscissa of convergence of the Laplace transform λ of λ.The following statement is a key result on properties of the function The proof of Lemma 2 is omitted as it goes along the same lines as the proof of Lemma 1.Note that the integral on the right hand side of the previous formula for large b approximates µ G ; this fact underlies the construction of our estimator.
The estimator of µ G is defined as follows The estimator depends on the tuning parameter b which will be specified in what follows.

Upper bounds on the risk
The next two statements establish upper bounds on the risk of μG under different growth conditions on the arrival rate, Assumptions 1(a) and 1(b) respectively.
Theorem 3. Let Assumptions 1(a) and 2 hold, and let G ∈ G (M ).Let μG be the estimator ( 13) associated with parameter and C 1 is a positive constant depending on γ and k 0 .
where C 2 is a positive constant depending on γ, k 0 and p only.
Remark 5.The results of Theorems 3 and 4 hold without any conditions on smoothness of G.In particular, G may be a discrete distribution with bounded second moment.Remark 6. Theorems 3 and 4 demonstrate that accuracy of μG is primarily determined by the growth of the arrival rate at infinity.In particular, if the arrival rate increases exponentially, i.e. σ λ > 0, then the risk of μG converges to zero at a slow logarithmic rate.At the same time, if the arrival rate is bounded, i.e. σ λ = 0 and p = 0, then the risk tends to zero at the parametric rate.A close inspection of the proof shows that growth of the arrival rate manifests itself in the growth of the variance of X(t) which, in its turn, affects the rate of convergence.

Implementation and numerical examples
In this section we provide details on implementation of the proposed estimators.In particular, we discuss numerical methods for calculating kernels L h and J b and an adaptive scheme for bandwidth selection.We also conduct a small simulation study in order to illustrate practical behavior of the proposed estimators.

Implementation issues
In our simulations we consider a Gaussian kernel, i.e.
, and thus K(z) = e z 2 /2 , for all z ∈ C. The reason we use this kernel is that in some cases L h can be computed explicitly.If such analytical expressions are not available, we resort to numerical integration and inversion techniques.For consistency, we still use the Gaussian kernel in the numerical cases, even though this is obviously not necessary.
It is computationally somewhat expensive to implement the theoretical kernel J b in (12) as covered in Theorems 3 and 4 because the integral of ψ 0 does not have an explicit form, and neither does its transform.From that point of view it is more attractive to work with a slightly different kernel J b , in particular where K is a standard Gaussian kernel.In some cases we let ψ to be a convolution of the indicator function 1 [0,b+x 1 ] (•) and (1/h)K(• − x 1 /h) for some x 1 < 0. The shift to the left with −x 1 is there in order to get rid of some bias caused by the boundary effects as h approaches zero.

Exact calculation of kernels L h and J b
Before we consider cases where only numerical calculation of kernels is possible, we present the following examples where explicit expressions for kernels L h and J b are available.
Example 5 (Constant arrival rate).Suppose that λ(t) = λ 0 1 [0,∞) (t); then λ(−z) = −λ 0 /z, for Re(z) < 0. It can be checked that the assumptions in Lemma 1 are satisfied, hence L h is well defined, for c < 0, by To estimate the expected value of service times, we consider the kernel J b , which in this case equals This can be seen as the difference of two Gaussian kernels: one centered at b and one at 0. As h ↓ 0, it converges to the difference of Dirac delta functions centered on b and 0. In other words, for h ↓ 0, the estimator of the mean service time is In this special case, it is also obvious how such an estimator could be obtained directly by observing that Example 6 (High/low switching).Suppose that the arrival rate repetitively switches between a high and low arrival rate λ 0 , λ 1 after each time unit.Mathematically we can write this as Elementary calculus yields λ(z) = (λ 0 + λ 1 e −z )z −1 (1 + e −z ) −1 .In the special case that the low rate is equal to zero, the kernels L h and J b can be calculated explicitly.Suppose , from which the estimator of G(x 0 ) follows.Furthermore, it can be calculated that .
Following the same logic as in the previous example and by shifting the kernel J b to the left by 1 unit to get rid of bias, we see that (as h ↓ 0) Example 7 (Linearly increasing arrival rate).Suppose that λ(t) = λ 0 t 1 [0,∞) (t); then λ(z) = λ 0 z −2 and therefore Furthermore, with similar calculations as before, we find In other words, J b is a differentiation kernel centered around b, minus a differentiation kernel centered at 0. It can also be argued intuitively that differentiation plays a role for the estimator.Indeed, note that and for b large the last integral is an approximation for µ G .

Numerical evaluation of kernels L h and J b
For some arrival rates λ(•) there is no analytic form for L h and J b .In that case, one needs to resort to numerical inversion.Following [11], we discretize the Bromwich integral with a trapezoidal rule.Although it is assumed in [11] that the inverse transform is zero on the negative half-line, the algorithm still works if this assumption does not hold.For completeness, we provide the following inversion formula (L h and L h can be replaced by J b and J b , respectively): where the error is controlled by the parameters n max , c, T and the machine precision.
When applying this formula it is important that c is chosen in the strip of convergence of L. For example, if L(z) is defined for z such that Re(z) < 0, we take c small but c < 0.
We stuck to the guideline that |c T | ≈ 30.In principle, picking c closer to the edge of the strip should improve accuracy.However, when c is too close to the edge of the strip, the solution will become numerically instable.For more details on picking the right parameters and bounds on the error, cf.[11].

Adaptive scheme for bandwidth selection
The bandwidth h controls the trade-off between bias and variance when estimating the distribution function: lower h corresponds to high variance and low bias, and vice versa.
In applications, smoothness of G is unknown, so the bandwidth choice given in Theorem 2 is not feasible.In our simulations we implemented a variant of Lepski's [21] adaptive procedure proposed in [16].
The adaptive bandwidth scheme is as follows: 1. Pick a minimum bandwidth h min > 0, and define h i := (1 + α) i h min for α > 0; 2. Estimate the variance of Gh i (x 0 ) and denote the estimate by v 2 h i ; 3. Define I h i as the interval Gh 4. The estimator will be the middle of the last interval I j * that is nonempty, i.e. j * := max j : Note that the variance of Gh (x 0 ) is given by where R(t, τ ) := cov G [X(t), X(τ )] is determined by (3).We discretize the double integral and estimate the covariance function by its empirical counterpart.In particular, we define a uniform time grid {t i : , and a covariance matrix R ∈ R N ×N , such that element ij gives the sample covariance between X(t i ) and With this notation, the variance of the estimator is approximated by   1), the estimator of 1 − G(1) = e −1 ≈ 0.368, which are plotted next to it.Furthermore, we plotted the kernel L h (t), for several values of h as indicated in the captions.For Case 1, we also included a boxplot to show the performance of these 50 runs for x 0 ranging from 0.5 to 3.5.

Simulation results for estimating the distribution function
From the figures we note the following.Firstly, the RMSE is clearly smaller in Case 1a compared to 1b.This was to be expected because γ = 0 in Case 1a, and γ = 1 in Case 1b, which leads to inferior worst case minimax convergence rates in theory.Note that in Case 1b the estimators are often not even within [0, 1].In Case 1a we see some slight bias for smaller x 0 : it appears that estimation for x small in this case is relatively hard, which could be due to the fact that G(x) is steeper for small x.The L h kernel in Case 1a seems quite similar to the standard Gaussian differentiation kernel that would be used in the case of constant λ(•) (when λ(•) ≡ λ 0 is constant, the derivative of t 0 λ 0 Ḡ(s) ds equals λ 0 Ḡ(t)).However, there is an additional 'bump' that compensates for the fact that λ(•) is a cosine function, rather than a constant.

Simulation results for estimating the mean
We consider the following numerical settings.2a.Take λ(t) = 1 + sin(t) and G(x) = 1 − e −x , x 1 = −1, h = 0.08 (as small as possible so that the numerical inversion algorithm is still stable, with parameters T = 30, 2b.Let λ be the on-off rate of Example 6 and suppose that service times are uniformly distributed on [0, 2], such that µ G = 1.We take b = 25 in estimator (14).
In Case 2a, of which the results are shown in Figure 3, we see that the kernel is approximately a delta function at b + x 1 = 24 (as in the constant λ(•) case), with some adjustment that takes care of the fact that the arrival rate is a sine function.The histogram shows the distribution of μG , each based on a single observation.Note that the distribution is approximately normal, with some skew to the left, with mean 1.03 and standard deviation 0.73.Although the standard deviation seems quite large, one must take into account that the kernel J b is almost zero outside of [15,25].During those 10 time units there are approximately 10 arrivals.Note that the kernel J b in Case 2a is again quite similar to kernel J b in case of constant λ(•) (as in Example 5): there is a Dirac delta function at b + x 1 , and in addition to that there is a wave preceding the Dirac delta function to somehow compensate for the fact that λ is not constant.
In Case 2b we used the explicit kernels as calculated in Example 6.The bias turns out to be negligible, and the standard deviation is approximately 1 √ n after n observations.

Proof of Theorem 1
The proof is similar to the one of Proposition 1 in [14].By conditioning on {τ j , j ∈ Z} we obtain that and our current goal is to evaluate the integral on the right hand side of the last equality.Denote this integral by S m : We derive (2) by iterating (15); we have For any k ≥ 1 this yields Then we obtain where by convention we put t 0 = −∞.Taking into account that S 1 = (e θ 1 − 1)H(t 1 ) and using (16) we complete the proof.

Auxiliary results
First we state two auxiliary results that are used in the subsequent proofs.

Proof of Theorem 2
Throughout the proof c 1 , c 2 , . . .stand for positive constants that may depend on β, γ, k 0 , and p only.In particular, they do not depend on n, T , and on the parameters x 0 , A, M , σ λ and λ 0 .The proof is divided in steps.The numbering of constants at each step of the proof starts anew, so that c 1 , c 2 , . . .are different on different appearances.1 0 .First we show that under the premise of the theorem the estimator is well-defined.Since supp(K) = [0, 1], K is an entire function.For any σ ∈ R K(σ + iω) = 1 0 K(t)e −(σ+iω)t dt, and, by condition (K2), kernel K is r times continuously differentiable; hence repeatedly integrating by parts we obtain that for any ω and σ In addition, in view of (K2), for any ω and σ By Assumption 2, λ(z) does not have zeros in the half-plane {z ∈ C : Re(z) > σ λ }.This implies that K(zh)/ λ(−z) is an analytic function in {z ∈ C : Re(z) < −σ λ }, and integration in ( 7) can be performed on any vertical line Re(z) = s with s < −σ λ .
Since G ∈ Hβ,x 0 (A, M ), it holds that H(t) ≤ M λ 0 e σ λ t when σ λ > 0, and H(t) ≤ M λ 0 (a 1 + a 2 t p ) when σ λ = 0 [cf.(1)].Combining this with ( 23) we obtain where we put Thus the estimator Gh (x 0 ) is well defined, and conditions (8) of Lemma 1 hold. 2 0 .Now we establish an upper bound on the bias of Gh (x 0 ).By Lemma 1 and conditions (K1)-(K2) we have If σ λ = 0, p > 0 and δT ≥ 2p then by Lemma 4 where Combining these inequalities we obtain 3 0 .The next step is to bound the variance of Gh (x 0 ).Let R(t, τ ) := cov G X(t), X(τ )]; then where the first inequality is due to Cauchy-Schwarz and symmetry of the covariance function in its arguments.Our current goal is to bound from above the expression on the right hand side of (25).We consider the cases σ λ > 0 and σ λ = 0 separately.where in the last line we have used Parseval's identity.By Assumption 2 and in view of ( 21) and ( 22) , where we have taken into account that δ < 1/h.This leads to the following bound on the variance var G Gh (x 0 ) ≤ c 3 (σ λ + δ) −1 M e 2(σ λ +δ)(x 0 +h) (λ 0 nh 2γ+1 ) −1 .
(26) (b).Now consider the case σ λ = 0.In this case (25) and Lemma 3 show that with some appropriately large absolute constant c 1 .Note that δ * → 0 as n → ∞ because 1 T ln(λ 0 n) → 0 as n → ∞.With the choice of h = h * and δ = δ * , it is straightforward to verify in ( 24) that the bias is bounded above by a multiple of Ah β * .Then substituting the expressions for h * and δ * in ( 24) and (26) and taking the limit as n → ∞ we obtain (11) for the case σ λ > 0.

Proof of Theorems 3 and 4
We provide a unified proof of Theorems 3 and 4. Each step of the proof considers two cases: σ λ > 0 (corresponds to the proof of Theorem 3) and σ λ = 0 (corresponds to the proof of Theorem 4).
Throughout the proof c 1 , c 2 , . . .stand for constants depending on γ, k 0 and p only.In particular, they do not depend on M , λ 0 and σ λ .In order to avoid additional technicalities in the sequel we assume that γ is integer.

Figure 1
Figure1corresponds to Case 1a, and Figure2corresponds to Case 1b.In these figures (from left to right, top to bottom), we plotted the RMSE as function of n, which is based on 50 runs of 1 − G(1), the estimator of 1 − G(1) = e −1 ≈ 0.368, which are plotted next to it.Furthermore, we plotted the kernel L h (t), for several values of h as indicated in the captions.For Case 1, we also included a boxplot to show the performance of these 50 runs for x 0 ranging from 0.5 to 3.5.From the figures we note the following.Firstly, the RMSE is clearly smaller in Case 1a compared to 1b.This was to be expected because γ = 0 in Case 1a, and γ = 1 in Case 1b, which leads to inferior worst case minimax convergence rates in theory.Note that in Case 1b the estimators are often not even within [0, 1].In Case 1a we see some slight bias for smaller x 0 : it appears that estimation for x small in this case is relatively hard, which could be due to the fact that G(x) is steeper for small x.The L h kernel in Case 1a seems quite similar to the standard Gaussian differentiation kernel that would be used in

Figure 3 :
Figure 3: Results corresponding to Case 2a.On the top left we plotted the kernel J b and on the top right we plotted μG based on a single observation, for 10 5 simulation runs.

Figure 4 :
Figure 4: Results corresponding to Case 2b.We plotted the histogram of estimators based on a single observation.All estimates based on a single observation are obviously integer valued.The mean of this sample is 0.999, with a standard deviation of 0.998.