Two-parameter Sample Path Large Deviations for Infinite Server Queues

Let $Q_{\lambda}(t,y) $ be the number of people present at time $t$ with $y$ units of remaining service time in an infinite server system with arrival rate equal to $\lambda>0$. In the presence of a non-lattice renewal arrival process and assuming that the service times have a continuous distribution, we obtain a large deviations principle for $Q_{\lambda}(\cdot) /\lambda$ under the topology of uniform convergence on $[0,T]\times\lbrack0,\infty)$. We illustrate our results by obtaining the most likely path, represented as a surface, to ruin in life insurance portfolios, and also we obtain the most likely surfaces to overflow in the setting of loss queues.


Introduction
The asymptotic analysis of queueing systems with many servers in heavy-traffic has received substantial attention, especially in recent years. Among the earliest references that come to mind in connection to this topic is the work of [7] on heavy-traffic limits for the infinite-server queue. Another highly influential paper in the area is [6] in the context of many server Markovian queues, which introduced a scaling that is now known as the "Quality and Efficiency Driven" regime. The ideas in these papers have fueled more recent results in the asymptotic analysis of many server systems such as: [12], [8], [13], [9], [10], in the setting of many server queues, and [5], [2], [11], [14], in the setting of the infinite server queue. The asymptotic analysis of queueing systems with many servers has been motivated by applications in service engineering, in particular in the context of call centers and health-care operations. Another set of application areas that is also very relevant, but that is infrequently mentioned in the analysis of many server systems is that of insurance mathematics. It is clear, for instance, that a portfolio of insurance policies can be directly modeled as an infinite server system; casting insurance portfolios in this framework is particularly appealing in the setting of life insurance as we shall illustrate in Section 5.
So far most of the asymptotic analysis of many server systems has concentrated mainly on fluid and heavy-traffic approximations. Meanwhile, the literature on large deviations analysis for many server queues is not as extensive as that of fluid or diffusion approximations; despite the fact that it is clearly of interest to understand the large deviations behavior of these types of systems. For instance, consider the consequences of dropping calls in an emergency call center or being unable to satisfy the demand for critically ill patients in the context of health-care applications. In the insurance setting, for instance, it is of interest to estimate ruin probabilities and, perhaps even more importantly, understanding the most likely path (or set of paths) to ruin. Risk theory typically concentrates on ruin probabilities for aggregated models, such as the classical ruin model (see [1]); the results in this paper, as we shall illustrate, provide a systematic way for assessing ruin probabilities for a class of bottom-up models.
Our main contribution in this paper is to provide the first sample-path large deviations analysis of the state descriptor of the infinite server queueing model in heavy-traffic (i.e. as the arrival rate increases to infinity without introducing any scaling on the service times). The statement of our main result, which is given in Theorem 1, features a convenient representation of a good large deviations rate function, under a strong topology, that later we use in some applied settings of interest. For instance, we compute the most likely path to overflow in a loss system, and also the most likely path to ruin for a life insurance portfolio that embeds an infinite server queue with a particular service cost structure. It is important to emphasize that our results take advantage of a convenient representation of the system's description that facilitates the representation of the rate function; detailed discussion on this system's representation is given in Section 2.1. Previous large deviations analysis of the infinite server queue has concentrated on queue-length characteristics only; see, for instance, [4] who develops large deviations for marginal quantities in the case of renewal arrivals, and [15] who develops sample path large deviations for the queue length process of infinite server queues in tandem in the case of Poisson arrivals.
Our large deviations analysis complement results on fluid analysis and diffusion approximations recently obtained for infinite server systems. For example, [11] have shown that the state descriptor of the infinite server queue, suitably parameterized in terms of a two-parameter stochastic process, converges after centering and re-scaling to a Gaussian process; see also [14] who interpret the state descriptor of the infinite server queue as a measure valued process acting on the space of tempered distributions. These recent results, in turn, extend prior work by [5] in the context of discrete and bounded service time distributions, and [2] for the case of Poisson arrivals.
The analysis of the infinite server queue is important as it serves as a building block for other models of interest. For instance, in the setting of loss models one can clearly couple the loss systems with associated infinite server systems, and in the setting of many server queues [13] shows how one can precisely understand queues with multiple servers as a perturbation of infinite server queues. Furthermore, the infinite server model is a classical model in queueing theory that serves as direct model in important applications. Of particular interest to us, as mentioned earlier, are the applications to insurance mathematics.
The rest of the paper is organized as follows. In Section 2 we introduce our problem setting and provide a statement of our main result. This is a fundamental section and it is divided into three parts. We first shall introduce our assumptions and define our notation. Then we provide the precise mathematical statement of our result, and, finally, we will provide a heuristic argument that allows us to gain some intuition behind our main result. The next two sections then provide the proof of our main result. We first show our result for bounded service times in Section 3. Then, in Section 4, we apply a truncation argument to extend our result to unbounded service times. Finally, in Section 5 we apply our result to computing the most likely paths to rare events in the setting of loss queueing systems and also in the setting of ruin probabilities for large life insurance portfolios.

Assumptions, Notation, and Main Result
The purpose of this section is threefold. First we shall clearly state our assumptions and introduce necessary notation for our development. Second, we shall explain the main large deviations result and provide a heuristic derivation of the rate function that we obtain. Finally, we shall provide a road map for the strategy behind the proof which will be presented in subsequent sections.

Assumptions and Notation
We shall describe an underlying system corresponding to an arrival rate λ. We call the system with λ = 1, i.e. one customer per unit time, our "base system"; eventually we shall send λ to infinity in our asymptotic analysis. We collect our assumptions as follows.
Assumptions and notation concerning the arrival process. For the base system, we assume the interarrival times are non-lattice, i.i.d., positive random variables (U n : n ≥ 1) with finite exponential moments in a neighborhood of the origin; in precise words, κ(θ) := log Ee θUn < ∞ for some θ > 0. In our λ-scaled system, the arrivals come λ times faster (i.e. the n-th interarrival times becomes U n /λ). The associated logarithmic moment generating of the λ-scaled service times is then κ λ (θ) := log Ee θUn/λ = κ(θ/λ) < ∞ for some θ > 0.
The time at which the n-th arrival occurs in the base system is A n = U 1 + ... + U n for n ≥ 1. We simply define A 0 := 0 and then let N (t) := max{n ≥ 0 : A n ≤ t} be the number of arrivals that have occurred up to time t in the base system. It is important to keep in mind that N (·) increases by one unit at discontinuity points since we are assuming that the U n 's are positive.
Eventually, we shall increase the arrival rate, so it is sensible to define N λ (t) := N (λt). Define the so-called infinitesimal logarithmic moment generating function for the arrival process via ψ N (θ) = −κ −1 (−θ) (see [4]). This definition is motivated by the fact that for any δ > 0. Since the U n 's are positive with probability one we have that ψ N (·) is continuous and strictly convex on the positive line. We also assume that ψ N (·) is continuously differentiable throughout R. This assumption is satisfied for most arrival processes, certainly for interarrival times that are strictly positive and such that sup{κ (θ) : κ (θ) < ∞} = ∞.
Assumptions and notation concerning the service times. We assume that the n-th customer that arrives to the base system (i.e. at time A n ) brings up a service requirement of size V n . The sequence (V n : n ≥ 1) is assumed to be i.i.d. We write F (x) = P (V n ≤ x) to denote the associated distribution function evaluated at x, and setF (x) := 1 − F (x) to be the tail distribution. Moreover, we assume that F (·) is continuous.
Two-parameter representation of system status. LetQ λ (t, y) denote the number of customers who arrived before or at time t and leave after time y in the λ-scaled system. In other words,Q We shall assume that the system is initially empty at the beginning. This is done for simplicity. Since we have infinitely many servers, we can incorporate the initial configuration by keeping track of its evolution independently of what occurs subsequently. Given our assumption of an initial empty system we then have thatQ λ (0, u) = 0 for all u ≥ 0. Note that for t ∈ [0, T ] and y ≥ 0, It is worth comparing the current system representation with the more common one involving the quantity Q λ (t, u) defined as the number of customers in the system currently at time t who have residual service time larger than u ≥ 0; more precisely, Q λ (t, u) =Q λ (t, u + t). These two system representations are equivalent in the sense that (Q λ (t, u) : t ∈ [0, T ], u ≥ 0) encodes the evolution of the infinite server systems and thus, such evolution can be used in principle to retrieve (Q λ (t, u) : t ∈ [0, T ], u ≥ 0). We have chosen the representation based onQ λ to facilitate the representation of the rate function; a more detailed discussion is given towards the end of Section 2.2.1. In addition, the representation based onQ λ allows to obtain a rich large deviations principle to which one can apply the contraction principle directly to several continuous functions of interest. For instance, it follows immediately that the arrival process N λ (t) =Q λ (t, 0), and the departure process, D λ (t) := N λ (t) −Q λ (t, t) are continuous functions under the topology that we consider (and that we shall discuss in the next paragraphs). More applications of the contraction principle will be discussed in Section 5.
Discussion about the topological space. Let D = {(t, y) : 0 ≤ t ≤ T, y ≥ 0} and let us write || · || C to denote the supremum norm over any set C. The space of functions that we consider for our large deviations principle shall be denoted by L +,∞ (D) and it corresponds to bounded functions with domain in D, such that x (0, u) = 0 for u ≥ 0, x (t, ·) is non increasing, and x (t, ·) vanishes at infinity. We will develop the large deviations principle for the family of stochastic processes (Q λ /λ : λ > 0) on the space L +,∞ (D) endowed with the topology generated by the supremum norm. Following [3] p. 4, the probability measures in path space in our development are assumed to have been completed.
Our large deviations principle forQ λ /λ immediately implies in particular a large deviations principle in the Skorokhod topology in the space D D R [0,∞) [0, T ] which is the space of right-continuouswith-left-limits (RCLL) functions x, with domain on [0, T ], that take values on the space of RCLL functions taking values on R. That is, on each time point . This is precisely the topology considered in [11], who also provide a discussion on the benefits of using this topology relative to other natural (but weaker) alternative options (see Section 2.3 in [11]).
An alternative approach that one might consider given the available results on functional weak convergence analysis of the infinite server queue, such as [14], is to interpret the space descriptor of the infinite server queue as acting on the space of tempered distributions. We believe, however, that this approach, although elegant, has important limitations in terms of assumptions and the class of functions to which the contraction principle can be directly applied to obtain other large deviations principles of interest.

Statement of Our Main result.
We are now ready to state our main result. Letq := (q (t, y) : (t, y) ∈ D) ∈ L +,∞ (D). We say that q ∈ AC + (D) if the following conditions hold: i)q is absolutely continuous on D, and ii) ∂ 2q (t, y)/∂t ∂y = 0 almost everywhere for (t, y) ∈ {(t, y) : 0 ≤ y ≤ t ≤ T }, iii)q (0, y) = 0 for y ≥ 0. Ifq ∈ AC + (D), then we let I (q) be defined via and for each closed set C As an immediate corollary we obtain, as mentioned earlier, a large deviations principle for (Q λ /λ : λ > 0) under the Skorokhod topology in the space D D R [0,∞) [0, T ], discussed in the previous section and introduced in [11].
We shall explain the strategy behind the proof of Theorem 1. First, we shall introduce an auxiliary continuous process,Q λ /λ, which shall be defined later in Section 2.3. We will show thatQ λ /λ is exponentially equivalent toQ λ /λ. Second, in addition to the assumptions imposed in Section 2.1 we will assume that there exists a deterministic constant M ∈ (0, ∞) such that P (V n ∈ [0, M ]) = 1. In the third and last part of the argument we will relax this truncation assumption.
In turn, the first part of the argument (i.e. assuming truncation) is divided into several steps. The first step consists in developing the large deviations principle forQ λ /λ with rate I (·) under the topology of pointwise convergence using the Dawson-Gartner projective limit theorem. The second step involves showing thatQ λ /λ is exponentially tight as λ → ∞ under the uniform topology on the compact set [0, T ] × [0, M ]. The third and last step involves lifting the large deviations principle to the uniform topology.
During the second part we introduce an approximation scheme that proceeds by ignoring the customers that arrive to the system with a service time larger than K. Using a coupling argument, the process that is obtained using this scheme is shown to be a good approximation to the original system for the purpose of computing large deviations probabilities.
However, before we do this let us provide a heuristic argument in order to guess the form of the rate function. Later we will explain what are the technical difficulties that need to be addressed.

Guessing the Rate Function: A Heuristic Approach
One can take advantage of the point process representation of the input process (i.e. the arrivals and the service times represented as a marked point process). Let us start with the case of Poisson arrivals. We shall briefly explain how to adapt the development that follows to the more general case of renewal arrivals. Consider the scaled system with arrival rate λ and suppose that F (·) has a density f (·). The amount of customers that arrive during the time interval [t, t + dt] that bring a service requirement of size [r, r + dr] is denoted by the quantity M λ (t + dt, r + dr), which is governed by a Poisson distribution with rate λf (r) dtdr. It follows then by elementary considerations involving the Poisson distribution that M λ (t + dt, r + dr) /λ satisfies a large deviations principle in the real line. In particular, we formally obtain that P (M λ (t + dt, r + dr) /λ ≈ µ (t, r) dtdr) = exp (−λJ (µ (t, r)) dtdr) , and ψ N (η) = exp (η)−1. The supremum above is obtained formally with η * (t, r) := log(µ (t, r) /f (r)). So, by pasting independent regions of the form [t, t + dt] × [r, r + dr] together one expects that the Poisson random measure M λ (·) /λ would satisfy a large deviations principle under a suitable topology, so that Now, observe that for all y ≥ t Q λ (t, y) = M λ (s + ds, r + dr) , andQ λ (0, y) = 0 for y ≥ 0. Now, consider Note that Therefore, ifq(·, ·) is absolutely continuous andq(0, y) = 0 for y ≥ 0, so that representation (5) is applicable, one can formally compute the rate function ofQ λ (·, ·) /λ evaluated atq(·, ·) by evaluating J (µ) with t=s,y=s+r for s ∈ [0, T ] and r ∈ [0, ∞). In particular, this analysis yields that I (q) should satisfy which is, of course, equivalent to (2) in the Poisson case assuming the existence of a density f (·) for the distribution of the service times. The previous form of the rate function was heuristically obtained assuming that y ≥ t. However, since all the information of the infinite server queue is contained in the evolution of (Q λ (t, y) : t ∈ [0, T ], u ≥ 0) and Q λ (t, u) =Q λ (t, t + u), we must have that the rate function should be specified only overq (t, y), For the non-Poisson case one can argue using renewal arguments. We need to compute the log-moment generating function of the vertical strip (M λ (t + dt, r i + dr) : 1 ≤ i ≤ n), where r 1 < r 2 < ... < r n for an arbitrary partition (r i : 1 ≤ i ≤ n). We obtain, using elementary properties of the multinomial distribution together with an application of the key renewal theorem as in [4], as λ → ∞. So, by pasting together vertical strips (i.e. ranging the parameter t) we obtain that the family of random measures M λ (·) /λ is expected to satisfy a large deviations principle under a suitable topology with rate function The rest of the formal analysis proceeds similarly as in the Poisson case. The formal argument just outlined, even if heuristic, suggests a potential approach to developing sample path large deviations forQ λ /λ. Namely, first develop a large deviations for the random measures M λ (·) /λ, and then apply the contraction principle to obtain the desired large deviations result forQ λ /λ. This approach, although intuitive, will not be followed in our development. We found it easier to directly work with the topology that we wish to impose. Part of the problem involved in making the argument based on random measures rigorous in the setting of the topology that is of interest to us is that indicator functions are not continuous, so the contraction principle is not directly applicable if one is to endow the space of measures with the weak convergence topology. Of course, one can proceed by trying a different topology (stronger than weak convergence) or by trying to use the extended contraction principle. However, the technical development, we believe, would end up being more involved than the direct approach that we will follow.
An additional concern that might arise at this point is our selection ofQ λ /λ in order to represent the system status; as opposed to Q λ /λ, which might appear more natural at first sight. Let us explain whyQ λ /λ is a more convenient object to consider. Note that if q (s, r) =q(s, s + r), then Since Q λ (t, u) /λ =Q λ (t, u + t) /λ and our heuristic analysis suggests that the candidate rate function ofQ λ (t, y) /λ is given by it is then sensible to conjecture, making y = u + t, a representation based on This representation, in turn, suggests that for the rate function to be finite at q (·), one might need to impose as a necessary condition the existence of ∂ 2 q (t, u) /∂ 2 u. Nevertheless, as we shall see in our examples, one might have a finite-valued rate function even in cases in which ∂q (t, ·) /∂u is not even continuous for every value of t ∈ (0, T ).

Construction of an Auxiliary Continuous Process
In order to prove Theorem 1 we introduce an auxiliary approximating continuous process,Q λ , which shall be shown to be exponentially equivalent to the process of interestQ λ in the uniform norm. The construction ofQ λ will be based on polygonal interpolations, so it will be convenient to introduce some notation. First, given (t, y) and (t , y ) where t = t we write Q λ (t, y) ↔ Q λ (t , y ) to denote the straight line that joins the points (t, y, Q λ (t, y)) and (t , y , Q λ (t , y )) in the associated three-dimensional space.
The next step is to join the end points of these straight lines to the end points of adjacent (suitably matched in the time axis) end points of straight lines in order to form segments of adjacent planes. In order to do this matching note that for each successive t i and t i+1 , either Q λ (t i+1 , ·) has one less discontinuous point than Q λ (t i , ·) (i.e. a departure occurs at t i+1 ) or one more discontinuity point (i.e. an arrival occurs at t i+1 ); the exception is the last segment from t m to t m+1 = T , where there might be no difference between the number of discontinuity points between Q λ (t m , ·) and Q λ (t m+1 , ·). Note that batch arrivals are not possible since the interarrival times are positive.
According to the notation introduced earlier for discontinuity points, Suppose a departure occurs at time t i+1 . Then we can label the discontinuous points of , and also the set of straight lines ). These three sets describe a series of adjacent triangular planar sections which jointly form a continuous surface.
Similarly, suppose that an arrival occurs at time t i+1 . Then we can label the discontinuous ). These three sets of straight lines, once again describe a series of adjacent triangular planar sections which jointly form a continuous surface. The last time interval from t m to T is dealt with similarly, with perhaps one less triangle formed if n (t m ) = n (T ).
The continuous function (Q * λ (t, y) : 0 ≤ t ≤ T, y ≥ 0) is defined by concatenating all these adjacent triangular planar regions as one varies t i and t i+1 for i ∈ {0, 1, ..., m}, and setting Q * λ (t, y) = 0 for the region where y is beyond the boundary of the last triangular plane i.e. beyond the lines Q λ (t i , y n(t i ) (t i )) ↔ Q λ (t i+1 , y n(t i+1 ) (t i+1 )), i ∈ {0, 1, ..., m}. It is immediate from the previous construction, and the fact that Q λ (t, ·) is non increasing, that Q * λ (t, ·) is also non increasing for each t ∈ [0, T ].
Then, we define our auxiliary processQ λ (t, y) for y ≥ t viã In order to defineQ λ (t, y) for 0 ≤ y ≤ t ≤ T , first letÑ λ (·) be the continuous process obtained by the polygonal interpolation of N λ (·), so thatÑ where || · || D represents the uniform norm over the set D.

Bounded Service Times
In addition to the assumptions imposed in Section 2 here we also assume that P (V n ∈ [0, M ]) = 1 for M ∈ (0, ∞).
We start by deriving a large deviations principle in the topology of pointwise convergence. The proof of this result will be given at the end of this section. Lemma 1. Let X consist of all the maps from D M to R, and we equip X with the topology of pointwise convergence on D M . ThenQ λ /λ satisfies a large deviations principle with good rate function I(q) defined by In order to lift the large deviations principle indicated in Lemma 1 to the uniform topology we need the following result on exponential tightness; we shall also give the proof of this result at the end of this section.
Lemma 2.Q λ /λ is exponentially tight in C + (D M ) equipped with the topology of uniform convergence.
Using the previous two lemmas we are ready to state and prove the main result of this section, which is a version of Theorem 1 for the case of bounded service times. Proof. First we verify that Q λ /λ andQ λ /λ are exponentially equivalent according to Definition 4.2.10 in [3]). Since the laws of (Q λ /λ,Q λ /λ) are induced by a separable stochastic process and the underlying topology is induced by the uniform norm, the set is Borel measurable (see Remark b) following Definition 4.2.10 in [3]). Now recall that by the construction ofQ λ that Q λ −Q λ ≤ 4 a.s. Hence for any η > 0, The result then follows by applying Theorem 4.2.13 in [3].

Proofs of Technical Results
Finally, we provide the proof of Lemmas 1 and 2.
We start with Lemma 1 which takes advantage of the Dawson-Gartner projective limit theorem and thus requires that we obtain an auxiliary large deviations principle for finite dimensional objects defined via for t i−1 < t i , and y j−1 < y j .
Proof of Lemma 1. We will use the Dawson-Gartner projective limit theorem. Consider a collection of points in the plane of the form κ = ((t i , y j ) : 1 ≤ i ≤ m, 0 ≤ j ≤ n), such that 0 := t 0 < t 1 < t 2 < ... < t m ≤ T and 0 := y 0 < y 1 < ... < y n . Moreover, we assume that y l = t l if 0 ≤ l ≤ min (m, n). Let K be the union of such collection of sets κ. Further, let {p κ } κ∈K be the projective system generated by K. We will proceed to obtain a large deviations principle for the projections Q λ (t, y)/λ : (t, y) ∈ κ . However, we will do this by first obtaining a large deviations principle for quantities ∆ ij (λ) /λ and then the large deviations principle for the projections follows using the contraction principle as the Q λ (t, y)/λ : (t, y) ∈ κ will be shown to be continuous functions. Set y n+1 = ∞, so thatQ λ (t, y n+1 ) = 0 for every t ∈ [0, T ]. It is important to note, given the structure of the partition κ, that if 1 ≤ i ≤ m, 1 ≤ j ≤ n, and i > j, then ∆ ij (λ) = 0. Now, similar to the definition of ∆ ij (λ) we define, for 1 ≤ i ≤ m and 1 ≤ j ≤ n + 1, Once again, observe thatQ λ (t, y n+1 ) = 0, and also if i > j, for 1 ≤ i ≤ m, 1 ≤ j ≤ n, we have t i−1 ≥ y j and therefore so indeed we have that (Q λ (t i , y j ) : 1 ≤ i ≤ m, 1 ≤ j ≤ n + 1) can be recovered as a continuous function of the ∆ lr (λ)'s. Since ∆ ij (λ) − ∆ ij (λ) ≤ 16, we have that Consequently, from Lemma 3, the rate function for the projections represented by κ (these projections are denoted by p κ (q)) can be written as To possess a finite I(p κ (q)), the quantity δ ij (κ) :=q(t i , y j−1 ) −q(t i , y j ) −q(t i−1 , y j−1 ) +q(t i−1 , y j ) must satisfy that δ ij (κ) = 0 (11) for i > j, and 1 ≤ i ≤ m, 1 ≤ j ≤ n + 1; otherwise, if δ ij (κ) = 0, the rate function can be made arbitrarily large by picking θ ij = c × sgn(δ ij (κ)) with arbitrarily large constant c > 0 for 1 ≤ j < i ≤ m, as is independent of θ ij 's that have j < i. In the representation of the rate function I(p κ (q)) we have also used the fact thatq(t i , y j ) = l≤i,r>j δ lr (κ), withq(0, y j ) = 0, so the relation from the δ ij (κ)'s to theq(t i , y j ) is a one-to-one, continuous function, so that the contraction principle (Theorem 4.2.1, [3]) is invoked for the above representation for I(p κ (q)). We want to show that sup κ∈K I(p κ (q)) is equal to (9), and hence conclude the proof by Dawson-Gartner Theorem (see Theorem 4.6.1, [3]).
Clearly it suffices to concentrate on functionsq such thatq(t, y) = 0 whenever t > T or y > t + M given that we are assuming service times bounded by M . Note that the constraint (11) implies that for anyq, in order that I(q) < ∞, we must have absolute continuity throughout 0 ≤ y ≤ t ≤ T and, moreover, that ∂q(t, y)/∂y∂t = 0 almost everywhere on 0 ≤ y ≤ t ≤ T (see [3] p. 189). We now focus onq(t, y) that is absolutely continuous on [0, T ] × [0, T + M ] and have ∂q(t, y)/∂y∂t = 0 almost everywhere on 0 ≤ y ≤ t ≤ T . Observe that Regarding θ(·, ·) as a step function with jumps at 0 = t 1 < t 2 < · · · < t m ≤ T and 0 ≤ y 0 < y 1 < · · · < y n < y n+1 = T + M , and denote S (D) as the set of all step functions on a given domain D.
To show that sup κ I(p κ (q)) ≥ I(q) where I(q) is as defined in (9) sinceq is absolutely continuous. By dominated convergence we have Similarly, since, as mentioned earlier |θ k (t, y)| ≤ C, by the bounded convergence theorem we have t+M t e θ k (t,y) dF (y − t) → t+M t e θ k (t,y) dF (y − t) and so by the continuity of ψ N (log (·)) we get for any t. Furthermore, the obvious inequality yields

Hence yet another application of dominated convergence gives
Combining (13) and (15) and using the expression in (12), we conclude that sup κ I(p κ (q)) ≥ I(q) (note a shift of variable y in (9)). For the other direction, consider  such that θ k → θ pointwise almost everywhere and that θ k is uniformly bounded; this sequence can be found, for example, by convolving θ with a sequence of mollifiers (i.e. smooth kernels with bandwidth that tends to zero as k → ∞). Exactly the same argument as above would then yield sup κ I(p κ (q)) ≤ I(q). Now, letq ∈ C + (D M ) and suppose thatq is not absolutely continuous. That is, it is not of bounded total variation in the sense of [3] p. 189. Then, for every γ > 0 there exists t 1 (γ) < ... < t m (γ) and y 0 (γ) < ... < y n (γ) such that m i=1 n j=1 δ γ ij ≥ γ, where Following [3] p. 192, we can select θ ij = sgn δ γ ij for the partition introduced earlier that defines δ γ ij , and obtain Since γ > 0 is arbitrary we conclude that sup κ∈K I(p κ (q)) = ∞ as required.

Unbounded Service Times
In this section, we will extend our result to unbounded service times. The main intuition of the extension beyond the bounded case is to justify that we can ignore in certain sense the customers who arrive with very large service time. Let us first introduce a suitable truncation scheme. For any K > 0 andq ∈ AC + (D) define for t ∈ [0, T ] and y := u + t ≥ 0. Sinceq is absolutely continuous, φ K (q) (t, u + t) is well defined. Moreover, the region over which the integration in (20) is performed corresponds to the triangular area depicted in light color in Figure 2. This region corresponds to the customers that are present at time t, have residual residual service time greater than y, and whose initial service time is less than K, as illustrated in Figure 2.
Moreover, for a sample pathQ λ , defineQ λ,K as the two-parameter process derived fromQ λ by ignoring the arrivals with service time greater than K (one way to imagine is that they leave the system immediately upon arrival). Therefore,Q λ,K is a two-parameter queue length process corresponding to an infinite server system with i.i.d. interarrival times following the lawŪ = G i=1 U i /λ, where G is a geometric r.v. independent of the U i 's such that P (G = n) =F (K) n−1 F (K), n ≥ 1. It is easy to check that the arrival process corresponding toQ λ,K , i.e. by ignoring the arrivals with initial service time larger than K, satisfies the conditions in Section 2.1. The service time then has the distribution function F K (x) = F (x)/F (K) for x ∈ [0, K]. We denote (V (K) n , n = 1, . . .) as the sequence of service times in this modified system. Now recall the continuous version ofQ λ , denoted byQ λ constructed in Section 2.3. Moreover, defineQ λ,K to be the continuous approximation toQ λ,K constructed in exactly the same fashion. In addition, forq ∈ AC + (D K ) define I K (q) as and set I K (q) = ∞ otherwise, where ψ (K) N is the infinitesimal moment generating function corresponding to the truncated arrival process.
Theorem 2 yields thatQ λ,K /λ, satisfies a full large deviations principle with good rate function I K (·). Forq ∈ AC + (D) we shall also evaluate I K (q) according to the expression (21).
Since the geometric r.v. G is independent of the U i 's, we can compute the associated logarithmic moment generating function of the modified interarrival times and from which we solve that the associated infinitesimal logarithmic moment generating function of the arrival process is ψ (K) N (θ) := ψ N (log(F (K)e θ +F (K))). Plugging in the above expressions into (21), we have the following expression of I K (q) (22) At this point our strategy involves two steps. First, we want to show thatQ λ,K /λ andQ λ,K /λ are exponentially good approximations as K ∞ to bothQ λ /λ andQ λ /λ respectively. The second step consists in using this fact, together with the properties of I K (q) as K ∞ and also properties of I (q) to conclude the identification of the rate function ofQ λ /λ. So, to execute the first step we first define (t) is the number of arrivals with service time larger than K in the λ-scaled system. Then we obtain the following result, which is proved at the end of this section. Consequently,Q λ,K /λ andQ λ,K /λ are exponentially good approximations as K ∞ to bothQ λ /λ andQ λ /λ respectively.
Using the previous lemma we obtain the following result. The proof is straightforward, but following our convention we shall give it at the end of the section.
We now extend the weak large deviations principle into a full large deviations principle with a good rate function using exponential tightness. Lemma 6. The family (Q λ /λ : λ > 0) is exponentially tight on C + (D) and therefore it satisfies a full large deviations principle with good rate function I * (·) .
We proceed to show the identification I * (q) = I (q). We now collect useful properties that we will need to show this identification.

Lemma 7.
i) For anyq such that I(q) < ∞, we have I(φ K (q)) = I K (φ K (q)) = I K (q) I(q) as K → ∞; the notation I K (q) I(q) implies that (I K (q) : K > 0) is non decreasing in K and convergent to I(q).
ii) For anyq such that I(q) = ∞, and each M > 0, there exists a projection p κ (following the notation introduced in the proof of Lemma 1) such that, for large enough K, iii) Finally, with κ from ii) there exists ε > 0 such that ifq ∈ C + (D) and ||q −q|| D < ε then We now are ready to prove the following important result of this section. Proof of Theorem 3. Given Lemma 6 all we need to show is that I * (q) = I (q). Suppose thatq is such that I (q) = ∞. Then, parts ii) and iii) in Lemma 7 imply in particular that for every M , there exists K, a projection κ, and ε > 0 such that I K (p κ (q)) > M for any ||q −q|| D < ε. Consequently, we conclude, by using the monotonicity of I K (q) as a function of K and taking subsequences, that Since I K (·) is a rate function (in particular I K (·) is lower semicontinuous) we have that and then by part i) of Lemma 7 we conclude that sup K>0 I K (φ K (q)) = I (q), thus concluding that I * (q) = I (q) as claimed.
We finish this section with the proof of Theorem 1.
Proof of Theorem 1. All we need to show is thatQ λ /λ andQ λ /λ are exponentially equivalent. This follows exactly as in the proof of Corollary 1 since Q λ −Q λ D ≤ 4. The measurability issue again is dealt with using separability. The result then follows by applying Theorem 4.2.13 in [3].

Proofs of Technical Results
We now provide the proof of the pending technical results.
Proof of Lemma 5. This result is a direct application of part a) in Theorem 4.2.16 in [3].
Proof of Lemma 6. This is similar to the case with bounded service time, but the conditions for tightness are slightly different given that our domain D is not compact. We must show that for any η, γ > 0, we can choose small enough ρ > 0, such that for δ < ρ, 1 λ log P (w(Q λ /λ, δ) > η) < −γ when λ is large; this part is indeed basically the same as the case D M . In addition, however, we also must show that for all η > 0 and every a > 0 there exists K > 0 such that Note that P (w(Q λ /λ, δ) > η) and P sup By Lemma 4, for every γ > 0 can choose K large enough such that for all λ large enough. So, condition (24) is enforced and the second term in the sum in (25) is also appropriately controlled. Now, by a similar argument as in Lemma 2 in the previous section, for a chosen K, we have, for all small enough δ, for large enough λ. In sum, we get for large enough λ. Therefore, exponential tightness follows. It follows immediately that a weak large deviations principle and exponential tightness implies a full large deviations principle. The goodness of the rate function then is a consequence of exponential tightness together with the weak large deviations principle; see Lemma 1.2.18, p. 8, part b) of [3].
Proof of Lemma 7. We start with part i), assuming that I(q) < ∞. Since we have immediately that I(φ K (q)) = I K (φ K (q)) = I K (q). It is obvious that I K (q) is non-decreasing in K and that I K (q) ≤ I(q). On the other hand, there exists θ n ∈ C b [0, T ] × [0, ∞) (the space of bounded and continuous functions on [0, T ] × [0, ∞)) such that converges to I(q) as n → ∞. Since I(q) < ∞, it follows easily that ∂ 2q (·)/∂t∂y is integrable over D. Therefore, given that θ n (·) is bounded, uniformly on t ∈ [0, T ]. In sum, lim K→∞ I n K (q) = I n (q) as K → ∞. Therefore, there exists K n such that I n Kn (q) ≥ I n (q) − 1/n. Recall that I n Kn (q) ≤ I Kn (q) and consequently we obtain Since I K (q) increases in K, we have I K (q) I(q) as claimed. For part ii), when I(q) = ∞, there are two cases: a)q is not absolutely continuous, and b)q is absolutely continuous. Case b) in turn is divided into two subcases: b.1) ∂ 2q (t, y)/∂t ∂y is not integrable over D, and b.2) ∂ 2q (t, y)/∂t ∂y is integrable over D. We shall proceed to analyze all these cases now. For Case a). We can construct a projection p κ with I K (p κ (q)) > M as we did in the proof of Lemma 1. For Case b), we have thatq is absolutely continuous, but so we proceed to study case b.1): Assume that ∂ 2q (t, y)/∂t ∂y is not integrable on D. We shall assume that (if this integral is finite, then integral of the negative part must diverge and the analysis that follows next is identical). As in the proof of Lemma 1, given a projection κ induced by 0 ≤ t 1 < t 2 < ... < t m ≤ T and 0 ≤ y 0 < y 1 < ... < y n+1 , define as long as y j−1 ≥ t i−1 (otherwise, if y j−1 < t i−1 , δ ij (κ) = 0). Then, from (26) and (27), it follows easily that for any M , there exists a partition κ such that Therefore, for large enough K, Now, for case b.2) suppose that ∂ 2q (t, y)/∂t ∂y is integrable on D. We can find θ(t, y) such that Following the same line of reasoning as in the proof of part i) we can conclude that there exists K > 0 such that I K (q) > 2M . According to Dawson-Gartner Theorem, I K (q) = sup I K (p κ (q)) where the supremum is taken over all projections restricted to {t ∈ [0, T ], 0 ≤ y ≤ t + K}. As a result, there exists some projection p κ such that I K (p κ (q)) > M and hence we are done. Now we turn to part iii). So far we proved that for anyq and M > 0, we can find a projection p κ such that I K (p κ (q)) > 2M . As discussed in the proof of Lemma 1, we have where δ ij (κ) is induced by the projection p κ . By definition, there exists some θ ij such that For all ε > 0 andq ∈ B ε (p), we have Hence for ε = M/(8 i,j |θ ij |) and allq ∈ B ε (q), we have Thus we conclude the result.
Proof of Lemma 4. Let N (K) λ (T ) be the total number of arrivals from time 0 up to T with service time longer than K, under the λ-scaled system. Then following [4] (or as in the proof of Lemma 2) we have that lim λ→∞ 1 λ log Ee θN (K) λ (T ) = T ψ N (log(e θF (K) + F (K))).
Chernoff's bound yields Since θ can be arbitrarily large, the result follows.

Examples
This section is devoted to two examples that apply the large deviations principle that we have developed in the previous sections. The first example is on the most likely path to overflow in a loss queue, while the second example is on the ruin of a large life insurance portfolio that embeds an infinite server queue with service cost.
Example 1. (Finite-horizon maximum of queue length process for M/G/∞) Consider an M/G/∞ queue with Poisson arrivals with rate λ. Suppose that the service times have a density f (·) with respect to the Lebesgue measure. The system initially starts empty. We want to find the optimal large deviations sample path to attain the event {max 0≤t≤TQλ (t, t)/λ ≥ x}, for fixed T and x, as λ → ∞; this event corresponds precisely to the event of observing a loss in a queue with λx servers, no waiting room, starting empty. Note that g (q) := max 0≤t≤Tq (t, t) is a continuous function under the uniform norm, so the contraction principle is directly applicable.
We impose the condition that T 0F (t)dt < x. This condition implies that the probability for the queue to reach λx decreases exponentially fast as λ → ∞ (Such condition will be clear when we solve the constrained optimization in a moment).
To proceed, let us first observe that ψ N (θ) = e θ − 1. The maximization problem in (2) can be solved and the rate function is immediately recognized as which is easily seen to be a convex function of ∂ 2q (t, y)/∂t∂y. To find the optimal sample path amounts to solving the minimization problem which is a convex optimization problem. The quantity s 0 ∞ s −∂ 2q (t, y)/∂t∂y dydt is equal tō q(s, s) whenq is absolutely continuous, andq(s, s) represents the scaled queue length process at time s.
To solve (28), we first consider a fixed s in the constraint and then optimize over s. When considering s fixed we replace the constraint in (28) byq(s, s) ≥ x. Under this new constraint, it suffices to look at the time 0 to s in the objective function, that is, we now solve The solution to (28) is then the optimal sample path from (29), among 0 ≤ u ≤ T , that gives the smallest objective. We now consider (29). Introducing a Lagrange multiplier µ ≥ 0, we minimize By a formal application of Euler-Lagrange equations, we differentiate the integrand with respect to −∂ 2q (t, y)/∂t∂y to get for t ≤ y ≤ u µf (y − t) for y > u for some µ ≥ 1 (we replace e µ by another dummy µ for convenience). Complementary slackness then implies which in turn gives µ = x u 0F (t)dt (note that we have assumed u 0F (t)dt < x and so the condition µ > 1 is satisfied). As a result for y > u.
The optimal sample pathq(t, y) leading to the constraintq(u, u) ≥ x is given bȳ Transforming into q(t, y) =q(t, y + t) and some simple calculus gives the optimal sample path q(t, y) = , for y + t > u.
In connection to our discussion about the direct rate function representation in terms of Q λ /λ (see equation (6), in Section 2.2.1), one can check that ∂q(t, y)/∂y is not continuous on the line y = t and therefore ∂ 2 q(t, y)/∂y 2 does not exists though I(q) is finite.
Note also that the objective is This is the rate function corresponding to the probability P (Q λ (u, u) ≥ λx) = P (Q λ (u, 0) ≥ λx), where Q λ (u, 0) is the queue length at time u. This rate of decay is consistent with direct calculation using the fact that Q λ (u, 0) is a Poisson random variable with rate λ u 0F (t)dt, which gives For a consistency check, our result here can in fact recover the large deviations for the arrival process itself. If one changes the constraint in (29) toq(u, 0) = u 0 ∞ 0 − ∂ 2 ∂t∂yq (t, y) dydt ≥ x, the optimal value of (29) then becomes x[log(x/s) − 1] + s, which coincides with the exponential decay rate of P (Poisson(λs) > λx) as λ → ∞. Figures 3 and 4 illustrate both the law of large numbers (i.e. the typical path) and the most likely path to the overflow event max 0≤u≤TQλ (u, u)/λ ≥ x for T = 1, x = 2. The underlying service time distribution is uniform in the interval [0, 1]. We can see that the optimal path of Q(t, y) increases gradually over time to overflow at time 1.
(a) Q(t, y) for the most likely path to overflow (surface) (b) Q(t, y) for the unconditional path (surface) Figure 3: Surface plots of the asymptotic surface Q λ (t, y)/λ, as λ increases, both an optimal (most likely) path leading to overflow, and the unconditional path.
It is easy to see that since we assume T 0F (u)du < x, the rate function (30) is non-decreasing in s, and as a result an optimal time horizon is T . If the service time has bounded support [0, M ] with M < T , then the selection of any time s ∈ [M, T ] will give an optimal sample path.
Example 2. (Insurance risk process) The net reserve of a life insurance company consists of the premium collected from policyholders, deducted by the benefit paid to policyholders in the event of deaths; often all these payments are discounted at zero in order to recognize the value of money in time. When policyholders arrive at the insurance company over time (an arrival is interpreted as the moment when a contract is signed), one can model the net assets of the insurer as a function of the underlying arrivals and death events of policyholders. Specifically, we shall assume that policyholders arrive according to a Poisson process with rate λ, and that the timeuntil-death upon arrival of the policyholders are independent and identically distributed. Moreover, we assume that the time-until-death upon arrival has density f (·), distribution function F (·), and tail distributionF (·). The time-until-death in this setting can be thought as the service time in  Figure 4: Contour plots of the asymptotic surface Q λ (t, y)/λ, as λ increases, both an optimal (most likely) path leading to overflow, and the unconditional path.
the queueing context. We shall assume without loss of generality that the initial net reserve of the company is zero. It is often more convenient to work with the negative net reserve process, also known as the aggregate loss process, defined as the total benefit that the insurer has paid up to time t, minus the total premium received up to time t. For a policyholder who arrives at time A i , and who dies at time A i + V i < t, the payoff by the insurer, discounted at time zero, is denoted h 1 (A i , A i + V i ); here A i and V i are the arrival time and time-until-death at the time of arrival of the policyholder. This quantity, h 1 (s, y), for y ≥ s, captures the benefit paid at y minus the accumulated premium collected from time s to y. On the other hand, for a policyholder who has arrived prior to t, at time A i , and who is still alive at time t, the payoff from the insurer to the policyholder is h 2 (A i , t) (typically h 2 (A i , t) will be negative as it represents premium that are paid to the insurer, so the payoff is negative). Here h 2 (s, t), for t ≥ s, captures the premium accumulated from s up to the present time t, discounted to obtain the net present value at time zero.
Consider, for instance, the setting of whole life insurance policies. That is, policies that pay a benefit b to the family of the policyholder, at the time of eventual death, in exchange of a premium which is paid at rate p continuously in time during all the time the policy was held, from arrival, up until the time of death. If the interest rate (or force of interest as it is known in the insurance setting) is constant equal to δ > 0, then The aggregate loss process, S λ (t), is represented as the net present value of the sum of the payoffs for all policyholders who arrive before t and it is given by