Cache Miss Estimation for Non-Stationary Request Processes

The goal of the paper is to evaluate the miss probability of a Least Recently Used (LRU) cache, when it is oﬀered a non-stationary request process given by a Poisson cluster point process. First, we construct a probability space using Palm theory, describing how to consider a tagged document with respect to the rest of the request process. This framework allows us to derive a fundamental integral formula for the expected number of misses of the tagged document. Then, we consider the limit when the cache size and the arrival rate go to inﬁnity in proportion, and use the integral formula to derive an asymptotic expansion of the miss probability in powers of the inverse of the cache size. This enables us to quantify and improve the accuracy of the so-called Che approximation .

1. Introduction.Since the early days of the Web, cache servers have been used to provide users faster document retrieval while saving network resources.In recent years, there has been a renewed interest in the study of these systems, since they are the building bricks of Content Delivery Networks (CDNs), a key component of today's Internet.In fact, these systems handle nowadays around 60% of all video traffic, and it is predicted that this quantity will increase to more than 70% by 2019 [4].Caches also play an important role in the emergent Information Centric Networking (ICN) architecture, that incorporates them ubiquitously into the network in order to increase its overall capacity [1].
In order to improve network efficiency, cache servers are placed close to the users, and store a subset of the catalog of available documents.Upon a user request for a document: • If the document is already stored in the cache, then the cache uploads it directly to the user.This event is called a cache hit.• Else, the request is forwarded to the repository server, which uploads a copy to the user, and possibly to the cache for future requests.This event is called a cache miss.
Since the cost of storage is a constraint, each cache contains only a fraction of the document catalog, and needs to eliminate some documents to free space for new ones.Since the caches must decide to do so in real time, they use simple distributed elimination algorithms, called cache eviction policies.
Hereafter, we will focus our efforts on the Least Recently Used (LRU) cache eviction policy.To simplify the analysis, we will assume that all documents have the same size, and therefore that the disk of the cache can be represented as a list of documents of size C ≥ 1.The LRU policy evicts content upon a user request as follows (see Fig. 1): • If the requested document is already stored in the cache, then it is moved to the front of the list, while all documents which were in front of it are shifted down by one slot.• Else, a copy of the requested document is downloaded from the server and placed at the front of the list, while all documents which were already in the list are shifted down by one slot except the last one which is eliminated.
Intuitively, this simple policy should perform well, since highly requested documents should stay near the front of the list, whereas unpopular ones should be quickly eliminated.Early theoretical studies on LRU caching performance further assumed that the catalog is fixed and finite, and that documents there have each an intrinsic probability to be requested independently, thus defining a popularity distribution.The request process is then modeled as an i.i.d.sequence, where at each time step a document is requested according to its popularity.This framework is commonly referred to as the Independent Reference Model (IRM) in the literature, see for instance [13].
While the IRM setting has been proved to be a good model for short time-scales, it is not accurate for larger ones.In fact, other phenomena occurring within longer time-scales must also be taken into account, notably the dynamic nature of the catalog and of user preferences.
In order to capture these phenomena, a new model based on Poisson cluster point processes has been independently proposed by Traverso et al. [20] and Olmos et al. [19].It allows to address the catalog and preference dynamics, and thus to obtain more accurate results in both large and small time scales.Its properties have received only heuristic analysis in these works.
The object of the present paper is to build a sound mathematical framework for the analysis of this model, and to provide rigorous proofs for the estimation of the hit probability, which corresponds to the asymptotic proportion of cache hits among all requests when the number of these requests goes to infinity, or equivalently of the complementary miss probability.
Before describing our main contributions, we briefly review the literature on caching performance, mentioning only papers relevant to our present work; see [10] and the references therein for a more comprehensive bibliography of the subject.The modern treatment of the subject started with Fill and Holst [9], which introduced the embedding of the request sequence into a marked Poisson process, in order to analyze the related problem of the search cost for the Move-to-Front list.Independently, Che et al. [3] also used marked Poisson processes to model the requests.In their work, they express the hit probability of a LRU cache in terms of a family of exit times of the documents from the cache.In order to simplify the analysis, they approximated this family by a single constant called the characteristic time.This heuristic, called the Che approximation in the literature, proved to be empirically accurate even outside of its original setting.The question of quantifying the error incurred in the approximation has been partially answered by Fricker et al. [13], where the authors provide a justification for a Zipf popularity distribution when the cache size C grows to infinity and scales linearly with the catalog size.The error incurred by the approximation is estimated for the exit times, but not, however, for the hit probability.
In the present paper, we succeed in adapting the Che approximation to the more complex setting of the cluster point process model.The approximation accuracy has been considered first by Leonardi and Torrisi [16], which provide limit theorems for the exit time as C goes to infinity, as well as an upper bound of the error on the hit probability.However, the latter bound depends on an additional variable, of which the optimal value is not explicitly given in terms of system parameters.
The contribution of our paper is threefold.Firstly, in Sections 2 and 3, we use the Palm distribution for the system in order to provide a probability space where an "average document" can be tagged and analyzed independently from the rest.Secondly, in Section 4, we use the latter independence structure to obtain an integral formula for the expected number of misses for a document, generalizing the development in [19].Thirdly, in Section 5, using scaling methods, we deduce from this formula an asymptotic expansion for the average number of misses, showing that the error term in the Che approximation is of order O(1/C).In contrast to the upper bound provided in [16], our error estimation depends simply on system parameters and can be readily calculated.Section 6 is devoted to a numerical study validating the accuracy of the asymptotic expansion.Section 7 contains some concluding remarks, and Section 8 contains all proofs.
2. Document Request Model.Our request model consists in the following cluster point process on the real line R, illustrated in Fig. 2.
A ground process Γ g , hereafter called the catalog arrival process, gives the consecutive arrival times of new documents to the catalog.We assume it to be a homogeneous Poisson process with intensity γ > 0, and denote its generic arrival time by a.
The cluster at an arrival time a of Γ g is denoted by ξ a , and is an element of the space M # (R) of point processes on R. It represents the document request process for the document arriving to the catalog at that time a.We assume that ξ a is a Cox process directed by a stochastic intensity function λ a ≥ 0 having the following properties.
• Given Γ g , the intensities λ a for a ∈ Γ g are jointly independent.
• The intensities λ a are causal : each function t → λ a (t) is zero for t < a.
Requests for a document thus occur only after its arrival at the catalog.• The distribution of λ a is stationary: for each arrival time a ∈ R, the processes λ a (•) and λ 0 (• − a) have the same distribution.
These three conditions allow to sample the sequence (λ a ) a∈Γ g using independent samples of a canonical intensity function λ with support in [0, ∞), adequately shifted to every arrival time a.For a document arriving at time a, we denote by Λ a both the mean function associated to the request intensity λ a and the average number of requests (with abuse of notation for conciseness) We assume that Λ a < ∞ almost surely, and denote by Λa the complemen- When referring to the canonical document, which corresponds to an arrival at time zero, we remove the time index a; for instance we write Λ and Λ(t).The superposition of all processes ξ a for a ∈ Γ g given by Γ = a∈Γ g ξ a constitutes the total request process for all documents.We assume that (1) t −∞ E 1 − e −(Λa(t)−Λa(s)) da < ∞ for all times s ≤ t.This is a necessary and sufficient condition for the process Γ to be locally finite almost surely, see [5,Theorem 6.3.III].
3. Tagging a Document via Palm Theory.The key of our analysis is to tag one document of the system and treat the remaining process as an external environment.To do this, we follow [6, p.279].Let Q u,ν be the local Palm distribution at point (u, ν) in R × M # (R) for the point process Γ = a∈Γ g δ a,ξa constituted by the ground process Γ g marked with the document request processes, which constitutes a Poisson point process on R × M # (R).Define the mark-averaged Palm distribution Q u on M # (R) by Under this distribution Q u , the process has the structure given by the following proposition, illustrated by Fig. 3.
Proposition 1 (Palm Decomposition for Tagged Document) Under the distribution Q u , the process Γ has almost surely a point at time u.Furthermore: • The distribution of the mark ξ u is the same as the original one.
• The distribution of the remaining process Γ \ δ u,ξu is the same than that of the original process Γ. • The mark ξ u and the process Γ \ δ u,ξu are independent.
We refer to Section 8.1 for the proof of this proposition.Proposition 1 allows us to consider a probability space for which there is a document arrival at time a = 0, almost surely.We call this document the tagged document, and the complementary process the rest.
In the next section, we shall see that for the LRU caching discipline, the independence of the tagged document from the rest allows us to derive a general integral formula for the miss probability.
4. Fundamental Integral Formula.As stated in the previous section, we will consider a tagged document at time zero, so that its associated distribution is the canonical one.For a LRU cache with size C, let N and µ C be the random number of requests and number of misses for the tagged document, respectively.The total miss probability is defined by which is also the average per-document miss probability µ C /N under the size biased distribution of N .The mixed Poisson random variable N with random mean Λ has expectation and it remains to study µ C .Let (Θ r ) N r=1 be the sequence of request times for the tagged document, with the understanding that it is the empty set if N = 0.The first request being always a miss, the number of misses can be written as 1{Request at Θ r is a miss}.
Under the LRU policy, a document requested at time s will be erased from the cache at the first time, after the last request for this document, that C distinct other documents have been requested.
For each s in R, let us define the process X s = (X s t ) t≥s which counts the number of distinct documents in the rest of the process which are requested on the time interval [s, t], and its exit time T s C to level C. Hence, T s C is the time that a document requested at time s spends in the cache before being evicted.Denoting by F s (ξ a ) the first arrival time of ξ a in [s, ∞), the process X s = (X s t ) t≥s and exit time T s C can be expressed as (3)

These definitions allow us to express the miss events as
since such a miss occurs if and only if at least C distinct other documents have been requested in the interval [Θ r−1 , Θ r ].Hence (2) can be written as ( 4) .
To proceed further, we study the consequences of the structure of the cluster point process on the structure of the families X s and T s C .Proposition 2 (Characterization of X s and T s C ) Let s be in R. The process X s = (X s t ) t≥s defined by ( 3) is an inhomogeneous Poisson process with intensity function In particular, T s C − s d = T C , where T C = T 0 C is the exit time of a document requested at time zero.
We refer to Section 8.2 for the proof.Equation (4), Proposition 2, and the independence between the tagged document and the rest of the process now yield an integral formula for E[µ C ].

Theorem 3 (Integral Formula for Expected Misses)
The expected number of misses is given by where T C = T 0 C denotes the exit time for a document requested at time zero, see (3), and the function m is defined using the notation in Section 2 by The proof is postponed to Section 8.3.It uses the following result of independent interest.Proposition 4 (Functionals of Holding Times) Let ξ be an inhomogeneous Poisson process on [0, ∞) with deterministic intensity function λ.Let the mean function Λ satisfy Λ(∞) < ∞, so that ξ has a finite random number N of points (Θ r ) N r=1 .Then, for any We refer to Section 8.4 for the proof of this proposition.The above analysis would identically apply if the random variable T C were deterministic and equal to some positive constant t.This would correspond to the cache discipline known as Time to Live (TTL), where the cache evicts a document after a fixed amount of time t.Therefore, m(t) is simply the average number of misses for a TTL cache of eviction time t.We can thus regard the number of misses in a LRU cache as a time randomization of the misses in a TTL cache.
Indeed, the integral formula (7) in Theorem 3 can by rewritten using integration by parts as which can be informally interpreted as follows: The exponential term Thus a request at time u will contribute to the intensity of the miss process if there were no requests in the interval [u − t, u], which is exactly a miss event in a t-TTL cache.This relationship between the miss probabilities of TTL and LRU caches has been already noted by Fofack et al. [11].This formula is not informative, since it basically tells us that the first request for a document is the unique miss for infinite capacity.
A more interesting way to derive asymptotics for E[µ C ] is to scale some system parameters with respect to C. An intuitively good choice is to scale the catalog arrival rate γ proportionally to the cache size C.In the following, with help of the results of the previous sections, we shall provide an asymptotic expansion for E[µ C ] as C grows large in this scaling.
The canonical exit time T C is the first passage time to level C of an inhomogeneous Poisson process with mean function Ξ = Ξ 0 , see Proposition 2.
To pursue the analysis, we first prove a key relation between Ξ and m.

Proposition 5 (Relation between Ξ and m)
The functions Ξ and m in (5) and (7) satisfy Ξ (t) = γ m(t), and hence We refer to Section 8.5 for the proof.Proposition 5 implies that Ξ(t) = y ⇔ M (t) = y/γ and thus that (for definiteness, we consider left-continuous inverses).Moreover, the exit time T C is the first passage time to level C of an inhomogeneous Poisson process with mean function Ξ (see Proposition 2), and can be expressed as ( 9) where T C is the first passage time to level C of an unit Poisson process and has a Gamma(C, 1) distribution.From Theorem 3 and ( 9), we derive that , and (8) eventually yields that Now, the strong law of large numbers yields that lim C→∞ T C /C = 1 almost surely, and thus (10) strongly suggests to consider the scaling (11) C = γθ for some θ > 0 .
This scaling is quite natural, since Little's law ([2, Section 3.1.2])applied to the cache system yields that C = γ E T in C , where 1{Object is in the cache at t} dt is the sojourn time of an object in the cache.Note that we do consider the objects without any requests as entering the system, but we set their sojourn time to T in C = 0.As a consequence, the asymptotic analysis under the scaling (11) amounts to fixing the average sojourn time θ = E T in C = C/γ and the distribution of the canonical intensity function λ while letting C grow to infinity.
Under the scaling (11), eq. ( 10) and lim C→∞ T C /C = 1 a.s.imply using dominated convergence that In the following, the quantity t θ will be called the characteristic time.The asymptotics of E[µ C ] will be expressed in terms of t θ .In this aim, we first recall two basic results regarding the Gamma(C, 1) distribution.
Lemma 6 (Classical Bounds on Gamma Laws) Let T C follow a Gamma(C, 1) distribution, and X C = T C /C. Then: (i) For any C > 1 and η > 0, where ϕ(x) = x − 1 − log x is the large deviations rate function for the law of large numbers for exponential random variables of mean 1. (ii) For any C > 1 and k > 1, We refer to Section 8.6 for the classical proofs.We now formulate our central result concerning the asymptotics for the average number of misses.
Then, as C goes to infinity with the scaling C = γθ for fixed θ > 0, we have We refer to Section 8.7 for the proof.Theorem 7 justifies the accuracy of the estimations that use the Che approximation.In the present setting, this heuristic consists in replacing the exit time T C in ( 6) by the constant t C = Ξ −1 (C), therefore estimating E[µ C ] by m( t C ).Now, under the scaling C = γθ, the identity (8) entails that The quantity t C is called in the literature the "characteristic time", and this identity justifies this naming for t θ as well.More importantly, the asymptotic expansion of E[µ C ] in Theorem 7 shows that the error in the Che approximation is of order 1/C and specifies it precisely, for large C and fixed average sojourn time θ.

Remark 8 (Higher Order Expansions)
If the function m has derivatives of higher order, the proof of Theorem 7 together with Lemma 6 allow us to derive higher order expansions of E[µ C ] in powers of 1/C.Specifically, to obtain an expansion at order n, we must expand f θ to the 2n-th order, since ) by Lemma 6.
We then eventually obtain where φ k is a polynomial of degree k/2 , as shown in the proof of Lemma 6.
Remark 9 (Laplace Asymptotic Method) Theorem 7 can be proved by purely analytical methods.Indeed, (20) can be written in integral form, after using the change of variables w → w/C, as Theorem 7 then follows by expanding this integral using the Laplace method (see [18, (3.15)]) and Γ(C) using the Stirling formula.The expansion of the numerator must be performed through a Taylor series of function f θ around the extremal point of the argument w − log(w) of the exponential term, that is, near w = 1.This method is, however, more complicated, since it involves the expansion of both numerator and denominator in powers of √ C.
The smoothness assumptions on the function m in Theorem 7 can usually be checked readily on a case by case basis, by justifying interchange of derivation and expectation in (7) using dominated convergence.Nevertheless, it is difficult to give a general result.
To conclude this section, we show that these smoothness assumptions hold for a class of random intensities λ which is suitable for modeling purposes.This class is built by randomly scaling a deterministic shape function in both domain and range.It includes the families used in previous works [20,19].

Proposition 10 (Twice Continuously Differentiable Example)
Let f ∈ C 1 (0, ∞) be a strictly positive unimodal function satisfying that f = 1, f 2 < ∞, and |f | < ∞.Let (R, L) be a couple of positive random variables with a smooth joint density, satisfying that E[R] < ∞ and E[RL] < ∞.If the canonical document request intensity is of the form then the function m is C 2 (0, ∞) with derivatives given for t > 0 by where F (u) = u 0 f (v)dv.We defer the proof of the proposition to Section 8.8.Note that Proposition 10 only imposes mild conditions on the distribution of (R, L).The admitted shape functions f include exponential and power law decreasing profiles, and Gaussian curves restricted to [0, ∞).In addition, the assumption of f being strictly positive on [0, ∞) can be weakened to that of being positive only in a compact interval; this in turn implies that f is not differentiable everywhere and the second derivative of m will thus contain additional terms from the integral of f .These terms can be obtained by integration by parts (see [12,Th. 3.36] for a generalized form).
One example of such a family with compact support is given by the "Box Model", previously analyzed in [19], which can be constructed by simply taking f = 1 [0,1] .In this case, m and its derivatives reduce to (15) We will use this model for a numerical illustration in the next section.
6. Numerical Experiments.We provide some numerical results to validate the accuracy of asymptotic expansion (12), by comparing it to the values obtained from the system simulation.In our experiments, we used the "Box Model" in which the canonical intensity function is given by where the random pair (R, L) represents the request rate and lifespan of a document.In view of (12), we obtain the zero order and first order approximations for the hit probability q C , namely (16) For a given general distribution of (R, L), we cannot deduce explicit expressions for m, m , m , M , and M −1 from (15).In particular, there is usually no formula for t θ in terms of θ.In consequence, we resorted to numerical integration and inversion to obtain the hit probability estimates in (16).
As argued in [19], actual data traces suggest that the distributions of variable R and L are heavy tailed with infinite variance, that is, with tail index α ∈ (1, 2).For our experiments, we consequently chose R and L to be distributed as independent Pareto-Lomax variables, with probability density ασ α /(σ + x) α+1 for x > 0, with respective parameters (α = 1.9, σ = 22.5) and (α = 1.7, σ = 0.07).Such values have been taken so that the simulation time is not excessive; they provide a "box" of average width 0.1 and height 25 with high volatility since neither R nor L have a finite variance.
We generated the request process associated with these intensity functions for various values of γ ranging from 10 to 1,000.For each request sequence, we simulated an LRU cache and obtained the empirical hit probability for various capacities C.
To obtain reliable results, the heavy tailed nature of the input distributions requires to use the stable-law central limit theorem (see [21,Th. 4.5.1]).Specifically, there exists a so-called stable law S α (σ, β, µ) with scaling parameter σ and a constant K α such that, in distribution, This allows to heuristically quantify the convergence rate for the law of large numbers by considering that (in the present case, α = 1.7 for L).We then chose the simulation time S such that the average number of observed documents n = γS × E 1 − e −RL is such that scaling parameter K α /n 1−1/α is smaller than 10 −3 (such a value of n ensures the same accuracy for the request rate R with larger tail index α = 1.9).Besides, we also chose S large enough to ensure that there is enough time for all observable documents to appear in the simulated trace.
We show in Fig. 4 some of the resulting hit probability curves from these experiments.We observe that the zero order approximation in ( 16) is almost exact already for γ = 500.The error incurred by the approximation for lower γ can be corrected by using the first order approximation in (16), as shown in Fig. 5a for γ = 50.For even lower intensities, this correction might not be enough to approximate the real hit probability, as illustrated in Fig. 5b for γ = 10; the higher order expansion of Remark 8 would then be needed.
The above numerical results therefore illustrate the accuracy of the asymptotic expansion for the hit probability.
7. Concluding Remarks.In this paper, we have estimated the hit probability of a LRU cache for a traffic model based on a Poisson cluster point process.In this endeavor, we have built using Palm theory a probability space where a tagged document can be analyzed independently from the rest of the process.In the case of the LRU replacement policy, this property is key for the analysis, since it allowed us to derive an integral expression for the expected number of misses of the tagged object.
Using this expression, we were able to obtain an asymptotic expansion of this integral for large C under the scaling C = γθ for fixed θ > 0. This expansion quantifies rigorously and in precise fashion the error made when  applying the commonly used "Che approximation".We have further shown that the latter expansion is valid for a sub-class of processes suitable for modeling purposes.Finally, the accuracy of our theoretical results has been illustrated by numerical experiments.
Our framework could be used to analyze other caching policies satisfying that the eviction policy for the canonical document depends only on the rest of the document request process.Examples of such caching policies found in the literature are RANDOM, which evicts a uniformly chosen document when adding a new document to the cache, and FIFO, which works as LRU except that it does not move a requested document that is already in the cache to the front of it.Such alternative policies may be relevant in that the replacement operations are somewhat simpler than LRU, and this may compensate their probable lesser performance in terms of the hit probability.However, the miss events for this policies are more intricate to analyze, since they depend on the missed requests in the rest Γ \ δ 0,ξ 0 .
Another possible extension of our study would be to take into account the fact that documents have random sizes.These sizes and the cache size C should be measured for instance in bits, packets, or by a continuous value in R + .The document sizes can be incorporated as additional marks to the cluster point process.In this case, the process X defining the canonical exit time becomes a compound inhomogeneous Poisson process, summing up these file sizes.The exit time to consider for a canonical document of size S for any measurable function f : R × M # (R) → R + , where L is the Laplace functional on the original probability space.The Laplace functional L u under Q u is consequently given by Note that the expectation in the right-hand side is the Laplace functional of the point process δ u,ξu .Since Laplace functionals characterize point processes, the conclusion follows.8.2.Proof of Proposition 2. Condition (1) implies that Ξ s (t) < ∞ for all t ≥ s.Among all points (a, ξ a ) in the rest Γ \ δ 0,ξ 0 , the process (X s u ) s≤u≤t counts those such that F s (ξ a ) falls in [s, t], and for h ≥ 0 the increment X s t+h − X s t counts those such that F s (ξ a ) falls in (t, t + h].Since the corresponding two subsets of R × M # (R) are disjoint and Γ \ δ 0,ξ 0 is Poisson, we conclude that X s is a counting process with independent increments.In consequence, it is a inhomogeneous Poisson process.
The mean function for this process is then given by Formula (5) follows from the latter expression and the fact that the mean measure η of Γ \ δ 0,ξ 0 is defined by 8.3.Proof of Theorem 3. In the first r.h.s.term of (4), N is a mixed Poisson random variable with random mean Λ and thus Consider now the second r.h.s.term of (4).Following [15][p.106seq.], since the family T s C for s ≥ 0 is defined on the rest of the process and thus is independent from the request process ξ = N r=1 δ Θr for the tagged document, where h : M # (R) → R + is the measurable function defined by We use this to compute the expectation of the l.h.s. of eq. ( 18), which combined with ( 4) and (17) yields that Now, since the canonical intensity λ and exit time T C are independent from the request process of the tagged document, Proposition 4 yields that where we use for the last equality that, since Λ(∞) = Λ and Λ(0) = 0, This last equation and dominated convergence imply that lim t→∞ ↓ m(t) = E 1 − e −Λ , which concludes the proof.
8.4.Proof of Proposition 4. Recall that, given that the process ξ has k points, the request times (Θ r ) k r=1 have the distribution of the order statistics of a random variable with density g(t) = λ(t)/Λ for t ≥ 0, and thus with c.d.f.G given by G(t) = Λ(t)/Λ for t ≥ 0. Let G = 1 − G denote the complement of G. From order statistics theory, it is known that the holding times Θ r − Θ r−1 for 2 ≤ r ≤ k have density gk,r given for w ≥ 0 by gk,r (w Consequently, for k ≥ 2 we have and hence Now, using the Binomial Theorem, and thus and we conclude that and g(u)g(u+w) = λ(u)λ(u+w)/Λ 2 , Equation (19) together with the latter intermediate results concludes the proof.
(i).This is the optimized exponential Markov inequality which is used for the upper bound in Cramer's large deviations Theorem, see [7, Theorem 2.2.3, Remark (c)].
(ii).Expanding the k-th order central moment of X C in terms of the known moments of T C yields that where φ k is a polynomial of degree at most k.As shown in [17], the polynomial φ k is actually of degree k/2 , which allows us to conclude.
With the scaling C = γθ, Equation ( 10) can be then written as Let again X C = T C /C as in Lemma 6, and fix η > 0. Let us decompose the expectation (20 For A C , recall that the function m is bounded by E[Λ] < ∞, and so is f θ .Then, by Lemma 6 (i), we have For B C , we write a Taylor expansion of f θ at 1 of order two in the form We then compute ( 21) in the right-hand side of (21), we use the Cauchy-Schwarz inequality to write and note that E h θ (X C ) 2 = O(1) for all C > 1 by Lemma 6 (ii).Applying Lemma 6 (i) then eventually shows that ) which is, in particular, o(1/C).At this stage, we therefore conclude from (21) and the latter discussion that ( 22) Lastly, we show that the term E C is o(1/C).To this aim, it is sufficient to show that the sequence W C = C • k θ (X C , Y C ) for C > 1 converges in probability to zero and that it is uniformly integrable ( [22,Theorem 13.7]).
• To prove the convergence in probability, note that since and, in particular, in probability.On the other hand, since X C = T C /C is an average of C i.i.d.random variables with mean 1, the continuous mapping theorem for weak limits implies that C(X C − 1) 2 converges in distribution (the limit distribution is χ 2 with parameter 1 but this specific limit has no importance for the present proof).Finally, since 1{|X C − 1| < η} → 1 a.s., Slutsky's theorem ([14,Th. 11.4]) allows us to conclude that in distribution as C → ∞, and thus in probability as well.
• To prove the uniform integrability of W C , it suffices to show that for any C > 1 and for some constant K depending on η only.By Lemma 6, we further have (22), we thus have proved that as C → ∞.To conclude the proof, we now express the function f θ and its derivatives at 1 in terms of function m and its derivatives at t θ .By implicit differentiation, and the values of f θ and f θ at z = 1 consequently follow.Replacing them into (24), we finally prove the expansion (12), as claimed.
8.8.Proof of Proposition 10.Differentiating (7) under the integral sign, with λ(u) expressed by (13), readily gives formulas (14) after using the change of variables u → u/L.The validity of these formulas can then be simply proved by showing that these integrals for m and m are finite.
Given t > 0 and L, define u * = u * (t, L) = inf{u : f (u) > f (u + t/L)}, so that f (u) ≤ f (u + t/L) for u ≤ u * and f (u) > f (u + t/L) for u > u * .The existence of u * is ensured from the unimodality of f , and we have u * = 0 if and only if f is non-increasing.Finally, define ũ = inf{u : f (u) = max f } (see Fig. 6 for a schematic view of these definitions).where the last inequality is justified by the bound xe −ax ≤ 1/ae for any fixed a > 0, and the fact that f = 1.
For the second derivative m (t), we introduce the integrals Using again xe −ax ≤ 1/ae, we have Finally, to deal with B 2 (t) we note that f (u + t/L) ≤ 0 for u ∈ [u * , ∞) and thus |f (u + t/L)| = −f (u + t/L).We then use integration by parts to obtain u)e −Rf (u+t/L)t du .The first term in the latter expression is trivially negative; the second is also negative since f is non-decreasing in [0, ũ).As a consequence both terms can be ignored to obtain thus concluding the proof.

Fig 1 :
Fig 1: The LRU eviction policy handling a hit and a miss request on a cache of size C = 5.

Fig 2 :
Fig 2: A sample of the document arrival and request processes.Top: Each catalog arrival triggers a function representing the request intensity for the corresponding document.Bottom: A sample of the document request processes.Their superposition generates the total request process.

Fig 3 :
Fig 3: Illustration of request process ξ under the averaged Palm distribution.The original process is decomposed into: (1) the tagged document and (2) the rest of the process.These are mutually independent, and the rest of the process has the same distribution as the original process.

5 .
Asymptotic Expansion.It holds that lim t→∞ ↓ m(t) = m 0 , see Theorem 3.Moreover, Proposition 2 yields that the exit time T C increases to infinity with C. Hence, (6) and dominated convergence yield that lim C→∞ E[µ C ] = m 0 .

Fig 4 :
Fig 4: Convergence of the hit probability curves obtained in the experiments to the 0th order Che approximation.

Fig 5 : 8 . 1 .
Fig 5: Comparison between hit probability curves obtained in the experiments and their analytic approximations in the case of low γ.

8. 7 .
Proof of Theorem 7. Define the function f θ by u
see Proposition 2, the function h can be rewritten as