In this paper, we analyze the optimal management of local memory systems using the tools of stationary point processes. We provide a rigorous setting of the problem, and we characterize the optimal causal policy that maximizes the stationary hit probability. We then analyze a special case where the request processes come from a scale family and derive a suitable large-scale limit as the catalog size $N \to \infty$ when a fixed fraction c of items can be stored. In this limiting regime, the optimal policy amounts to comparing the stochastic intensity of the process with a fixed threshold, which is defined by a quantile of an appropriate limit distribution. We derive asymptotic performance metrics as well as sharp estimates for the prelimit case. Moreover, we establish a connection with optimal timer-based policies for renewal traffic and monotone hazard rates. We also present detailed validation examples of our results, including some closed-form expressions for the miss probability that are compared with simulations. We also use these examples to exhibit the significant superiority of the optimal policy for the case of regular traffic patterns.

Funding: This research was partially supported by the U.S. Air Force Office of Scientific Research [Grant FA9550-23-1-0350].

1. Introduction

In modern computing systems, a crucial component for improving performance at several layers is the use of a local memory, typically referred to as a cache. The rationale is that storing frequently used items at a readily available location can improve latency, reducing the cost of retrieval from a more remote location. This strategy is pervasive in computing systems: from local caching of instructions at the processor level to texture caching in graphics processing units, disk caching for quickly retrieving data from hard disks, content caching in web applications, geographically distributed caching in content delivery networks, and cloud storage gateways keeping readily available items stored in the cloud data centers.

A local memory consists of a certain amount of memory space that may store a subset of items locally and temporarily; the goal is to appropriately choose from a large catalog the subset of items more likely to be requested next. All of the aforementioned applications can be subsumed into this basic structure. In the case of homogeneous item sizes, the space constraint is reflected in the number of objects C that may be locally stored from the catalog of size $N ≫ C$ .

Classical analyses of local memory systems have focused on modeling the (discrete-time) sequence of item requests. Based on this empirical sequence, the system must determine which items are stored and which must be evicted from memory to make room for others. In a stationary regime, one natural strategy would be to store at all times the C items with the highest popularity measured by their mean intensity of requests; this simple static policy requires, however, popularities to be known. A practical approximation is the least frequently used (LFU) eviction policy; here, item request intensities are dynamically estimated from the arrival sequence, and the least popular ones are evicted. Under mild assumptions on stationary arrivals, LFU will eventually converge to the static policy.

A popular alternative is the least recently used (LRU) policy, which keeps in memory the C most recently requested items. Upon receiving a request, it will store the item if not already present, and in that case, it will evict the oldest item in the sequence of requests. This method is better suited to handle highly correlated demands, where a requested item is more likely to be fetched again (i.e., bursty request patterns). Smoother variants have been analyzed, and also, combinations of both approaches, such as Least Frequent Recently Used (cf. Bilal and Kang 2017), have been proposed. We review the relevant literature below.

A main drawback of the classical analysis is that relevant time (continuous-time) information is neglected; by focusing only on the sequence of requests, the models ignore interrequest times that are important characteristics of the request processes. An alternative approach that gives time the center stage is timer-based (time-to-live (TTL)) caching policies; here, items are stored upon request and evicted only after a certain amount of time has elapsed because their last appearance in the request stream. This is a common approach in internet-based systems, such as domain name system queries and web applications.

A crucial step toward incorporating time information is the seminal paper by Fofack et al. (2014), where the incoming request stream is assumed to be a stationary point process on the real line. The mean intensities of the underlying request processes for each item capture their relative popularity, whereas by modeling the interrequest times, we can express different types of behavior; for instance, heavy-tailed interrequest times are well adapted to bursty arrival patterns. This approach has led to new insights in the analysis of both replacement and timer-based caching policies, which we also review below.

Within this modeling framework, we ask a natural question. What is the optimal memory management policy? By this, we mean the one that maximizes the hit rate, which is the frequency of successful retrievals from local memory. In Ferragut et al. (2016, 2018), the optimality question for TTL caching policies was investigated; in particular, it was shown that if requests form a general stationary simple point process, optimality may be characterized by the hazard rate function of the interarrival distribution. Later, Panigrahy et al. (2022) analyzed optimal replacement policies (i.e., fixed memory systems), and they identified the optimal local storage policy in terms of the stochastic intensity, a generalization of the hazard rate function.

In this paper, our goal is to develop this connection more extensively, bringing to bear the machinery of stationary point processes as in Brémaud (2020) to formally characterize causal memory management policies and the condition for optimality, in particular for general stationary request processes and correlation structures. Subsequently, our aim is to understand the large-scale behavior of the optimal replacement policy for large N under some appropriate assumptions: namely, that request processes are independent and that interarrival times for different items are drawn from a scale family parametrized by a distribution of intensities that has an asymptotic limit. We characterize the optimal memory management as a threshold policy on the hazard rates observed by the system, and we prove that the threshold has a deterministic limit. We also obtain a closed expression for the optimal performance in the large-scale limit as a function of the fundamental parameters of the scale family and the popularity distribution. As a further contribution, we investigate the properties of threshold policies before the asymptotic limit, showing that under monotonicity of the hazard rate function, they become equivalent to timer-based policies. As a consequence, we obtain the convergence of the two types of policies (replacement or TTL) to a common optimum in the asymptotic limit, a striking connection between both approaches. Our results are illustrated by a series of parametric examples, and their properties are further exhibited through stochastic simulations.

1.1. Related Work

Being essential to modern computing systems, the performance analysis of caching and local memory management has a long history. The seminal work on replacement-based algorithms with a fixed memory size started in King (1971) and Gelenbe (1973). Despite its apparent simplicity, the analysis of replacement policies is complex, even for a single cache; King (1971) gives an explicit expression of the hit probability of the LRU policy with exponential complexity. In Gelenbe (1973), it is shown that under the so-called independent reference model (IRM) assumption, where requests are independent and identically distributed, replacement policies, such as first in, first out, achieve the same hit probability. Exact computation is, however, intractable. Dan and Towsley (1990) provide an approximate computational procedure. Their method has been further extended in Rosensweig et al. (2010) to the case of networks of cache systems. Also, in Gast and Houdt (2015), another approach is given for list-based replacement policies.

A related line of work that includes replacement policies is analyses based on the move-to-front (MTF) rule, which is equivalent to LRU. A first step in this direction is the work by Fill (1996), which addresses the limit cost of the move-to-front rule. By exploiting the connection between MTF and LRU, Jelenković (1999) and Jelenković and Radovanović (2008) provide asymptotic expressions for the hit probability under the IRM assumption and Zipf popularities with parameter $β$ . With a fundamentally different approach, based on Laplace transforms, the same limit result is obtained in Barrera and Fontbona (2010). Our asymptotic limit assumptions are related to this latter approach. Few results on replacement policies move beyond the IRM assumptions, such as the works by Jelenković and Radovanović (2003, 2004) and Jelenković et al. (2006), which show some insensitivity properties of LRU in a large-scale regime and dependent requests. However, their approach is only valid for light-tailed popularities, and it is related to our result below on optimal performance for the nonuniformly integrable popularity limit. Extensions to a network of caches working cooperatively to optimize performance are proposed in Borst et al. (2010), Ioannidis et al. (2010), and Ioannidis and Yeh (2016).

Because exact analysis of LRU is difficult, the most popular technique for approximating its performance is the so-called “Che approximation” introduced in Che et al. (2002) for web caches. The crucial point is to define a characteristic time representing the average permanence time common to all files. This assumption is valid in a large-scale regime, and Fricker et al. (2012) perform a second-order analysis to explain why this is a good approximation.

On the other hand, the analysis of policies based on timers is more recent and associated with the growth of internet caches. A first contribution in this line is Jung et al. (2003), with expressions for the steady-state hit probabilities for web pages; this line of work was later extended in Bahat and Makowski (2005) to include update delay. However, a crucial contribution is the introduction of point process theory to capture request correlations and in particular, move beyond the sequence of requests to a continuous-time model with general arrival processes. The foundations for this approach were laid down in Fofack et al. (2012, 2014). In Berger et al. (2014), the analysis is extended to a family of TTL policies with the focus of approximating LRU performance in linear and tree networks, and a different policy is proposed in Berger et al. (2015) with the aim of maximizing hit ratios by variance reduction. In Bianchi et al. (2013), the Che approximation is analyzed in a more general setting, and in Martina et al. (2014), it is extended using TTL cache tools to renewal arrivals, showing good accuracy in the case of small caches and Zipf-law popularities and making the connection between replacement and timer-based policies. The fact that timer-based policies decouple the analysis over independent request streams has led to more amenable generalizations to the case of cache networks (Panigrahy et al. 2017, 2020; Dehghan et al. 2019).

The search for optimal policies with respect to the hit rate has received recent attention. In Ferragut et al. (2016, 2018), the problem of finding the optimal timer-based policy is formulated and related to the hazard rate function of the interrequest times. Using tools from convex optimization, Ferragut et al. (2016, 2018) show that decreasing hazard rates lead to a nontrivial timer-based policy that achieves optimality and characterize its large-scale behavior. However, more regular request traffic with increasing hazard rates does not benefit from caching at all, a striking result that highlights the role of traffic regularity in local memory management systems. More recently in Ferragut et al. (2024), a generalization of timer-based policies to consider prefetching was introduced, leading to a nontrivial optimal policy for increasing hazard rates. A parallel research line developed in Panigrahy et al. (2022) considers the optimality question for replacement-based policies; the structure of the optimal causal policy is identified, incorporating for the first time the notion of stochastic intensity of point processes, a generalization of the hazard rate function. The current paper builds on these immediate precursors.

1.2. Main Contributions

The first contribution of this paper is a general foundation for the problem of optimal causal policies for the local memory or caching problem. The setup is the same as the one considered in Panigrahy et al. (2022), but we provide here a more complete mathematical treatment within the framework of point processes in the real line following Brémaud (2020). In particular, this approach enables us to properly define predictable policies in terms of the underlying natural filtration of the involved processes, which is a key technical issue to be addressed in the proofs. We also aim at greater generality, determining the optimal causal policy for a general superposition of request processes, including the correlated case, as a function of stochastic intensities of the underlying request processes and their correlation structure. We then specialize this result to independent sources.

The second most significant part of the paper concerns the asymptotic behavior of the optimal policy under a large-scale regime (i.e., when the item catalog size goes to infinity). Assuming that the request processes for different items are independent with stochastic intensities from a common scale family and that their popularities have a limiting distribution, we prove that the optimal policy converges to a threshold policy in the stochastic intensities, with a deterministic threshold that follows closed-form expressions. Armed with this result, we characterize the limit of the optimal miss rate in large-scale systems through an explicit formula. Therefore, we provide a universal asymptotic bound on performance for any practical policy.

A third contribution concerns the behavior of deterministic threshold policies before the asymptotic limit; under renewal traffic and additional monotonicity assumptions of the hazard rate function of the interrequest times, it is shown that previously analyzed timer-based policies are, in fact, of the threshold type and in particular, that under renewal assumptions, the optimal timer policy for decreasing hazard rates identified in Ferragut et al. (2018) achieves the universal bound. For the dual case of increasing hazard rates, we further analyze the timer-based prefetching policy introduced in Ferragut et al. (2024) and prove that it also achieves the universal asymptotic bound.

Although the preceding theory is developed for the case of homogeneous item sizes, we also explore the generalization to the heterogeneous case.

In our simulations section, we extensively analyze certain concrete parametric examples that illustrate the theory and serve as a benchmark for classical caching policies in comparison with the optimal. Finally, we provide some remarks on the practical challenges in implementing the optimal policy.

1.3. Organization of the Paper

The paper is organized as follows. In Section 2, we lay out the main tools of point processes and stochastic intensities required to analyze our system. We then define our local memory model, establish the framework for causal policies, and characterize the optimal policy in Section 3. Our main theorem describing the large-scale limit is presented in Section 4, and the ensuing universal performance bound is presented in Section 5. In Section 6, we describe the connection between the optimal policy and timer-based policies. In Section 7, we discuss the extension of the results to heterogeneous item sizes. Simulations and examples are presented in Section 8, and conclusions are given in Section 9.

2. Preliminaries and Notation

Throughout this paper, we will consider stationary point processes defined on a common probability space $(Ω, F, P)$ . We recall now some basic concepts that will be useful in the following and introduce our notation; we refer the reader to Brémaud (2020) for a thorough treatment.

A simple and locally finite point process $Φ$ on the real line is a random and strictly increasing sequence of points $Φ = {τ_{k}}_{k \in Z}$ satisfying $\lim_{k \to \pm \infty} τ_{k} = \pm \infty$ . More formally, $Φ$ can be cast as a random counting measure (i.e., $Φ = \sum_{k} δ_{τ_{k}}$ ), a measurable map from $(Ω, F) \to (M^{#} (R), M^{#} (R))$ . Here, $M^{#} (R)$ is the space of locally finite measures on $R$ taking values in $N \cup {\infty}$ , and $M^{#} (R)$ is the smallest $σ$ -algebra such that for all Borel sets $B \in B (R)$ , $Φ (B) = \sum_{k} 1_{{τ_{k} \in B}}$ is measurable. Moreover, $Φ (B)$ is assumed to be finite for bounded B, and thus, it is a nonnegative integer-valued random variable. By definition, all points $τ_{k}$ are different, and thus, $Φ ({x}) ⩽ 1$ P almost surely (a.s.) for all $x \in R$ . In order to label the points, we follow the usual convention (Brémaud 2020), where $τ_{0} (Φ) ⩽ 0$ and $τ_{1} (Φ) > 0$ . With this convention, $τ_{0} = τ_{0} (Φ)$ represents the first point before the time origin of the process $Φ$ .¹

Let $S_{t} (Φ)$ denote the shift operator for measures in $R$ (i.e., $S_{t} (Φ) (B) ≔ Φ (B + t)$ for all Borel sets $B \subset R$ , where $B + t ≔ {x + t : x \in B}$ ). The point process $Φ$ is stationary if $S_{t} (Φ)$ has the same distribution as $Φ$ for all $t \in R$ . The mean measure of the point process is $λ (B) ≔ E [Φ (B)]$ . If the process is stationary, then this measure is translation invariant and thus, a multiple of the Lebesgue measure on $R$ (i.e., $λ (B) = λ m (B)$ ). The constant $λ$ is called the (average) intensity of the stationary point process. In what follows, we assume $λ > 0$ to avoid the trivial case where the process has no points.

For a simple stationary point process $Φ$ , an important measure is the Palm probability $P_{Φ}^{0}$ . This is a probability measure defined in $(Ω, F)$ that captures the stochastic behavior of the point process when the observer is synchronized with it. In particular, $P_{Φ}^{0} (τ_{0} = 0) = 1$ (i.e., there is $P_{Φ}^{0}$ -a.s. a point at the origin). We refer the reader to Brémaud (2020) for a formal definition.

The key relationship between the Palm probability and the stationary probability is the following inversion formula valid for any nonnegative real-valued measurable function $f : M^{#} (R) \to R_{+}$ :

E [f (Φ)] = λ E_{Φ}^{0} [\int_{0}^{τ_{1}} f (S_{t} (Φ)) d t] .

(1)

That is, in order to know the average value of a property in the stationary measure, we can integrate over one cycle of the process using the Palm measure and scale by $λ$ .

The interarrival distribution of the point process $Φ$ is defined as $F_{0} (t) ≔ P_{Φ}^{0} (τ_{1} - τ_{0} ⩽ t) = P_{Φ}^{0} (τ_{1} ⩽ t)$ . Because the process is simple, $F_{0}$ has support in $R_{+}$ . Moreover, $E_{Φ}^{0} [τ_{1}] = 1 / λ$ , which follows from taking $f \equiv 1$ in (1).

A second important distribution is the age distribution, which is the age of the current interval when the process is observed at a point t not synchronized with it. Because the process is stationary, we can take without loss of generality $t = 0$ , and thus, the age distribution is just

F (t) ≔ P (- τ_{0} ⩽ t) = λ \int_{0}^{t} (1 - F_{0} (s)) d s,

(2)

a result that also follows from (1) with

f (Φ) = 1_{{τ_{0} (Φ) ⩾ - t}}

. Note that these distributions are different in general because of the bias toward larger intervals when sampling in the steady state. A depiction of this sampling effect is shown in Figure 1.

**Figure 1. Interarrival and Age Distribution of a Stationary Point Process**

2.1. Stochastic Intensity

We now introduce the concept of stochastic intensity, which is crucial for the analysis in this paper. First, we need the following definition.

Definition 1.

A filtration ${F_{t}}_{t \in R}$ in $(Ω, F)$ (i.e., an increasing family of $σ$ -algebras contained in $F$ ) is a history of the simple locally finite point process $Φ$ if $Φ ((a, b])$ is $F_{t}$ measurable for all $a < b ⩽ t$ . The natural history of $Φ$ is $F_{t} = σ (Φ ((a, b]) : a < b ⩽ t)$ , the smallest filtration satisfying this property.

We define the stochastic intensity of a process for a given history (Brémaud 2020, definition 5.1.1).

Definition 2.

Let $Φ$ be a simple locally finite point process, and let $F_{t}$ be a history of $Φ$ . If there exists a locally integrable $F_{t}$ -adapted process $φ (t) ⩾ 0$ satisfying

E [Φ ((s, t]) | F_{s}] = E [\int_{s}^{t} φ (u) d u | F_{s}] for all s < t,

(3)

then

φ (t)

is called an

F_{t}

stochastic intensity of

Φ

The process $φ (t)$ acts as the local likelihood of a point appearing at time t given past information. This notion will play a key role in our particular application. Also, directly from the definition, one can prove that $E [φ (t)] = E [φ (0)] = λ$ , the average intensity of the process.

As an example, the stationary Poisson process of intensity $λ$ with its natural history satisfies (3) with $φ (t) \equiv λ$ , a deterministic constant, and thus, the likelihood of a point appearing in time is independent of the past, reflecting the total randomness property of the Poisson process.

We now highlight some key properties of the stochastic intensity that will be useful for our later analysis. The first one is related to predictability; for this, we need the following definition.

Definition 3.

Let ${F_{t}}_{t \in R}$ be a filtration on $(Ω, F)$ . The predictable σ-algebra $P (F .)$ associated with $F_{t}$ is the σ-algebra on $R \times Ω$ generated by the sets of the form

(a, b] \times A, a < b, A \in F_{a} .

A stochastic process $X (t, ω)$ taking values on a measurable space $(E, E)$ is $F_{t}$ predictable if the mapping $(t, ω) \mapsto X (t, ω)$ is $P (F .)$ measurable.

The following properties can be found in Brémaud (2020); any real-valued and left-continuous stochastic process adapted to $F_{t}$ is $F_{t}$ predictable. Moreover, the stochastic intensity of a point process $Φ$ if it exists can always be chosen to be a.s. predictable up to a set of Lebesgue measure 0, and thus, it is essentially unique. Also, the following result provides a smoothing formula for predictable processes based on the stochastic intensity.

Theorem 1.

Let $Φ$ be a simple point process in $R$ with history $F_{t},$ and let $φ (t)$ be an $F_{t}$ -adapted and a.s.-integrable process. Then, $φ (t)$ is an $F_{t}$ -stochastic intensity of $Φ$ if and only if

E [\int Z (t) Φ (d t)] = E [\int Z (t) φ (t) d t]

(4)

holds for all

F_{t}

-predictable processes

Z (t)

Finally, we quote the following result, known as Papangelou’s formula (Brémaud 2020, theorem 7.7.5).

Theorem 2.

Let $Φ$ be a simple point process in $R$ with history $F_{t},$ stochastic intensity $φ (t)$ , and mean intensity $λ$ . If $Z (t)$ is an $F_{t}$ -predictable process and $f (z)$ is a real-valued function, then

E [f (Z (0)) φ (0)] = λ E_{Φ}^{0} [f (Z (0))] .

(5)

For the purposes of our analysis, an important random variable is the stochastic intensity observed when sampling the renewal process at a fixed time; for convenience, it is chosen to be $t = 0$ . We call it the observed stochastic intensity (OSI) and denote it by $X ≔ φ (0)$ .

To find the distribution of the OSI, there are two underlying probability measures to consider.

Definition 4.

Let $X ≔ φ (0)$ be the observed stochastic intensity at time 0. Define

G (x) ≔ P (X ⩽ x) = E [1_{{φ (0) ⩽ x}}]; G_{0} (x) ≔ P_{Φ}^{0} (X ⩽ x) = E_{Φ}^{0} [1_{{φ (0) ⩽ x}}] .

(6)

Here, $G (x)$ is the distribution of the stochastic intensity at a sampling point not synchronized with the process. $G_{0}$ is the distribution of the OSI at the time of a request; because the stochastic intensity can be chosen to be left continuous, $G_{0}$ identifies the stochastic intensity distribution prior to this arrival. A useful inequality that will be needed later is the following lemma.

Lemma 1.

The distribution of the OSI prior to arrival events satisfies

G_{0} (x) ⩽ \frac{x}{λ} .

(7)

Proof.

The result follows from the formula of Papangelou; taking $Z (t) = φ (t)$ and $f (z) = 1 {z ⩽ x}$ in (5), we get

λ G_{0} (x) = λ E_{Φ_{0}}^{0} [1_{{φ (0) ⩽ x}}] = E [φ (0) 1_{{φ (0) ⩽ x}}] ⩽ E [x 1_{{φ (0) ⩽ x}}] ⩽ x .

□

2.2. Renewal Point Processes

As an important special case, consider now that $Φ$ is a stationary renewal process, meaning that the interrequest time sequence ${τ_{k + 1} - τ_{k}}_{k \in Z}$ includes $P_{Φ}^{0}$ independent and identically distributed random variables with common distribution $F_{0}$ . We assume that $F_{0}$ has a density $f_{0}$ and recall the definition of the failure rate or hazard rate function associated with $F_{0}$ :

η (t) = \frac{f_{0} (t)}{1 - F_{0} (t)} .

(8)

Then, if $F_{t}$ is the natural history of the process, the stochastic intensity of $Φ$ is given by Daley and Vere-Jones (2003, chapter 7):

φ (t) = η (t - τ^{-} (t)),

(9)

where

τ^{-} (t) = \sup {τ_{k} : τ_{k} < t}

is the last point before t. Note that

τ^{-} (t)

is a left-continuous process. In particular, because of the renewal property, the local intensity of the process depends only on the age of the current interval and the hazard rate function of the interarrival distribution.

When $F_{0}$ is the exponential distribution as in the Poisson process, $η (t) \equiv λ$ , so the stochastic intensity is constant as previously stated. Another interesting parametric example that we will use for illustration is the case of Pareto interarrival times; it is presented below in Section 2.3.

In the renewal case, we can provide more explicit formulas for the distribution of the observed stochastic intensity $X ≔ φ (0) = η (- τ^{-} (0))$ . In particular, at a time not synchronized with arrivals, we will have

G (x) ≔ P (X ⩽ x) = P (η (- τ_{0}) ⩽ x) = P (- τ_{0} \in η^{- 1} ([0, x])) = \int_{η^{- 1} ([0, x])} F (d t)

(10)

because

- τ_{0} \sim F

, the age distribution. We note that the set

η^{- 1} ([0, x])

will be an interval if the hazard rates are monotone.

If instead, we are evaluating the OSI at an arrival time, we must use the Palm probability $P_{Φ}^{0}$ , for which $τ_{0} \equiv 0$ a.s. and $τ^{-} (τ_{0}) = τ^{-} (0) = τ_{- 1}$ . Therefore, $X = η (- τ_{- 1})$ , and its distribution is given by

G_{0} (x) ≔ P_{Φ}^{0} (X ⩽ x) = P_{Φ}^{0} (η (- τ_{- 1}) ⩽ x) = \int_{η^{- 1} ([0, x])} F_{0} (d t)

(11)

because

- τ_{- 1} \sim F_{0}

under the Palm probability.

2.3. Pareto Interarrival Times

An interesting parametric example of stationary renewal processes, which is considered in Ferragut et al. (2016) for the same application, is when interarrival times follow a heavy-tailed Pareto distribution. In this case, all of the above magnitudes have explicit expressions, which we now compute.

Example 1

(Renewal Pareto Process). In the above setting, choose

F_{0} (t) = 1 - {(\frac{1}{1 + γ t})}^{α}, f_{0} (t) = \frac{α γ}{{(1 + γ t)}^{α + 1}} (t ⩾ 0) .

(12)

Thus, $F_{0}$ is a Pareto distribution (starting at zero) with tail parameter $α > 1$ . The number $γ$ acts as a scale parameter, and by direct computation, in order to have $E_{Φ}^{0} [τ_{1}] = 1 / λ$ , it should be chosen such that $γ = \frac{λ}{α - 1}$ .

From Equations (2) and (8), we can compute

F (t) = 1 - {(\frac{1}{1 + γ t})}^{α - 1}, η (t) = \frac{α γ}{1 + γ t}, (t ⩾ 0) .

(13)

For this example, note that the hazard rate function is decreasing for any choice of the parameters. Therefore, at an arrival time $τ_{k}$ , the stochastic intensity increases (the hazard rate resets to $η (0)$ following (9)), and a subsequent arrival becomes more likely. This gives rise to bursty traffic as depicted in Figure 2.

**Figure 2. Stochastic Intensity of a Renewal Pareto Process Showing Decreasing Hazard Rates**

For this process, we can also compute the distributions of the observed stochastic intensity at nonsynchronized or synchronized times:

G (x) = \int_{η^{- 1} (x)}^{\infty} F (d t) = \int_{\frac{α}{x} - \frac{1}{γ}}^{\infty} F (d t) = {\begin{array}{l} {(\frac{x}{α γ})}^{α - 1}, & 0 ⩽ x ⩽ α γ, \\ 1 & x > α γ . \end{array}

(14)

G_{0} (x) = \int_{η^{- 1} (x)}^{\infty} F_{0} (d t) = \int_{\frac{α}{x} - \frac{1}{γ}}^{\infty} F_{0} (d t) = {\begin{array}{l} {(\frac{x}{α γ})}^{α}, & 0 ⩽ x ⩽ α γ, \\ 1 & x > α γ . \end{array}

(15)

In Figure 3, we depict the distributions $F_{0}$ , F, G, and $G_{0}$ as well as the hazard rate function $η$ for the case $α = 2$ and $γ = 1$ (with average intensity $λ = 1$ ). In this particular case, the nonsynchronized OSI distribution is uniform in [0, 2].

Figure 3. Cumulative Distribution Function (cdf) Shapes for the Pareto Renewal Example with $α = 2, γ = 1$ , and Therefore, $λ = 1$
*Notes*. (a) Interarrival cdf. (b) Age cdf. (c) Hazard rate. (d) Stationary observed hazard rate cdf. (e) Observed hazard rate upon arrival cdf.

3. Local Memory Systems and Optimal Storage Policy

We start by rigorously defining our model for a local memory system. Requests from a catalog of N equally sized items are received (we defer the discussion of heterogeneous sizes to Section 7). We model item requests by stationary stochastic point processes $Φ_{i}$ , $i = 1, \dots, N$ , which are defined on a common probability space and with finite intensity $λ_{i} > 0$ , a measure of the popularity of item i. By appropriate labeling, we can choose $λ_{1} ⩾ λ_{2} ⩾ \dots$ (i.e., the objects are ordered by decreasing popularities). Thus, the complete request process is the superposition $Φ ≔ \sum_{i = 1}^{N} Φ_{i}$ , and its total intensity is $λ^{N} = \sum_{i = 1}^{N} λ_{i}$ .

The local memory is limited, and thus, it can only keep readily available a subset of size $C < N$ of the items. These items can upon request be served from the local memory with lower cost than retrieving them from a central repository. This formulation is quite general, and it subsumes the typical notion of caching, which is useful in many applications.² Mathematically, the local memory can keep at any point in time a subset

C (t) = {i_{1}, \dots, i_{k}} \subset {1, \dots, N} with 0 ⩽ k ⩽ C .

We call the process $C : Ω \times R \to P_{C} (N)$ the storage process of the system, where $P_{C} (N)$ denotes the subsets of ${1, \dots, N}$ with size less than C.

For a given storage process, we can define the hit-counting process of the local memory system as

Ψ_{C} (B) = \sum_{i = 1}^{N} \int_{B} 1_{{i \in C (t)}} Φ_{i} (d t)

(16)

(i.e., the thinned process that counts the requests of objects only if they are stored in memory at the request time). The mean intensity $h_{C}$ of the process $Ψ_{C}$ is called the hit rate, and accordingly, it satisfies $h_{C} ⩽ λ^{N}$ .

The natural objective in this setting is to maximize $h_{C}$ (i.e., the rate at which requests can be served directly from the local memory) by choosing an appropriate policy that defines the storage process $C (t)$ . However, this policy should be causal (i.e., it cannot make use of future information). Otherwise, the policy that only stores the next arriving item achieves a maximum rate using only one unit of memory for any N, and the problem is trivial. This is where the predictability notion introduced in Section 2.1 becomes important.

Let us denote by $F_{t}^{(i)}$ the natural history of the ith request process $Φ_{i}$ , and $φ_{i} (t)$ is its stochastic intensity with respect to $F_{t}^{(i)}$ . Define $F_{t} = σ (F_{t}^{(i)}; i = 1, \dots, N)$ as the aggregated history (i.e., the information about all arrivals up to time t). Note that we are not assuming independence between the request processes. We have the following definition.

Definition 5.

Consider a local memory system with request processes ${Φ_{i} : i = 1 \dots N}$ , aggregated natural history $F_{t}$ , and capacity C. A causal memory management policy is a stationary $F_{t}$ -predictable stochastic process $C (t)$ , with values in the space $P_{C} (N)$ of all subsets of ${1, \dots, N}$ of size at most C (equipped with the discrete $σ$ -algebra).

We would like to characterize the causal policy that maximizes the hit rate $h_{C}$ . We need the following assumption.

Assumption 1.

The aggregated process $Φ$ is simple, and each process $Φ_{i}$ admits a stochastic intensity ${\tilde{φ}}_{i} (t)$ with respect to the aggregated natural history $F_{t}$ for all $i = 1, \dots, N$ .

Note that the stochastic intensity ${\tilde{φ}}_{i} (t)$ need not be equal to the individual stochastic intensity $φ_{i} (t)$ because of crossinformation across arrivals. However, the average intensity should still satisfy $E [{\tilde{φ}}_{i} (0)] = λ_{i}$ . The following example illustrates this fact.

Example 2.

Consider the following process; objects $1 \dots, N$ are requested alternatively one after another. When a request for item i arrives, an exponential distribution with parameter $λ_{i}$ is started, and after expiration, the request for object $i + 1$ arrives (with the cyclic convention $N + 1 \mapsto 1$ ). Let $Φ_{i}$ denote the requests for item i and $Φ$ denote the total request process. Then, it is clear that $Φ_{i}$ is a renewal process with interval length following a hypoexponential (phase-type) distribution (i.e., the sum of independent exponentials with parameters $λ_{i}$ ). In this case, by Equation (9), $φ_{i} (t) = η (t - τ_{i}^{-})$ , where $η$ is the hazard rate function of the resulting hypoexponential, and it is the same for all i.

However, when looking at the aggregated history, it is clear that arrivals for item i can only come at rate $λ_{i}$ after a request from process $i - 1$ ; therefore,

{\tilde{φ}}_{i} (t) = λ_{i} 1_{{τ^{-} (t) = τ_{i - 1}^{-} (t)}}, i = 1, \dots, N,

where

τ^{-} (t)

is the last point before t of the aggregated process.

Assumption 1 is also needed to guarantee the existence of $\tilde{φ}$ ; as an extreme example, consider in particular two processes $Φ_{1}$ and $Φ_{2} = S_{t_{0}} (Φ_{1})$ (i.e., a delayed version of the first with $t_{0} > 0$ fixed). Then, their stochastic intensities with respect to their own natural histories are just delayed versions of one another: $φ_{2} (t) = φ_{1} (t - t_{0})$ . However, in the combined history of both processes, the stochastic intensity of process 2 is zero everywhere except that it degenerates into Dirac $δ$ functions at the $t_{0}$ -shifted points of process 1. Hence, these processes do not have a stochastic intensity in the sense of Definition 2 (which requires a proper stochastic function $φ_{i} (t)$ ) with respect to the combined history of both processes. Assumption 1 ensures that these pathological examples with complete correlation are excluded.

We are now in position to compute the stochastic intensity of the hit process.

Lemma 2.

Let $C (t)$ be a causal policy. The stochastic intensity of its associated hit process $Ψ_{C} (t)$ defined by (16) with respect to the aggregated history $F_{t}$ is

φ_{C} (t) = \sum_{i = 1}^{N} {\tilde{φ}}_{i} (t) 1_{{i \in C (t)}} .

(17)

Proof.

We apply the smoothing Equation (4) in Section 2.1. For an $F_{t}$ -predictable process $Z (t)$ , we write

\begin{array}{l} E [\int_{R} Z (t) Ψ_{C} (d t)] = \sum_{i = 1}^{N} E [\int_{R} Z (t) 1_{{i \in C (t)}} Φ_{i} (d t)] = \sum_{i = 1}^{N} E [\int_{R} Z (t) 1_{{i \in C (t)}} {\tilde{φ}}_{i} (t) d t] \\ = E [\int_{R} Z (t) (\sum_{i = 1}^{N} 1_{{i \in C (t)}} {\tilde{φ}}_{i} (t)) d t] = E [\int_{R} Z (t) φ_{C} (t) d t] . \end{array}

The second equality follows from the smoothing Equation (4) because by Assumption 1, the stochastic intensity of $Φ_{i}$ with respect $F_{t}$ is ${\tilde{φ}}_{i} (t)$ . The last equality is because of the process $Z (t) 1_{{i \in C (t)}}$ being $F_{t}$ predictable. This is the case because $C (t)$ is causal and thus, $P (F .)$ measurable; therefore, $Z (t) 1_{{i \in C (t)}}$ is also $P (F .)$ measurable for all i.

Because the identity holds for any $F_{t}$ -predictable process $Z (t)$ , the converse implication in the smoothing formula establishes (17). □

Define now the policy $C^{*} (t)$ as follows. At any point in time, rank the items by decreasing stochastic intensities ${\tilde{φ}}_{i} (t)$ , and store the C highest; ties may be broken arbitrarily. Note that this policy maximizes the sum of the stochastic intensities of the items in memory

C^{*} (t) ≔ \underset{C \in P_{C} (N)}{arg max} \sum_{i \in C} {\tilde{φ}}_{i} (t) .

(18)

Theorem 3

(Optimal Causal Memory Management Policy). Given a local memory system with capacity C fed by a family of stationary point processes, ${Φ_{i} : i = 1, \dots, N}$ with aggregated natural history $F_{t}$ , satisfying Assumption 1. Then, for any causal policy $C (t)$ , its stationary hit rate $h_{C}$ satisfies

h_{C} = E [φ_{C} (0)] ⩽ E [φ_{C^{*}} (0)] = h_{C^{*}},

with

φ_{C} (t)

as in (17) and

C^{*}

as defined in (18). In addition, the hit process (16) is also simple, and the hit probability

H_{C} ≔ P_{Φ}^{0} (Ψ_{C} ({0}) = 1)

is maximized by the policy

C^{*}

Proof.

Note from Equation (17) and the construction of $C^{*}$ that for any realization,

φ_{C} (0) = \sum_{i = 1}^{N} {\tilde{φ}}_{i} (0) 1_{{i \in C}} ⩽ \max_{{i_{1}, \dots, i_{C}}} \sum_{i \in {i_{1} . \dots, i_{C}}} {\tilde{φ}}_{i} (0) = φ_{C^{*}} (0) .

(19)

The first inequality follows by taking expectations on both sides with respect to the joint probability measure $P$ of the arrival processes (nonsynchronized with arrivals).

To derive the second statement, we begin by observing that by definition (Equation (16)), $Ψ_{C} (B) ⩽ Φ (B)$ for any $B \in B (R)$ , so $Ψ_{C}$ must also be simple. Therefore, it suffices to show that for any causal policy, $h_{C} = λ^{N} H_{C}$ (i.e., the hit rate is the total rate “thinned” by the hit probability). This is a natural property but nontrivial because the thinning is not independent of the arrival process.

From the superposition properties of stationary point processes (Brémaud 2020, example 7.2.11), under Assumption 1, the aggregated process is simple, and for any event A, we must have

P_{Φ}^{0} (A) = \sum_{i = 1}^{N} \frac{λ_{i}}{λ^{N}} P_{Φ_{i}}^{0} (A) .

(20)

To interpret the above, note that given a point in $Φ$ occurring at time 0, $λ_{i} / λ^{N}$ is the probability that this point comes from process i. Now, apply (20) to the event $A = {ω : Ψ_{C} ({0}) = 1}$ :

H_{C} = P_{Φ}^{0} (Ψ_{C} ({0}) = 1) = \sum_{i = 1}^{N} \frac{λ_{i}}{λ^{N}} P_{Φ_{i}}^{0} (Ψ_{C} ({0}) = 1) .

We now compute $P_{Φ_{i}}^{0} (Ψ_{C} ({0}) = 1)$ as

P_{Φ_{i}}^{0} (Ψ_{C} ({0}) = 1) = P_{Φ_{i}}^{0} (\underset{i}{\cup} {Φ_{i} ({0}) = 1, i \in C (0)}) = P_{Φ_{i}}^{0} (i \in C (0))

because under

P_{Φ_{i}}^{0}

P_{Φ_{i}}^{0} ({Φ_{i} ({0}) = 1}) = 1

and

P_{Φ_{i}}^{0} ({Φ_{j} ({0}) = 1}) = 0

for

j \neq i

because of

Φ

being simple. We conclude that

H_{C} = \sum_{i = 1}^{N} \frac{λ_{i}}{λ^{N}} P_{Φ_{i}}^{0} (i \in C (0)) .

(21)

Now, return to (17) and its expectation at $t = 0$ :

h_{C} = E [\sum_{i = 1}^{N} {\tilde{φ}}_{i} (0) 1_{{i \in C (0)}}] = \sum_{i = 1}^{N} E [{\tilde{φ}}_{i} (0) 1_{{i \in C (0)}}] .

Because $1_{{i \in C (t)}}$ is F_t predictable for any causal policy, it follows from Papangelou’s formula (Equation (5)) that

E [{\tilde{φ}}_{i} (0) 1_{{i \in C (0)}}] = λ_{i} E_{Φ_{i}}^{0} [1_{{i \in C (0)}}] = λ_{i} P_{Φ_{i}}^{0} (i \in C (0)) .

Summation over i gives together with (21), $h_{C} = λ^{N} H_{C}$ as claimed, and the result follows. □

Thus, Theorem 3 gives a clean formulation of what the optimal policy is: just keep track of the most likely arrivals in the system given the complete past history of the arrival stream. In particular, this generalizes Panigrahy et al. (2022) to general (possibly correlated) superpositions that provided that they satisfy Assumption 1.

3.1. The Case of Independent Request Streams

In the preceding analysis, computing the optimal policy requires knowledge of the stochastic intensities ${{\tilde{φ}}_{i} (t) : i = 1, \dots, N}$ with respect to the aggregated history. This may require the knowledge of the underlying correlations. This is greatly simplified if the request processes are independent.

Consider now a local memory system with capacity C fed by N-independent stationary request processes $Φ_{i}$ with average intensities $λ_{i} > 0$ as before (in decreasing order). Again, let $F_{t}^{(i)}$ be the natural history of the ith process, and let $F_{t}$ be the aggregated history. Again, let $P_{Φ}^{0}$ denote the Palm probability of the superposition process, and let $P_{Φ_{i}}^{0}$ be the Palm probability of the ith process.

The superposition of independent simple processes is simple (Baccelli and Brémaud 2013, property 1.1.1), and besides Equation (20), we also have (Brémaud 2020, example 7.2.8)

P_{Φ_{i}}^{0} (Φ_{1} \in Γ_{1}, \dots, Φ_{N} \in Γ_{N}) = P_{Φ_{i}}^{0} (Φ_{i} \in Γ_{i}) \prod_{j \neq i} P (Φ_{j} \in Γ_{j})

(22)

for any

Γ_{i} \in M^{#} (R)

and

i = 1, \dots, N

The interpretation of Equation (22) is the following; in order to compute Palm probabilities given that the point comes from process i, we must use the Palm probability for process i and the stationary probability for any other $j \neq i$ , and the processes remain independent.

This, in turn, allows us to prove the following property.

Lemma 3.

If $Φ = \sum_{i} Φ_{i}$ is the superposition process, then $F_{t}$ is a history of $Φ$ . Moreover, $φ_{i} (t)$ is a stochastic intensity for process i with respect to the enlarged history $F_{t}$ , and the total stochastic intensity of $Φ$ is simply $φ (t) = \sum_{i = 1}^{N} φ_{i} (t)$ .

Proof.

First, $Φ ((a, b])$ is $F_{t}$ measurable for all $a < b ⩽ t$ because the sum function is measurable and $F_{t}^{(i)} \subset F_{t}$ . Second, $φ_{i} (t)$ is the stochastic intensity of $Φ_{i}$ with respect to the shared history $F_{t}$ because the conditional expectation given $F_{t}^{(i)}$ coincides with the conditional expectation given $F_{t}$ for any random variable independent of $F_{t}^{(j)}$ with $j \neq i$ . This fact is, in turn, a consequence of the independence of the filtrations ${F_{t}^{(i)}}_{i = 1}^{N}$ and that $F_{t}$ is generated by events of the form $\cap_{i = 1}^{N} A_{i}$ with $A_{i} \in F_{t}^{(i)}$ , $i = 1, \dots, N$ . Finally, that $φ (t)$ is the stochastic intensity of $Φ (t)$ with respect to $F_{t}$ follows immediately by linearity of conditional expectation. □

The above lemma guarantees that the stochastic intensity of process i is not altered by superposition with other independent processes; there is no crossinformation between the processes. In particular, Assumption 1 is satisfied, and therefore, we can replace ${\tilde{φ}}_{i} (t)$ by $φ_{i} (t)$ in Theorem 3 to obtain Theorem 4.

Theorem 4

(Optimal Causal Memory Management Policy, Independent Case). Given a local memory system with capacity C fed by independent stationary point processes, ${Φ_{i} : i = 1, \dots, N}$ with individual stochastic intensities $φ_{i} (t)$ . Then, the optimal causal policy $C^{*}$ corresponds to the following.

At any point in time, rank the items in decreasing order of their individual stochastic intensities $φ_{i} (t)$ .
Store in memory the first C objects in the ranking.

Any other causal policy $C$ will satisfy $h_{C} ⩽ h_{C^{*}}$ and $H_{C} ⩽ H_{C^{*}}$ .

The above theorem characterizes the structure of the optimal causal policy for any superposition of independent request processes (i.e., where no additional information is available about the future other than the natural history of the requests and where there is no crossinformation between them). If the processes are correlated among them, this statement will not be true in general, and we will have to keep track of the shared stochastic intensity with respect to the common history ${\tilde{φ}}_{i} (t)$ .

We now analyze some examples where the above policy can be further characterized. The simplest is the Poisson case.

Example 3

(Poisson Arrivals). In case all processes $Φ_{i}$ are Poisson with $λ_{1} ⩾ \dots ⩾ λ_{N}$ , $φ_{i} (t) \equiv λ_{i}$ . By Theorem 3, the optimal policy $C^{*}$ is the static policy that stores the C objects with higher (average) intensities at all times. This is, of course, connected to the total independence property of Poisson processes, and it is also well known in the caching literature (see, e.g., Garetto et al. 2016) under the name independent reference model.

Example 4

(Renewal Arrivals). In the more general class of renewal arrival processes, by Equation (9), $φ_{i} (t) = η_{i} (t - τ_{i}^{-} (t))$ . In this case, the optimal policy can be recast in the following way.

Rank all contents in decreasing order of the current interval hazard rates, $η_{i} (t - τ_{i}^{-} (t))$ .
Store in memory the first C objects in the ranking.

As we can see, the role of the hazard rates is crucial as already identified in Ferragut et al. (2016) for timer-based caching. Different monotonicity assumptions on these hazard rates lead to completely different optimal policies as we shall see in Section 6.

An issue with the optimal policy is that its main performance metric, the hit rate, cannot be computed exactly except in some special cases. We now derive an asymptotic result that characterizes the optimal policy for large-scale systems and allows us to calculate asymptotic performance limits.

4. Large-Scale Analysis of the Optimal Causal Policy

In this section, we will present results concerning the asymptotic behavior of the optimal policy in a large-scale regime, where both the system load and the memory size scale appropriately to infinity. For this purpose, we need to introduce more structure into the problem; in particular, we will assume that requests for different items come from independent processes from a common scale family in the following sense.

Consider a base process with unit intensity $Φ_{0}$ , natural history $F_{t}^{(0)}$ , and stochastic intensity $φ_{0} (t)$ with respect to $F_{t}^{(0)}$ . In particular, $E [φ_{0} (t)] = 1$ . For our asymptotic analysis, we specify our scale family as satisfying the following.

Assumption 2.

The request processes $Φ_{i}$ for items $i \in {1, \dots, N}$ are independent with stochastic intensities

φ_{i} (t) \overset{d}{=} λ_{i} φ_{0} (λ_{i} t),

where

λ_{i}

is the process average request rate. Without loss of generality, we will assume that these rates are decreasing (i.e.,

λ_{1} ⩾ \dots ⩾ λ_{N} > 0

); equivalently, items are ordered in decreasing popularity.

A construction that generates processes $Φ_{i}$ satisfying this assumption is to take independent copies of $Φ_{0}$ and set $Φ_{i} (B) \overset{d}{=} Φ_{0} (λ_{i} B)$ (i.e., rescale the time axis).

It is direct from Definition 6 and Assumption 2 that if $X_{i} = φ_{i} (0)$ is the observed stochastic intensity for process i, then the following scaling relationships hold:

G^{(i)} (x) ≔ P (X_{i} ⩽ x) = P (φ_{0} (0) ⩽ x / λ_{i}) = G (x / λ_{i});

(23a)

G_{0}^{(i)} (x) ≔ P_{Φ_{i}}^{0} (X_{i} ⩽ x) = P_{Φ_{0}}^{0} (φ_{0} (0) ⩽ x / λ_{i}) = G_{0} (x / λ_{i}) .

(23b)

Example 5

(Scaled Renewal Process). A simple example where all of the above assumptions hold is when the base process is renewal with some base distribution $F_{0} (t)$ with a mean of one. In this case, the interarrival distribution of the ith process is

F_{0}^{(i)} (t) = F_{0} (λ_{i} t) .

We can also express the age distribution $F^{(i)} (t)$ and hazard rate function $η^{(i)} (t)$ for the ith request process in terms of the base case $F (t)$ , $η (t)$ :

\begin{array}{l} F^{(i)} (t) & = λ_{i} \int_{0}^{t} (1 - F_{0} (λ_{i} s)) d s = \int_{0}^{λ_{i} t} (1 - F_{0} (u)) d u = F (λ_{i} t); \\ η^{(i)} (t) & = \frac{f_{0}^{(i)} (t)}{1 - F_{0}^{(i)} (t)} = \frac{λ_{i} f_{0} (λ_{i} t)}{1 - F_{0} (λ_{i} t)} = λ_{i} η (λ_{i} t) . \end{array}

4.1. Threshold Characterization of the Optimal Policy

Under Assumption 2, the optimal policy of Theorem 4 can be recast as follows; at any given time, we have a sample of random variables $X_{i}, i = 1, \dots, N$ , each representing the observed stochastic intensity (for the renewal case, hazard rate since the last request) of the ith request process. Because the processes are independent, the $X_{i}$ ’s are independent but nonidentically distributed; in fact, $X_{i} \sim G^{(i)}$ .

According to Theorem 4, an item will be stored in memory if and only if its OSI is one of the C highest. An alternative way to cast this optimal policy is to consider the empirical distribution of the observed stochastic intensities defined by

{\hat{G}}_{N} (x) = \frac{1}{N} \sum_{i = 1}^{N} 1_{{X_{i} ⩽ x}};

(24)

then, object i is locally stored if and only if

{\hat{G}}_{N} (X_{i}) ⩾ \frac{N - C}{N} = 1 - \frac{C}{N}

Introducing the quantile function (inverse of the empirical distribution)

{\hat{Q}}_{N} (p) ≔ \inf {x : {\hat{G}}_{N} (x) ⩾ p}, p \in [0, 1],

(25)

the condition for storing item i in memory at a certain time may be expressed as

X_{i} ⩾ {\hat{θ}}_{N} ≔ {\hat{Q}}_{N} (1 - \frac{C}{N}) .

(26)

The random variable ${\hat{θ}}_{N}$ acts as a threshold in the OSI that determines the items to optimally store in local memory. Equivalently, ${\hat{θ}}_{N} = Y_{N - C}$ , where ${Y_{i} : i = 1, \dots, N}$ are the order statistics of the sample ${X_{i}, i = 1, \dots, N}$ .

We will pursue asymptotic results for a system with a very large catalog size N. If we can find a suitable limit for the empirical distribution ${\hat{G}}_{N} (x)$ as $N \to \infty$ and if the memory size scales linearly with N such that $\frac{C}{N} \to_{N} c$ , then the quantile ${\hat{θ}}_{N}$ should approach a limit (i.e., the large-scale behavior should resemble a policy with a fixed threshold in OSI).

Because the $X_{i}$ ’s are not identically distributed, we cannot use classical Glivenko–Cantelli arguments for empirical distribution convergence. Nevertheless, we will show that a suitable limit arises under Assumption 2 if the system load as defined by the request intensities also scales appropriately with N.

4.2. Scaling of the Request Intensities

We will construct a sequence of systems indexed by N, the number of arrival streams or in other words, items in its catalog. Denote by ${λ_{i}^{(N)}}_{i = 1}^{N}$ the arrival rates of the system of size N, with the above convention that $λ_{1}^{(N)} ⩾ \dots ⩾ λ_{N}^{(N)} > 0$ . We may think of these points as a discrete measure on the axis $λ \in R_{+}$ of possible process intensities. Normalizing this measure to total unit mass, we may write its distribution function

L_{N} (λ) ≔ \frac{1}{N} \sum_{i = 1}^{N} 1_{{λ_{i}^{(N)} ⩽ λ}} .

(27)

The total arrival rate for system of size N can then be expressed as

λ^{N} = \sum_{i} λ_{i}^{(N)} = N \int_{0}^{\infty} λ L_{N} (d λ) .

Our limit theorems will assume that this family of discrete distributions has a limit with $N \to \infty$ .

Assumption 3.

There exists a fixed distribution L with no atoms at $λ = 0$ such that $L_{N} \Rightarrow L$ when $N \to \infty$ . Here, $\Rightarrow$ denotes usual weak convergence of probability distributions.

To gain some intuition on the above condition, we turn to the following important example.

Example 6

(Scaling for Zipf Popularities). A widely used model of popularity among different items is the so-called Zipf distribution with the request rates $λ_{i} \propto i^{- β}$ , where $β ⩾ 0$ is the tail parameter of the Zipf law. Note that $β = 0$ corresponds to a uniform distribution, whereas as for large $β$ , relative popularities decay very fast.

In order to model this in our setting, we can use the following arrival rates for the Nth system:

λ_{i}^{(N)} = {(\frac{N}{i})}^{β}

with

β ⩾ 0

. Under this scaling, the least popular object has intensity 1 for all N. As N grows, larger intensities are included in the mix. Now, for any

λ > 1

, we have

1 - L_{N} (λ) = \frac{1}{N} \sum_{i = 1}^{N} 1_{{{(\frac{N}{i})}^{β} > λ}} = \frac{1}{N} \sum_{i = 1}^{N} 1_{{i < \frac{N}{λ^{1 / β}}}} = \frac{1}{N} ⌊ \frac{N}{λ^{1 / β}} ⌋ \underset{N \to \infty}{\to} λ^{- 1 / β} .

Because the above convergence is point wise and the limit is continuous, we have $L_{N} (λ) \Rightarrow L (λ)$ given by

L (λ) = 1 - λ^{- 1 / β} for λ ⩾ 1 .

(28)

In the limit, the popularities follow a continuous distribution over $[1, \infty)$ : namely, a standard Pareto distribution with tail parameter $1 / β$ .

If $β ⩾ 1$ (i.e., the popularities are light tailed), some objects are extremely more popular than others; in this case, L does not have a finite mean. If instead, $0 < β < 1$ , where popularities are heavy tailed and thus, more homogeneous, L has finite mean $1 / (1 - β)$ . For $β = 0$ , the system degenerates into every object having the same popularity, and thus, L is the step function at $λ = 1$ .

It is also worth observing that the total arrival rate of the Nth system satisfies

λ^{N} = \sum_{i = 1}^{N} λ_{i}^{(N)} = N^{β} \sum_{i = 1}^{N} \frac{1}{i^{β}} ≕ N^{β} S_{N} (β),

where

S_{N} (β)

is the generalized harmonic series partial sum. Using the well-known equivalents for this series, we have that

λ^{N} = {\begin{array}{l} O (N^{β}) & if β > 1, \\ O (N \log N) & if β = 1, \\ O (N) & if β < 1 . \end{array}

In particular, with our scaling, the total arrival rate $λ^{N} \to \infty$ as $N \to \infty$ , albeit at different rates depending on the tail parameter $β$ .

4.3. Asymptotic Behavior of the Optimal Causal Policy

We now return to our family of systems indexed by N with independent request streams coming from a common scale family with base distribution $F_{0}$ (Assumption 2) and incorporate the scaling in Assumption 3 on the arrival rates ${λ_{i}^{(N)}}_{i = 1}^{N}$ . Namely, their empirical distribution $L_{N} (λ)$ in (27) has a weak limit $L (λ)$ .

If we look at each system at a fixed time $t = 0$ , we obtain a sample of the current observed stochastic intensities ${X_{i}^{(N)} : i = 1, \dots, N}$ . Considered collectively that for all $N ⩾ 1$ , these random variables constitute a triangular array; without loss of generality, we may assume that they are all defined in a common probability space $(Ω, F, P)$ .

For each N, we can define the random function ${\hat{G}}_{N}$ by Equation (24). The main result below concerns the asymptotic behavior of these empirical stochastic intensity distributions.

Theorem 5.

Consider a family of local memory systems indexed by N with request processes satisfying Assumption 2 and with intensities ${λ_{i}^{(N)}}_{i = 1}^{N}$ satisfying Assumption 3. Then,

{\hat{G}}_{N} \underset{N \to \infty}{\Rightarrow} G_{\infty} P a . s .,

where the function

G_{\infty}

is given by

G_{\infty} (x) ≔ \int_{0}^{\infty} G (x / λ) L (d λ) .

(29)

Moreover, assume that the memory size of the Nth system satisfies $\frac{C_{N}}{N} \to_{N} c$ with $0 < c ⩽ 1$ .

Then, if $1 - c$ is a continuity point of the quantile function $Q_{\infty} = G_{\infty}^{- 1}$ , the random threshold ${\hat{θ}}_{N}$ defined by Equation (26) converges P almost surely to $θ^{*} = Q_{\infty} (1 - c)$ .

Proof.

The proof begins by computing the expected value of the random function ${\hat{G}}_{N}$ :

{\bar{G}}_{N} (x) ≔ E [{\hat{G}}_{N} (x)] = E [\frac{1}{N} \sum_{i = 1}^{N} 1_{{X_{i}^{(N)} ⩽ x}}] = \frac{1}{N} \sum_{i = 1}^{N} G^{(i)} (x) = \frac{1}{N} \sum_{i = 1}^{N} G (\frac{x}{λ_{i}^{(N)}}),

(30)

where we have applied the scaling in (23a).

{\bar{G}}_{N}

is a (deterministic) distribution function in

x ⩾ 0

. Using the definition of

L_{N}

, we can rewrite the above as

{\bar{G}}_{N} (x) = \int_{0}^{\infty} G (x / λ) L_{N} (d λ) .

We first show that as distribution functions, ${\bar{G}}_{N} \Rightarrow G_{\infty}$ as $N \to \infty$ . To do so, it is convenient to interpret (30) as follows; consider a pair $(X, Λ_{N})$ of independent random variables in $R_{+}$ with respective distribution functions $G (x) = P (X ⩽ x)$ and $L_{N} (λ) = P (Λ_{N} ⩽ λ)$ . Then, ${\bar{G}}_{N} (x)$ is the distribution of the product $Z_{N} = X Λ_{N}$ . Indeed,

P (Z_{N} ⩽ x) = \sum_{n = 1}^{N} P (X Λ_{N} ⩽ x | Λ_{N} = λ_{i}^{(N)}) P (Λ_{N} = λ_{i}^{(N)}) = \frac{1}{N} \sum_{n = 1}^{N} P (X ⩽ x / λ_{i}^{(N)}) = {\bar{G}}_{N} (x) .

Now, consider the limit in the distribution of the pair $(X, Λ_{N})$ . Because of their independence and because $L_{N} \Rightarrow L$ , the limit corresponds to the distribution of $(X, Λ)$ , a vector of independent random variables with marginal distributions $G (x)$ and $L (λ)$ . By continuity of the map $(x, λ) \mapsto x λ$ , we conclude that $Z_{N} = X Λ_{N} \overset{d}{\to} Z = X Λ$ . It remains to compute the distribution function of the latter product:

P (Z ⩽ x) = \int_{R_{+}^{2}} 1_{{ξ λ ⩽ x}} L (d λ) G (d ξ) = \int_{0}^{\infty} L (d λ) \int_{0}^{\infty} 1_{{ξ ⩽ x / λ}} G (d ξ) = \int_{0}^{\infty} G (x / λ) L (d λ) .

We conclude that ${\bar{G}}_{N} \Rightarrow G_{\infty}$ as $N \to \infty$ . Equivalently, we have point-wise convergence ${\bar{G}}_{N} (x) \to G_{\infty} (x)$ at any continuity point $x ⩾ 0$ of $G_{\infty}$ .

To relate the mean function ${\bar{G}}_{N} (x)$ to the stochastic one ${\hat{G}}_{N} (x)$ , we now resort to Shorack (1979, theorem 2.1), a generalization of the classical Glivenko–Cantelli theorem for empirical distributions for random variables that are not identically distributed. The theorem states that in the probability space $(Ω, F, P)$ where the triangular array ${X_{i}^{(N)} : i = 1, \dots, N, N ⩾ 1}$ is defined, we have

{‖ {\hat{G}}_{N} - {\bar{G}}_{N} ‖}_{\infty} \underset{N \to \infty}{\to} 0 P a . s .

(31)

In particular, with $P$ probability 1, $| {\hat{G}}_{N} (x) - {\bar{G}}_{N} (x) | \to 0$ as $N \to \infty$ for all $x ⩾ 0$ . Combining this with the previously obtained point-wise convergence of ${\bar{G}}_{N} (x)$ , we conclude that $P$ almost surely ${\hat{G}}_{N} (x) \to G_{\infty} (x)$ at any continuity point $x ⩾ 0$ of $G_{\infty}$ , and thus, ${\hat{G}}_{N} \Rightarrow G_{\infty}$ .

Finally, convergence of ${\hat{θ}}_{N}$ now follows from the fact that convergence in distribution implies convergence of quantiles; specifically (van der Vaart 1998, lemma 21.2), ${\hat{G}}_{N} \Rightarrow G_{\infty}$ is equivalent to ${\hat{Q}}_{N} (p) \to Q_{\infty} (p)$ for all continuity points $p \in [0, 1]$ of $Q_{\infty} = G_{\infty}^{- 1}$ . See also Proposition A.2 in Appendix A.1. □

We have thus established that the optimal causal local memory policy converges in the large-scale limit to a deterministic threshold policy in the observed stochastic intensities. Specifically, when memory scales as a fraction c of the catalog, a large-scale system should store at any given time the items whose current OSI exceeds a threshold $θ^{*}$ chosen as the $1 - c$ quantile of the limit distribution $G_{\infty}$ in (29).

5. Asymptotic Optimal Performance

In this section, we analyze the performance achieved by the optimal policy in the large-scale limit. As discussed in Section 3, performance in local memory systems is measured by the hit probability $H_{C}$ or equivalently, the hit rate $h_{C} = λ^{N} H_{C}$ .

We will find it more convenient to derive expressions for the complementary miss probability $M_{C} = 1 - H_{C}$ and the miss rate $m_{C} = λ^{N} M_{C} = λ^{N} - h_{C}$ . Referring back to Equation (21), we can write the following expressions for these quantities in a system of size N.³

M_{C}^{(N)} = \sum_{i = 1}^{N} \frac{λ_{i}^{(N)}}{λ^{N}} P_{Φ_{i}}^{0} (i \notin C (0)) .

(32a)

m_{C}^{(N)} = \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} (i \notin C (0)) .

(32b)

The key challenge in the evaluation of the above formulas is to compute $P_{Φ_{i}}^{0} (i \notin C (0))$ (i.e., the probability that an incoming arrival does not find its file in local memory). In the optimal policy $C^{*}$ for fixed N, the determining condition is whether the observed stochastic intensity $X_{i}^{(N)}$ of the requested item finds itself among the C largest. To correctly evaluate this probability, however, we must recognize an asymmetry. The competing $X_{j}^{(N)}, j \neq i$ are being evaluated at a time not synchronized with their arrivals and thus, follow independent distributions $G (\cdot / λ_{j}^{(N)})$ ; for the item in question, we must use the distribution $G_{0} (\cdot / λ_{i}^{(N)})$ that applies to synchronized sampling of the stochastic intensity.

For this reason, to analyze the comparison, it is convenient to define the empirical distribution

{\hat{G}}_{N}^{(- i)} (x) = \frac{1}{N - 1} \sum_{j \neq i} 1_{{X_{j}^{(N)} ⩽ x}}

(33)

of the stochastic intensities of nonrequested items and express the miss condition as follows:

\begin{array}{l} i \notin C (0) & \Leftrightarrow \sum_{j \neq i} 1_{{X_{j}^{(N)} > X_{i}^{(N)}}} ⩾ C \\ \Leftrightarrow \sum_{j \neq i} 1_{{X_{j}^{(N)} ⩽ X_{i}^{(N)}}} ⩽ N - 1 - C . \\ \Leftrightarrow {\hat{G}}_{N}^{(- i)} (X_{i}^{(N)}) ⩽ 1 - \frac{C}{N - 1} ≕ p_{N} . \end{array}

The above equivalence assumes that there are no ties in the comparison of the OSIs; to simplify the analysis to follow, we will make this a standing assumption in this section.

Assumption 4.

For each $N ⩾ 1$ and $1 ⩽ i \neq j ⩽ N$ , $P_{Φ}^{0} (X_{i}^{(N)} = X_{j}^{(N)}) = 0$ .

Returning to (32b), we may now express the performance criterion of the optimal policy as

m_{C^{*}}^{(N)} = \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} ({\hat{G}}_{N}^{(- i)} (X_{i}^{(N)}) ⩽ p_{N}) = \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} (X_{i}^{(N)} ⩽ {\hat{Q}}_{N}^{(- i)} (p_{N})),

(34)

where

{\hat{Q}}_{N}^{(- i)}

is the inverse (quantile) function of

{\hat{G}}_{N}^{(- i)}

We now outline informally the essence of the analysis that follows. Because the empirical distribution ${\hat{G}}_{N}^{(- i)}$ corresponds to $N - 1$ intensities, which are sampled at a nonsynchronized point, its asymptotic behavior under the assumed scaling should follow the conclusions of Theorem 5 (i.e., converge to the distribution $G_{\infty}$ ). Under $C / N \to c$ , we have $p_{N} \to 1 - c$ , so we should have ${\hat{Q}}_{N}^{(- i)} (p_{N}) \to Q_{\infty} (1 - c) = θ^{*}$ as $N \to \infty$ .

This leads us to consider the approximate formula

m_{C^{*}}^{(N)} \approx \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} (X_{i}^{(N)} ⩽ θ^{*}) = \sum_{i = 1}^{N} λ_{i}^{(N)} G_{0} (θ^{*} / λ_{i}^{(N)});

in the last equality, we have invoked the distribution of

X_{i}^{(N)}

synchronized with the arrival under the Palm probability

P_{Φ_{i}}^{0}

. Of course, the approximation is not a rigorous step. We have taken the limit in the quantile but not in the rest of Equation (34); nevertheless, it will help us arrive at the right conjecture.

Invoking the distribution of intensities in (27), we may express the approximation as

\frac{m_{C^{*}}^{(N)}}{N} \approx \frac{1}{N} \sum_{i = 1}^{N} λ_{i}^{(N)} G_{0} (θ^{*} / λ_{i}^{(N)}) = \int_{0}^{\infty} λ G_{0} (θ^{*} / λ) L_{N} (d λ) .

(35)

Under Assumption 3, $L_{N} \Rightarrow L$ . The integrand function $λ G_{0} (θ^{*} / λ)$ is bounded by $θ^{*}$ , invoking (7); if it is continuous, the right-hand side will converge to the corresponding integral with the limit distribution $L (λ)$ . This formula for the asymptotic optimal miss rate is the main conjecture that we will prove in Theorem 6 below, making rigorous the preceding approximate reasoning.

We will require some additional technical conditions.

For continuity of $λ G_{0} (θ^{*} / λ)$ , it will suffice to guarantee it at the atoms of the measure $L (d λ)$ (i.e., the discontinuities of $L (λ)$ ); denote this set by $D_{L}$ , and recall by Assumption 3 that $0 \notin D_{L}$ . Analogously, denote by $D_{G_{0}}$ the set of discontinuities of $G_{0}$ , and denote by $D_{L} \cdot D_{G_{0}} ≔ {λ x : λ \in D_{L}, x \in D_{G_{0}}}$ ; we will require for our limit theorem that $θ^{*} \notin D_{L} \cdot D_{G_{0}}$ .
To carry out the limit result for $m_{C^{*}}$ in (34), we will treat separately two cases in regard to the uniform integrability of the family of popularity distributions ${L_{N}}_{N ⩾ 1}$ . The family is uniformly integrable if
$\forall ϵ > 0, \exists K ⩾ 0 such that \sup_{N ⩾ 1} \frac{1}{N} \sum_{λ_{i}^{(N)} ⩾ K} λ_{i}^{(N)} ⩽ ϵ .$

Uniform integrability is a standard assumption (see, e.g., Billingsley 1999, section 1.3) that connects convergence in distribution with convergence of the first moment. In particular, if $L_{N} \Rightarrow L$ and the ${L_{N}}$ ’s are uniformly integrable, then

\int_{0}^{\infty} λ L_{N} (d λ) \underset{N \to \infty}{\to} \int_{0}^{\infty} λ L (d λ) < \infty .

(36)

A partial converse also holds; if $L_{N} \Rightarrow L$ , with first moments satisfying (36), the family must be uniformly integrable.

Example 7.

The Zipf family considered in Example 6 has continuous limit L and thus, satisfies all necessary continuity assumptions. Furthermore, the family $L_{N}$ is uniformly integrable for $β < 1$ ; for $β ⩾ 1$ , the limit distribution has infinite mean.

We are now ready to state our main performance result.

Theorem 6

(Asymptotic Miss Rate—Uniformly Integrable Case). Suppose ${L_{N}}_{N ⩾ 1}$ is uniformly integrable. Let $c \in (0, 1]$ be such that $1 - c$ is a continuity point of $Q_{\infty}$ and that $θ^{*} ≔ Q_{\infty} (1 - c) \notin D_{L} \cdot D_{G_{0}}$ . If $C / N \to c$ as $N \to \infty$ , then

\lim_{N \to \infty} \frac{m_{C^{*}}^{(N)}}{N} = \int_{0}^{\infty} λ G_{0} (\frac{θ^{*}}{λ}) L (d λ) .

(37)

The proof is given in Appendix A.

The result states that for a uniformly integrable scaling of popularity distributions, under some regularity assumptions, the miss rate of the optimal policy scales linearly with N with a proportionality constant given by the integral in (37), where the distribution function $G_{0}$ of the OSI under synchronous sampling appears explicitly. On the other hand, the nonsynchronous distribution G of the OSI also influences the formula because it determines the distribution $G_{\infty}$ in (29), whose quantile is the asymptotic threshold $θ^{*}$ .

Corollary 1

(Asymptotic Miss Probability—Uniformly Integrable Case). Under the same hypothesis as Theorem 6, the optimal miss probability satisfies

\lim_{N \to \infty} M_{C^{*}}^{(N)} = \frac{\int_{0}^{\infty} λ G_{0} (\frac{θ^{*}}{λ}) L (d λ)}{\int_{0}^{\infty} λ L (d λ)} .

(38)

Proof.

Note that

M_{C^{*}}^{(N)} = \frac{m_{C^{*}}^{(N)}}{λ^{N}} = \frac{m_{C^{*}}^{(N)} / N}{λ^{N} / N};

the limit of the numerator is given in Theorem 6. For the denominator, we use (36) because

\frac{λ^{N}}{N} = \int_{0}^{\infty} λ L_{N} (d λ)

, and we have uniform integrability. □

Note regarding Equation (38) that the numerator is bounded by $θ^{*}$ ; indeed, we have $λ G_{0} (θ^{*} / λ) ⩽ θ^{*}$ from Lemma 1, and L has unit mass. As the first moment of this distribution (in the denominator) becomes larger, the miss probability is smaller. This suggests that for L with infinite first moment, the miss probability will be zero. This is indeed true, but we must provide a separate argument because uniform integrability will not hold in this case. Our strategy will be to show that a suboptimal policy (the static one with $C^{s} = {1, \dots, C}$ ) already achieves vanishing asymptotic miss probability.

Theorem 7

(Asymptotic Miss Probability—Nonintegrable Case). Suppose that $\int_{0}^{\infty} λ L (d λ) = \infty$ . Assume that $C / N \to c \in (0, 1]$ as $N \to \infty$ . Then,

M_{C^{*}}^{(N)} ⩽ M_{C^{s}}^{(N)} \underset{N \to \infty}{\to} 0 .

(39)

Proof.

The first inequality is immediate by optimality of $C^{*}$ . So, we focus on the static policy, where misses are certain for $i > C$ ; therefore,

M_{C^{s}}^{(N)} = \sum_{i = C + 1}^{N} \frac{λ_{i}^{(N)}}{λ^{N}} = \frac{\frac{1}{N} \sum_{i = C + 1}^{N} λ_{i}^{(N)}}{λ^{N} / N} = \frac{\int_{0}^{α_{N}} λ L_{N} (d λ)}{\int_{0}^{\infty} λ L_{N} (d λ)},

(40)

where

α_{N} = L_{N}^{- 1} (p_{N})

is the

p_{N}

th quantile of

L_{N}

and

p_{N} = 1 - \frac{C}{N}

Because $L_{N} \Rightarrow L$ and $p_{N} \to 1 - c < 1$ , we have that $A ≔ \sup_{N} α_{N} < \infty$ . Then,

\underset{N \to \infty}{\lim \sup} \int_{0}^{α_{N}} λ L_{N} (d λ) ⩽ \lim_{N \to \infty} \int_{0}^{A} λ L_{N} (d λ) = \int_{0}^{A} λ L (d λ) < \infty .

(41)

For the denominator, we have (Billingsley 1999, theorem 3.4)

\underset{N \to \infty}{\lim \inf} \int_{0}^{\infty} λ L_{N} (d λ) ⩾ \int_{0}^{\infty} λ L (d λ) = \infty .

(42)

Together, (41) and (42) imply (39). □

As a comment on the above results, we note the following. When the distribution of item intensities is not uniformly integrable as, for example, in the Zipf case with $β ⩾ 1$ , items with the largest intensities dominate the rest; thus, it is natural that the static policy would achieve asymptotic optimality and perfect performance regardless of the interarrival distribution. Note, however, that such convergence may be slow; we refer to Section 8 for examples.

In the more interesting case of uniform integrability with less disparate popularities, any causal memory management policy will incur some performance penalty; the minimum miss probability achieved by the optimal policy can be computed explicitly from the interarrival distribution characteristics and the storage fraction c.

6. Threshold Policies and Their Timer-Based Counterparts

In Section 4, we found that the optimal causal policy could be expressed in terms of a threshold on the observed stochastic intensity. This threshold is stochastically varying but converges in our large-scale regime to a deterministic constant.

This motivates us to look at policies that are defined by a deterministic threshold (i.e., where one keeps in memory items with stochastic intensity larger than $θ$ ). An immediate observation is that in such policies, the memory constraint C would not be satisfied in a strong way at all times. Rather, we must replace it with the soft constraint where the average memory occupation is C. We now make this formal.

6.1. Threshold Policies and Memory Occupation

Consider a local memory system with independent request processes ${Φ_{i} : i = 1 \dots N}$ , aggregated natural history $F_{t}$ , and stochastic intensities $φ_{i} (t)$ . Consider the following threshold policy:

C_{θ} (t) = {i : φ_{i} (t) > θ} \subset P (N)

(43)

(i.e., the store in memory of all of the items that have current stochastic intensity above the threshold $θ$ ).

Because the $Φ_{i}$ ’s are stationary, the intensities $φ_{i} (t)$ are also stationary, and thus, we have a stationary and causal policy. The memory occupation of this policy is now the stationary random process:

U_{θ} (t) = # C_{θ} (t) = \sum_{i = 1}^{N} 1_{{φ_{i} (t) > θ}} .

(44)

Its average can be computed as (using $t = 0$ as a sampling point)

E [U_{θ} (0)] = E [\sum_{i = 1}^{N} 1_{{φ_{i} (0) > θ}}] = \sum_{i = 1}^{N} P (φ_{i} (0) > θ) .

(45)

Note that $E [U_{θ} (0)]$ is decreasing from N to zero as $θ$ goes from zero to $\infty$ . In order to have a fair comparison against a fixed memory constraint, define the threshold $θ_{C}$ as

θ_{C} ≔ \inf {θ : \sum_{i = 1}^{N} P (φ_{i} (0) > θ) ⩽ C} .

(46)

Thus, $θ_{C}$ is the $1 - C / N$ quantile of the average of the distributions of the stochastic intensities observed at time 0. From the right continuity of the distribution functions, it is easy to check that $E [U_{θ_{C}} (0)] ⩽ C$ with equality if $1 - C / N$ is a continuity point of the quantile function. We shall assume this henceforth to simplify exposition.

Because the arrival processes are independent, the random variables $φ_{i} (0)$ are independent, and thus, we have the following proposition.

Proposition 1.

Assume that the above quantile is exact in the sense that $E [U_{θ_{C}} (0)] = C$ . Take $C = c N$ with $0 < c ⩽ 1$ . Then, for any $ε > 0$ ,

P (| \frac{U_{θ_{C}} (0)}{N} - c | > ε) ⩽ 2 e^{- 2 N ε^{2}},

and therefore,

U_{θ_{C}} (0) / N \overset{P}{\to} c

N \to \infty

Proof.

The proof follows from the Hoeffding inequality applied to the independent Bernoulli random variables ${1_{{φ_{i} (0) > θ}}}$ . In fact, for any N and $ε > 0$ ,

P (| \frac{U_{θ_{C}} (0)}{N} - c | > ε) = P (| \sum_{i = 1}^{N} 1_{{φ_{i} (0) > θ_{C}}} - E [\sum_{i = 1}^{N} 1_{{φ_{i} (0) > θ_{C}}}] | > N ε) ⩽ 2 e^{- 2 N ε^{2}} . □

From Proposition 1, we conclude that under independent request processes, the memory occupation of the threshold policy will not deviate much from a fixed memory policy for large N. We will use this to our advantage in Section 8.

6.2. Asymptotic Optimality of Threshold Policies

We focus now on the situation studied in Sections 4 and 5, where by Assumption 2, the request processes come from a common scale family. Furthermore, the traffic intensities satisfy Assumption 3.

In that case, $φ_{i} (0) = X_{i}$ , which is the observed stochastic intensity with distribution $G^{(i)} (x)$ . Therefore, $θ_{C}$ is the solution to the equation

\sum_{i = 1}^{N} [1 - G^{(i)} (θ_{C})] = C .

Invoking Equation (23a) for the scale family and rearranging terms, we have

{\bar{G}}_{N} (θ_{C}) = \frac{1}{N} \sum_{i = 1}^{N} G (\frac{θ_{C}}{λ_{i}}) = 1 - \frac{C}{N}

with the notation introduced in (30); invoking the distribution

L_{N}

of traffic intensities

λ_{i}

, we may also write the above equation as

\int_{0}^{\infty} G (\frac{θ_{C}}{λ}) L_{N} (d λ) = 1 - \frac{C}{N} .

(47)

We are now in position to prove the following.

Proposition 2.

Consider a family of local memory systems indexed by N with request processes satisfying Assumptions 2 and 3. Choose the memory size of the Nth system as $C_{N} = c N$ with $0 < c ⩽ 1$ . If there exists a unique solution to $θ^{*}$ satisfying

G_{\infty} (θ^{*}) = 1 - c,

then the sequence of thresholds

θ_{N} ≔ θ_{C_{N}}

defined by (47) satisfies

θ_{N} \to θ^{*}

The essential step of the proof is to show that ${\bar{G}}_{N} \Rightarrow G_{\infty}$ ; this part is identical to the proof of Theorem 5. The result then follows from the convergence of quantiles.

Let us now analyze the asymptotic performance; considering an object i, its miss probability is given by

P_{Φ_{i}}^{0} (i \notin C_{θ_{N}} (0)) = P_{Φ_{i}}^{0} (X_{i} ⩽ θ_{N}) = G_{0}^{(i)} (θ_{N}) .

Therefore, the total miss probability for system N is given by

M_{θ_{N}} = \sum_{i = 1}^{N} \frac{λ_{i}^{(N)}}{λ^{(N)}} P_{Φ_{i}}^{0} (X_{i} ⩽ θ_{N}) = \sum_{i = 1}^{N} \frac{λ_{i}^{(N)}}{λ^{(N)}} G_{0} (θ_{N} / λ_{i}^{(N)}) = \frac{\int_{0}^{\infty} λ G_{0} (θ_{N} / λ) L_{N} (d λ)}{\int_{0}^{\infty} λ L_{N} (d λ)} .

We can then state a result analogous to Theorem 6 and Corollary 1 for the asymptotic performance.

Proposition 3.

Under the conditions of Theorem 6, the asymptotic miss probability satisfies

\lim_{N \to \infty} M_{θ_{N}} = \frac{\int_{0}^{\infty} λ G_{0} (\frac{θ^{*}}{λ}) L (d λ)}{\int_{0}^{\infty} λ L (d λ)} .

The proof is substantially easier than in the case of Theorem 6 because the threshold $θ_{N}$ is deterministic instead of depending on the state of the remaining processes.

Indeed, by Lemma 1 and the fact that $θ_{N} \to θ^{*}$ , we see that the function $λ G_{0} (θ_{N} / λ)$ is uniformly bounded for all N. Taking proper account of continuity as in Theorem 6, the numerator converges as stated. For the denominator, we invoke uniform integrability.

The main conclusion of this analysis is that under the above assumptions, a deterministic threshold policy with a soft memory constraint is asymptotically equivalent to the optimal causal policy derived in Section 3 both in terms of memory usage and in terms of performance.

We shall show now that there is a strong connection between threshold policies and timer-based ones.

6.3. Connection with Timer-Based Policies: Monotone Hazard Rates

To end this section, we would like to highlight a strong connection between threshold policies and timer-based ones in the case where the requests follow renewal processes. Timer-based or time-to-live caching has been a long-standing idea in local memory systems; here, each time a request for item i occurs, a timer is started, and the item is kept in memory up to the timer expiration. When a new request arrives, the timer is reset. TTL policies were first analyzed in terms of point process requests in Fofack et al. (2014), and the optimal timers under stationary requests were obtained in Ferragut et al. (2016).

Consider the policy defined in (43) with deterministic threshold $θ_{N} = θ_{C_{N}}$ from Equation (46). Assume that the incoming processes are renewal and that the interarrival hazard rates are strictly monotonically decreasing over the distribution domain (as in the Pareto parametric example). Assume for simplicity that the threshold $θ_{N}$ lies in the range of the function $η_{i} (\cdot)$ , and define

T_{i}^{(N)} = η_{i}^{- 1} (θ_{N}) .

Then, because $η_{i}$ is decreasing, we have

φ_{i} (t) > θ_{N} \Leftrightarrow t < T_{i}^{(N)};

in panel (b) of Figure 4, we illustrate this condition. Consequently, in the case of monotonically decreasing hazard rates, the threshold policy is exactly equivalent to TTL caching.⁴ This is consistent with the characterization of optimal timers in Ferragut et al. (2016). In addition, from the asymptotic optimality of threshold policies established above, we further conclude here that TTL caching is asymptotically optimal.

Figure 4. Threshold Policies in Terms of Timers
*Notes*. (a) Timer-based caching for decreasing hazard rates. (b) Timer-based prefetching for increasing hazard rates.

An analogous discussion is valid for monotonically increasing hazard rates; here, the likelihood of a subsequent request is smallest immediately after receiving one, and thus, caching is not useful. Instead, it was proposed in Ferragut et al. (2024) to remove the item and prefetch it at an appropriate later time. In fact, this strategy can also be cast as a threshold policy by observing (see Figure 4) that

φ_{i} (t) > θ_{N} \Leftrightarrow t > T_{i}^{(N)},

so the threshold policy removes the object from memory and recalls it after a time

T_{i}^{(N)}

; these timers coincide with the optimal timer prefetching policy from Ferragut et al. (2024). Once again, the asymptotical optimality of threshold policies allows us to further conclude that timer-based prefetching is asymptotically optimal for request processes with increasing hazard rates.

7. Extension to Heterogeneous Object Sizes

The preceding model assumes that objects have equal sizes; they all have the same impact on memory, and the natural performance criterion is the hit probability determined by whether the object can be (fully) recovered from local memory upon request. In practice, items can have different sizes and therefore, unequal memory cost. Also, one can consider in the performance the value of a partial recovery of the object. We now formalize this approach, which was first proposed in Panigrahy et al. (2022), and we show how the previous results can be extended to incorporate this case.

Consider a local memory system fed by a family of request processes ${Φ_{i} : i = 1, \dots, N}$ with intensities $λ_{i}$ , natural histories $F_{t}^{(i)}$ , and stochastic intensities $φ_{i} (t)$ . Again, let $Φ$ be their superposition, and let $F_{t}$ be the aggregated history. Assume that each object i has a size $z_{i} > 0$ and that the system has total memory C.

A causal policy $\tilde{C} (t)$ will now be an $F_{t}$ -predictable stochastic process $\tilde{C} : Ω \times R \to {[0, 1]}^{N}$ , where $\tilde{C} (t)$ represents the fractions of each file stored in the local memory. For feasibility, the policy should satisfy the memory constraint

\tilde{C} (t) = (x_{1}, \dots, x_{n}) \Rightarrow \sum_{i} z_{i} x_{i} ⩽ C .

(48)

Thus, we are allowing fractional caching (i.e., only part of the item to be stored in memory).

Assume that whenever a request for item i arrives at time t, it can recover from the local memory $z_{i} x_{i} (t)$ and derives a utility from this amount of content $U_{i} (z_{i} x_{i})$ , where $U_{i}$ is a concave function. In this scenario, the analogue of the hit process is the following utility process:

ϒ_{\tilde{C}} (B) = \sum_{i = 1}^{N} \int_{B} U_{i} (z_{i} x_{i} (t)) Φ_{i} (d t) .

(49)

The process $ϒ_{\tilde{C}}$ is a measure-valued process that stores the utility derived from the local memory retrievals. For the special case where $U_{i} (z) = z$ , we recover the byte-counting process in Panigrahy et al. (2022) because it keeps track of the amount of data recovered from the local memory in any time interval.

Now, let this local memory system be fed by a superposition of request processes ${Φ_{i} : i = 1, \dots, N}$ satisfying Assumption 1. We have the analogue of Lemma 2.

Lemma 4.

Let $\tilde{C} (t)$ be a causal policy. The stochastic intensity of the utility process (49) with respect to the aggregated history $F_{t}$ is

u_{\tilde{C}} (t) = \sum_{i = 1}^{N} {\tilde{φ}}_{i} (t) U_{i} (z_{i} x_{i} (t)) .

(50)

The proof is similar to the proof of Lemma 2, changing the predictable process $1_{{i \in C (t)}}$ for the fractional caching version $U_{i} (z_{i} x_{i} (t))$ , where $x_{i} (t)$ is predictable by assumption.

Consider now the policy ${\tilde{C}}^{*} (t)$ defined by solving the following optimization problem:

\max_{x_{i} \in [0, 1]} \sum_{i} {\tilde{φ}}_{i} (t) U_{i} (z_{i} x_{i})

(51a)

subject to \sum_{i} z_{i} x_{i} ⩽ C .

(51b)

We then have the analogue of Theorem 3.

Theorem 8.

Given a local memory system with total memory C fed by a family of stationary point processes, ${Φ_{i} : i = 1, \dots, N}$ with aggregated natural history $F_{t}$ , satisfying Assumption 1. Assume that the item sizes are ${z_{i} : i = 1, \dots, N}$ and that the system may store fractional items. Then, for any causal policy $\tilde{C} (t)$ , the stationary utility rate $u_{\tilde{C}}$ satisfies

u_{\tilde{C}} = E [u_{\tilde{C}} (0)] ⩽ E [u_{{\tilde{C}}^{*}} (0)] = u_{{\tilde{C}}^{*}}

with

u_{C} (t)

as in (50) and

{\tilde{C}}^{*}

as defined in (51).

The proof follows along the same lines of Theorem 3 by observing that the policy ${\tilde{C}}^{*}$ satisfies Equations (51) at all times.

Example 8

(Maximizing the Byte Count). If we take $U_{i} (z) = z$ for all i, then we recover the byte-counting process of Panigrahy et al. (2022). In that case, the optimization problem defining ${\tilde{C}}^{*}$ becomes a linear program, and its solution can be given explicitly in terms of the stochastic intensities ${\tilde{φ}}_{i} (t)$ as follows.

Rank the contents in decreasing order of ${\tilde{φ}}_{i} (t)$ .
Choose the first K objects satisfying
$\sum_{k = 1}^{K - 1} z_{i_{k}} < C and \sum_{k = 1}^{K} z_{i_{k}} > C,$
where $i_{k}$ denotes the ordering permutation of the first step.
Take $x_{i_{k}} (t) = 1$ for $k = 1, \dots, K - 1$ and $x_{i_{K}} (t) = \frac{C - \sum_{i = 1}^{K - 1} z_{i_{k}}}{z_{i_{K}}}$ .

In other words, at all times, store the most likely upcoming requests as dictated by the stochastic intensity up to filling up your memory.

Note that the policy in Example 8 ranks the objects again in terms of the stochastic intensity independently of the object size. In particular, the optimal policy can be cast as a dynamic threshold policy, where the threshold is the stochastic intensity of the Kth object (i.e., the one that completes the memory allocation). Because of this fact, we think that the asymptotic results of Sections 5 and 6 can also be extended to this case under the appropriate scaling assumptions. This topic will be pursued in future work.

8. Parametric Examples and Simulations

In this section, we provide some interesting parametric examples to highlight the power of the results, in particular enabling us to compute sharp estimates for the performance of a local memory system. We must specify the interarrival distribution of our scale family of request processes and the distribution of popularities. We begin with the Pareto–Zipf combination, which was already introduced in examples above. We then analyze a further example with increasing hazard rates.

8.1. Pareto Interarrival Times and Zipf Popularities

Consider first the Pareto interrequest times introduced in Section 2.3 with tail parameter $α$ . As we already mentioned, this family represents bursty traffic with decreasing hazard rates. For the base (unit intensity) process, we have the following distribution for (nonsynchronized) observed stochastic intensity:

G (x) = {(\frac{α - 1}{α} x)}^{α - 1} for 0 ⩽ x ⩽ \frac{α}{α - 1}; G (x) = 1 for x > \frac{α}{α - 1} .

Combine it with the Zipf popularities with tail parameter $β$ introduced in Example 6 with limit distribution

L (λ) = 1 - λ^{- 1 / β} for λ ⩾ 1 .

By appropriately integrating Equation (29), we can obtain the asymptotic distribution:

G_{\infty} (x) = {\begin{array}{l} (1 - c_{α, β}) {(\frac{α - 1}{α} x)}^{α - 1} & x ⩽ \frac{α}{α - 1}; \\ 1 - c_{α, β} {(\frac{α - 1}{α} x)}^{- 1 / β} & x > \frac{α}{α - 1}; \end{array} where c_{α, β} ≔ \frac{(α - 1) β}{(α - 1) β + 1},

(52)

which is a relevant parameter in the following discussion. Note that

G_{\infty} (x)

is continuous and takes the value

1 - c_{α, β}

x = \frac{α}{α - 1}

. A depiction of

G_{\infty} (x)

is given in Figure 5.

Figure 5. Pareto Example with Zipf Popularities, $α = 2$ , and $β = 1$
*Notes*. (a) Limit distribution G_∞. (b) Threshold fixed point.

Because $G_{\infty}$ is strictly increasing, we can now solve for the unique asymptotic threshold $θ^{*}$ as a function of the memory size c. Imposing $G_{\infty} (θ^{*}) = 1 - c$ , we get the following result:

θ^{*} = {\begin{array}{l} (\frac{α}{α - 1}) {(\frac{c_{α, β}}{c})}^{β} & c ⩽ c_{α, β}, \\ (\frac{α}{α - 1}) {(\frac{1 - c}{1 - c_{α, β}})}^{\frac{1}{α - 1}} & c ⩾ c_{α, β} . \end{array}

In the case $β < 1$ , it is easy to see that the measures ${L_{N}}$ are uniformly integrable, and thus, we can compute the miss rate estimate from (37):

\int_{0}^{\infty} λ G_{0} (x / λ) L (d λ) = {\begin{array}{l} (1 - c_{α, β}) {(\frac{α - 1}{α} x)}^{α} & x ⩽ \frac{α}{α - 1}; \\ \frac{1}{1 - β} [1 - α β (1 - c_{α, β}) {(\frac{α - 1}{α} x)}^{1 - 1 / β}] & x ⩾ \frac{α}{α - 1} . \end{array}

Substituting the appropriate threshold and noting that $\int_{0}^{\infty} λ L (d λ) = \frac{1}{1 - β}$ , we reach the following result for the asymptotic miss probability:

M = {\begin{array}{l} 1 - α β (1 - c_{α, β}) {(\frac{c}{c_{α, β}})}^{1 - β} & c ⩽ c_{α, β}; \\ (1 - β) {(1 - c_{α, β})}^{- \frac{1}{α - 1}} {(1 - c)}^{\frac{α}{α - 1}} & c ⩾ c_{α, β} . \end{array}

(53)

In Figure 6, we plot the asymptotic miss probability of Equation (53) for fixed $α$ and c as a function of the popularity tail parameter $β$ . We also compare it with the asymptotic miss probability of the static policy, which is derived in Ferragut et al. (2016) and is given by $1 - c^{1 - β}$ for $0 ⩽ β ⩽ 1$ . The optimal policy has the most advantage when popularities are more homogeneous; in particular, as $β \to 0$ , we have $M \to {(1 - c)}^{\frac{α}{α - 1}} < 1 - c$ for any $α > 1$ . The gain is larger as $α$ decreases; the optimal policy capitalizes on the burstiness of the incoming traffic captured by the hazard rates.

Figure 6. Asymptotic Miss Probability for Zipf Popularities with Varying Parameter $β$
*Notes*. Pareto interrequest times with $α = 2$ and $c = 0.1$ . The static policy was added for comparison.

A second observation is that convergence of the empirical distribution of the OSIs is very fast. In Figure 7, we plot a simulated sample in the steady state of ${\hat{G}}_{N} (x)$ for $N = 100$ items for $α = 2$ and $β = 0.5$ as an example. We see the convergence to $G_{\infty} (x)$ as established in Theorem 5.

Figure 7. The Empirical Distribution of the Observed Hazard Rates for $N = 100$ and Its Limit $G_{\infty} (x)$
*Note*. Pareto requests with $α = 2$ and Zipf popularities with $β = 0.5$ .

Moreover, because convergence to $G_{\infty}$ is fast, convergence of the random threshold ${\hat{θ}}_{N} ≔ {\hat{Q}}_{N} (1 - c) \to θ^{*}$ is also fast. We plot panels (a) and (b) of Figure 8 two examples with the same parameters as above for N = 1,000 and 10,000, respectively. The threshold process $θ_{N} (t)$ is approximately constant for large N around the value $θ^{*}$ .

Figure 8. Threshold Evolution and Large-Scale Limit for Pareto Requests $α = 2$ , $β = 0.5$ , and $c = 0.1$
*Notes*. (a) Catalog size N = 1,000. (b) Catalog size N = 10,000.

This last observation is crucial for developing finite N approximations for the performance. Because of the fact that when $β \to 1$ , the family of distributions ${L_{N}}$ approaches nonuniform integrability, convergence of the miss rate estimate $m_{C^{*}}^{(N)}$ , the total rate $λ^{(N)}$ , and the miss probability $M_{C^{*}}^{(N)}$ is slow around this value as depicted in the simulations shown in Figure 9.

Figure 9. Miss Probability of the Optimal Policy for Pareto Requests, $α = 2$ , $c = 0.1$ , and Varying $β$
*Notes*. The solid curve represents the asymptotic result (53). Dots represent simulation results for different values of N. The dotted lines represent the finite N approximation (54).

In order to compute finite N approximations, we use the intuition developed in Equation (35); that is, we compute the asymptotic threshold $θ^{*}$ by solving $G_{\infty} (θ^{*}) = 1 - c$ and plug in this estimate in place of the random threshold ${\hat{θ}}_{N}$ . Then, we estimate $M_{C^{*}}^{(N)}$ as

M_{C^{*}}^{(N)} \approx \frac{\int_{0}^{\infty} λ G_{0} (θ^{*} / λ) L_{N} (d λ)}{\int_{0}^{\infty} λ L_{N} (d λ)} = \frac{1}{λ^{(N)}} \sum_{i = 1}^{N} λ_{i}^{(N)} G_{0} (θ^{*} / λ_{i}^{(N)}) .

(54)

This turns out to be equivalent to approximating the optimal performance for that of the fixed threshold policy discussed in Section 6, and it is numerically easy to compute, even for distributions that do not have closed-form expressions as in this case. Therefore, this procedure provides sharp estimates of the maximum achievable performance for any policy and finite N. An example of this approximation is depicted in the dashed lines in Figure 9.

8.2. Erlang Interrequest Times and Zipf Popularities

We now turn our attention to a different example, where the hazard rate of the interrequest distribution is increasing. This leads to a totally different behavior; for instance, caching is not a good idea in this setting because upon receiving a request, a subsequent request becomes less likely. It is actually preferable to remove the content from memory and prefetch it again closer to request time (Ferragut et al. 2024).

Of course, the optimal memory management policy designed in Section 5 does this automatically by keeping track of the hazard rates, and all of the asymptotic results derived for it are still valid in this case.

To illustrate this behavior, we choose the interrequest times to be distributed as an Erlang distribution with k stages and appropriate means. For $k = 1$ , this corresponds to the Poisson process because interrequest times become exponential. As k grows, the process approaches deterministic interrequest times, and thus, the traffic pattern becomes more regular.

To be precise, let the base interrequest distribution be Erlang with k stages and mean of one (so $λ = k$ ). This corresponds to the following:

f_{0} (t) = \frac{1}{(k - 1)!} k^{k} t^{k - 1} e^{- k t}, F_{0} (t) = 1 - \sum_{j = 0}^{k - 1} \frac{1}{j!} {(k t)}^{j} e^{- k t} t ⩾ 0 .

This yields the following nice formula for the hazard rate function:

η (t) = k \frac{\frac{{(k t)}^{k - 1}}{(k - 1)!}}{\sum_{j = 0}^{k - 1} \frac{1}{j!} {(k t)}^{j}} = k B (k t, k - 1),

where

B (A, C)

is the classical Erlang-B formula for the blocking probability of telephone systems. For

k > 1

, this becomes strictly increasing in t,

η (0) = 0

, and

η (t) ↑ k

when

t \to \infty

We do not have analytical expressions for the age distribution or for the observed hazard rates, so $G_{\infty}$ must be estimated numerically. We do so for the Zipf popularity distribution limit $L (λ)$ introduced above. We plot the resulting distribution in Figure 10 together with a simulation of the empirical version ${\hat{G}}_{N} (x)$ for N = 1,000, showing good convergence. Also, in Figure 10, we show the threshold ${\hat{θ}}_{N}$ evolution over time for the optimal policy, showing convergence to the numerically computed limit $θ^{*}$ .

Figure 10. Observed Hazard Rates and Threshold Evolution for an Example with Erlang Requests with $k = 4$ Stages and Zipf Popularities with $β = 0.5$ and $c = 0.1$
*Notes*. (a) Empirical distribution of the observed hazard rates for N = 1,000 and its limit G_∞(x). (b) Threshold process evolution and predicted limit θ* numerically computed.

Finally, in Figure 11, we plot the asymptotic miss probability of the optimal policy numerically computed from Equation (38) as well as a simulation of the optimal policy for N = 1,000 as a function of parameter $β$ . As discussed above, convergence to the performance limit is slow when $β$ approaches one, so we compute the performance estimate from (54), which also corresponds to the optimal prefetching policy from Ferragut et al. (2024), showing good fit. As a comparison, the classical LRU policy is also simulated for the traffic pattern. As we discussed above, classical caching performs badly because of the regularity of the request process.

Figure 11. Miss Probability of the Optimal Policy for Erlang Requests with $k = 4$ Stages, $c = 0.1$ , and Varying $β$
*Notes*. The solid curve represents the asymptotic result. Squares represent simulation results (N = 1,000), and the numerical approximation from (54) is shown by the dashed line. For comparison, LRU simulations are also shown.

8.3. Practical Implementation

Although our paper is focused on theoretical results for the optimal management of local memory systems, it is important to discuss possible practical implementations of the optimal policy. From Theorem 3, we know that the key point to implement the optimal policy is correctly estimating the stochastic intensities of the underlying request processes, a well-known hard problem in statistics (Daley and Vere-Jones 2003, chapter 7.5) with few results.

In the case where the incoming processes are independent and renewal processes, the problem boils down to estimating the hazard rate function of each stream based on previously recorded requests. Hazard rate estimation from life tables and for discrete random variables is a well-studied issue, with estimators such as the Kaplan–Meier estimator for the hazard rate and the Nelson–Aalen estimator for the cumulative hazard rate, respectively (Colosimo et al. 2002). In the continuous setting, some sort of kernel smoothing must be done (Tanner and Wong 1983) because by definition, the hazard rate depends on the underlying density of the distribution, so the problem is completely related to kernel density estimation.

Nevertheless, in our case, we can propose a practical algorithm along the lines of the Nelson–Aalen estimator. We first discuss this for a given item i, so we drop the dependence on i to ease the notation. Begin by observing that the hazard rate is given by

η (t) = - \frac{\partial}{\partial t} \log (1 - F_{0} (t)),

where

F_{0}

is the interarrival distribution. In particular,

- \log (1 - F_{0} (t))

is the cumulative hazard rate function. The algorithm will proceed as follows.

Keep track of the observed interrequest times ${{\tilde{τ}}_{k} : k = 1, \dots, M}$ up to time t.
Construct the ordered sample ${{\tilde{τ}}_{k}^{*} : k = 1, \dots, M}$ .
Measure the current elapsed time because the last arrival $t_{0} ≔ t - τ^{-} (t)$ .
Find k such that
${\tilde{τ}}_{k - 1}^{*} ⩽ t_{0} < {\tilde{τ}}_{k}^{*}$
(i.e., the interval in the sample that contains the current elapsed time).
Estimate the hazard rate at elapsed time $t_{0}$ as the slope of the empirical cumulative hazard rate between times ${\tilde{τ}}_{k - 1}^{*}$ and ${\tilde{τ}}_{k}^{*}$ , which can be directly computed as
$\hat{η} (t_{0}) = \frac{\log (1 + \frac{1}{N - k})}{{\tilde{τ}}_{k}^{*} - {\tilde{τ}}_{k - 1}^{*}} .$

Note that the computational complexity of this algorithm is high because we must keep track of a large-enough sample of interarrival times (something that is difficult to achieve for less popular items), and we must do so for each item in the catalog. We must also keep these interarrival times sorted and find the correct interval to evaluate the slope dynamically as time elapses.

This is why more computationally efficient algorithms are needed. In particular, the classical and simple LRU algorithm seems well suited but only in the case of decreasing hazard rates or bursty traffic. For more regular traffic, like in the increasing hazard rate case, the problem is still open.

9. Conclusions

This paper analyzes the optimal management of local memory systems with tools of stationary point processes. As a first contribution, we provided a rigorous setting for this problem, and under suitable assumptions, we characterized the optimal policy (in the sense of maximizing hit rates) to store in memory at every moment the items corresponding to the highest stochastic intensities.

From this starting point, for the case of independent request streams, we analyzed the limiting behavior of the optimal policy as the catalog size $N \to \infty$ when a fixed fraction c of items can be stored. Assuming that item request processes come from a common scale family with intensities that have a limit in distribution, we proved that the optimal policy amounts to comparing the stochastic intensity (observed hazard rate) of the process with a fixed threshold defined by the $1 - c$ quantile of a certain limiting distribution function. We further characterized the asymptotic performance (miss probability) of this optimal policy.

We also analyzed optimal threshold policies for the finite N case that satisfy the memory constraint in the mean. We showed that they have the same limiting behavior as the optimal policy. Moreover, we found that for renewal traffic and monotonic hazard rates in the interrequest distribution, these are equivalent to timer-based policies, where caching or prefetching times are determined by a hazard rate threshold. We also showed that some of the optimality results presented extend quite nicely to the case of heterogeneous object sizes and fractional caching.

Finally, we presented two detailed examples of our results for the standard Zipf model of item popularities and two cases for the interrequest process. For the bursty, decreasing hazard rate case of Pareto interarrival times, we obtain closed-form expressions for the optimal asymptotic threshold and the corresponding performance. We also provide sharp estimates of the optimal performance for finite N validated by detailed stochastic simulations. For the regular, increasing hazard rate case of Erlang interarrival times, closed-form expressions are not available, but we still carry out a numerical validation of the asymptotic threshold and performance. We also use this example to exhibit the significant superiority of the optimal policy in comparison with the popular LRU policy in contrast with the bursty case where LRU has good behavior.

For future research, we may explore the application of our machinery to characterize optimality for other kinds of request traffic: Markov-modulated Poisson processes to account for time variations, more general mixtures of traffic from different sources, and possibly, heterogeneity in item sizes as well as different popularity distributions.

Also, note that as presented, the optimal policy serves as a fundamental limit on the achievable performance but not necessarily a practical algorithm because it requires knowledge of the relevant item intensities and hazard rate distributions. This poses the question of possibly learning this information from the actual request data and alternatively, designing an eviction policy that does not make explicit use of these quantities but approximates optimal performance in practice; to some extent, LRU achieves this in the bursty case, but a counterpart for the increasing hazard rate is currently open.

Appendix A. Proof of Asymptotic Performance

In this section, we will provide a proof for our main result on asymptotic performance of the optimal policy for the uniformly integrable case. Referring back to Section 4, we begin by restating Equation (34):

m_{C^{*}}^{(N)} = \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} ({\hat{G}}_{N}^{(- i)} (X_{i}^{(N)}) ⩽ p_{N}) = \sum_{i = 1}^{N} λ_{i}^{(N)} P_{Φ_{i}}^{0} (X_{i}^{(N)} ⩽ {\hat{Q}}_{N}^{(- i)} (p_{N})),

(A.1)

where

p_{N} = 1 - \frac{C}{N - 1}

{\hat{G}}_{N}^{(- i)}

was defined in (33) to be the empirical distribution of the OSIs of items different from i, and

{\hat{Q}}_{N}^{(- i)}

is its corresponding quantile function.

To compute each term on the right, we must invoke the distribution of $X_{i}^{(N)}$ under the Palm probability; it would be more convenient if the right-hand side of the inequality does not depend on i. For this purpose, we will bound the quantiles obtained from ${\hat{G}}_{N}^{(- i)}$ with those of ${\hat{G}}_{N}$ defined in (24), which include all items; as usual, ${\hat{Q}}_{N}$ is the corresponding quantile function. The key observation is that the contribution of the ith item to the empirical distribution ${\hat{G}}_{N}$ of all OSIs is at most $\frac{1}{N}$ and therefore, negligible in the limit as N tends to infinity. More precisely, we have Lemma A.1.

Lemma A.1.

$\max_{1 ⩽ i ⩽ N} {‖ {\hat{G}}_{N} - {\hat{G}}_{N}^{(- i)} ‖}_{\infty} ⩽ \frac{1}{N} .$

Proof.

Recalling the definitions (24) and (33), we can write

\begin{array}{l} N {\hat{G}}_{N} (x) - (N - 1) {\hat{G}}_{N}^{(- i)} (x) & = \sum_{j = 1}^{N} 1_{{X_{j}^{(N)} ⩽ x}} - \sum_{j \neq i} 1_{{X_{j}^{(N)} ⩽ x}} = 1_{{X_{i}^{(N)} ⩽ x}}; \\ \Rightarrow N ({\hat{G}}_{N} (x) - {\hat{G}}_{N}^{(- i)} (x)) & = 1_{{X_{i}^{(N)} ⩽ x}} - {\hat{G}}_{N}^{(- i)} (x) . \end{array}

In the preceding identity, the right-hand side is the difference of two numbers, both in [0, 1]; therefore, $| {\hat{G}}_{N} (x) - {\hat{G}}_{N}^{(- i)} (x) | ⩽ \frac{1}{N}$ . This holds for every x and for each i. □

We will now make use of this lemma to bound the distribution of ${\hat{G}}_{N}^{(- i)} (X_{i}^{(N)})$ as required for (A.1) with that of ${\hat{G}}_{N} (X_{i}^{(N)})$ . However, this change brings a new difficulty because ${\hat{G}}_{N} (\cdot)$ and $X_{i}^{(N)}$ would not be independent. For this reason, to calculate the distribution, we will use a coupling argument and consider independent versions of the corresponding random variables.

Proposition A.1

(Miss Rate Bounds for Fixed N). Let $p_{N}^{\pm} = p_{N} \pm \frac{1}{N}$ , where $p_{N} = 1 - \frac{C}{N - 1}$ . Let $Θ_{N}^{\pm}$ be the probability distribution function of the random variable ${\hat{θ}}_{N}^{\pm} ≔ {\hat{Q}}_{N} (p_{N}^{\pm})$ under the probability $P$ . Then, the miss rate is bounded from above and below by

\int_{0}^{\infty} \int_{0}^{\infty} λ G_{0} (\frac{θ}{λ}) Θ_{N}^{-} (d θ) L_{N} (d λ) ⩽ \frac{m_{C^{*}}^{(N)}}{N} ⩽ \int_{0}^{\infty} \int_{0}^{\infty} λ G_{0} (\frac{θ}{λ}) Θ_{N}^{+} (d θ) L_{N} (d λ) .

(A.2)

Proof.

The proof is based on a coupling argument where we consider independent copies of the OSIs. The distributions of these copies mimic the distributions of the OSIs at an arbitrary time $t = 0$ (that is, under the probability measure $P$ ) and upon an arrival at time $t = 0$ (that is, under the Palm distribution $P_{Φ}^{0}$ ). More precisely, consider two independent, row-independent triangular arrays

{Y_{i}^{N} : N ⩾ 1, 1 ⩽ i ⩽ N} and {Z_{i}^{N} : N ⩾ 1, 1 ⩽ i ⩽ N}

defined on a common probability space

(\tilde{Ω}, \tilde{F}, \tilde{P})

with distributions

Y_{i}^{N} \sim G (\cdot / λ_{i}^{N}) and Z_{i}^{N} \sim G_{0} (\cdot / λ_{i}^{N}) for all 1 ⩽ i ⩽ N and N ⩾ 1 .

Let

{\hat{H}}_{N} (x) = \frac{1}{N} \sum_{j = 1}^{N} 1_{{Y_{j}^{N} ⩽ x}} and {\hat{H}}_{N}^{(- i)} (x) = \frac{1}{N - 1} \sum_{j \neq i} 1_{{Y_{j}^{N} ⩽ x}} .

Analogously to Lemma A.1, we have

\max_{1 ⩽ i ⩽ N} {‖ {\hat{H}}_{N}^{(- i)} - {\hat{H}}_{N} ‖}_{\infty} ⩽ \frac{1}{N} .

(A.3)

For each $N ⩾ 1$ and $1 ⩽ i ⩽ N$ , the random variables

Y_{1}^{N}, \dots, Y_{i - 1}^{N}, Z_{i}^{N}, Y_{i + 1}^{N}, \dots, Y_{N}^{N}

are independent, so if we let

U_{i}^{N} = {\hat{H}}_{N}^{(- i)} (Z_{i}^{N})

, its distribution under

\tilde{P}

will coincide with that of

{\hat{G}}_{N}^{(- i)} (X_{i}^{(N)})

under the Palm probability

P_{Φ_{i}}^{0}

: that is,

P_{Φ_{i}}^{0} ({\hat{G}}_{N}^{(- i)} (X_{i}^{(N)}) ⩽ p_{N}) = \tilde{P} (U_{i}^{N} ⩽ p_{N}) .

(A.4)

To bound the probability on the right-hand side, observe that from (A.3), we have

{{\hat{H}}_{N} (Z_{i}^{N}) ⩽ p_{N}^{-}} \subset {U_{i}^{N} ⩽ p_{N}} \subset {{\hat{H}}_{N} (Z_{i}^{N}) ⩽ p_{N}^{+}};

(A.5)

the proof proceeds by computing the probabilities of the extreme sets in (A.5), noting that under the constructed

\tilde{P}

Z_{i}^{N}

is independent of

{\hat{H}}_{N}

(a function of

Y_{1}^{N}, \dots Y_{N}^{N}

Focusing momentarily on the upper bound, we can write

\tilde{P} ({\hat{H}}_{N} (Z_{i}^{N}) ⩽ p_{N}^{+}) = \tilde{P} (Z_{i}^{N} ⩽ {\hat{Q}}_{N}^{H} (p_{N}^{+})),

(A.6)

where

{\hat{Q}}_{N}^{H}

denotes the quantile function of

{\hat{H}}_{N}

; by construction, the distribution of

{\hat{Q}}_{N}^{H} (p_{N}^{+})

under

\tilde{P}

coincides with that of

{\hat{θ}}_{N}^{+} = {\hat{Q}}_{N} (p_{N}^{+})

under

P

, which by hypothesis, is denoted by

Θ_{N}^{+} (θ)

. Because

Z_{i}^{N}

is independent and has distribution

G_{0} (\cdot / λ_{i}^{N})

, we can compute the right-hand side of (A.6) to be

\int_{0}^{\infty} \tilde{P} (Z_{i}^{N} ⩽ θ) Θ_{N}^{+} (d θ) = \int_{0}^{\infty} G_{0} (\frac{θ}{λ_{i}^{N}}) Θ_{N}^{+} (d θ) .

Multiplying by $λ_{i}^{N}$ and averaging over I, we obtain from (A.5)

\begin{array}{l} \frac{1}{N} \sum_{i = 1}^{n} λ_{i}^{N} \tilde{P} (U_{i}^{N} ⩽ p_{N}) & ⩽ \frac{1}{N} \sum_{i = 1}^{n} \tilde{P} ({\hat{H}}_{N} (Z_{i}^{N}) ⩽ p_{N}^{+}) \\ = \frac{1}{N} \sum_{i = 1}^{n} \int_{0}^{\infty} λ_{i}^{N} G_{0} (\frac{θ}{λ_{i}^{N}}) Θ_{N}^{+} (d θ) = \int_{0}^{\infty} \int_{0}^{\infty} λ G_{0} (\frac{θ}{λ}) Θ_{N}^{+} (d θ) L_{N} (d λ), \end{array}

where the last step invokes the definition of

L_{N}

. An analogous calculation provides the lower bound

\int_{0}^{\infty} \int_{0}^{\infty} λ G_{0} (\frac{θ}{λ}) Θ_{N}^{-} (d θ) L_{N} (d λ) ⩽ \frac{1}{N} \sum_{i = 1}^{n} λ_{i}^{N} \tilde{P} (U_{i}^{N} ⩽ p_{N}) .

Invoking (A.4) and Expression (A.1) for the miss rate, we conclude the proof. □

A.1. Convergence of Quantiles

Our next step is to compute the limit as $N \to \infty$ under suitable conditions of the lower and upper bounds in (A.2). We will use the convergence $L_{N} \Rightarrow L$ of Assumption 3, but we also need to address the convergence of $Θ_{N}^{\pm}$ and the distribution of quantiles for the random function ${\hat{G}}_{N}$ .

Recall that by Theorem 5, we have ${\hat{G}}_{N} \Rightarrow G_{\infty}$ with $P$ probability 1, where $G_{\infty}$ is defined by (29). A standard result that we have already invoked (e.g., van der Vaart 1998, lemma 21.2) is that convergence in distribution implies the convergence of quantiles at a fixed point p provided that the limit quantile function is continuous at p. We will need a slight generalization of this property for the case where quantiles of the sequence are evaluated at a variable point $p_{N}$ convergent to p. This is stated and proven next.

Proposition A.2

(Convergence of Quantiles). Let ${F_{N}}_{N ⩾ 1}$ be a sequence of cumulative distribution functions such that $F_{N} \Rightarrow F$ for some distribution F. Let $Q_{N}$ and Q be the quantile functions of $F_{N}$ and F, respectively. Let $p \in [0, 1]$ be a continuity point of Q and ${p_{N}}_{N ⩾ 1}$ be a sequence in [0, 1] such that $p_{N} \to p$ . Then, $Q_{N} (p_{N}) \to Q (p)$ .

Proof.

In the proof, we use the characterization of weak convergence in terms of the Lévy distance

d_{L} (F, G) ≔ \inf {ϵ > 0 ∣ F (x - ϵ) - ϵ ⩽ G (x) ⩽ F (x + ϵ) + ϵ, \forall x \in R}

between cdfs

F, G : R \to [0, 1]

. Convergence in the Lévy distance

d_{L} (F_{N}, F) \to 0

is equivalent to weak convergence

F_{N} \Rightarrow F

Let $ϵ > 0$ . By continuity of Q at p, there exists $δ > 0$ such that $| q - p | < δ$ implies $| Q (q) - Q (p) | < ϵ / 2$ . Let $δ_{N} ≔ d_{L} (F_{N}, F)$ . Choose $N_{0}$ such that $| p_{N} - p | < δ / 2$ and $δ_{N} ⩽ \min {ϵ, δ} / 2$ for all $N ⩾ N_{0}$ .

We claim that

Q (p_{N} - δ_{N}) - δ_{N} ⩽ Q_{N} (p_{N}) ⩽ Q (p_{N} + δ_{N}) + δ_{N} .

(A.7)

Indeed, for the left inequality of (A.7), let $x \in R$ be such that $F_{N} (x) ⩾ p_{N}$ . Then,

F (x + δ_{N}) ⩾ F_{N} (x) - δ_{N} ⩾ p_{N} - δ_{N},

which implies

x + δ_{N} ⩾ Q (p_{N} - δ_{N})

. Taking infimum over such x, we get

Q_{N} (p_{N}) ⩾ Q (p_{N} - δ_{N}) - δ_{N}

For the right inequality of (A.7), let $x \in R$ be such that $F (x - δ_{N}) ⩾ p_{N} + δ_{N}$ . Then,

F_{N} (x) ⩾ F (x - δ_{N}) - δ_{N} ⩾ p_{N},

which implies

x ⩾ Q_{N} (p_{N})

. Taking infimum over such x, we get

Q (p_{N} + δ_{N}) + δ_{N} ⩾ Q_{N} (p_{N})

. This proves the claim.

Let $N ⩾ N_{0}$ . Then,

Q (p) - ϵ ⩽ Q (p) - \frac{ϵ}{2} - δ_{N} < Q (p_{N} - δ_{N}) - δ_{N} ⩽ Q_{N} (p_{N}),

and

Q_{N} (p_{N}) ⩽ Q (p_{N} + δ_{N}) + δ_{N} < Q (p) + \frac{ϵ}{2} + δ_{N} ⩽ Q (p) + ϵ .

Thus, $| Q_{N} (p_{N}) - Q (p) | < ϵ$ for all $N ⩾ N_{0}$ . □

A.2. Proof of Theorem 6

We are now in a position to complete the proof of our main result on performance.

We first establish that ${\hat{θ}}_{N}^{\pm} ≔ {\hat{Q}}_{N} (p_{N}^{\pm}) \to_{N \to \infty} θ^{*} ≔ Q_{\infty} (1 - c)$ almost surely in $P$ . This follows by invoking Proposition A.2 at each $ω$ in the set where ${\hat{G}}_{N} \Rightarrow G_{\infty}$ , which has unit probability by Theorem 5; note that $p_{N}^{\pm} \to 1 - c$ , which by hypothesis, is a point of continuity of $Q_{\infty}$ .

Almost sure convergence implies convergence in probability and in distribution. Therefore, the distributions $Θ_{N}^{\pm}$ of the random variables ${\hat{θ}}_{N}^{\pm}$ both satisfy $Θ_{N}^{\pm} \Rightarrow δ_{θ^{*}}$ , a unit mass at the limit threshold. As a consequence, under Assumption 3, we have convergence of the product measures $Θ_{N}^{\pm} \otimes L_{N} \Rightarrow δ_{θ^{*}} \otimes L$ . We will use this fact to show that both bounds in (A.2) converge to the same limit, and it coincides with the right-hand side of (37).

Define for this purpose the function $h : R_{+}^{2} \to R_{+}$ by $h (λ, θ) = λ G_{0} (\frac{θ}{λ})$ . Our next claim is that

(Θ_{N}^{\pm} \otimes L_{N}) h^{- 1} \Rightarrow (δ_{θ^{*}} \otimes L) h^{- 1} .

(A.8)

This claim is established in the mapping theorem (Billingsley 1999, theorem 2.7) provided that h is measurable and that its set of discontinuities $D_{h}$ has zero measure in the limit (i.e., $(δ_{θ^{*}} \otimes L) (D_{h}) = 0$ ).

Because the limit measure in $θ$ is a point mass, we have

(δ_{θ^{*}} \otimes L) (D_{h}) = L ({λ : \frac{θ^{*}}{λ} \in D_{G_{0}}}) .

(A.9)

Also, note that $G_{0}$ being a distribution function, its set of discontinuities is countable. Therefore, the measure on the right of (A.9) can only be positive if there exists $λ_{0} \in D_{L}$ (an atom of the measure L) such that $\frac{θ^{*}}{λ_{0}} \in D_{G_{0}}$ . This would imply that $θ^{*} \in D_{L} \cdot D_{G_{0}}$ and is ruled out by hypothesis. Therefore, the claim holds.

To finish the theorem, we must go from the convergence in distribution given in (A.8) to a first-moment condition. Here is where uniform integrability will come into play. It is easiest to express this via independent random variables ${\hat{θ}}_{N}^{\pm}$ and $Λ_{N}$ with respective distributions $Θ_{N}^{\pm}$ and $L_{N}$ with ${\hat{θ}}_{N}^{\pm} \overset{d}{\to} θ^{*}$ and $Λ_{N} \overset{d}{\to} Λ$ , the latter with distribution L. Condition (A.8) is equivalent to the statement

h (Λ_{N}, {\hat{θ}}_{N}^{\pm}) = Λ_{N} G_{0} (\frac{{\hat{θ}}_{N}^{\pm}}{Λ_{N}}) \overset{d}{\to} Λ G_{0} (\frac{θ^{*}}{Λ}) = h (Λ, θ^{*}) .

Because $G_{0} ⩽ 1$ , we have $h (Λ_{N}, {\hat{θ}}_{N}^{\pm}) ⩽ Λ_{N}$ ; now, ${Λ_{N}}_{N ⩾ 1}$ is uniformly integrable by hypothesis, so the same happens with ${h (Λ_{N}, {\hat{θ}}_{N}^{\pm})}_{N ⩾ 1}$ . Now, invoke Billingsley (1999, theorem 3.5) to obtain

\int_{0}^{\infty} \int_{0}^{\infty} λ G_{0} (\frac{θ}{λ}) L_{N} (d λ) Θ_{N}^{\pm} (d θ) = E [Λ_{N} G_{0} (\frac{{\hat{θ}}_{N}^{\pm}}{Λ_{N}})] \underset{N \to \infty}{\to} E [Λ G_{0} (\frac{θ^{*}}{Λ})] = \int_{0}^{\infty} λ G_{0} (\frac{θ^{*}}{λ}) L (d λ) .

Thus, both upper and lower bounds in (A.2) converge to the given limit, which concludes the proof.

Endnotes

¹ Note that $τ_{0}$ is a proper random variable because ${τ_{0} ⩽ t} = {Φ ((t, 0]) = 0}$ and thus, is measurable for all $t ⩽ 0$ . Similarly, any point $τ_{k}$ is a random variable.

² We specifically avoid using the term cache for our system because in caching, normally only the recently requested objects are stored, and our model aims to generalize these policies.

³ We now make explicit the dependence on the system size N because we will study the limits as $N \to \infty$ .

⁴ If $θ_{N}$ is not in the range of valid hazard rates, the equivalence also holds by allowing zero (or infinite) timers (i.e., items that are never (or always) stored).

References

Baccelli F, Brémaud P (2013) Elements of Queueing Theory (Springer-Verlag, Berlin).Google Scholar
Bahat O, Makowski AM (2005) Measuring consistency in TTL-based caches. Performance Evaluation 62(1):439–455.Google Scholar
Barrera J, Fontbona J (2010) The limiting move-to-front search-cost in law of large numbers asymptotic regimes. Ann. Appl. Probab. 20(2):722–752.Google Scholar
Berger DS, Gland P, Singla S, Ciucu F (2014) Exact analysis of TTL cache networks. Performance Evaluation 79:2–23.Google Scholar
Berger DS, Henningsen S, Ciucu F, Schmitt JB (2015) Maximizing cache hit ratios by variance reduction. ACM SIGMETRICS Performance Evaluation Rev. 43(2):57–59.Google Scholar
Bianchi G, Detti A, Caponi A, Blefari Melazzi N (2013) Check before storing: What is the performance price of content integrity verification in LRU caching? ACM SIGCOMM Comput. Comm. Rev. 43(3):59–67.Google Scholar
Bilal M, Kang SG (2017) A cache management scheme for efficient content eviction and replication in cache networks. IEEE Access 5:1692–1701.Google Scholar
Billingsley P (1999) Convergence of Probability Measures, Wiley Series in Probability and Statistics, 2nd ed. (Wiley, Chichester, UK).Google Scholar
Borst S, Gupta V, Walid A (2010) Distributed caching algorithms for content distribution networks. 2010 Proc. IEEE INFOCOM (IEEE, Piscataway, NJ), 1–9.Google Scholar
Brémaud P (2020) Point Process Calculus in Time and Space (Springer, New York).Google Scholar
Che H, Tung Y, Wang Z (2002) Hierarchical web caching systems: Modeling, design and experimental results. IEEE J. Selected Areas Comm. 20(7):1305–1314.Google Scholar
Colosimo E, Ferreira F, Oliveira M, Sousa C (2002) Empirical comparisons between Kaplan–Meier and Nelson–Aalen survival function estimators. J. Statist. Comput. Simulation 72(4):299–308.Google Scholar
Daley DJ, Vere-Jones D (2003) An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods (Springer, New York).Google Scholar
Dan A, Towsley D (1990) An approximate analysis of the LRU and FIFO buffer replacement schemes. Proc. ACM/SIGMETRICS 1990 (ACM, New York), 143–152.Google Scholar
Dehghan M, Massoulie L, Towsley D, Menasche DS, Tay YC (2019) A utility optimization approach to network cache design. IEEE/ACM Trans. Networking 27(3):1013–1027.Google Scholar
Ferragut A, Carrasco M, Paganini F (2024) Caching or pre-fetching? The role of hazard rates. 2024 60th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1–8.Google Scholar
Ferragut A, Rodriguez I, Paganini F (2016) Optimizing TTL caches under heavy tailed demands. Proc. ACM/SIGMETRICS 2016 (ACM, New York), 101–112.Google Scholar
Ferragut A, Rodríguez I, Paganini F (2018) Optimal timer-based caching policies for general arrival processes. Queueing Systems 88(3–4):207–241.Google Scholar
Fill JA (1996) Limits and rates of convergence for the distribution of search cost under the move-to-front rule. Theoret. Comput. Sci. 164(1):185–206.Google Scholar
Fofack NC, Nain P, Neglia G, Towsley D (2012) Analysis of TTL-based cache networks. Proc. Internat. Conf. Performance Evaluation Methodologies Tools (VALUETOOLS) (IEEE, Piscataway, NJ), 1–10.Google Scholar
Fofack NC, Nain P, Neglia G, Towsley D (2014) Performance evaluation of hierarchical TTL-based cache networks. Comput. Networks 65:212–231.Google Scholar
Fricker C, Robert P, Roberts J (2012) A versatile and accurate approximation for LRU cache performance. Proc. 24th Internat. Teletraffic Congress (IEEE, Piscataway, NJ), 57–64.Google Scholar
Garetto M, Leonardi E, Martina V (2016) A unified approach to the performance analysis of caching systems. ACM Trans. Model. Performance Evaluation Comput. Systems 1(3):1–28.Google Scholar
Gast N, Houdt BV (2015) Transient and steady-state regime of a family of list-based cache replacement algorithms. Proc. ACM/SIGMETRICS 2015 (ACM, New York), 123–136.Google Scholar
Gelenbe E (1973) A unified approach to the evaluation of a class of replacement algorithms. IEEE Trans. Comput. C-22(6):611–618.Google Scholar
Ioannidis S, Yeh E (2016) Adaptive caching networks with optimality guarantees. ACM SIGMETRICS Performance Evaluation Rev. 44(1):113–124.Google Scholar
Ioannidis S, Massoulie L, Chaintreau A (2010) Distributed caching over heterogeneous mobile networks. Proc. ACM SIGMETRICS Internat. Conf. Measurement Model. Comput. Systems (ACM, New York), 311–322.Google Scholar
Jelenković PR (1999) Asymptotic approximation of the move-to-front search cost distribution and least-recently used caching fault probabilities. Ann. Appl. Probab. 9(2):430–464.Google Scholar
Jelenković P, Radovanović A (2003) Asymptotic insensitivity of least-recently-used caching to statistical dependency. Proc. IEEE/INFOCOM 2003 (IEEE, Piscataway, NJ), 438–447.Google Scholar
Jelenković PR, Radovanović A (2004) Least-recently-used caching with dependent requests. Theoret. Comput. Sci. 326(1):293–327.Google Scholar
Jelenković PR, Radovanović A (2008) The persistent-access-caching algorithm. Random Structures Algorithms 33(2):219–251.Google Scholar
Jelenković PR, Radovanović A, Squillante MS (2006) Critical sizing of LRU caches with dependent requests. J. Appl. Probab. 43(4):1013–1027.Google Scholar
Jung J, Berger AW, Balakrishnan H (2003) Modeling TTL-based internet caches. Proc. IEEE/INFOCOM 2003 (IEEE, Piscataway, NJ), 417–426.Google Scholar
King W (1971) Analysis of paging algorithms. Proc. IFIP Congress 1971 (North-Holland Publishing Company, Amsterdam), 485–490.Google Scholar
Martina V, Garetto M, Leonardi E (2014) A unified approach to the performance analysis of caching systems. Proc. IEEE/INFOCOM 2014 (IEEE, Piscataway, NJ), 2040–2048.Google Scholar
Panigrahy NK, Li J, Towsley D (2017) Hit rate vs. hit probability based cache utility maximization. ACM SIGMETRICS Performance Evaluation Rev. 45(2):21–23.Google Scholar
Panigrahy NK, Li J, Towsley D, Hollot CV (2020) Network cache design under stationary requests: Exact analysis and Poisson approximation. Comput. Networks 180:107379.Google Scholar
Panigrahy NK, Nain P, Neglia G, Towsley D (2022) A new upper bound on cache hit probability for non-anticipative caching policies. ACM Trans. Model. Performance Evaluation Comput. Systems 7(2–4):5.Google Scholar
Rosensweig EJ, Kurose J, Towsley D (2010) Approximate models for general cache networks. Proc. IEEE/INFOCOM 2010 (IEEE, Piscataway, NJ), 1–9.Google Scholar
Shorack GR (1979) The weighted empirical process of row independent random variables with arbitrary distribution functions. Statistica Neerlandica 33(4):169–189.Google Scholar
Tanner MA, Wong WH (1983) The estimation of the hazard function from randomly censored data by the kernel method. Ann. Statist. 11(3):989–993.Google Scholar
van der Vaart AW (1998) Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar

Articles In Advance

Article Information

Metrics

Information

Received:November 29, 2024
Accepted:March 31, 2026
Published Online:May 13, 2026

Cite as

Matias Carrasco, Andres Ferragut, Fernando Paganini (2026) Optimal Local Storage Policy Based on Stochastic Intensities and Its Large-Scale Behavior. Stochastic Systems 0(0).

https://doi.org/10.1287/stsy.2024.0093

Keywords

PDF download

Available Issues

Available Issues

Optimal Local Storage Policy Based on Stochastic Intensities and Its Large-Scale Behavior

Abstract

1. Introduction

1.1. Related Work

1.2. Main Contributions

1.3. Organization of the Paper

2. Preliminaries and Notation

2.1. Stochastic Intensity

2.2. Renewal Point Processes

2.3. Pareto Interarrival Times

3. Local Memory Systems and Optimal Storage Policy

3.1. The Case of Independent Request Streams

4. Large-Scale Analysis of the Optimal Causal Policy

4.1. Threshold Characterization of the Optimal Policy

4.2. Scaling of the Request Intensities

4.3. Asymptotic Behavior of the Optimal Causal Policy

5. Asymptotic Optimal Performance

6. Threshold Policies and Their Timer-Based Counterparts

6.1. Threshold Policies and Memory Occupation

6.2. Asymptotic Optimality of Threshold Policies

6.3. Connection with Timer-Based Policies: Monotone Hazard Rates

7. Extension to Heterogeneous Object Sizes

8. Parametric Examples and Simulations

8.1. Pareto Interarrival Times and Zipf Popularities

8.2. Erlang Interrequest Times and Zipf Popularities

8.3. Practical Implementation

9. Conclusions

Appendix A. Proof of Asymptotic Performance

A.1. Convergence of Quantiles

A.2. Proof of Theorem 6

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords