Open Access

Directed Acyclic Graph-Type Distributed Ledgers via Young-Age Preferential Attachment

Christian Mönch
Corresponding Author
Christian Mönch
[email protected]
https://orcid.org/0000-0002-6531-6482
Institute of Mathematics, Johannes Gutenberg University, 55128 Mainz, Germany;
Search for more papers by this author
,
Amr Rizk
Amr Rizk
[email protected]
https://orcid.org/0000-0002-9385-7729
Institute for Computer Science and Business Information Systems, University of Duisburg-Essen, 45127 Essen, Germany
Search for more papers by this author

Christian Mönch

Corresponding Author

Christian Mönch

[email protected]

https://orcid.org/0000-0002-6531-6482

Institute of Mathematics, Johannes Gutenberg University, 55128 Mainz, Germany;

Search for more papers by this author

Amr Rizk

[email protected]

https://orcid.org/0000-0002-9385-7729

Institute for Computer Science and Business Information Systems, University of Duisburg-Essen, 45127 Essen, Germany

Search for more papers by this author

Published Online:5 Jun 2023https://doi.org/10.1287/stsy.2022.0005

Abstract

Distributed ledger technologies provide a mechanism to achieve ordering among transactions that are scattered on multiple participants with no prerequisite trust relations. This mechanism is essentially based on the idea of new transactions referencing older ones in a chain structure. Recently, directed acyclic graph (DAG)-type distributed ledgers that are based on DAGs were proposed to increase the system scalability through sacrificing the total order of transactions. In this paper, we develop a mathematical model to study the process that governs the addition of new transactions to the DAG-type distributed ledger. We propose a simple model for DAG-type distributed ledgers that are obtained from a recursive young-age preferential attachment scheme (i.e., new connections are made preferably to transactions that have not been in the system for very long). We determine the asymptotic degree structure of the resulting graph and show that a forward component of linear size arises if the edge density is chosen sufficiently large in relation to the “young-age preference” that tunes how quickly old transactions become unattractive.

Funding: The research of C. Mönch is supported by the Deutsche Forschungsgemeinschaft [Grant 443916008].

1. Motivation and Background

1.1. Introduction

Distributed ledger technologies (DLTs), which are based on directed acyclic graphs (DAGs) such as the Tangle (Popov 2016) or Coordicide (Popov et al. 2020), generalize blockchain-based DLTs by weakening the ordering among the transaction blocks that constitute the ledger. Blockchain-based DLTs achieve total ordering of the transactions contained by the ledger through the rule of the longest chain as coined in Nakamoto (2006). Here, transactions are collected into blocks that each include a reference to the previous block such that a chain of ordered transactions emerges. When blocks are created concurrently such that only a partial order exists, the rule of the longest chain dictates that blocks that do not belong to the longest chain are discarded. This is a major factor that impairs the scalability of blockchain-based DLTs in terms of the rate of adding transactions to the ledger.

DAG-type distributed ledgers achieve scalability through allowing partially ordered transaction blocks (i.e., transaction blocks may reference more than one previous block). In this paper, we study the relation of design parameters for DAG-type DLTs based on the transient and limiting properties of the underlying transaction graph. We consider the graph of transaction blocks that emerges through block attachment (i.e., new blocks referencing preceding ones). We focus on two design choices for DAG-type DLTs.

How many preceding blocks should a new block reference?
How should the age of a preceding block impact the reference choice of a new block?

The rationale behind these two choices lies in the formulation of simple, probabilistic block reference rules that guarantee the existence of a giant forward component based on (i) a simple function of the number of preceding blocks to reference and (ii) an age preference parameter. A preference for old or high-degree vertices creates a graph architecture that is locally tree like and features vertices of arbitrary high degree, which differs extremely from the linear structure of blockchain-based DLT. Young-age preferential attachment (YAPA) interpolates between these two structural extremes and generates a sparse, locally tree-like graph in which there are no hubs (i.e., the maximal degree grows only weakly with the system). We provide a simple reference rule that allows us to study the two questions analytically and investigate some transient and asymptotic properties of the transaction block graph, such as the reference degree distribution, degree evolutions, the existence of a giant forward component, and the size of the “detached surface” of vertices in the transaction graph that do not connect to any previous transactions.

1.2. Related Literature About DLTs

Blockchain technology is based on the following fundamental principles. (i) Decentralization. Because of its distributed architecture, a blockchain does not rely on a central server but rather, stores data in a decentralized manner, and it can be updated by any node in the network. (ii) Transparency. Each node (peer) has a local copy of the blockchain, and updates are sent as a broadcast to all nodes. Therefore, any change to the blockchain is transparent to all members of the network. (iii) Open source. The original blockchain protocol introduced with Bitcoin is publicly available and can be used for development. (iv) Autonomy. All peers can update or transmit the stored data. Each node can verify the accuracy of all data received. This eliminates the need for trust between participants. (v) Immutability. Once a transaction block is part of the blockchain, it is stored publicly forever, and its contents (i.e., transactions) cannot be edited without changing the hash value stored in all subsequent blocks. (vi) Anonymity. Data transmission in the blockchain is done via addresses, and therefore, user identities remain hidden.

Current blockchain-based cryptocurrencies are often presented as an alternative to traditional currencies. However, there are questions about the scalability and performance of distributed ledgers compared with established payment systems (Croman et al. 2016, Pappalardo et al. 2017). Currently, the Bitcoin network is only capable of processing very few transactions per second (Blockchain Luxembourg S.A. 2021c), and it takes up to 20 minutes for a newly posted transaction to be processed (Blockchain Luxembourg S.A. 2021b). Established payment processors, on the other hand, process up to four orders more transactions per second, and new transactions are usually processed within a few seconds. A significant influence on the performance of the globally distributed peer-to-peer Bitcoin system (Donet et al. 2014) is the propagation speed of the blocks in the network (Decker and Wattenhofer 2013). For example, for a 1-MB block, which is roughly the average block size (Blockchain Luxembourg S.A. 2021a), it takes about 2.4 minutes (Croman et al. 2016) for 90% of the nodes to receive this block. With the block interval of 15 minutes used by the Bitcoin network, this poses a significant problem, which increasingly leads to the splitting of the blockchain—the so-called blockchain forks (Decker and Wattenhofer 2013). The emergence of such forks significantly decreases the performance of the distributed ledger system.

Although approaches already exist to optimize the propagation speed of information in the distributed ledger network (Fadhil et al. 2016, 2017), these also do not reach the performance of current financial transaction systems. In addition to optimizing the propagation speed, other improvements arise in the communication protocol used (Croman et al. 2016, Gobel and Krzesinski 2017). One way to optimize the current Bitcoin protocol in terms of transmission speed is to minimize unnecessary transmissions by adjusting the sequence of exchanged messages needed to transmit new blocks. Instead of first publishing the availability of information, blocks that do not exceed a certain threshold in terms of size can thus be distributed directly to neighboring nodes. This can increase the effective transmission speed especially for small blocks, where the protocol overhead has a particularly large impact. Decker and Wattenhofer (2013) have shown that such protocol adaptations can reduce blockchain forks by about 50%.

To increase scalability and transaction throughput, nonlinear graph-based distributed ledgers (Lewenberg et al. 2015, Popov 2016, Sompolinsky et al. 2016, Yeow et al. 2017, Sompolinsky et al. 2021) replace the traditional blockchain with a global DAG that contains all recorded transactions. In this graph, the nodes are transaction blocks (consisting of one or more transactions), whereas the edges indicate the validation of previous blocks/transactions. Each added node should validate $k \geq 1$ transactions, where k can be fixed or random. A key advantage of DAG-based systems is that by deviating from the linear longest blockchain rule, scalability can be greatly improved. The longest blockchain rule essentially requires all nodes to know about newly created blocks quickly, thus limiting the creation of blocks to allow for the propagation of this knowledge before a new block is created. Instead of trusting the longest-chain rule as given in blockchain-distributed ledgers, DAG-based distributed ledgers rely on metrics, such as the largest strongly connected cluster of transaction nodes in the transaction graph, to determine the trusted transactions (Sompolinsky et al. 2021).

Beyond financial transactions, there exist many application areas for distributed ledgers; for example, in the context of distributed and secure storage of personal data (Zyskind et al. 2015) or for asynchronous messaging services in the context of distributed computing (Yin et al. 2018). Particularly, the former use case is of high relevance because of increasingly frequent incidents in which personal data are publicly accessible because of security breaches or data misuse. Here, the use of distributed ledgers as a system for distributed access control offers a potential solution. Instead of storing data centrally and granting direct access, queries are mapped as well as sharing information via transactions. Combined with off-chain storage of sensitive data, it is thus possible to develop a distributed ledgers system in which each user retains full control over their own data. In addition to increased security, this has the advantage that laws and other regulations can be implemented directly in the protocol used in the form of policies (Zyskind et al. 2015).

1.2.1. Organization of the Paper.

In the following section, we introduce the model formally and present our mathematical main results. Section 3 contains further discussion of how our results relate to previous work and of the practical implication of our results for DLT applications. Section 5 contains numerical simulations. Finally, detailed proofs of our main results are provided in Section 6.

2. Model and Main Results

2.1. Model Definition

The model we propose is based on young-age preferential attachment. We study a recursive growth parametrized by two numbers, the edge density parameter $α > 0$ and the reinforcement bias $β > 0$ . We use the shorthand notations $[n]$ for ${1, \dots, n}$ and $x \land y, x \lor y$ to denote $\min (x, y)$ and $\max (x, y)$ , respectively. At step n = 1, the graph G₁ is initialized with a vertex labeled one. At any step n > 1, one vertex labeled n is added to the system, and G_n is obtained recursively from $G_{n - 1}$ according to the following attachment rule. For each $m \in [n - 1]$ , n sends an arc to m independently of everything else with probability

\frac{α}{n - 1} {(\frac{m}{n - 1})}^{β} \land 1 .

(1)

We formally write $G_{n} = ([n], E_{n})$ with $E_{n} \subset [n] \times [n]$ for the graph obtained upon adding the nth vertex. If n sends an arc to $m \in [n - 1]$ , we denote this event by $n \to m$ . The notation $n ↠ m$ is shorthand for the existence of a directed path from n to $m \in [n - 1]$ . Note that the direction of the arcs along any path can be recovered from the labels of the vertices that belong to the path (Figure 1).

Figure 1. A Depiction of Sample Realizations of the Young-Age Preferential Attachment Model with Reinforcement Bias β = 2 and Edge Density $α \in {3, 3.5, 4}$
*Notes*. The node size and color correspond to the node indegree. (a) α = 3. (b) $α = 3.5$ . (c) α = 4.

2.2. Key Properties of the YAPA Model

Ignoring the dynamical aspect of our model, it can be viewed as an instance of the directed inhomogeneous random graph model proposed in Cao and Olvera-Cravioto (2020), which allows us to infer some basic properties of the network from known results about this model class. We first establish that the young-age preferential attachment model produces a sparse graph (i.e., the number of arcs is of the same order as the number of vertices).

Proposition 1

(Sparsity). It holds that

\lim_{N \to \infty} \frac{| E_{N} |}{N} = \frac{α}{β + 1} i n L^{1} and almost surely .

Next, we determine the distributional limit of the vertex degrees in G_n. For any vertex $m \in [n]$ , we denote by $D_{n} (m) = (D_{n}^{in} (m), D_{n}^{out} (m))$ its degree (vector) in G_n: that is,

D_{n}^{in} (m) = | {k : k \to m in G_{n}} | and D_{n}^{out} (m) = | {k : k \leftarrow m in G_{n}} | .

Note that the outdegree of a vertex is determined once and for all upon its insertion (i.e., $D_{n}^{out} (m) = D_{m}^{out} (m) ≕ D^{out} (m)$ for all $n \geq m$ ). As can be guessed from the fact that connections occur independently and with probability of order $1 / n$ , the asymptotic degree distribution is bivariate Poisson.

Theorem 1

(Limiting Degree Distribution and Rate of Convergence). Let G_N denote the young-age preferential attachment graph on N vertices. For $U \in [N]$ uniformly chosen among the vertices of G_N, the following asymptotic results hold:

D_{N} (U) = (D_{N}^{in} (U), D^{out} (U)) \to (Z^{in}, Z^{out}) i n distribution and i n L^{1} a s N \to \infty .

Here, $Z^{in}$ and $Z^{out}$ are independent, and $Z^{out}$ is $Poisson (α / (β + 1))$ distributed. The limiting indegree $Z^{in}$ has distribution $mixed-Poisson (Λ)$ with mixing random variable $Λ = α (1 - V^{β}) / β$ , where V is $Uniform [0, 1]$ . Moreover, we have that

d_{TV} (P^{in/out} (N), p^{in/out}) = O (\sqrt{\frac{{(\log N)}^{3}}{N}}), almost surely a s N \to \infty,

(2)

where

P^{in/out} (N) = {(P_{k}^{in/out} (N))}_{k = 0}^{\infty}

denotes the empirical in- or outdegree distribution of G_N,

p^{in/out} = {(p_{k}^{in/out})}_{k = 0}^{\infty}

denotes the distribution of

Z^{in/out}

, and

d_{TV} (\cdot, \cdot)

denotes total variation distance.

Remark 1.

The form of the limiting degree distribution in Theorem 1 is obtained rather directly by applying the machinery of Cao and Olvera-Cravioto (2020), but the error bound (2) is new and does not follow from previous results about inhomogeneous random graphs. Inspection of the proof further yields that an error of the same order can be achieved for any graph on N vertices in which edges are drawn independently with probability at most c/N for some universal c (cf. Lemma 3). We do not believe that the logarithmic term is asymptotically sharp, but the polynomial order $1 / \sqrt{N}$ is consistent with known results for other sparse models with independent edges, such as the Erdös–Rényi graph (cf. van der Hofstad 2017, theorem 5.12).

It follows from Theorem 1 that whereas the outdegree is asymptotically independent of the time at which a node enters the system, the indegree is not. Essentially, the limiting indegree distribution of a given node is also $Poisson (α / (β + 1))$ , but the time needed to collect all incoming links grows proportionally with network size. A node born at the beginning corresponds to V near zero, and a node born at a time close to the total age of the system corresponds to V close to one.

The indegree dynamics of fixed vertices are further characterized by the following bounds.

Proposition 2

(Degree Evolutions). The maximal (total) degree of any vertex in G_N is almost surely $O (\log N)$ . We further have that

for any $n \geq m \geq 1$ ,
$P [D_{n}^{in} (m) > x] \leq \exp [- \frac{α}{β} - x (\log x + \log \frac{β}{α} - \frac{β}{α})], x > α / β;$
$E [D^{out} (m)] = \frac{α}{β + 1} + O (1 / m)$ for any $m \geq 1 .$

Let us now discuss the connectivity of the YAPA network. The forward component of $m \in [n - 1]$ after the arrival of vertex n, $1 \leq m \leq n$ is given by

C_{n} (m) = {k : k ↠ m in G_{n}} \cup {m},

and for the distributed ledger application discussed in Section 1, it is desirable that the network be rooted (i.e.,

C_{n} (1) = [n]

). Because of the stochastic nature of the attachment mechanism, this is typically not the case but can formally be achieved by augmenting the network by some root vertex ρ that deterministically connects to precisely those vertices that have outdegree 0. Of course, the proportion of vertices that fail to connect to any existing vertices should be rather small, which is ensured by the light tails of the limiting Poisson distribution.

Corollary 1

(Detached Surface). We have that

\lim_{N \to \infty} \frac{| {m \in {2, \dots, N} : D^{out} (m) = 0} |}{N} = e^{- α / (β + 1)} almost surely and i n L^{1} .

Although there is still a positive fraction of nodes with no left neighbors, this fraction is not very large; for instance, if we would like to make sure that with probability at least 0.995, a typical vertex sends an arc to at least one vertex, it suffices to choose $α > \log (200) (β + 1) \approx 5.3 (β + 1)$ .

Next, we would like to ensure that the vast majority of vertices that do connect upon arrival in the system are contained in the forward component of only a handful of vertices. This can only be achieved if there exist vertices m such that $C_{N} (m)$ grows linearly with N. Our first result concerning such macroscopic components states that the forward component of a typical vertex is not of linear order.

Proposition 3

(No Giant Forward Components for Typical Vertices). Let $U \in [N]$ be uniformly chosen; then, for any $ε > 0$ , we have

\lim_{N \to \infty} P (| C_{N} (U) | \geq ε N) = 0 .

Remark 2.

As for Theorem 1, the neighborhood around a typically chosen vertex can be described by the local limit approach used in Cao and Olvera-Cravioto (2020). It is not hard to see that the weak local limit of $C_{N} (U)$ is in fact the family tree obtained from the multitype Galton–Watson process given by the following recursion.

Sample $V \in [0, 1]$ uniformly. Generation 0 of the tree comprises the root, which has type V.
For any $k \geq 1$ , given generation k − 1 of the tree, each member m in generation k − 1 of type V_m independently produces a $Poisson$ ( $α (1 - V_{m}^{β}) / β$ )-distributed number of offspring to obtain the kth generation, and each of the offspring thus produced is independently assigned a type drawn uniformly from $[V_{m}, 1]$ .

The proof of Proposition 3 in Section 6 shows that the local limit tree dies out for any choice of $α, β > 0$ . Contrary to the local behavior near a typical vertex, if $α > β + 1$ , most vertices indeed belong to the forward components of a few old vertices. More precisely, let $n_{0} = ⌊ 1 / α ⌋$ ; then, the vertices $1, \dots, n_{0}$ are always sequentially connected by construction, and we thus focus on the forward components of vertices added after n₀.

Theorem 2

(Existence of Giant Forward Components). Let $m > n_{0}$ be given. Then, almost surely,

\lim_{N \to \infty} \frac{| C_{N} (m) |}{N} = {\begin{array}{l} 0, & i f α \leq β + 1, \\ Z (m), & i f α > β + 1, \end{array}

where Z(m) is a nontrivial random variable that is concentrated on the two solutions 0 and

γ > 0

of the fixed point equation

x = 1 - e^{- \frac{α}{β + 1} x}, x \in [0, \infty) .

Theorem 2 is the main result of this article. The apparent differences in the characterizations obtained in Proposition 3 and Theorem 2 are because of the reducibility of the graph. Proposition 3 is concerned with the forward component of a typical (i.e., uniformly chosen) vertex. Such a typical vertex has a birth time of order N and can simply not collect enough direct and indirect references up to time N to develop a large forward component. In particular, applying the machinery of Cao and Olvera-Cravioto (2020) (after some adaptations to account for irreducibility) yields that typical vertices do not have a giant forward component; see the proof of Proposition 3.

In contrast, Theorem 2 states that if an early vertex is kept fixed and the time horizon is extended to infinity, with positive probability, the selected vertex will be the root of a forward component of linear size eventually. We stress that it is not possible to derive statements of this type directly using the general theory of Cao and Olvera-Cravioto (2020), which uses the local weak limit of the graph (i.e., limit distributions of neighborhoods of typical vertices) and therefore, does not provide sufficient information about vertices with index $n = o (N)$ .

3. Further Related Work and Discussion of Results

3.1. Design of Tip Selection for DAG-Type Distributed Ledgers

DAG-type distributed ledgers, such as IOTA (Popov 2016), utilize different algorithms for achieving consensus. State-of-the-art DAG-type DLTs repurpose the act of selecting transactions for validation (i.e., constructing arcs from an arriving node to previous ones for achieving consensus). In a nutshell, the tip selection algorithm in Popov (2016) is based on a biased random walk, and a transaction that is directly or indirectly validated by a large portion of the subsequent transactions is highly probable to be considered valid. Precisely, a tip on the surface of the DAG is selected based on a random walk from genesis to the surface. This random walk is biased in the sense that nodes possess a weight that resembles the number of subsequent nodes that are directly or indirectly attached to this node and that the jump probability of the random walk from a node i with weight w_i to a node j with weight w_j is $exp (- η (w_{i} - w_{j}))$ with a given constant $η > 0$ . For finite time, the value of the constant α may play a role to adjust the tip selection to mimic a uniformly random tip selection or to quickly concentrate the tip selection on a set of young nodes. However, this algorithm is different from our YAPA model as it concentrates the tip selection on certain walks where the node weights are large.

Further works, such as Kusmierz and Gal (2018) and Ferraro et al. (2020), analyze different properties of the DAG-type distributed ledgers, especially the IOTA DAG, which is denoted the Tangle. Kusmierz and Gal (2018) derive approximations of the probabilities that a node is left behind or that a node becomes a permanent tip. Specifically, the first approximation of the probability that a node is left behind is similar to the detached surface evolution in Corollary 1. Ferraro et al. (2020) provide a modified IOTA tip selection algorithm that is based on a combination of two Markov chain Monte Carlo algorithms with different parameters that aims to have all nodes validated. They prove this property using a fluid limit in the arrival rate of transactions.

In Coordicide (Popov et al. 2020), which is a follow-up work on IOTA constituting a modern DAG-type DLT, tip selection is fixed to validating two existing transactions. Here, the authors dispense with tip selection as a consensus mechanism and rather, consider it from a performance point of view (i.e., analyzing the impact of the tip selection on the DAG properties, especially in discouraging “lazy” behavior, which resembles the recurring validation of a fixed set of old transactions).

We consider this work in the same vein (i.e., our YAPA model provides a relationship between the parameters of the tip selection algorithm (that is, the edge density α and the reinforcement bias β) and the DAG-distributed ledger properties). We establish the YAPA model in (1) as a rule to encourage the validation of young transactions. From the calculation of the asymptotic distribution of the outdegree $Z^{out}$ (which turns out to be $Poisson (α / (β + 1))$ ) as well as the nonasymptotic expected outdegree at a given time point $E [D_{k}^{out}]$ , we know the expected number of transactions that are validated given parameters α, β. Further, the reinforcement bias $β > 0$ is understood as a forgetting factor, precisely a parameter to set the preference for validating young transactions (i.e., larger $β > 0$ leads to a stronger preference of younger transactions over older ones). Hence, by choosing α, β while ensuring $α > β + 1$ , the properties of the distributed ledger can be controlled. As is reflected in our results, we study the following properties: (i) the emergence of “giant” components (i.e., a linear increase of the number of direct and indirect validations that a transaction receives over time), (ii) the average number of direct validations received by a transaction over time, and (iii) the number of orphan transactions.

3.2. Related Mathematical Results

Our model is similar in spirit to the one introduced in Lyon and Mahmoud (2020) but with two important differences. First, their model produces a tree, whereas our model produces a directed acyclic graph that is typically not a tree. Second, we modulate the attachment via the parameters α and β to obtain a whole range of instances. This flexibility allows us to tune the model to specific applications and also allows us to observe the connectivity phase transition of Theorem 2. The young-age preferential attachment tree of Lyon and Mahmoud (2020) corresponds to running our model with β = 1 and conditioning on the creation of precisely one arc per time step. As explained in Section 2, our model can be analyzed in the general framework of Cao and Olvera-Cravioto (2020), who adapted the very general framework of inhomogeneous random graphs developed in Bollobás et al. (2007) to directed graphs, thereby generalizing earlier results of Bloznelis et al. (2012). We use some techniques from the (directed) inhomogeneous random graph world to obtain the degree sequence and analyze the emergence of a giant connected component for our model in Proposition 3. However, because $κ (\cdot, \cdot)$ is not reducible and we investigate connected components in terms of directed paths, our main result in Theorem 2 cannot be obtained by a direct application of the results in Bloznelis et al. (2012) and Cao and Olvera-Cravioto (2020). We also emphasize the dynamic nature of our model, which is not captured by the inhomogeneous random graph framework. Furthermore, if we were to ignore the direction of the arcs and consider G_N as an undirected graph, then it is straightforward to deduce that the limit $\lim_{N \to \infty} | C_{N} (U) | / N$ in Proposition 3 would be positive whenever $α > β + 1$ (cf. Bollobás et al. 2007).

The limiting case β = 0 of the YAPA model corresponds to a directed version of the uniformly grown random graph investigated in Bollobás and Riordan (2004) and Bollobás et al. (2005); this is a very natural recursive model that was first studied by Dubins in the 1980 s (cf. Kalikow and Weiss 1988, Shepp 1989, Durrett and Kesten 1990). Although we exclusively consider positive values of β, choosing $β < 0$ would lead to a preference for old-age vertices and produce an effect similar to degree-based preferential attachment models, such as the Barabási–Albert model (Barabási and Albert 1999). Models of this type have been investigated thoroughly in the last 20 years (see, e.g., Bollobás et al. 2001; Dereich and Mörters 2009, 2013; and the references therein). Old-age preferential attachment models produce graphs with entirely different features, such as a scale-free degree distribution, whereas the degrees in the YAPA model are fairly sharply concentrated around their expectation as is exemplified by Proposition 2.

4. Simulation

In the following, we describe evaluation results that are based on simulating the model with attachment probabilities given by (1). Transaction blocks arrive according to a Poisson process with rate λ = 1. We fix the reinforcement bias β = 2 and variate the edge density parameter α. Figure 2 shows the evolution of the connected component size starting at the genesis node (transaction) for the first 10⁵ transactions for different α. Note the linear growth rate of the connected component for $α > β + 1$ . The figure shows the empirical average in addition to the standard deviation (which is depicted as the shaded area) from 20 independent simulation runs. Note that the standard deviation for the subcritical regime α = 2 is large because there is no natural scaling (in N) for the component size.

Figure 2. Evolution of the Connected Component Size (Starting at the Genesis Node) for Reinforcement Bias β = 2 and Variable Edge Density Parameter $α \in {2, 3, 4}$
*Notes*. As an initial condition, we assume that the first 10 nodes reference the genesis node. (a) α = 2. (b) α = 3. (c) α = 4. (d) Variable α.

Figure 3 shows the error $| C_{N} (m) | / N - Z (m)$ as calculated from Theorem 2 for $α = 4, β = 2, m = 10$ . The figure shows clearly the convergence of the ratio $| C_{N} (m) | / N$ to Z(m) with a growing number of vertices N.

Figure 3. Numerical Verification of Theorem 2 for $α = 4, β = 2, m = 10$
*Note*. The error $| C_{N} (m) | / N - Z (m)$ diminishes with an increasing number of vertices N.

Figure 4 shows the properties of the shortest paths to the genesis node for various values of the reinforcement bias β and the edge density parameter α. In light of the impact of the tip selection algorithm parameterization through α and β, the figure reflects the shape of the transaction DAG (i.e., its depth and width in terms of the length of the shortest paths to genesis). In comparison, note that the extreme cases of a DAG of depth 1 or a totally ordered chain produce sum lengths of the shortest paths of size n and $n^{2} / 2$ , respectively. The figure also shows markers for the values of $(α, β)$ that correspond to a mean outdegree $E [Z^{out}] = 2$ , which is used in the context of DAG-type DLT (Popov 2016, Popov et al. 2020) as a deterministic guideline for the number of validated nodes per new node. Although the fraction of nodes not connected to genesis remains constant as given by the detached surface approximation in (5), the shape of the DAG DLT can be controlled by the choice of α, β. Note that a similar argument can be made in terms of the shortest paths to some arbitrary genesis cluster of nodes.

Figure 4. Properties of the Shortest Paths to Genesis for $α \in {4, 5, 6}, β \in {1, \dots, 3}$
*Notes*. The width and depth of the constructed graphs show a strong dependency on β. The figures are obtained from 10² simulation runs each, including $n = 10^{4}$ transactions. The black markers correspond to a mean outdegree $E [Z^{out}] = 2$ . (a) Cumulative length of the shortest paths to genesis in hops. (b) Mean length of the shortest paths to genesis in hops.

5. Implications for DLT Applications

Our work is motivated by questions on the design choices of DAG-based DLT applications, in particular on how the analyzed YAPA referencing scheme that relates newly generated blocks to previously generated ones affects the DLT properties. In the following, we discuss the implications of the obtained results for some design aspects of DLT applications.

5.1. Tip Selection

Tip selection is often convolved with consensus as in Popov (2016) and Müller et al. (2022). In Popov et al. (2020), tip selection is considered separately and set to referencing a fixed number of previous blocks. When tip selection is convolved with consensus, it is usually based on some aggregate weight of the blocks that must be tracked (Popov 2016, Müller et al. 2022). Some works also propose uniform random tip selection on a subset of (nonconflicting) blocks, such as Müller et al. (2022).

The first implication of the YAPA model definition is that users that generate blocks do not have to track or calculate block weights according to more or less complex and time-consuming algorithms. The configuration model is entirely determined by the YAPA parameters α, β. In turn, these determine the required storage and computation capacity for reference operations (i.e., the number of references (hashes) that are required on average as in Proposition 1). Note that each reference is usually tied to one hash operation as well as a linear growth of storage space. This stands in contrast to the deterministic number of references as well as the weight-based referencing methods.

5.2. Laziness and Permanent Tips

The reviewed state of the art makes great effort to avoid lazy behavior (i.e., preventing users from referencing a fixed set of old blocks). The temptation to do this is based on the reuse of computed hashes in old blocks in referencing for new blocks. The YAPA configuration readily dispenses with this problem for $α, β > 0$ . On the other hand, the risk of a block becoming a permanent tip is shown for our model in Corollary 1. This can be considered the cost of using the simple YAPA configuration model as the detached surface contains transaction blocks that are never verified.

5.3. Consensus and Time to Confirmation

The results of this work emphasize that consensus in a DAG-based DLT that uses the proposed YAPA configuration model can be reached. Theorem 2 shows that a consensus method based on the so-called heaviest weighted subchain (Wang et al. 2022) or the so-called heaviest sub-DAG (Müller et al. 2022) is viable under a proper YAPA configuration of $α > β + 1$ . The simulation results validate this behavior for a range of realistic parameters (i.e., average number of references/hashes of $α \in {2, 3, 4}$ and β = 2). Further, the simulation results provide insights in the form if the DLT is controlled by the YAPA configuration α, β. Related works, such as Popov (2016) and Müller et al. (2022), assume a constant Tangle width and that the numbers of tips of the Tangle are stationary with a given mean. Note that in Müller et al. (2022), this is justified by a number of chosen tips equal to one. Our simulations show, on the other hand, that the shape of the Tangle can be controlled by adjusting α and β at a fixed average outdegree (i.e., number of referenced previous blocks). In turn, the shape of the Tangle implicates the causal relations of transaction blocks (i.e., for a given number of blocks, N having a few long chains of blocks (approaching line topologies) with crossreferences is semantically different from having many short chains of blocks (approaching a star topology)). The former form distributes the cumulative weight required for consensus, whereas the latter concentrates this weight on a few blocks. Nevertheless, the result in Theorem 2 shows that consensus can be achieved after long-enough time.

Time to confirmation is a performance metric for the consensus protocol of the DLT. As discussed, the YAPA configuration model for a DLT provides two guarantees. First, the confirmation time in terms of the rate of convergence of the distribution of the number of direct block references received is bounded in Theorem 1. This stands in contrast to heuristics to obtain estimates of the time to confirmation as, for example, in Popov (2016) and Müller et al. (2022), where some difficulty stems from the intractability of the referencing rule.

6. Proofs

The proofs of our main results are divided into three sections. The first section contains the derivation of results that can be obtained in the general framework of Cao and Olvera-Cravioto (2020). The second section provides concentration bounds for degree properties, which are needed in the last section or are interesting in their own right, and the final section contains the proof of our main result, Theorem 2.

6.1. Directed Inhomogeneous Random Graphs and Branching Process Approximation

We call

w_{n - 1} (m) = α {(\frac{m}{n - 1})}^{β}, 1 \leq m + 1 \leq n,

the weight of vertex m upon the arrival of vertex n. An equivalent way of characterizing our model is to fix

N \in N

and rewrite the connection probabilities using the kernel

κ : {[0, 1]}^{2} \to [0, \infty)

given by

κ (x, y) = α x^{β} y^{- (β + 1)} 1 {x < y} .

(3)

Then,

P (n \to m in G_{N}) = (\frac{κ (m / N, n / N)}{N} {(1 + \frac{1}{n - 1})}^{β + 1}) \land 1, 1 \leq m < n < N .

Setting also

ϕ (\frac{m}{N}, \frac{n}{N}) = {(1 + \frac{1}{n - 1})}^{β + 1} - 1 \in O (1 / n),

the pair

(κ, ϕ)

parametrizes an instance of a directed inhomogeneous random graph in the sense of Cao and Olvera-Cravioto (2020). We can thus readily apply several results of Cao and Olvera-Cravioto (2020), by which we immediately establish sparsity of G_N and the form of the limiting degree distribution in the next subsection. Note that, however, κ is clearly a reducible kernel because it has no mass on the lower diagonal. Therefore, the connectivity results for irreducible kernels established in Cao and Olvera-Cravioto (2020) do not apply.

It is straightforward to verify the regularity assumptions listed in Cao and Olvera-Cravioto (2020, assumption 3.1) for our model using κ and $ϕ$ as defined and interpreting $[0, 1]$ together with the Lebesgue measure as the ground space. Proposition 1 is now a special case of the corresponding statement in Cao and Olvera-Cravioto (2020).

Proof of Proposition 1.

By Cao and Olvera-Cravioto (2020, proposition 3.3), we have

\frac{| E_{N} |}{N} \to \int_{0}^{1} \int_{0}^{1} κ (x, y) d x d y = \frac{α}{β + 1} in L^{1} .

Each

X_{n} = | E_{n + 1} | - | E_{n} | = \sum_{m = 1}^{n} 1 {n + 1 \to m}, n \in N,

is a sum of independent Bernoulli random variables with variance

σ_{n}^{2} \leq \sum_{m = 1}^{n} \frac{w_{n} (m)}{n} [(1 - \frac{w_{n} (m)}{n}) \lor 0], n \in N .

In particular, because $w_{n} (m) \leq α$ for all m, n, we have

\sum_{n = 1}^{\infty} \frac{σ_{n}^{2}}{n^{2}} \leq \sum_{n = 1}^{\infty} \frac{α}{n^{2}} < \infty,

and Kolmogorov’s Strong Law of Large Numbers (see, e.g., Feller 1950, theorem 8.3) yields almost sure convergence. □

Proof of Limiting Distribution in Theorem 1.

The limiting distribution described in Theorem 1 is obtained as a special case of the statement of Cao and Olvera-Cravioto (2020, theorem 3.4) when applied to the kernel κ given in (3). □

Proof of Corollary 1.

The detached surface

D (n) = {1 < k \leq n : D^{out} (k) = 0}, n > 2,

comprises all vertices that have arrived and do not attach to any previous vertices. We let

Δ_{n} = | D (n) |

denote the size of the detached surface at time n. L¹ convergence of

Δ_{n} / n

e^{- \frac{α}{β + 1}}

is a direct consequence of Theorem 1 because

E [Δ_{n} / n] + 1 / n

is precisely the probability that a uniformly chosen vertex has outdegree 0. Alternatively, we can calculate the asymptotic density of Δ_n directly. The probability that the kth arriving vertex does not attach to any previous vertex is

p_{k}^{'} = \prod_{i = 1}^{k - 1} (1 - p_{k} (i)), k > 1 .

(4)

Using that $p_{k} (i) \leq \frac{α}{k - 1}$ , we obtain that $\log (1 - p_{k} (i)) = - p_{k} (i) + O (1 / k^{2})$ and thus,

\lim_{k \to \infty} p_{k}^{'} = \lim_{k \to \infty} e^{- \sum_{i = 1}^{k - 1} p_{k} (i)} = e^{- \frac{α}{β + 1}} .

(5)

The outdegrees of different vertices are independent, and therefore, Δ_n is a sum of independent $Bernoulli (p_{k}^{'})$ -distributed random variables. The corresponding variances $p_{k}^{'} (1 - p_{k}^{'})$ are uniformly bounded by one; thus, another invocation of Kolmogorov’s Strong Law of Large Numbers yields almost sure convergence of $Δ_{n} / n$ to $e^{- α / (β + 1)} .$ □

Proof of Proposition 3.

In Cao and Olvera-Cravioto (2020, section 4.3.1), it is shown that the forward exploration started in a uniform vertex can be coupled to a multitype branching process with family tree $T$ . It is straightforward to deduce from the form of κ that this branching process is the Galton–Watson process described in Remark 2. It is well known and not hard to prove that if $| T | < \infty$ almost surely, then we have $P (C_{N} (U) > ε N) \to 0$ as $N \to \infty$ for any $ε > 0,$ and additionally, see, for example, van der Hofstad (2021, corollary 2.27). The survival probability $ζ = P (| T | = \infty)$ of the Galton–Watson process can be characterized via the Volterra-type operator $T = T (κ)$ on $L^{2} [0, 1]$ given by

T f (x) = α x^{β} \int_{x}^{1} y^{- β - 1} f (y) d y, x \in (0, 1),

and see Bollobás et al. (2007) and Cao and Olvera-Cravioto (2020) for a general account of the connection between

T

and T. Note that T is an unbounded operator (because of the divergence of

y^{- β - 1}

at zero), which makes it necessary to use the approximating family of operators

{T_{δ}, δ > 0}

given by

T_{δ} f (x) = α x^{β} \int_{x \lor δ}^{1} y^{- β - 1} f (y) d y, x \in (0, 1), δ > 0 .

Our goal is to show that for any choice of $δ > 0$ , we have that $ζ \leq δ$ , and hence, ζ = 0. Let U_t denote the type of vertex $t \in T$ , and let U₀ denote the type of the root; then,

P (| T | = \infty) \leq δ + P (| T | = \infty, U_{0} > δ) .

Note that ${U_{0} > δ} = {U_{t} > δ \forall t \in T}$ because types must increase along any genealogical path in $T$ . On the event ${U_{t} > δ \forall t \in T}$ , the evolution of the branching process is represented by the operator $T_{δ}$ . More precisely, we have that $T_{δ}^{k} 1 (x), k = 1, 2, \dots$ , where $1 (x) \equiv 1$ denotes constant unit function and describes the expected number of offspring of type $d x$ in the kth generation $T^{k}$ of $T$ ; hence,

E (| T^{k} | | U_{0} > δ) = \frac{1}{1 - δ} \int_{δ}^{1} T_{δ}^{k} 1 (x), d x .

(6)

We assume for the moment that $T_{δ}$ is quasinilpotent for any $δ > 0$ (i.e., $\lim_{k \to \infty} ‖ T_{δ}^{k} ‖ = 0$ , where $‖ \cdot ‖$ denotes the operator norm on $L^{2} [δ, 1]$ ). In particular, this implies that $E (| T^{k} | | U_{0} > δ)$ vanishes as $k \to \infty$ , and hence,

P (| T | = \infty | U_{0} > δ) = \lim_{k \to \infty} P (| T^{k} | \geq 1 | U_{0} > δ) \leq \lim_{k \to \infty} E (| T^{k} | | U_{0} > δ) = 0 .

Thus, it only remains to show that for any fixed $δ > 0, T_{δ}$ is quasinilpotent. Because $y^{- β - 1}$ is bounded on $[δ, 1]$ , it is straightforward to verify that $T_{δ}$ is a compact operator on $L^{2} [δ, 1]$ ; hence, $T_{δ}$ is quasinilpotent precisely if zero is its only eigenvalue (see, e.g., Dunford and Schwartz 1988, section VII 5.13). Assume for contradiction that $T_{δ}$ has the eigenvalue $λ \neq 0$ : that is, that there is some $f \in L^{2} [δ, 1] ∖ {0}$ satisfying for almost every $x \in [δ, 1]$

λ f (x) = T_{δ} f (x) = α x^{β} \int_{x}^{1} y^{- β - 1} f (y) d y .

(7)

Because f is integrable, the right-hand side is a continuous function of x by dominated convergence, and we may thus assume that f has a continuous version g that satisfies (7) for all $x \in (δ, 1)$ . Because g is continuous, it follows from Lebesgue’s differentiation theorem that $d / d x \int_{x}^{1} y^{- 1 - β} g (y) d y$ exists and is equal to $x^{- β - 1} g (x)$ for every $x \in (δ, 1)$ . It follows that $T_{δ} g$ is differentiable and thus, that $g = λ^{- 1} T_{δ} g$ is differentiable (even smooth) on $(δ, 1)$ . In fact, we have, for $x \in (δ, 1)$ ,

\begin{array}{l} λ g' (x) = (T g)' (x) & = α β^{β - 1} \int_{x}^{1} y^{- β - 1} g (y) d y - α x^{- 1} g (x) \\ = \frac{β}{x} T g (x) - \frac{α}{x} g (x) \\ = \frac{λ β - α}{x} g (x) . \end{array}

Furthermore, we have

\lim_{x \to 1} g (x) = \frac{α}{λ} \lim_{x \to 1} \int_{x}^{1} y^{- 1 - β} g (y) d y = 0 .

We conclude that g is a solution to the final value problem

u (1) = 1, u' (x) = c x^{- 1} u (x), with c = β - \frac{α}{λ}

(δ, 1]

. The Picard–Lindelöf theorem now yields that g is unique. If

λ = \frac{α}{β}

, then g vanishes and cannot be an eigenfunction, thus contradicting our initial assumption. If

λ \neq \frac{α}{β}

, then we have

g (x) = x^{β - \frac{α}{λ}} - 1, x \in (δ, 1)

, implying that, for all

x \in (δ, 1)

\begin{array}{l} T g (x) & = α x^{β} \int_{x}^{1} y^{- 1 - \frac{α}{λ}} d y - α x^{β} \int_{x}^{1} y^{- β - 1} d y \\ = λ x^{β - \frac{α}{λ}} - λ x^{β} + \frac{α}{β} x^{β} - \frac{α}{β} \\ = λ g (x) + (λ - λ x^{β} + \frac{α}{β} x^{β} - \frac{α}{β}) . \end{array}

Because g is an eigenfunction for λ, the term in parentheses must vanish (i.e., $λ = \frac{α}{β}$ , which is a contradiction). We conclude that $T_{δ}$ has only the eigenvalue 0 and is thus, quasinilpotent. □

6.2. Concentration Bounds

We begin by stating a straightforward concentration bound on the maximal degree in G_N, which applies not only to our model but also, to any directed inhomogeneous random graph in which κ and $ϕ$ are bounded.

Lemma 1.

Let G_N denote a random graph in which each arc (m, n) is included independently with probability $p_{m n}^{(N)} .$ Suppose that $p_{m n}^{(N)} \leq A / N$ uniformly in m, n for all sufficiently large N and some $A < \infty$ ; then, the maximal total degree Δ_N in G_N satisfies

P (Δ_{N} > 2 A + λ) \leq N e^{- \frac{λ^{2}}{4 A + 2 λ / 3}}

(8)

for any N. In particular, it follows that almost surely

Δ_{N} \in O (\log N) .

Proof.

Let D_n denote the total degree of $n \in V (G_{N})$ . By assumption, D_n is dominated by a $Bin (2 N, A / N)$ random variable; hence,

P (D_{n} \geq 2 A + λ) \leq e^{- \frac{λ^{2}}{4 A + \frac{2 λ}{3}}}, n = 1, \dots, N,

using a classical concentration inequality, such as Chung and Lu (2006, theorem 3.2). Summing over n yields (8), and we note that choosing

λ = λ (N) = C \log N

implies that

N e^{- \frac{λ^{2}}{4 A + 2 λ / 3}} < \infty,

if C is sufficiently large, which allows us to conclude the almost sure bound on Δ_N. □

Proof of Proposition 2.

The statement about the maximal degree is an immediate consequence of Lemma 1. Let us now establish the stated bounds on the degree evolutions. The evolution of the indegree $D_{n}^{in} (m)$ of a fixed vertex m after the arrival of vertex k > m can be modeled using a counting process that jumps by one upon the arrival of the jth vertex after k if this vertex attaches to m. Hence, jumps occur with probability

p_{k + j} (m) = \frac{w_{k + j} (m)}{k + j - 1} \land 1, j \in N_{0},

(9)

for m and k > m fixed. The indegree of vertex m upon the arrival of vertex n is hence given by a generalized binomial distribution and satisfies

E [D_{n}^{in} (m)] = \sum_{j = m + 1}^{n} p_{j} (m) .

(10)

In particular, we have

E [D_{n}^{in} (m)] \leq \frac{α}{β} [1 - {(\frac{m}{n - 1})}^{β}], 1 \leq m < n,

and

E [D_{n}^{in} (m)] \geq \frac{α}{β} [{(1 - \frac{1}{m + 1})}^{β} - {(\frac{m + 1}{n})}^{β}], ⌈ α ⌉ \leq m < n .

The moment-generating function of $D_{n}^{in} (m)$ is given as

E [e^{θ D_{n}^{in} (m)}] = \prod_{j = m + 1}^{n} (1 + p_{j} (m) (e^{θ} - 1)), θ > 0 .

(11)

A bound on the tail of the indegree distribution can now be derived using Markov’s inequality

P [D_{n}^{in} (m) > x] \leq \inf_{θ > 0} e^{- θ x} \prod_{j = m + 1}^{n} (1 + p_{j} (m) (e^{θ} - 1)) .

We obtain

\begin{array}{l} e^{- θ x} \prod_{j = m + 1}^{n} (1 + p_{j} (m) (e^{θ} - 1)) & \leq \exp (- θ x + (e^{θ} - 1) \sum_{j = m + 1}^{n} p_{j} (m)) \\ \leq \exp (- θ x + (e^{θ} - 1) \frac{α}{β}), \end{array}

and the right-hand side is minimized by

θ = \log \frac{x β}{α}

. Thus, for any

n > m \geq 1

, we have the uniform upper tail bound

P [D_{n}^{in} (m) > x] \leq \exp [- \frac{α}{β} - x (\log x + \log \frac{β}{α} - \frac{β}{α})], x > α / β,

which establishes part (i). Turning to part (ii), we recall that the probability that k attaches itself to

i \in [k - 1]

is given by (9). The expected outdegree is hence given by

E [D^{out} (k)] = \sum_{i = 1}^{k - 1} p_{k} (i) = \frac{α}{β + 1} + O (1 / k),

where the error term is simply because of approximating the sum by an integral. □

Proof of the Error Bound in Theorem 1.

We only provided the detailed calculation for the indegree distribution; a similar but simpler argument yields the same bound for the outdegree distribution. Let us set, for each $n \in N$ ,

P_{n} (k) = \frac{1}{n} \sum_{m = 1}^{n} 1 {D_{n}^{in} (m) = k}, k = 0, 1, 2, \dots,

and denote the target distribution by

{(p (k))}_{k = 0}^{\infty}

. We have

p (k) = \int_{0}^{1} e^{- \frac{α}{β} (1 - s^{β})} {(\frac{α}{β} (1 - s^{β}))}^{k} \frac{1}{k!} d s, k = 0, 1, 2, \dots,

and applying the triangle inequality,

\sum_{k = 0}^{\infty} | P_{n} (k) - p (k) | \leq \sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | + \sum_{k = 0}^{\infty} | E P_{n} (k) - p (k) |, n \in N .

(12)

Note that the first error is probabilistic and that the second error is essentially just a discretization error because

E P_{n} (k) = \frac{1}{n} \sum_{m = 1}^{n} e^{- \sum_{j = m + 1}^{n} (α \frac{m^{β}}{{(j - 1)}^{β + 1}}) \land 1} \frac{{(\sum_{j = m + 1}^{n} (α \frac{m^{β}}{{(j - 1)}^{β + 1}}) \land 1)}^{k}}{k!}, k \leq n,

(13)

which can be obtained, for example, by differentiating (11). We calculate the bound for each error in the following two lemmas.

Lemma 2.

We have that

\sum_{k = 0}^{\infty} | E P_{n} (k) - p (k) | = O (\frac{\log n}{n}) .

(14)

Proof.

First note that $(α m^{β} / {(j - 1)}^{β + 1}) \land 1 = 1$ only if $m \leq j - 1 \leq n_{0}$ , and thus, the error induced in the calculation of the probability weights by ignoring the truncation is $O (n^{- 1})$ and may thus be ignored. A straightforward integral approximation yields

α m^{β} \sum_{j = m + 1}^{n} {(j - 1)}^{- β - 1} = \frac{α}{β} (1 - {(\frac{m}{n})}^{β}) + O (m^{- 1})

for all sufficiently large n. Because

\frac{d}{d x} e^{- x} x^{k} = (k - x) e^{- x} x^{k - 1}

, we obtain that, for any

C < \infty

, the function

e^{- x} x^{k}

is uniformly Lipschitz continuous in x on

[- C / m, α / β + C / m]

with the m-independent Lipschitz constant

L_{k} = ℓ_{1} k ℓ_{2}^{k - 1}, k \in N,

where

ℓ_{1}, ℓ_{2}

are independent of k. Setting further

L_{0} = K

for some sufficiently large K, we conclude that there is a k-independent constant

C < \infty

, such that

\begin{array}{l} | \frac{1}{n} \sum_{m = 1}^{n} & e^{- \sum_{j = m + 1}^{n} α \frac{m^{β}}{{(j - 1)}^{β + 1}}} {(\sum_{j = m + 1}^{n} α \frac{m^{β}}{{(j - 1)}^{β + 1}})}^{k} \\ - \frac{1}{n} \sum_{m = 1}^{n} e^{- \frac{α}{β} (1 - {(\frac{m}{n})}^{β})} {(\frac{α}{β} (1 - {(\frac{m}{n})}^{β}))}^{k} | \leq \frac{C L_{k}}{n} \sum_{m = 1}^{n} \frac{1}{m} . \end{array}

To approximate the discrete outer sum by an integral, we use that if $f : [a, b] \to R$ is continuously differentiable, then

| \frac{1}{n} \sum_{i = 1}^{n} f (\frac{i (b - a)}{n}) - \int_{a}^{b} f (s) d s | \leq \frac{b - a}{2 n} \sup_{x \in [a, b]} f' (x),

and hence, the constants

L_{k}, k \geq 0

can be used to bound this approximation error as well; we end up with

\begin{array}{l} | \frac{1}{n} \sum_{m = 1}^{n} & e^{- \sum_{j = m + 1}^{n} α \frac{m^{β}}{{(j - 1)}^{β + 1}}} {(\sum_{j = m + 1}^{n} α \frac{m^{β}}{{(j - 1)}^{β + 1}})}^{k} \\ - \int_{0}^{1} e^{- \frac{α}{β} (1 - s^{β})} {(\frac{α}{β} (1 - s^{β}))}^{k} d s | \leq \frac{C_{1} L_{k}}{n} \sum_{m = 1}^{n} \frac{1}{m} + \frac{C_{2}}{n}, \end{array}

where C₁, C₂ are finite constants and independent of n, m, and k. Noting further that

\sum_{k = 0}^{\infty} \frac{L_{k}}{k!} < \infty

, we arrive at (14) because

\sum_{k = 1}^{n} | E P_{n} (k) - p (k) | + \sum_{k = n}^{\infty} p_{k} \leq \frac{(C_{1} + C_{2})}{n} \sum_{m = 1}^{n} \frac{1}{m} \sum_{k = 1}^{n} \frac{L_{k}}{k!} + \sum_{k = n}^{\infty} p_{k} = O (\log n / n)

because

\sum_{k = n}^{\infty} p_{k}

decays exponentially in n. □

Lemma 3.

We have that almost surely

\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | = O (\frac{{(\log n)}^{3 / 2}}{\sqrt{n}}) .

Proof.

Fix k, and write $X_{i} = 1 {D_{n} (i) = k}, i = 1, \dots, n$ . Note that the $X_{i}, i = 1, \dots, n$ are independent, and thus, another invocation of Chernoff’s inequality (see, e.g., Chung and Lu 2006, theorem 3.2) yields that

P (| \sum_{i = 1}^{n} X_{i} - E X_{i} | > n λ) \leq 2 \exp (- \frac{{(λ n)}^{2}}{2 (\sum_{i = 1}^{n} E X_{i} + \frac{λ n}{3})}) = 2 \exp (- \frac{λ^{2} n}{2 (E P_{n} (k) + \frac{λ}{3})}),

(15)

with

λ > 0

. Let E_n denote the event in which there is no vertex

i \in [n]

with

D_{n} (i) \geq C \log n

; then, for any

λ' > 0

\begin{array}{l} P (\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | \geq λ') & \leq P ({\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | \geq λ'} \cap E_{n}) + P (E_{n}^{c}) \\ \leq P (\sum_{k = 0}^{C \log n} | P_{n} (k) - E P_{n} (k) | + \sum_{k = C \log n}^{\infty} E P_{n} (k) \geq λ') + P (E_{n}^{c}) . \end{array}

Arguing as in the proof of Lemma 1, we may choose C so large that $\sum_{n = 1}^{\infty} P (E_{n}^{c}) = B < \infty$ and such that

\sum_{k = C \log n}^{\infty} E P_{n} (k) \leq \frac{1}{n} E # {i \in [n] : D_{n} (i) \geq C \log n} \leq \frac{1}{n}

for all sufficiently large n. Observe further that

\sum_{k = 0}^{C \log n} | P_{n} (k) - E P_{n} (k) | \geq λ' - 1 / n

is only possible if at least one of the terms

| P_{n} (k) - E P_{n} (k) |

exceeds

(λ' - 1 / n) / (C \log n)

, and we conclude that

P (\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | \geq λ') \leq P (E_{n}^{c}) + \sum_{k = 0}^{C \log n} P (| P_{n} (k) - E P_{n} (k) | \geq \frac{λ' - 1 / n}{C \log n}) .

Choosing now $λ = λ (N) = D \sqrt{\log n / n}$ for some sufficiently large D and setting $λ' = λ' (n) = λ C \log n + 1 / n$ allow us to apply (15) to obtain

\sum_{n = 1}^{\infty} P (\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | \geq R \frac{{(\log n)}^{3 / 2}}{\sqrt{n}}) \leq B + C \sum_{n = 1}^{\infty} e^{- λ^{2} n / (2 + 2 λ / 3)} \log n < \infty

for some

R < \infty

, which implies that almost surely

\sum_{k = 0}^{\infty} | P_{n} (k) - E P_{n} (k) | = O (\frac{\log n^{3 / 2}}{\sqrt{n}}) .

□

Combining the previous lemmas concludes the proof of Theorem 1.

6.3. Forward Component Size and Weight Evolution—Proof of Theorem 2

Let $A \subset [n]$ ; then,

w_{n} (A) = \sum_{m \in A} w_{n} (m), n \in N,

and the weight of A upon arrival of vertex n + 1 essentially determines how likely it is that n + 1 connects to a vertex in A. Set

Γ_{n} (m) = | C_{n} (m) |, n_{0} < m \leq n,

and that is,

Γ_{n} (m)

counts how often m has been referenced (directly and indirectly). We have

Γ_{m} (m) = 0

and

Γ_{n + 1} (m) = {\begin{array}{l} Γ_{n} (m), & with probability \prod_{l \in C_{n} (m)} (1 - \frac{w_{n} (l)}{n}), \\ Γ_{n} (m) + 1, & with probability 1 - \prod_{l \in C_{n} (m)} (1 - \frac{w_{n} (l)}{n}), \end{array} n > m .

Hence, transitions of the processes $Γ (m) = {(Γ_{n} (m))}_{n \geq m}$ depend on the weight structure of the forward component of m, not only on its present size. In particular, each $Γ (m)$ is a Markov process with respect to the filtration generated by the graph sequence ${(G_{n})}_{n \geq m}$ but not with respect to the smaller filtration generated by the sequence ${(Γ_{n} (m))}_{n \geq m}$ itself. It is, therefore, more convenient to study the weight processes $W_{n} (m) = w_{n} (C_{n} (m)), n \geq m$ , which have the dynamics

W_{n + 1} (m) = {\begin{array}{l} {(\frac{n}{n + 1})}^{β} W_{n} (m), & with probability \prod_{k \in C_{n} (m)} (1 - \frac{w_{n} (k)}{n}), \\ {(\frac{n}{n + 1})}^{β} W_{n} (m) + α, & with probability 1 - \prod_{k \in C_{n} (m)} (1 - \frac{w_{n} (k)}{n}) . \end{array}

To eliminate the slightly unwieldy products structure of the transition probability, we formulate a stochastic domination result. We use the notation

Δ a_{n} = a_{n + 1} - a_{n}, n = n_{1}, n_{1} + 1, n_{1} + 2, \dots

for the increments of a random or deterministic sequence

{(a_{n})}_{n \geq n_{1}}

and further write

{W_{n + 1} ↑}

for the event that α is added to the score process in step n + 1. The complementary event is denoted by

{W_{n + 1} ↓}

, and we extend this notation to all other processes

X = (X_{n})

for which

Δ X_{n}

can attain precisely two values.

Lemma 4.

Let $Y_{n} (m) = Y_{n} (m; a, b, y)$ denote the Markov chain given by $Y_{m} (m) = y$ and

Δ Y_{n} (m) = {\begin{array}{l} - \frac{b + 1}{n + 1} Y_{n} (m) + \frac{a}{n + 1}, & with probability (1 - e^{- Y_{n} (m)}), \\ - \frac{b + 1}{n + 1} Y_{n} (m), & with probability e^{- Y_{n} (m)}, \end{array} n \geq m .

Then, ${(W_{n} (m) / n)}_{n \geq m}$ stochastically dominates ${(Y_{n} (m; α, β, W_{m} (m) / m))}_{n \geq m}$ .

Proof.

We start by analyzing the transition probabilities of $W_{n} = W_{n} (m) :$

1 - \prod_{k \in C_{n} (m)} (1 - \frac{w_{n} (k)}{n}) = 1 - e^{\sum_{k \in C_{n} (m)} \log (1 - w_{n} (k) / n)}

= 1 - \exp (- \sum_{k \in C_{n} (m)} \sum_{j = 1}^{\infty} \frac{{(\frac{w_{n} (k)}{n})}^{j}}{j}) .

Noting that $w_{n} (k) / n$ is uniformly bounded by $\frac{α}{n}$ , we can estimate

\begin{array}{l} 1 - e^{- \sum_{k \in C_{n} (m)} \frac{w_{n} (k)}{n}} & \leq 1 - \exp (- \sum_{k \in C_{n} (m)} \sum_{j = 1}^{\infty} \frac{{(\frac{w_{n} (k)}{n})}^{j}}{j}) \\ \leq 1 - \exp (- \sum_{k \in C_{n} (m)} (\frac{w_{n} (k)}{n} + (1 + ε) \frac{w_{n} {(k)}^{2}}{2 n^{2}})), \end{array}

(16)

for any

ε > 0

, if n is sufficiently large. Another Taylor expansion yields

1 - \frac{β + 1}{n + 1} \leq {(\frac{n}{n + 1})}^{β + 1} \leq 1 - \frac{β + 1}{n + 1} + \frac{β (β + 1)}{2 {(n + 1)}^{2}}, n \geq m .

(17)

We conclude that

\begin{array}{l} Δ \frac{W_{n}}{n} = (1 - {(\frac{n}{n + 1})}^{β + 1}) \frac{W_{n}}{n} + \frac{α}{n + 1} 1 {n + 1 connects to C_{n} (m)} \\ \geq - \frac{β + 1}{n + 1} \frac{W_{n}}{n} + \frac{α}{n + 1} 1 {n + 1 connects to C_{n} (m)}, \end{array}

and because of the lower bound in (16), the event in the indicator occurs with probability at least

1 - e^{- \sum_{k \in C_{n} (m)} \frac{w_{n} (k)}{n}} = 1 - e^{- W_{n} / n},

given

C_{n} (m)

, which proves the lemma. □

To obtain a stochastic upper bound is only slightly more tricky.

Lemma 5.

Let $Y_{n} (m; a, b, y)$ be defined as in Lemma 4, and let $a > α, b < β$ . There exits n₁ such that for all $n \geq n_{1}$ , we can couple $Y_{n} = Y_{n} (n_{1}; a, b, w)$ and $W_{n} = W_{n} (m)$ conditionally on $\frac{a}{α n_{1}} W_{n_{1}} = w$ such that

Y_{n} \geq \frac{a}{α n} W_{n}, for all n \geq n_{1} .

Proof.

We argue by induction. The claim holds by construction at any given initial time $n_{1} \geq m .$ Now, assume that the coupling has been established for times $n_{1}, \dots, n$ . Then,

Y_{n + 1} = (1 - \frac{b + 1}{n + 1}) Y_{n} + \frac{a}{n + 1} 1 {Y_{n + 1} ↑} \geq (1 - \frac{b + 1}{n + 1}) \frac{a}{α} \frac{W_{n}}{n} + \frac{a}{α} \frac{α}{n + 1} 1 {Y_{n + 1} ↑},

where we have used the definition of Y_n and the induction hypothesis. Now, note that

P (Y_{n + 1} ↑ | W_{n}, Y_{n}) = 1 - e^{- Y_{n}} \geq 1 - e^{- \frac{a}{α n} W_{n}} \geq P (W_{n + 1} ↑ | W_{n}, Y_{n})

by the induction hypothesis and the upper bound in (16) if

n \geq n_{1}

is sufficiently large. We conclude that

W_{n + 1}

and

Y_{n + 1}

can be coupled such that

{W_{n + 1} ↑} \subset {Y_{n + 1} ↑}

, and it follows that under this coupling,

\begin{array}{l} Y_{n + 1} & \geq \frac{a}{α} [(1 - \frac{b + 1}{n + 1}) \frac{W_{n}}{n} + \frac{α}{n + 1} 1 {Y_{n + 1} ↑}] \\ \geq \frac{a}{α} [{(\frac{n}{n + 1})}^{β + 1} \frac{W_{n}}{n} + \frac{α}{n + 1} 1 {W_{n + 1} ↑}] \\ = \frac{a}{α} \frac{W_{n + 1}}{n + 1} \end{array},

where we have used (17) and once more that n₁ can be taken large. □

Lemmas 4 and 5 indicate that the dynamics of ${(C_{n} (m))}_{n \geq m}$ can be analyzed in the terms of the simple processes ${(Y_{n} (m))}_{n \geq m}$ defined in Lemma 4. To this end, fix $a, b > 0$ and Y_m = y, and define for $n \geq m$ ,

M_{n} = Y_{n} - Y_{m} + \sum_{j = m}^{n - 1} {\frac{b + 1}{j + 1} Y_{j} - \frac{a}{j + 1} (1 - e^{- Y_{j}})},

and

Z_{n} = \frac{Y_{n}}{Y_{m}} \exp {\sum_{j = m}^{n - 1} [\frac{b + 1}{j + 1} - \frac{a}{j + 1} \frac{1 - e^{- Y_{j}}}{Y_{j}}]} .

It follows from a straightforward calculation or alternatively, by invoking Brightwell and Luczak (2012, lemmas 2.1 and 2.2) that ${(M_{n})}_{n \geq m}$ is a martingale and that ${(Z_{n})}_{n \geq m}$ is a supermartingale adapted to the filtration ${(F_{n})}_{n \geq m}$ defined by $F_{n} = σ (Y_{j}, m \leq j \leq n)$ .

Lemma 6.

If $0 < a < b + 1$ , then $\lim_{n \to \infty} Y_{n} = 0$ almost surely.

Proof.

Because ${(Z_{n})}_{n \geq m}$ is a nonnegative supermartingale, ${(Z_{n})}_{n \geq m}$ converges almost surely to some random variable $Z_{\infty}$ satisfying $E Z_{\infty} \leq E Z_{m} = 1 .$ Because $a < b + 1$ , it thus follows from

\begin{array}{l} \exp {\sum_{j = m}^{n - 1} [\frac{b + 1}{j + 1} - \frac{α}{j + 1} \frac{1 - e^{- Y_{j}}}{Y_{j}}]} & \geq \exp {\sum_{j = m}^{n - 1} [\frac{b + 1}{j + 1} - \frac{a}{j + 1}]} = n^{1 + b - a + o (1)} \end{array}

that

Z_{\infty}

can only be finite if

{(Y_{n})}_{n \geq m}

converges to zero. Hence,

\lim_{n \to \infty} Y_{n} = 0

almost surely. □

Let us now focus on the case $a > b + 1$ . Performing a Taylor expansion, we see that

a (1 - e^{- y}) - (1 + b) y = (a e^{- y_{*}} - (1 + b)) (y - y_{*}) - \frac{a}{2} ξ (y) e^{- ξ (y)} {(y - y_{*})}^{2},

(18)

where

ξ (y)

is some value between

y_{*}

and y and

y_{*} = y_{*} (a, b)

is the unique positive solution to

1 - e^{- y} = \frac{1 + β}{α} y .

(19)

Using the Martingale ${(M_{n})}_{n \geq m}$ , we can represent Y_n as

\begin{array}{l} Y_{m} = y > 0 \\ Δ Y_{n} = \frac{a}{n + 1} (1 - e^{- Y_{n}}) - \frac{b + 1}{n + 1} Y_{n} + Δ M_{n}, n \geq m . \end{array}

(20)

In view of (18) and (19), the recursion (20) is a stochastic approximation of the stable point $y_{*} .$ We recall the following well-known result about stochastic approximations.

Lemma 7

(Special Case of Pemantle 2007, lemma 2.6). Suppose that ${(X_{n})}_{n \in N}$ is a stochastic process adapted to a filtration ${(G_{n})}_{n \in N}$ satisfying the recurrence relation

Δ X_{n} = c_{n} (f (X_{n}) + ξ_{n + 1}),

where

{(c_{n})}_{n \in N}

is a deterministic sequence,

{(ξ_{n})}_{n \in N}

is a

{(G_{n})}_{n \in N}

-adapted process, and f is a bounded function satisfying

$\sum_{n \in N} c_{n} = \infty, \sum_{n \in N} c_{n}^{2} < \infty$ ;
$E [ξ_{n + 1} | G_{n}] = 0, E [ξ_{n + 1}^{2} | G_{n}] \leq B < \infty$ for all $n \in N$ ; and
either $f ((l_{0}, r_{0})) \subset (- \infty, - δ)$ or $f ((l_{0}, r_{0})) \subset (δ, \infty)$ for some $δ > 0$ and some open interval (l₀, r₀).

Then, we have for any interval $[l, r] \subset (l_{0}, r_{0})$ that

P (X_{n} visits [l, r] only finitely many times) = 1 .

We now have all of the ingredients to complete the proof of Theorem 2.

Proof of Theorem 2.

Let $Γ_{n} = | C_{n} (m) |, n \geq m$ and $W_{n} = w_{n} (C_{n} (m))$ . We estimate

\frac{α}{n^{β}} \sum_{k = m}^{m + Γ_{n} - 1} k^{β} \leq W_{n} \leq \frac{α}{n^{β}} \sum_{k = n - Γ_{n}}^{n} k^{β}, n \geq m .

Approximating the sums by integrals yields

\frac{α}{β + 1} n {(\frac{Γ_{n}}{n})}^{β + 1} + O (n^{- β}) \leq W_{n} \leq \frac{α}{β + 1} n (1 - {(1 - \frac{Γ_{n}}{n})}^{β + 1}) + O (n^{- β}), n \geq m,

from which we conclude that

\lim_{n \to \infty} \frac{Γ_{n}}{n} = Z

exists if and only if

\lim_{n \to \infty} \frac{W_{n}}{n} = w_{\infty}

exists. □

6.3.1. The Subcritical Case $α < β + 1$ .

By Lemma 5, $W_{n} / n$ is eventually dominated by $Y_{n} (n_{1}, α', β', y)$ for some y if $α < α' < β' + 1 < β + 1$ . Because $\lim_{n \to \infty} Y_{n} = 0$ almost surely by Lemma 6, we conclude that

P (Z = w_{\infty} = 0) = 1 .

6.3.2. The Supercritical Case $α > β + 1$ .

We first employ Lemma 4 to dominate $w_{\infty}$ from below by the random limit

Y_{\infty} = Y_{\infty} (α, β) = \lim_{n \to \infty} Y_{n} (m; α, β, W_{m} / m),

which we show to exist now. Rewriting (20), we obtain

Δ Y_{n} = \frac{1}{n + 1} (α (1 - e^{- Y_{n}}) - (β + 1) Y_{n} + (n + 1) Δ M_{n}), n \geq m .

We wish to apply Lemma 7 with $c_{n} = {(n + 1)}^{- 1}, ξ_{n + 1} = (n + 1) Δ M_{n}$ , and

f (x) = 1_{[0, K]} (x) (α (1 - e^{- x}) - (β + 1) x), x \in R .

Let us first check that the condition on ξ_n is satisfied. Clearly, $E [ξ_{n + 1} | Y_{n}] = 0$ because $Δ M_{n}$ is a martingale difference. To see that $E [ξ_{n + 1}^{2} | Y_{n}]$ is bounded, we simply note that

| Δ M_{n} | \leq | Δ Y_{n} | + (1 + Y_{n}) \frac{α + β + 1}{n + 1} \leq \frac{C}{n + 1}

for some constant C by definition of Y_n and the fact that Y_n is bounded; hence, ξ_n is bounded. Let us now analyze the recursion function f. Note that the truncation of f is necessary because of the assumption of boundedness in Lemma 7, but this is not a problem because clearly,

Y_{n} > 0

for all n,

Y_{n} \leq W_{n} / n

because of the coupling, and

W_{n} / n

is easily seen to be bounded by some deterministic value

K > \frac{α}{β + 1}

(mind that

W_{n} / n \leq \frac{α}{β + 1} + o (1)

). Note that f is continuous on

[0, K]

with f(x) = 0 for

x \in [0, K]

if and only if

x \in {0, y_{*}}

, where

y_{*} = y_{*} (α, β)

is given in (19). Note further that f(x) > 0 if

x \in (0, y_{*})

and f(x) < 0 if

x \in (y_{*}, K)

. Continuity of f now implies that for any sufficiently small

ε_{0} > 0

, there exists a

δ_{0} > 0

such that

f (x) > δ_{0} for all x \in (ε_{0}, y_{*} - ε) and f (x) < - δ_{0} for all x \in (y_{*} + ε_{0}, \frac{α}{β + 1} + ε_{0}),

where we note that

y_{*} < α / (β + 1)

for any choice of

α > β + 1

. By Lemma 7, we deduce that any closed bounded interval not including either zero or

y_{*}

is visited only finitely often, and hence,

{(Y_{n})}_{n \geq m}

converges almost surely to a random variable

Y_{\infty}

with distribution

(1 - p) δ_{0} + p δ_{y_{*}}, for some p \in [0, 1] .

We say ${(Y_{n})}_{n \geq m}$ survives if $Y_{\infty} = y_{*}$ and that it dies out if $Y_{\infty} = 0$ . We are going to show now that p > 0 (i.e., ${(Y_{n})}_{n \geq m}$ survives with positive probability). Observe that we have

Δ E [Y_{n} | F_{m}] = \frac{α}{n + 1} E [1 - e^{- Y_{n}} | F_{m}] - \frac{β + 1}{n + 1} E [Y_{n} | F_{m}], n \geq k .

Note that if ${(Y_{n})}_{n \geq m} dies out$ almost surely, then we have that $\lim_{k \to \infty} \sup_{n \geq k} E [Y_{n} | F_{k}] = 0$ almost surely. Because $E [1 - e^{- Y_{n}} | F_{k}] = E [Y_{n} | F_{k}] + O (E {[Y_{n} | F_{k}]}^{2})$ as $E [Y_{n} | F_{k}]$ vanishes, it follows that for every $ε \in (0, α - β - 1)$ , there exists some K₀ such that

Δ E [Y_{n} | F_{k}] \geq \frac{α - β - 1 - ε}{n + 1} E [Y_{n} | F_{k}], n \geq k \geq K_{0} .

Denoting $ζ = α - β - 1 - ε > 0$ and setting $Y_{k} ≕ a (k) > 0$ , it follows that ${(E [Y_{n} | F_{k}])}_{n \geq k}$ dominates the unique solution of the difference equation

a_{k} = a (k), Δ a_{n} = \frac{ζ}{n + 1} a_{n}, n \geq k,

which satisfies

\lim_{n \to \infty} a_{n} = \infty

for any initial condition a(k) > 0. This is a contradiction to

\lim_{k \to \infty} \sup_{n \geq k} E [Y_{n} | F_{k}] = 0

, and we conclude that

p = P ({(Y_{n})}_{n \geq m} survives) > 0 .

Combining Lemmas 4 and 5, we obtain that

y_{*} (α, β) \leq w_{\infty} \leq y_{*} (α + ε, β - ε)

for all sufficiently small ε, and because the root

y_{*} (a, b)

of (19) depends continuously on a and b, we conclude that

w_{\infty} = \lim_{n \to \infty} W_{n} / n = y_{*} (α, β) > 0

almost surely on the event of survival. It follows that, almost surely on survival,

\lim_{n \to \infty} Γ_{n} / n ≕ γ > 0

. Let q_n denote the probability that a uniformly chosen vertex in G_n is contained in

C_{n} (m)

; then,

\lim_{n \to \infty} q_{n} = γ P (\lim_{n \to \infty} Γ_{n} / n > 0)

, and for any

m \geq n

q_{n} = \frac{1}{n} + \frac{1}{n} \sum_{k = m}^{n - 1} P (k + 1 \in C_{n} (m)) = \frac{1}{n} + \frac{1}{n} E (\sum_{k = m}^{n - 1} E [1 {k + 1 \in C_{k + 1} (m)} | G_{k}]) .

Note that $E [1 {k + 1 \in C_{k + 1} (m)} | G_{k}] = 1 - e^{- W_{k} / k}$ and that for any $ε \in (0, 1)$ ,

\sum_{k = ⌈ ε n ⌉ - 1}^{n - 1} E (1 - e^{- W_{k} / k}) \leq \sum_{k = m}^{n - 1} E (1 - e^{- W_{k} / k}) \leq \sum_{k = [ε n] - 1}^{n - 1} E (1 - e^{- W_{k} / k}) + ε n + 1

if n is sufficiently large. We infer that

γ = (1 + β) α^{- 1} y_{*} (α, β)

, and this concludes the proof of the critical case by establishing a one-to-one relation between

Y_{\infty}

and the random variable

Z = Z (m)

as defined in the theorem.

6.3.3. The Critical Case $α = β + 1$ .

From the supercritical case, we infer that almost surely γ = 0 in this case because γ is a monotone function of α and $\lim_{α ↓ β + 1} y_{*} (α, β) = 0$ .

Remark 3.

The original version of lemma 7 in Pemantle (2007) also allows us to include an error term R_n in the recursion as long as $\sum_{n} c_{n} R_{n} < \infty$ . In principle, one can apply this version of the stochastic approximation result directly to $W_{n} / n$ . We have chosen to include the intermediate step via Y_n because it breaks up the error estimates into several parts and makes the argument easier to follow because ${(Y_{n})}_{n \geq m}$ has a rather simple form.
Note that the subcritical result also follows from the supercritical/critical case by monotonicity, but we gave a standalone argument because the supercritical case is more involved than the direct proof. Unfortunately, this simple argument does not carry over to the critical case.

References

Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512.Google Scholar
Blockchain Luxembourg S.A. (2021a) Average block size—Blockchain. Accessed May 19, 2023, https://blockchain.info/de/charts/avg-block-size.Google Scholar
Blockchain Luxembourg S.A. (2021b) Median confirmation time—Blockchain. Accessed May 19, 2023, https://www.blockchain.com/charts/median-confirmation-time.Google Scholar
Blockchain Luxembourg S.A. (2021c) Transaction rate—Blockchain. Accessed May 19, 2023, https://blockchain.info/de/charts/transactions-per-second.Google Scholar
Bloznelis M, Götze F, Jaworski J (2012) Birth of a strongly connected giant in an inhomogeneous random digraph. J. Appl. Probab. 49(3):601–611.Google Scholar
Bollobás B, Riordan O (2004) The phase transition and connectedness in uniformly grown random graphs. Leonardi S, ed. Algorithms and Models for the Web-Graph (WAW 2004), Lecture Notes in Computer Science, vol. 3243 (Springer, Berlin), 1–18. https://doi.org/10.1007/978-3-540-30216-2_1.Google Scholar
Bollobás B, Janson S, Riordan O (2005) The phase transition in the uniformly grown random graph has infinite order. Random Structures Algorithms 26(1–2):1–36.Google Scholar
Bollobás B, Janson S, Riordan O (2007) The phase transition in inhomogeneous random graphs. Random Structures Algorithms 31(1):3–122.Google Scholar
Bollobás B, Riordan O, Spencer J, Tusnády G (2001) The degree sequence of a scale-free random graph process. Random Structures Algorithms 18(3):279–290.Google Scholar
Brightwell G, Luczak M (2012) Vertices of high degree in the preferential attachment tree. Electronic J. Probab. 17(14):1–43.Google Scholar
Cao J, Olvera-Cravioto M (2020) Connectivity of a general class of inhomogeneous random digraphs. Random Structures Algorithms 56(3):722–774.Google Scholar
Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Google Scholar
Croman K, Decker C, Eyal I, Gencer AE, Juels A, Kosba A, Miller A, et al. (2016) On scaling decentralized blockchains. Clark J, Meiklejohn S, Ryan P, Wallach D, Brenner M, Rohloff K, eds. Financial Cryptography and Data Security. FC 2016, Lecture Notes in Computer Science, vol. 9604 (Springer, Berlin, Heidelberg), 106–125. https://doi.org/10.1007/978-3-662-53357-4_8.Google Scholar
Decker C, Wattenhofer R (2013) Information propagation in the Bitcoin network. IEEE 13th Internat. Conf. Peer-to-Peer Comput. (P2P) (IEEE, Piscataway, NJ), 1–10. https://doi.org/10.1109/P2P.2013.6688704.Google Scholar
Dereich S, Mörters P (2009) Random networks with sublinear preferential attachment: Degree evolutions. Electronic J. Probab. 14(43):1222–1267.Google Scholar
Dereich S, Mörters P (2013) Random networks with sublinear preferential attachment: The giant component. Ann. Probab. 41(1):329–384.Google Scholar
Donet JAD, Perez-Sola C, Herrera-Joancomarti J (2014) The Bitcoin P2P network. Böhme R, Brenner M, Moore T, Smith M, eds. Financial Cryptography and Data Security. FC 2014, Lecture Notes in Computer Science, vol. 8438 (Springer, Berlin, Heidelberg), 87–102. https://doi.org/10.1007/978-3-662-44774-1_7.Google Scholar
Dunford N, Schwartz JT (1988) Linear Operators. Part II (John Wiley & Sons, Inc., New York).Google Scholar
Durrett R, Kesten H (1990) The Critical Parameter for Connectedness of Some Random Graphs. A Tribute to Paul Erdős (Cambridge University Press, Cambridge, UK), 161–176.Google Scholar
Fadhil M, Owenson G, Adda M (2016) A Bitcoin model for evaluation of clustering to improve propagation delay in Bitcoin network. 2016 IEEE Internat. Conf. Comput. Sci. Engrg. (CSE) and IEEE Internat. Conf. Embedded Ubiquitous Comput. (EUC) and 15th Internat. Sympos. Distributed Comput. Appl. Bus. Engrg. (DCABES) (IEEE, Piscataway, NJ), 468–475. https://doi.org/10.1109/CSE-EUC-DCABES.2016.226.Google Scholar
Fadhil M, Owenson G, Adda M (2017) Locality based approach to improve propagation delay on the Bitcoin peer-to-peer network. IFIP/IEEE Sympos. Integrated Network Service Management (IM) (IEEE, Piscataway, NJ), 556–559. https://doi.org/10.23919/INM.2017.7987328.Google Scholar
Feller W (1950) An Introduction to Probability Theory and Its Applications, vol. I (John Wiley & Sons, Inc., New York).Google Scholar
Ferraro P, King C, Shorten R (2020) On the stability of unverified transactions in a DAG-based distributed ledger. IEEE Trans. Automatic Control 65(9):3772–3783.Google Scholar
Gobel J, Krzesinski AE (2017) Increased block size and Bitcoin blockchain dynamics. 27th IEEE Internat. Telecomm. Networks Appl. Conf. (ITNAC) (IEEE, Piscataway, NJ), 1–6. https://doi.org/10.1109/ATNAC.2017.8215367.Google Scholar
Kalikow S, Weiss B (1988) When are random graphs connected. Israel J. Math. 62(3):257–268.Google Scholar
Kusmierz B, Gal A (2018) Probability of being left behind and probability of becoming permanent tip in the Tangle v0.2. Working paper, IOTA Foundation.Google Scholar
Lewenberg Y, Sompolinsky Y, Zohar A (2015) Inclusive block chain protocols. Böhme R, Okamoto T, eds. Financial Cryptography and Data Security. FC 2015, Lecture Notes in Computer Science, vol. 8975 (Springer, Berlin, Heidelberg), 528–547. https://doi.org/10.1007/978-3-662-47854-7_33.Google Scholar
Lyon MR, Mahmoud HM (2020) Trees grown under young-age preferential attachment. J. Appl. Probab. 57(3):9111–927.Google Scholar
Müller S, Penzkofer A, Polyanskii N, Theis J, Sanders W, Moog H (2022) Tangle 2.0 leaderless Nakamoto consensus on the heaviest DAG. IEEE Access 10:105807–105842. https://doi.org/10.1109/ACCESS.2022.3211422.Google Scholar
Nakamoto S (2006) Bitcoin: A peer-to-peer electronic cash system. White paper.Google Scholar
Pappalardo G, Di Matteo T, Caldarelli G, Aste T (2017) Blockchain inefficiency in the Bitcoin peers network. EPJ Data Sci. 7:30. https://doi.org/10.1140/epjds/s13688-018-0159-3.Google Scholar
Pemantle R (2007) A survey of random processes with reinforcement. Probab. Surveys 4:1–79.Google Scholar
Popov S (2016) The Tangle. Working paper, IOTA Foundation.Google Scholar
Popov S, Moog H, Camargo D, Capossele A, Dimitrov V, Gal A, Greve A, et al. (2020) The Coordicide. Working paper, IOTA Foundation.Google Scholar
Shepp LA (1989) Connectedness of certain random graphs. Israel J. Math. 67(1):23–33.Google Scholar
Sompolinsky Y, Wyborski S, Zohar A (2021) Phantom Ghostdag: A scalable generalization of Nakamoto consensus. Proc. 3rd ACM Conf. Adv. Financial Tech. (AFT ’21) (Association for Computing Machinery, New York), 57–70. https://doi.org/10.1145/3479722.3480990.Google Scholar
Sompolinsky Y, Lewenberg Y, Zohar A (2016) Spectre: A fast and scalable cryptocurrency protocol. IACR Cryptology ePrint Archive 1159. Working paper. https://eprint.iacr.org/2016/1159.Google Scholar
van der Hofstad R (2017) Random Graphs and Complex Networks, vol. 1, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
van der Hofstad R (2021) Random Graphs and Complex Networks, vol. 2, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
Wang Q, Yu J, Chen S, Xiang Y (2022) SoK: Diving into DAG-based blockchain systems. Preprint, submitted October 29, https://doi.org/10.48550/arXiv.2012.06128.Google Scholar
Yeow K, Gani A, Ahmad RW, Rodrigues JJ, Ko K (2017) Decentralized consensus for edge-centric Internet of things: A review, taxonomy, and research issues. IEEE Access 6:1513–1524. https://doi.org/10.1109/ACCESS.2017.2779263.Google Scholar
Yin H, Guo D, Wang K, Jiang Z, Lyu Y, Xing J (2018) Hyperconnected network: A decentralized trusted computing and networking paradigm. IEEE Network 32(1):112–117.Google Scholar
Zyskind G, Nathan O, Pentland A (2015) Decentralizing privacy: Using blockchain to protect personal data. 2015 IEEE Security Privacy Workshops (IEEE, Piscataway, NJ), 180–184.Google Scholar

Volume 13, Issue 3

September 2023

Pages 321-397

Article Information

Metrics

Information

Received:March 24, 2022
Accepted:April 12, 2023
Published Online:June 05, 2023

Cite as

Christian Mönch, Amr Rizk (2023) Directed Acyclic Graph-Type Distributed Ledgers via Young-Age Preferential Attachment. Stochastic Systems 13(3):377-397.

https://doi.org/10.1287/stsy.2022.0005

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Directed Acyclic Graph-Type Distributed Ledgers via Young-Age Preferential Attachment

Abstract

1. Motivation and Background

1.1. Introduction

1.2. Related Literature About DLTs

1.2.1. Organization of the Paper.

2. Model and Main Results

2.1. Model Definition

2.2. Key Properties of the YAPA Model

3. Further Related Work and Discussion of Results

3.1. Design of Tip Selection for DAG-Type Distributed Ledgers

3.2. Related Mathematical Results

4. Simulation

5. Implications for DLT Applications

5.1. Tip Selection

5.2. Laziness and Permanent Tips

5.3. Consensus and Time to Confirmation

6. Proofs

6.1. Directed Inhomogeneous Random Graphs and Branching Process Approximation

6.2. Concentration Bounds

6.3. Forward Component Size and Weight Evolution—Proof of Theorem 2

6.3.1. The Subcritical Case $α < β + 1$ .

6.3.2. The Supercritical Case $α > β + 1$ .

6.3.3. The Critical Case $α = β + 1$ .

References

Volume 13, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News

Available Issues

Available Issues

Directed Acyclic Graph-Type Distributed Ledgers via Young-Age Preferential Attachment

Abstract

1. Motivation and Background

1.1. Introduction

1.2. Related Literature About DLTs

1.2.1. Organization of the Paper.

2. Model and Main Results

2.1. Model Definition

2.2. Key Properties of the YAPA Model

3. Further Related Work and Discussion of Results

3.1. Design of Tip Selection for DAG-Type Distributed Ledgers

3.2. Related Mathematical Results

4. Simulation

5. Implications for DLT Applications

5.1. Tip Selection

5.2. Laziness and Permanent Tips

5.3. Consensus and Time to Confirmation

6. Proofs

6.1. Directed Inhomogeneous Random Graphs and Branching Process Approximation

6.2. Concentration Bounds

6.3. Forward Component Size and Weight Evolution—Proof of Theorem 2

6.3.1. The Subcritical Case α<β +1.

6.3.2. The Supercritical Case α > β + 1.

6.3.3. The Critical Case α=β+1.

References

Volume 13, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

6.3.1. The Subcritical Case $α < β + 1$ .

6.3.2. The Supercritical Case $α > β + 1$ .

6.3.3. The Critical Case $α = β + 1$ .