Directed Acyclic Graph-Type Distributed Ledgers via Young-Age Preferential Attachment

Published Online:https://doi.org/10.1287/stsy.2022.0005

Abstract

Distributed ledger technologies provide a mechanism to achieve ordering among transactions that are scattered on multiple participants with no prerequisite trust relations. This mechanism is essentially based on the idea of new transactions referencing older ones in a chain structure. Recently, directed acyclic graph (DAG)-type distributed ledgers that are based on DAGs were proposed to increase the system scalability through sacrificing the total order of transactions. In this paper, we develop a mathematical model to study the process that governs the addition of new transactions to the DAG-type distributed ledger. We propose a simple model for DAG-type distributed ledgers that are obtained from a recursive young-age preferential attachment scheme (i.e., new connections are made preferably to transactions that have not been in the system for very long). We determine the asymptotic degree structure of the resulting graph and show that a forward component of linear size arises if the edge density is chosen sufficiently large in relation to the “young-age preference” that tunes how quickly old transactions become unattractive.

Funding: The research of C. Mönch is supported by the Deutsche Forschungsgemeinschaft [Grant 443916008].

1. Motivation and Background

1.1. Introduction

Distributed ledger technologies (DLTs), which are based on directed acyclic graphs (DAGs) such as the Tangle (Popov 2016) or Coordicide (Popov et al. 2020), generalize blockchain-based DLTs by weakening the ordering among the transaction blocks that constitute the ledger. Blockchain-based DLTs achieve total ordering of the transactions contained by the ledger through the rule of the longest chain as coined in Nakamoto (2006). Here, transactions are collected into blocks that each include a reference to the previous block such that a chain of ordered transactions emerges. When blocks are created concurrently such that only a partial order exists, the rule of the longest chain dictates that blocks that do not belong to the longest chain are discarded. This is a major factor that impairs the scalability of blockchain-based DLTs in terms of the rate of adding transactions to the ledger.

DAG-type distributed ledgers achieve scalability through allowing partially ordered transaction blocks (i.e., transaction blocks may reference more than one previous block). In this paper, we study the relation of design parameters for DAG-type DLTs based on the transient and limiting properties of the underlying transaction graph. We consider the graph of transaction blocks that emerges through block attachment (i.e., new blocks referencing preceding ones). We focus on two design choices for DAG-type DLTs.

  • How many preceding blocks should a new block reference?

  • How should the age of a preceding block impact the reference choice of a new block?

The rationale behind these two choices lies in the formulation of simple, probabilistic block reference rules that guarantee the existence of a giant forward component based on (i) a simple function of the number of preceding blocks to reference and (ii) an age preference parameter. A preference for old or high-degree vertices creates a graph architecture that is locally tree like and features vertices of arbitrary high degree, which differs extremely from the linear structure of blockchain-based DLT. Young-age preferential attachment (YAPA) interpolates between these two structural extremes and generates a sparse, locally tree-like graph in which there are no hubs (i.e., the maximal degree grows only weakly with the system). We provide a simple reference rule that allows us to study the two questions analytically and investigate some transient and asymptotic properties of the transaction block graph, such as the reference degree distribution, degree evolutions, the existence of a giant forward component, and the size of the “detached surface” of vertices in the transaction graph that do not connect to any previous transactions.

1.2. Related Literature About DLTs

Blockchain technology is based on the following fundamental principles. (i) Decentralization. Because of its distributed architecture, a blockchain does not rely on a central server but rather, stores data in a decentralized manner, and it can be updated by any node in the network. (ii) Transparency. Each node (peer) has a local copy of the blockchain, and updates are sent as a broadcast to all nodes. Therefore, any change to the blockchain is transparent to all members of the network. (iii) Open source. The original blockchain protocol introduced with Bitcoin is publicly available and can be used for development. (iv) Autonomy. All peers can update or transmit the stored data. Each node can verify the accuracy of all data received. This eliminates the need for trust between participants. (v) Immutability. Once a transaction block is part of the blockchain, it is stored publicly forever, and its contents (i.e., transactions) cannot be edited without changing the hash value stored in all subsequent blocks. (vi) Anonymity. Data transmission in the blockchain is done via addresses, and therefore, user identities remain hidden.

Current blockchain-based cryptocurrencies are often presented as an alternative to traditional currencies. However, there are questions about the scalability and performance of distributed ledgers compared with established payment systems (Croman et al. 2016, Pappalardo et al. 2017). Currently, the Bitcoin network is only capable of processing very few transactions per second (Blockchain Luxembourg S.A. 2021c), and it takes up to 20 minutes for a newly posted transaction to be processed (Blockchain Luxembourg S.A. 2021b). Established payment processors, on the other hand, process up to four orders more transactions per second, and new transactions are usually processed within a few seconds. A significant influence on the performance of the globally distributed peer-to-peer Bitcoin system (Donet et al. 2014) is the propagation speed of the blocks in the network (Decker and Wattenhofer 2013). For example, for a 1-MB block, which is roughly the average block size (Blockchain Luxembourg S.A. 2021a), it takes about 2.4 minutes (Croman et al. 2016) for 90% of the nodes to receive this block. With the block interval of 15 minutes used by the Bitcoin network, this poses a significant problem, which increasingly leads to the splitting of the blockchain—the so-called blockchain forks (Decker and Wattenhofer 2013). The emergence of such forks significantly decreases the performance of the distributed ledger system.

Although approaches already exist to optimize the propagation speed of information in the distributed ledger network (Fadhil et al. 2016, 2017), these also do not reach the performance of current financial transaction systems. In addition to optimizing the propagation speed, other improvements arise in the communication protocol used (Croman et al. 2016, Gobel and Krzesinski 2017). One way to optimize the current Bitcoin protocol in terms of transmission speed is to minimize unnecessary transmissions by adjusting the sequence of exchanged messages needed to transmit new blocks. Instead of first publishing the availability of information, blocks that do not exceed a certain threshold in terms of size can thus be distributed directly to neighboring nodes. This can increase the effective transmission speed especially for small blocks, where the protocol overhead has a particularly large impact. Decker and Wattenhofer (2013) have shown that such protocol adaptations can reduce blockchain forks by about 50%.

To increase scalability and transaction throughput, nonlinear graph-based distributed ledgers (Lewenberg et al. 2015, Popov 2016, Sompolinsky et al. 2016, Yeow et al. 2017, Sompolinsky et al. 2021) replace the traditional blockchain with a global DAG that contains all recorded transactions. In this graph, the nodes are transaction blocks (consisting of one or more transactions), whereas the edges indicate the validation of previous blocks/transactions. Each added node should validate k1 transactions, where k can be fixed or random. A key advantage of DAG-based systems is that by deviating from the linear longest blockchain rule, scalability can be greatly improved. The longest blockchain rule essentially requires all nodes to know about newly created blocks quickly, thus limiting the creation of blocks to allow for the propagation of this knowledge before a new block is created. Instead of trusting the longest-chain rule as given in blockchain-distributed ledgers, DAG-based distributed ledgers rely on metrics, such as the largest strongly connected cluster of transaction nodes in the transaction graph, to determine the trusted transactions (Sompolinsky et al. 2021).

Beyond financial transactions, there exist many application areas for distributed ledgers; for example, in the context of distributed and secure storage of personal data (Zyskind et al. 2015) or for asynchronous messaging services in the context of distributed computing (Yin et al. 2018). Particularly, the former use case is of high relevance because of increasingly frequent incidents in which personal data are publicly accessible because of security breaches or data misuse. Here, the use of distributed ledgers as a system for distributed access control offers a potential solution. Instead of storing data centrally and granting direct access, queries are mapped as well as sharing information via transactions. Combined with off-chain storage of sensitive data, it is thus possible to develop a distributed ledgers system in which each user retains full control over their own data. In addition to increased security, this has the advantage that laws and other regulations can be implemented directly in the protocol used in the form of policies (Zyskind et al. 2015).

1.2.1. Organization of the Paper.

In the following section, we introduce the model formally and present our mathematical main results. Section 3 contains further discussion of how our results relate to previous work and of the practical implication of our results for DLT applications. Section 5 contains numerical simulations. Finally, detailed proofs of our main results are provided in Section 6.

2. Model and Main Results

2.1. Model Definition

The model we propose is based on young-age preferential attachment. We study a recursive growth parametrized by two numbers, the edge density parameter α>0 and the reinforcement bias β>0. We use the shorthand notations [n] for {1,,n} and xy,xy to denote min(x,y) and max(x,y), respectively. At step n = 1, the graph G1 is initialized with a vertex labeled one. At any step n > 1, one vertex labeled n is added to the system, and Gn is obtained recursively from Gn1 according to the following attachment rule. For each m[n1], n sends an arc to m independently of everything else with probability

αn1(mn1)β1.(1)

We formally write Gn=([n],En) with En[n]×[n] for the graph obtained upon adding the nth vertex. If n sends an arc to m[n1], we denote this event by nm. The notation nm is shorthand for the existence of a directed path from n to m[n1]. Note that the direction of the arcs along any path can be recovered from the labels of the vertices that belong to the path (Figure 1).

Figure 1. A Depiction of Sample Realizations of the Young-Age Preferential Attachment Model with Reinforcement Bias β = 2 and Edge Density α{3,3.5,4}
Notes. The node size and color correspond to the node indegree. (a) α = 3. (b) α=3.5. (c) α = 4.

2.2. Key Properties of the YAPA Model

Ignoring the dynamical aspect of our model, it can be viewed as an instance of the directed inhomogeneous random graph model proposed in Cao and Olvera-Cravioto (2020), which allows us to infer some basic properties of the network from known results about this model class. We first establish that the young-age preferential attachment model produces a sparse graph (i.e., the number of arcs is of the same order as the number of vertices).

Proposition 1

(Sparsity). It holds that

limN|EN|N=αβ+1 in L1 and almost surely.

Next, we determine the distributional limit of the vertex degrees in Gn. For any vertex m[n], we denote by Dn(m)=(Dnin(m),Dnout(m)) its degree (vector) in Gn: that is,

Dnin(m)=|{k:km in Gn}| and Dnout(m)=|{k:km in Gn}|.

Note that the outdegree of a vertex is determined once and for all upon its insertion (i.e., Dnout(m)=Dmout(m)Dout(m) for all nm). As can be guessed from the fact that connections occur independently and with probability of order 1/n, the asymptotic degree distribution is bivariate Poisson.

Theorem 1

(Limiting Degree Distribution and Rate of Convergence). Let GN denote the young-age preferential attachment graph on N vertices. For U[N] uniformly chosen among the vertices of GN, the following asymptotic results hold:

DN(U)=(DNin(U),Dout(U))(Zin,Zout) in distribution and in L1 as N.

Here, Zin and Zout are independent, and Zout is Poisson(α/(β+1)) distributed. The limiting indegree Zin has distribution mixed-Poisson(Λ) with mixing random variable Λ=α(1Vβ)/β, where V is Uniform[0,1]. Moreover, we have that

dTV(Pin/out(N),pin/out)=O((log N)3N),almost surely as N,(2)
where Pin/out(N)=(Pkin/out(N))k=0 denotes the empirical in- or outdegree distribution of GN, pin/out=(pkin/out)k=0 denotes the distribution of Zin/out, and dTV(·,·) denotes total variation distance.

Remark 1.

The form of the limiting degree distribution in Theorem 1 is obtained rather directly by applying the machinery of Cao and Olvera-Cravioto (2020), but the error bound (2) is new and does not follow from previous results about inhomogeneous random graphs. Inspection of the proof further yields that an error of the same order can be achieved for any graph on N vertices in which edges are drawn independently with probability at most c/N for some universal c (cf. Lemma 3). We do not believe that the logarithmic term is asymptotically sharp, but the polynomial order 1/N is consistent with known results for other sparse models with independent edges, such as the Erdös–Rényi graph (cf. van der Hofstad 2017, theorem 5.12).

It follows from Theorem 1 that whereas the outdegree is asymptotically independent of the time at which a node enters the system, the indegree is not. Essentially, the limiting indegree distribution of a given node is also Poisson(α/(β+1)), but the time needed to collect all incoming links grows proportionally with network size. A node born at the beginning corresponds to V near zero, and a node born at a time close to the total age of the system corresponds to V close to one.

The indegree dynamics of fixed vertices are further characterized by the following bounds.

Proposition 2

(Degree Evolutions). The maximal (total) degree of any vertex in GN is almost surely O(log N). We further have that

  1. for any nm1,

    P[Dnin(m)>x]exp[αβx(log x+logβαβα)],x>α/β;

  2. E[Dout(m)]=αβ+1+O(1/m) for any m1.

Let us now discuss the connectivity of the YAPA network. The forward component of m[n1] after the arrival of vertex n, 1mn is given by

Cn(m)={k:km in Gn}{m},
and for the distributed ledger application discussed in Section 1, it is desirable that the network be rooted (i.e., Cn(1)=[n]). Because of the stochastic nature of the attachment mechanism, this is typically not the case but can formally be achieved by augmenting the network by some root vertex ρ that deterministically connects to precisely those vertices that have outdegree 0. Of course, the proportion of vertices that fail to connect to any existing vertices should be rather small, which is ensured by the light tails of the limiting Poisson distribution.

Corollary 1

(Detached Surface). We have that

limN|{m{2,,N}:Dout(m)=0}|N=eα/(β+1) almost surely and in L1.

Although there is still a positive fraction of nodes with no left neighbors, this fraction is not very large; for instance, if we would like to make sure that with probability at least 0.995, a typical vertex sends an arc to at least one vertex, it suffices to choose α>log(200)(β+1)5.3(β+1).

Next, we would like to ensure that the vast majority of vertices that do connect upon arrival in the system are contained in the forward component of only a handful of vertices. This can only be achieved if there exist vertices m such that CN(m) grows linearly with N. Our first result concerning such macroscopic components states that the forward component of a typical vertex is not of linear order.

Proposition 3

(No Giant Forward Components for Typical Vertices). Let U[N] be uniformly chosen; then, for any ε>0, we have

limNP(|CN(U)|εN)=0.

Remark 2.

As for Theorem 1, the neighborhood around a typically chosen vertex can be described by the local limit approach used in Cao and Olvera-Cravioto (2020). It is not hard to see that the weak local limit of CN(U) is in fact the family tree obtained from the multitype Galton–Watson process given by the following recursion.

  • Sample V[0,1] uniformly. Generation 0 of the tree comprises the root, which has type V.

  • For any k1, given generation k − 1 of the tree, each member m in generation k − 1 of type Vm independently produces a Poisson(α(1Vmβ)/β)-distributed number of offspring to obtain the kth generation, and each of the offspring thus produced is independently assigned a type drawn uniformly from [Vm,1].

The proof of Proposition 3 in Section 6 shows that the local limit tree dies out for any choice of α,β>0. Contrary to the local behavior near a typical vertex, if α>β+1, most vertices indeed belong to the forward components of a few old vertices. More precisely, let n0=1/α; then, the vertices 1,,n0 are always sequentially connected by construction, and we thus focus on the forward components of vertices added after n0.

Theorem 2

(Existence of Giant Forward Components). Let m>n0 be given. Then, almost surely,

limN|CN(m)|N={0,if αβ+1,Z(m),if α>β+1,
where Z(m) is a nontrivial random variable that is concentrated on the two solutions 0 and γ>0 of the fixed point equation
x=1eαβ+1x,x[0,).

Theorem 2 is the main result of this article. The apparent differences in the characterizations obtained in Proposition 3 and Theorem 2 are because of the reducibility of the graph. Proposition 3 is concerned with the forward component of a typical (i.e., uniformly chosen) vertex. Such a typical vertex has a birth time of order N and can simply not collect enough direct and indirect references up to time N to develop a large forward component. In particular, applying the machinery of Cao and Olvera-Cravioto (2020) (after some adaptations to account for irreducibility) yields that typical vertices do not have a giant forward component; see the proof of Proposition 3.

In contrast, Theorem 2 states that if an early vertex is kept fixed and the time horizon is extended to infinity, with positive probability, the selected vertex will be the root of a forward component of linear size eventually. We stress that it is not possible to derive statements of this type directly using the general theory of Cao and Olvera-Cravioto (2020), which uses the local weak limit of the graph (i.e., limit distributions of neighborhoods of typical vertices) and therefore, does not provide sufficient information about vertices with index n=o(N).

3. Further Related Work and Discussion of Results

3.1. Design of Tip Selection for DAG-Type Distributed Ledgers

DAG-type distributed ledgers, such as IOTA (Popov 2016), utilize different algorithms for achieving consensus. State-of-the-art DAG-type DLTs repurpose the act of selecting transactions for validation (i.e., constructing arcs from an arriving node to previous ones for achieving consensus). In a nutshell, the tip selection algorithm in Popov (2016) is based on a biased random walk, and a transaction that is directly or indirectly validated by a large portion of the subsequent transactions is highly probable to be considered valid. Precisely, a tip on the surface of the DAG is selected based on a random walk from genesis to the surface. This random walk is biased in the sense that nodes possess a weight that resembles the number of subsequent nodes that are directly or indirectly attached to this node and that the jump probability of the random walk from a node i with weight wi to a node j with weight wj is exp(η(wiwj)) with a given constant η>0. For finite time, the value of the constant α may play a role to adjust the tip selection to mimic a uniformly random tip selection or to quickly concentrate the tip selection on a set of young nodes. However, this algorithm is different from our YAPA model as it concentrates the tip selection on certain walks where the node weights are large.

Further works, such as Kusmierz and Gal (2018) and Ferraro et al. (2020), analyze different properties of the DAG-type distributed ledgers, especially the IOTA DAG, which is denoted the Tangle. Kusmierz and Gal (2018) derive approximations of the probabilities that a node is left behind or that a node becomes a permanent tip. Specifically, the first approximation of the probability that a node is left behind is similar to the detached surface evolution in Corollary 1. Ferraro et al. (2020) provide a modified IOTA tip selection algorithm that is based on a combination of two Markov chain Monte Carlo algorithms with different parameters that aims to have all nodes validated. They prove this property using a fluid limit in the arrival rate of transactions.

In Coordicide (Popov et al. 2020), which is a follow-up work on IOTA constituting a modern DAG-type DLT, tip selection is fixed to validating two existing transactions. Here, the authors dispense with tip selection as a consensus mechanism and rather, consider it from a performance point of view (i.e., analyzing the impact of the tip selection on the DAG properties, especially in discouraging “lazy” behavior, which resembles the recurring validation of a fixed set of old transactions).

We consider this work in the same vein (i.e., our YAPA model provides a relationship between the parameters of the tip selection algorithm (that is, the edge density α and the reinforcement bias β) and the DAG-distributed ledger properties). We establish the YAPA model in (1) as a rule to encourage the validation of young transactions. From the calculation of the asymptotic distribution of the outdegree Zout (which turns out to be Poisson(α/(β+1))) as well as the nonasymptotic expected outdegree at a given time point E[Dkout], we know the expected number of transactions that are validated given parameters α, β. Further, the reinforcement bias β>0 is understood as a forgetting factor, precisely a parameter to set the preference for validating young transactions (i.e., larger β>0 leads to a stronger preference of younger transactions over older ones). Hence, by choosing α, β while ensuring α>β+1, the properties of the distributed ledger can be controlled. As is reflected in our results, we study the following properties: (i) the emergence of “giant” components (i.e., a linear increase of the number of direct and indirect validations that a transaction receives over time), (ii) the average number of direct validations received by a transaction over time, and (iii) the number of orphan transactions.

3.2. Related Mathematical Results

Our model is similar in spirit to the one introduced in Lyon and Mahmoud (2020) but with two important differences. First, their model produces a tree, whereas our model produces a directed acyclic graph that is typically not a tree. Second, we modulate the attachment via the parameters α and β to obtain a whole range of instances. This flexibility allows us to tune the model to specific applications and also allows us to observe the connectivity phase transition of Theorem 2. The young-age preferential attachment tree of Lyon and Mahmoud (2020) corresponds to running our model with β = 1 and conditioning on the creation of precisely one arc per time step. As explained in Section 2, our model can be analyzed in the general framework of Cao and Olvera-Cravioto (2020), who adapted the very general framework of inhomogeneous random graphs developed in Bollobás et al. (2007) to directed graphs, thereby generalizing earlier results of Bloznelis et al. (2012). We use some techniques from the (directed) inhomogeneous random graph world to obtain the degree sequence and analyze the emergence of a giant connected component for our model in Proposition 3. However, because κ(·,·) is not reducible and we investigate connected components in terms of directed paths, our main result in Theorem 2 cannot be obtained by a direct application of the results in Bloznelis et al. (2012) and Cao and Olvera-Cravioto (2020). We also emphasize the dynamic nature of our model, which is not captured by the inhomogeneous random graph framework. Furthermore, if we were to ignore the direction of the arcs and consider GN as an undirected graph, then it is straightforward to deduce that the limit limN|CN(U)|/N in Proposition 3 would be positive whenever α>β+1 (cf. Bollobás et al. 2007).

The limiting case β = 0 of the YAPA model corresponds to a directed version of the uniformly grown random graph investigated in Bollobás and Riordan (2004) and Bollobás et al. (2005); this is a very natural recursive model that was first studied by Dubins in the 1980 s (cf. Kalikow and Weiss 1988, Shepp 1989, Durrett and Kesten 1990). Although we exclusively consider positive values of β, choosing β<0 would lead to a preference for old-age vertices and produce an effect similar to degree-based preferential attachment models, such as the Barabási–Albert model (Barabási and Albert 1999). Models of this type have been investigated thoroughly in the last 20 years (see, e.g., Bollobás et al. 2001; Dereich and Mörters 2009, 2013; and the references therein). Old-age preferential attachment models produce graphs with entirely different features, such as a scale-free degree distribution, whereas the degrees in the YAPA model are fairly sharply concentrated around their expectation as is exemplified by Proposition 2.

4. Simulation

In the following, we describe evaluation results that are based on simulating the model with attachment probabilities given by (1). Transaction blocks arrive according to a Poisson process with rate λ = 1. We fix the reinforcement bias β = 2 and variate the edge density parameter α. Figure 2 shows the evolution of the connected component size starting at the genesis node (transaction) for the first 105 transactions for different α. Note the linear growth rate of the connected component for α>β+1. The figure shows the empirical average in addition to the standard deviation (which is depicted as the shaded area) from 20 independent simulation runs. Note that the standard deviation for the subcritical regime α = 2 is large because there is no natural scaling (in N) for the component size.

Figure 2. Evolution of the Connected Component Size (Starting at the Genesis Node) for Reinforcement Bias β = 2 and Variable Edge Density Parameter α{2,3,4}
Notes. As an initial condition, we assume that the first 10 nodes reference the genesis node. (a) α = 2. (b) α = 3. (c) α = 4. (d) Variable α.

Figure 3 shows the error |CN(m)|/NZ(m) as calculated from Theorem 2 for α=4,β=2,m=10. The figure shows clearly the convergence of the ratio |CN(m)|/N to Z(m) with a growing number of vertices N.

Figure 3. Numerical Verification of Theorem 2 for α=4,β=2,m=10
Note. The error |CN(m)|/NZ(m) diminishes with an increasing number of vertices N.

Figure 4 shows the properties of the shortest paths to the genesis node for various values of the reinforcement bias β and the edge density parameter α. In light of the impact of the tip selection algorithm parameterization through α and β, the figure reflects the shape of the transaction DAG (i.e., its depth and width in terms of the length of the shortest paths to genesis). In comparison, note that the extreme cases of a DAG of depth 1 or a totally ordered chain produce sum lengths of the shortest paths of size n and n2/2, respectively. The figure also shows markers for the values of (α,β) that correspond to a mean outdegree E[Zout]=2, which is used in the context of DAG-type DLT (Popov 2016, Popov et al. 2020) as a deterministic guideline for the number of validated nodes per new node. Although the fraction of nodes not connected to genesis remains constant as given by the detached surface approximation in (5), the shape of the DAG DLT can be controlled by the choice of α, β. Note that a similar argument can be made in terms of the shortest paths to some arbitrary genesis cluster of nodes.

Figure 4. Properties of the Shortest Paths to Genesis for α{4,5,6},β{1,,3}
Notes. The width and depth of the constructed graphs show a strong dependency on β. The figures are obtained from 102 simulation runs each, including n=104 transactions. The black markers correspond to a mean outdegree E[Zout]=2. (a) Cumulative length of the shortest paths to genesis in hops. (b) Mean length of the shortest paths to genesis in hops.

5. Implications for DLT Applications

Our work is motivated by questions on the design choices of DAG-based DLT applications, in particular on how the analyzed YAPA referencing scheme that relates newly generated blocks to previously generated ones affects the DLT properties. In the following, we discuss the implications of the obtained results for some design aspects of DLT applications.

5.1. Tip Selection

Tip selection is often convolved with consensus as in Popov (2016) and Müller et al. (2022). In Popov et al. (2020), tip selection is considered separately and set to referencing a fixed number of previous blocks. When tip selection is convolved with consensus, it is usually based on some aggregate weight of the blocks that must be tracked (Popov 2016, Müller et al. 2022). Some works also propose uniform random tip selection on a subset of (nonconflicting) blocks, such as Müller et al. (2022).

The first implication of the YAPA model definition is that users that generate blocks do not have to track or calculate block weights according to more or less complex and time-consuming algorithms. The configuration model is entirely determined by the YAPA parameters α, β. In turn, these determine the required storage and computation capacity for reference operations (i.e., the number of references (hashes) that are required on average as in Proposition 1). Note that each reference is usually tied to one hash operation as well as a linear growth of storage space. This stands in contrast to the deterministic number of references as well as the weight-based referencing methods.

5.2. Laziness and Permanent Tips

The reviewed state of the art makes great effort to avoid lazy behavior (i.e., preventing users from referencing a fixed set of old blocks). The temptation to do this is based on the reuse of computed hashes in old blocks in referencing for new blocks. The YAPA configuration readily dispenses with this problem for α,β>0. On the other hand, the risk of a block becoming a permanent tip is shown for our model in Corollary 1. This can be considered the cost of using the simple YAPA configuration model as the detached surface contains transaction blocks that are never verified.

5.3. Consensus and Time to Confirmation

The results of this work emphasize that consensus in a DAG-based DLT that uses the proposed YAPA configuration model can be reached. Theorem 2 shows that a consensus method based on the so-called heaviest weighted subchain (Wang et al. 2022) or the so-called heaviest sub-DAG (Müller et al. 2022) is viable under a proper YAPA configuration of α>β+1. The simulation results validate this behavior for a range of realistic parameters (i.e., average number of references/hashes of α{2,3,4} and β = 2). Further, the simulation results provide insights in the form if the DLT is controlled by the YAPA configuration α, β. Related works, such as Popov (2016) and Müller et al. (2022), assume a constant Tangle width and that the numbers of tips of the Tangle are stationary with a given mean. Note that in Müller et al. (2022), this is justified by a number of chosen tips equal to one. Our simulations show, on the other hand, that the shape of the Tangle can be controlled by adjusting α and β at a fixed average outdegree (i.e., number of referenced previous blocks). In turn, the shape of the Tangle implicates the causal relations of transaction blocks (i.e., for a given number of blocks, N having a few long chains of blocks (approaching line topologies) with crossreferences is semantically different from having many short chains of blocks (approaching a star topology)). The former form distributes the cumulative weight required for consensus, whereas the latter concentrates this weight on a few blocks. Nevertheless, the result in Theorem 2 shows that consensus can be achieved after long-enough time.

Time to confirmation is a performance metric for the consensus protocol of the DLT. As discussed, the YAPA configuration model for a DLT provides two guarantees. First, the confirmation time in terms of the rate of convergence of the distribution of the number of direct block references received is bounded in Theorem 1. This stands in contrast to heuristics to obtain estimates of the time to confirmation as, for example, in Popov (2016) and Müller et al. (2022), where some difficulty stems from the intractability of the referencing rule.

6. Proofs

The proofs of our main results are divided into three sections. The first section contains the derivation of results that can be obtained in the general framework of Cao and Olvera-Cravioto (2020). The second section provides concentration bounds for degree properties, which are needed in the last section or are interesting in their own right, and the final section contains the proof of our main result, Theorem 2.

6.1. Directed Inhomogeneous Random Graphs and Branching Process Approximation

We call

wn1(m)=α(mn1)β,1m+1n,
the weight of vertex m upon the arrival of vertex n. An equivalent way of characterizing our model is to fix NN and rewrite the connection probabilities using the kernel κ:[0,1]2[0,) given by
κ(x,y)=αxβy(β+1)1{x<y}.(3)

Then,

P(nm in GN)=(κ(m/N,n/N)N(1+1n1)β+1)1,1m<n<N.

Setting also

ϕ(mN,nN)=(1+1n1)β+11O(1/n),
the pair (κ,ϕ) parametrizes an instance of a directed inhomogeneous random graph in the sense of Cao and Olvera-Cravioto (2020). We can thus readily apply several results of Cao and Olvera-Cravioto (2020), by which we immediately establish sparsity of GN and the form of the limiting degree distribution in the next subsection. Note that, however, κ is clearly a reducible kernel because it has no mass on the lower diagonal. Therefore, the connectivity results for irreducible kernels established in Cao and Olvera-Cravioto (2020) do not apply.

It is straightforward to verify the regularity assumptions listed in Cao and Olvera-Cravioto (2020, assumption 3.1) for our model using κ and ϕ as defined and interpreting [0,1] together with the Lebesgue measure as the ground space. Proposition 1 is now a special case of the corresponding statement in Cao and Olvera-Cravioto (2020).

Proof of Proposition 1.

By Cao and Olvera-Cravioto (2020, proposition 3.3), we have

|EN|N0101κ(x,y)dxdy=αβ+1in L1.

Each

Xn=|En+1||En|=m=1n1{n+1m},nN,
is a sum of independent Bernoulli random variables with variance
σn2m=1nwn(m)n[(1wn(m)n)0],nN.

In particular, because wn(m)α for all m, n, we have

n=1σn2n2n=1αn2<,
and Kolmogorov’s Strong Law of Large Numbers (see, e.g., Feller 1950, theorem 8.3) yields almost sure convergence. □

Proof of Limiting Distribution in Theorem 1.

The limiting distribution described in Theorem 1 is obtained as a special case of the statement of Cao and Olvera-Cravioto (2020, theorem 3.4) when applied to the kernel κ given in (3). □

Proof of Corollary 1.

The detached surface

D(n)={1<kn:Dout(k)=0},n>2,
comprises all vertices that have arrived and do not attach to any previous vertices. We let Δn=|D(n)| denote the size of the detached surface at time n. L1 convergence of Δn/n to eαβ+1 is a direct consequence of Theorem 1 because E[Δn/n]+1/n is precisely the probability that a uniformly chosen vertex has outdegree 0. Alternatively, we can calculate the asymptotic density of Δn directly. The probability that the kth arriving vertex does not attach to any previous vertex is
pk=i=1k1(1pk(i)),k>1.(4)

Using that pk(i)αk1, we obtain that log(1pk(i))=pk(i)+O(1/k2) and thus,

limkpk=limkei=1k1pk(i)=eαβ+1.(5)

The outdegrees of different vertices are independent, and therefore, Δn is a sum of independent Bernoulli(pk)-distributed random variables. The corresponding variances pk(1pk) are uniformly bounded by one; thus, another invocation of Kolmogorov’s Strong Law of Large Numbers yields almost sure convergence of Δn/n to eα/(β+1). □

Proof of Proposition 3.

In Cao and Olvera-Cravioto (2020, section 4.3.1), it is shown that the forward exploration started in a uniform vertex can be coupled to a multitype branching process with family tree T. It is straightforward to deduce from the form of κ that this branching process is the Galton–Watson process described in Remark 2. It is well known and not hard to prove that if |T|< almost surely, then we have P(CN(U)>εN)0 as N for any ε>0, and additionally, see, for example, van der Hofstad (2021, corollary 2.27). The survival probability ζ=P(|T|=) of the Galton–Watson process can be characterized via the Volterra-type operator T=T(κ) on L2[0,1] given by

Tf(x)=αxβx1yβ1f(y)dy,x(0,1),
and see Bollobás et al. (2007) and Cao and Olvera-Cravioto (2020) for a general account of the connection between T and T. Note that T is an unbounded operator (because of the divergence of yβ1 at zero), which makes it necessary to use the approximating family of operators {Tδ,δ>0} given by
Tδf(x)=αxβxδ1yβ1f(y)dy,x(0,1),δ>0.

Our goal is to show that for any choice of δ>0, we have that ζδ, and hence, ζ = 0. Let Ut denote the type of vertex tT, and let U0 denote the type of the root; then,

P(|T|=)δ+P(|T|=,U0>δ).

Note that {U0>δ}={Ut>δtT} because types must increase along any genealogical path in T. On the event {Ut>δtT}, the evolution of the branching process is represented by the operator Tδ. More precisely, we have that Tδk1(x),k=1,2,, where 1(x)1 denotes constant unit function and describes the expected number of offspring of type dx in the kth generation Tk of T; hence,

E(|Tk||U0>δ)=11δδ1Tδk1(x),dx.(6)

We assume for the moment that Tδ is quasinilpotent for any δ>0 (i.e., limkTδk=0, where · denotes the operator norm on L2[δ,1]). In particular, this implies that E(|Tk||U0>δ) vanishes as k, and hence,

P(|T|=|U0>δ)=limkP(|Tk|1|U0>δ)limkE(|Tk||U0>δ)=0.

Thus, it only remains to show that for any fixed δ>0,Tδ is quasinilpotent. Because yβ1 is bounded on [δ,1], it is straightforward to verify that Tδ is a compact operator on L2[δ,1]; hence, Tδ is quasinilpotent precisely if zero is its only eigenvalue (see, e.g., Dunford and Schwartz 1988, section VII 5.13). Assume for contradiction that Tδ has the eigenvalue λ0: that is, that there is some fL2[δ,1]{0} satisfying for almost every x[δ,1]

λf(x)=Tδf(x)=αxβx1yβ1f(y)dy.(7)

Because f is integrable, the right-hand side is a continuous function of x by dominated convergence, and we may thus assume that f has a continuous version g that satisfies (7) for all x(δ,1). Because g is continuous, it follows from Lebesgue’s differentiation theorem that d/dxx1y1βg(y)dy exists and is equal to xβ1g(x) for every x(δ,1). It follows that Tδg is differentiable and thus, that g=λ1Tδg is differentiable (even smooth) on (δ,1). In fact, we have, for x(δ,1),

λg(x)=(Tg)(x)=αββ1x1yβ1g(y)dyαx1g(x)=βxTg(x)αxg(x)=λβαxg(x).

Furthermore, we have

limx1g(x)=αλlimx1x1y1βg(y)dy=0.

We conclude that g is a solution to the final value problem

u(1)=1,u(x)=cx1u(x),with c=βαλ
on (δ,1]. The Picard–Lindelöf theorem now yields that g is unique. If λ=αβ, then g vanishes and cannot be an eigenfunction, thus contradicting our initial assumption. If λαβ, then we have g(x)=xβαλ1,x(δ,1), implying that, for all x(δ,1),
Tg(x)=αxβx1y1αλdyαxβx1yβ1dy=λxβαλλxβ+αβxβαβ=λg(x)+(λλxβ+αβxβαβ).

Because g is an eigenfunction for λ, the term in parentheses must vanish (i.e., λ=αβ, which is a contradiction). We conclude that Tδ has only the eigenvalue 0 and is thus, quasinilpotent. □

6.2. Concentration Bounds

We begin by stating a straightforward concentration bound on the maximal degree in GN, which applies not only to our model but also, to any directed inhomogeneous random graph in which κ and ϕ are bounded.

Lemma 1.

Let GN denote a random graph in which each arc (m, n) is included independently with probability pmn(N). Suppose that pmn(N)A/N uniformly in m, n for all sufficiently large N and some A<; then, the maximal total degree ΔN in GN satisfies

P(ΔN>2A+λ)Neλ24A+2λ/3(8)
for any N. In particular, it follows that almost surely
ΔNO(log N).

Proof.

Let Dn denote the total degree of nV(GN). By assumption, Dn is dominated by a Bin(2N,A/N) random variable; hence,

P(Dn2A+λ)eλ24A+2λ3,n=1,,N,
using a classical concentration inequality, such as Chung and Lu (2006, theorem 3.2). Summing over n yields (8), and we note that choosing λ=λ(N)=C log N implies that
Neλ24A+2λ/3<,
if C is sufficiently large, which allows us to conclude the almost sure bound on ΔN. □

Proof of Proposition 2.

The statement about the maximal degree is an immediate consequence of Lemma 1. Let us now establish the stated bounds on the degree evolutions. The evolution of the indegree Dnin(m) of a fixed vertex m after the arrival of vertex k > m can be modeled using a counting process that jumps by one upon the arrival of the jth vertex after k if this vertex attaches to m. Hence, jumps occur with probability

pk+j(m)=wk+j(m)k+j11,jN0,(9)
for m and k > m fixed. The indegree of vertex m upon the arrival of vertex n is hence given by a generalized binomial distribution and satisfies
E[Dnin(m)]=j=m+1npj(m).(10)

In particular, we have

E[Dnin(m)]αβ[1(mn1)β],1m<n,
and
E[Dnin(m)]αβ[(11m+1)β(m+1n)β],αm<n.

The moment-generating function of Dnin(m) is given as

E[eθDnin(m)]=j=m+1n(1+pj(m)(eθ1)),θ>0.(11)

A bound on the tail of the indegree distribution can now be derived using Markov’s inequality

P[Dnin(m)>x]infθ>0eθxj=m+1n(1+pj(m)(eθ1)).

We obtain

eθxj=m+1n(1+pj(m)(eθ1))exp(θx+(eθ1)j=m+1npj(m))exp(θx+(eθ1)αβ),
and the right-hand side is minimized by θ=logxβα. Thus, for any n>m1, we have the uniform upper tail bound
P[Dnin(m)>x]exp[αβx(log x+logβαβα)],x>α/β,
which establishes part (i). Turning to part (ii), we recall that the probability that k attaches itself to i[k1] is given by (9). The expected outdegree is hence given by
E[Dout(k)]=i=1k1pk(i)=αβ+1+O(1/k),
where the error term is simply because of approximating the sum by an integral. □

Proof of the Error Bound in Theorem 1.

We only provided the detailed calculation for the indegree distribution; a similar but simpler argument yields the same bound for the outdegree distribution. Let us set, for each nN,

Pn(k)=1nm=1n1{Dnin(m)=k},k=0,1,2,,
and denote the target distribution by (p(k))k=0. We have
p(k)=01eαβ(1sβ)(αβ(1sβ))k1k!ds,k=0,1,2,,
and applying the triangle inequality,
k=0|Pn(k)p(k)|k=0|Pn(k)EPn(k)|+k=0|EPn(k)p(k)|,nN.(12)

Note that the first error is probabilistic and that the second error is essentially just a discretization error because

EPn(k)=1nm=1nej=m+1n(αmβ(j1)β+1)1(j=m+1n(αmβ(j1)β+1)1)kk!,kn,(13)
which can be obtained, for example, by differentiating (11). We calculate the bound for each error in the following two lemmas.

Lemma 2.

We have that

k=0|EPn(k)p(k)|=O(log nn).(14)

Proof.

First note that (αmβ/(j1)β+1)1=1 only if mj1n0, and thus, the error induced in the calculation of the probability weights by ignoring the truncation is O(n1) and may thus be ignored. A straightforward integral approximation yields

αmβj=m+1n(j1)β1=αβ(1(mn)β)+O(m1)
for all sufficiently large n. Because ddxexxk=(kx)exxk1, we obtain that, for any C<, the function exxk is uniformly Lipschitz continuous in x on [C/m,α/β+C/m] with the m-independent Lipschitz constant
Lk=1k2k1,kN,
where 1,2 are independent of k. Setting further L0=K for some sufficiently large K, we conclude that there is a k-independent constant C<, such that
|1nm=1nej=m+1nαmβ(j1)β+1(j=m+1nαmβ(j1)β+1)k1nm=1neαβ(1(mn)β)(αβ(1(mn)β))k|CLknm=1n1m.

To approximate the discrete outer sum by an integral, we use that if f:[a,b]R is continuously differentiable, then

|1ni=1nf(i(ba)n)abf(s)ds|ba2nsupx[a,b]f(x),
and hence, the constants Lk,k0 can be used to bound this approximation error as well; we end up with
|1nm=1nej=m+1nαmβ(j1)β+1(j=m+1nαmβ(j1)β+1)k01eαβ(1sβ)(αβ(1sβ))kds|C1Lknm=1n1m+C2n,
where C1, C2 are finite constants and independent of n, m, and k. Noting further that k=0Lkk!<, we arrive at (14) because
k=1n|EPn(k)p(k)|+k=npk(C1+C2)nm=1n1mk=1nLkk!+k=npk=O(log n/n)
because k=npk decays exponentially in n. □

Lemma 3.

We have that almost surely

k=0|Pn(k)EPn(k)|=O((log n)3/2n).

Proof.

Fix k, and write Xi=1{Dn(i)=k},i=1,,n. Note that the Xi,i=1,,n are independent, and thus, another invocation of Chernoff’s inequality (see, e.g., Chung and Lu 2006, theorem 3.2) yields that

P(|i=1nXiEXi|>nλ)2 exp((λn)22(i=1nEXi+λn3))=2 exp(λ2n2(EPn(k)+λ3)),(15)
with λ>0. Let En denote the event in which there is no vertex i[n] with Dn(i)C log n; then, for any λ>0,
P(k=0|Pn(k)EPn(k)|λ)P({k=0|Pn(k)EPn(k)|λ}En)+P(Enc)P(k=0C log n|Pn(k)EPn(k)|+k=C log nEPn(k)λ)+P(Enc).

Arguing as in the proof of Lemma 1, we may choose C so large that n=1P(Enc)=B< and such that

k=ClognEPn(k)1nE#{i[n]:Dn(i)C log n}1n
for all sufficiently large n. Observe further that
k=0Clogn|Pn(k)EPn(k)|λ1/n
is only possible if at least one of the terms |Pn(k)EPn(k)| exceeds (λ1/n)/(Clogn), and we conclude that
P(k=0|Pn(k)EPn(k)|λ)P(Enc)+k=0ClognP(|Pn(k)EPn(k)|λ1/nClogn).

Choosing now λ=λ(N)=Dlog n/n for some sufficiently large D and setting λ=λ(n)=λC log n+1/n allow us to apply (15) to obtain

n=1P(k=0|Pn(k)EPn(k)|R(log n)3/2n)B+Cn=1eλ2n/(2+2λ/3)log n<
for some R<, which implies that almost surely
k=0|Pn(k)EPn(k)|=O(log n3/2n).
 □

Combining the previous lemmas concludes the proof of Theorem 1.

6.3. Forward Component Size and Weight Evolution—Proof of Theorem 2

Let A[n]; then,

wn(A)=mAwn(m),nN,
and the weight of A upon arrival of vertex n + 1 essentially determines how likely it is that n + 1 connects to a vertex in A. Set
Γn(m)=|Cn(m)|,n0<mn,
and that is, Γn(m) counts how often m has been referenced (directly and indirectly). We have Γm(m)=0 and
Γn+1(m)={Γn(m),with probability lCn(m)(1wn(l)n),Γn(m)+1,with probability 1lCn(m)(1wn(l)n),n>m.

Hence, transitions of the processes Γ(m)=(Γn(m))nm depend on the weight structure of the forward component of m, not only on its present size. In particular, each Γ(m) is a Markov process with respect to the filtration generated by the graph sequence (Gn)nm but not with respect to the smaller filtration generated by the sequence (Γn(m))nm itself. It is, therefore, more convenient to study the weight processes Wn(m)=wn(Cn(m)),nm, which have the dynamics

Wn+1(m)={(nn+1)βWn(m),with probability kCn(m)(1wn(k)n),(nn+1)βWn(m)+α,with probability 1kCn(m)(1wn(k)n).

To eliminate the slightly unwieldy products structure of the transition probability, we formulate a stochastic domination result. We use the notation

Δan=an+1an,n=n1,n1+1,n1+2,
for the increments of a random or deterministic sequence (an)nn1 and further write {Wn+1} for the event that α is added to the score process in step n + 1. The complementary event is denoted by {Wn+1}, and we extend this notation to all other processes X=(Xn) for which ΔXn can attain precisely two values.

Lemma 4.

Let Yn(m)=Yn(m;a,b,y) denote the Markov chain given by Ym(m)=y and

ΔYn(m)={b+1n+1Yn(m)+an+1,with probability (1eYn(m)),b+1n+1Yn(m),with probability eYn(m),nm.

Then, (Wn(m)/n)nm stochastically dominates (Yn(m;α,β,Wm(m)/m))nm.

Proof.

We start by analyzing the transition probabilities of Wn=Wn(m):

1kCn(m)(1wn(k)n)=1ekCn(m)log(1wn(k)/n)
=1exp(kCn(m)j=1(wn(k)n)jj).

Noting that wn(k)/n is uniformly bounded by αn, we can estimate

1ekCn(m)wn(k)n1exp(kCn(m)j=1(wn(k)n)jj)1exp(kCn(m)(wn(k)n+(1+ε)wn(k)22n2)),(16)
for any ε>0, if n is sufficiently large. Another Taylor expansion yields
1β+1n+1(nn+1)β+11β+1n+1+β(β+1)2(n+1)2,nm.(17)

We conclude that

ΔWnn=(1(nn+1)β+1)Wnn+αn+11{n+1 connects to Cn(m)}β+1n+1Wnn+αn+11{n+1 connects to Cn(m)},
and because of the lower bound in (16), the event in the indicator occurs with probability at least
1ekCn(m)wn(k)n=1eWn/n,
given Cn(m), which proves the lemma. □

To obtain a stochastic upper bound is only slightly more tricky.

Lemma 5.

Let Yn(m;a,b,y) be defined as in Lemma 4, and let a>α,b<β. There exits n1 such that for all nn1, we can couple Yn=Yn(n1;a,b,w) and Wn=Wn(m) conditionally on aαn1Wn1=w such that

YnaαnWn, for all nn1.

Proof.

We argue by induction. The claim holds by construction at any given initial time n1m. Now, assume that the coupling has been established for times n1,,n. Then,

Yn+1=(1b+1n+1)Yn+an+11{Yn+1}(1b+1n+1)aαWnn+aααn+11{Yn+1},
where we have used the definition of Yn and the induction hypothesis. Now, note that
P(Yn+1|Wn,Yn)=1eYn1eaαnWnP(Wn+1|Wn,Yn)
by the induction hypothesis and the upper bound in (16) if nn1 is sufficiently large. We conclude that Wn+1 and Yn+1 can be coupled such that {Wn+1}{Yn+1}, and it follows that under this coupling,
Yn+1aα[(1b+1n+1)Wnn+αn+11{Yn+1}]aα[(nn+1)β+1Wnn+αn+11{Wn+1}]=aαWn+1n+1,
where we have used (17) and once more that n1 can be taken large. □

Lemmas 4 and 5 indicate that the dynamics of (Cn(m))nm can be analyzed in the terms of the simple processes (Yn(m))nm defined in Lemma 4. To this end, fix a,b>0 and Ym = y, and define for nm,

Mn=YnYm+j=mn1{b+1j+1Yjaj+1(1eYj)},
and
Zn=YnYmexp{j=mn1[b+1j+1aj+11eYjYj]}.

It follows from a straightforward calculation or alternatively, by invoking Brightwell and Luczak (2012, lemmas 2.1 and 2.2) that (Mn)nm is a martingale and that (Zn)nm is a supermartingale adapted to the filtration (Fn)nm defined by Fn=σ(Yj,mjn).

Lemma 6.

If 0<a<b+1, then limnYn=0 almost surely.

Proof.

Because (Zn)nm is a nonnegative supermartingale, (Zn)nm converges almost surely to some random variable Z satisfying EZEZm=1. Because a<b+1, it thus follows from

exp{j=mn1[b+1j+1αj+11eYjYj]}exp{j=mn1[b+1j+1aj+1]}=n1+ba+o(1)
that Z can only be finite if (Yn)nm converges to zero. Hence, limnYn=0 almost surely. □

Let us now focus on the case a>b+1. Performing a Taylor expansion, we see that

a(1ey)(1+b)y=(aey(1+b))(yy)a2ξ(y)eξ(y)(yy)2,(18)
where ξ(y) is some value between y and y and y=y(a,b) is the unique positive solution to
1ey=1+βαy.(19)

Using the Martingale (Mn)nm, we can represent Yn as

Ym=y>0ΔYn=an+1(1eYn)b+1n+1Yn+ΔMn,nm.(20)

In view of (18) and (19), the recursion (20) is a stochastic approximation of the stable point y. We recall the following well-known result about stochastic approximations.

Lemma 7

(Special Case of Pemantle 2007, lemma 2.6). Suppose that (Xn)nN is a stochastic process adapted to a filtration (Gn)nN satisfying the recurrence relation

ΔXn=cn(f(Xn)+ξn+1),
where (cn)nN is a deterministic sequence, (ξn)nN is a (Gn)nN-adapted process, and f is a bounded function satisfying
  • nNcn=,nNcn2<;

  • E[ξn+1|Gn]=0,E[ξn+12|Gn]B< for all nN; and

  • either f((l0,r0))(,δ) or f((l0,r0))(δ,) for some δ>0 and some open interval (l0, r0).

Then, we have for any interval [l,r](l0,r0) that

P(Xn visits [l,r] only finitely many times)=1.

We now have all of the ingredients to complete the proof of Theorem 2.

Proof of Theorem 2.

Let Γn=|Cn(m)|,nm and Wn=wn(Cn(m)). We estimate

αnβk=mm+Γn1kβWnαnβk=nΓnnkβ,nm.

Approximating the sums by integrals yields

αβ+1n(Γnn)β+1+O(nβ)Wnαβ+1n(1(1Γnn)β+1)+O(nβ),nm,
from which we conclude that limnΓnn=Z exists if and only if limnWnn=w exists. □

6.3.1. The Subcritical Case α<β +1.

By Lemma 5, Wn/n is eventually dominated by Yn(n1,α,β,y) for some y if α<α<β+1<β+1. Because limnYn=0 almost surely by Lemma 6, we conclude that

P(Z=w=0)=1.

6.3.2. The Supercritical Case α>β+1.

We first employ Lemma 4 to dominate w from below by the random limit

Y=Y(α,β)=limnYn(m;α,β,Wm/m),
which we show to exist now. Rewriting (20), we obtain
ΔYn=1n+1(α(1eYn)(β+1)Yn+(n+1)ΔMn),nm.

We wish to apply Lemma 7 with cn=(n+1)1,ξn+1=(n+1)ΔMn, and

f(x)=1[0,K](x)(α(1ex)(β+1)x),xR.

Let us first check that the condition on ξn is satisfied. Clearly, E[ξn+1|Yn]=0 because ΔMn is a martingale difference. To see that E[ξn+12|Yn] is bounded, we simply note that

|ΔMn||ΔYn|+(1+Yn)α+β+1n+1Cn+1
for some constant C by definition of Yn and the fact that Yn is bounded; hence, ξn is bounded. Let us now analyze the recursion function f. Note that the truncation of f is necessary because of the assumption of boundedness in Lemma 7, but this is not a problem because clearly, Yn>0 for all n, YnWn/n because of the coupling, and Wn/n is easily seen to be bounded by some deterministic value K>αβ+1 (mind that Wn/nαβ+1+o(1)). Note that f is continuous on [0,K] with f(x) = 0 for x[0,K] if and only if x{0,y}, where y=y(α,β) is given in (19). Note further that f(x) > 0 if x(0,y) and f(x) < 0 if x(y,K). Continuity of f now implies that for any sufficiently small ε0>0, there exists a δ0>0 such that
f(x)>δ0 for all x(ε0,yε) and f(x)<δ0 for all x(y+ε0,αβ+1+ε0),
where we note that y<α/(β+1) for any choice of α>β+1. By Lemma 7, we deduce that any closed bounded interval not including either zero or y is visited only finitely often, and hence, (Yn)nm converges almost surely to a random variable Y with distribution
(1p)δ0+pδy,for some p[0,1].

We say (Yn)nm survives if Y=y and that it dies out if Y=0. We are going to show now that p > 0 (i.e., (Yn)nm survives with positive probability). Observe that we have

ΔE[Yn|Fm]=αn+1E[1eYn|Fm]β+1n+1E[Yn|Fm],nk.

Note that if (Yn)nm dies out almost surely, then we have that limksupnkE[Yn|Fk]=0 almost surely. Because E[1eYn|Fk]=E[Yn|Fk]+O(E[Yn|Fk]2) as E[Yn|Fk] vanishes, it follows that for every ε(0,αβ1), there exists some K0 such that

ΔE[Yn|Fk]αβ1εn+1E[Yn|Fk],nkK0.

Denoting ζ=αβ1ε>0 and setting Yka(k)>0, it follows that (E[Yn|Fk])nk dominates the unique solution of the difference equation

ak=a(k),Δan=ζn+1an,nk,
which satisfies limnan= for any initial condition a(k) > 0. This is a contradiction to limksupnkE[Yn|Fk]=0, and we conclude that
p=P((Yn)nm survives)>0.

Combining Lemmas 4 and 5, we obtain that

y(α,β)wy(α+ε,βε)
for all sufficiently small ε, and because the root y(a,b) of (19) depends continuously on a and b, we conclude that w=limnWn/n=y(α,β)>0 almost surely on the event of survival. It follows that, almost surely on survival, limnΓn/nγ>0. Let qn denote the probability that a uniformly chosen vertex in Gn is contained in Cn(m); then, limnqn=γP(limnΓn/n>0), and for any mn,
qn=1n+1nk=mn1P(k+1Cn(m))=1n+1nE(k=mn1E[1{k+1Ck+1(m)}|Gk]).

Note that E[1{k+1Ck+1(m)}|Gk]=1eWk/k and that for any ε(0,1),

k=εn1n1E(1eWk/k)k=mn1E(1eWk/k)k=[εn]1n1E(1eWk/k)+εn+1
if n is sufficiently large. We infer that γ=(1+β)α1y(α,β), and this concludes the proof of the critical case by establishing a one-to-one relation between Y and the random variable Z=Z(m) as defined in the theorem.

6.3.3. The Critical Case α=β+1.

From the supercritical case, we infer that almost surely γ = 0 in this case because γ is a monotone function of α and limαβ+1y(α,β)=0.

Remark 3.

  1. The original version of lemma 7 in Pemantle (2007) also allows us to include an error term Rn in the recursion as long as ncnRn<. In principle, one can apply this version of the stochastic approximation result directly to Wn/n. We have chosen to include the intermediate step via Yn because it breaks up the error estimates into several parts and makes the argument easier to follow because (Yn)nm has a rather simple form.

  2. Note that the subcritical result also follows from the supercritical/critical case by monotonicity, but we gave a standalone argument because the supercritical case is more involved than the direct proof. Unfortunately, this simple argument does not carry over to the critical case.

References

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512.Google Scholar
  • Blockchain Luxembourg S.A. (2021a) Average block size—Blockchain. Accessed May 19, 2023, https://blockchain.info/de/charts/avg-block-size.Google Scholar
  • Blockchain Luxembourg S.A. (2021b) Median confirmation time—Blockchain. Accessed May 19, 2023, https://www.blockchain.com/charts/median-confirmation-time.Google Scholar
  • Blockchain Luxembourg S.A. (2021c) Transaction rate—Blockchain. Accessed May 19, 2023, https://blockchain.info/de/charts/transactions-per-second.Google Scholar
  • Bloznelis M, Götze F, Jaworski J (2012) Birth of a strongly connected giant in an inhomogeneous random digraph. J. Appl. Probab. 49(3):601–611.Google Scholar
  • Bollobás B, Riordan O (2004) The phase transition and connectedness in uniformly grown random graphs. Leonardi S, ed. Algorithms and Models for the Web-Graph (WAW 2004), Lecture Notes in Computer Science, vol. 3243 (Springer, Berlin), 1–18. https://doi.org/10.1007/978-3-540-30216-2_1.Google Scholar
  • Bollobás B, Janson S, Riordan O (2005) The phase transition in the uniformly grown random graph has infinite order. Random Structures Algorithms 26(1–2):1–36.Google Scholar
  • Bollobás B, Janson S, Riordan O (2007) The phase transition in inhomogeneous random graphs. Random Structures Algorithms 31(1):3–122.Google Scholar
  • Bollobás B, Riordan O, Spencer J, Tusnády G (2001) The degree sequence of a scale-free random graph process. Random Structures Algorithms 18(3):279–290.Google Scholar
  • Brightwell G, Luczak M (2012) Vertices of high degree in the preferential attachment tree. Electronic J. Probab. 17(14):1–43.Google Scholar
  • Cao J, Olvera-Cravioto M (2020) Connectivity of a general class of inhomogeneous random digraphs. Random Structures Algorithms 56(3):722–774.Google Scholar
  • Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Google Scholar
  • Croman K, Decker C, Eyal I, Gencer AE, Juels A, Kosba A, Miller A, et al. (2016) On scaling decentralized blockchains. Clark J, Meiklejohn S, Ryan P, Wallach D, Brenner M, Rohloff K, eds. Financial Cryptography and Data Security. FC 2016, Lecture Notes in Computer Science, vol. 9604 (Springer, Berlin, Heidelberg), 106–125. https://doi.org/10.1007/978-3-662-53357-4_8.Google Scholar
  • Decker C, Wattenhofer R (2013) Information propagation in the Bitcoin network. IEEE 13th Internat. Conf. Peer-to-Peer Comput. (P2P) (IEEE, Piscataway, NJ), 1–10. https://doi.org/10.1109/P2P.2013.6688704.Google Scholar
  • Dereich S, Mörters P (2009) Random networks with sublinear preferential attachment: Degree evolutions. Electronic J. Probab. 14(43):1222–1267.Google Scholar
  • Dereich S, Mörters P (2013) Random networks with sublinear preferential attachment: The giant component. Ann. Probab. 41(1):329–384.Google Scholar
  • Donet JAD, Perez-Sola C, Herrera-Joancomarti J (2014) The Bitcoin P2P network. Böhme R, Brenner M, Moore T, Smith M, eds. Financial Cryptography and Data Security. FC 2014, Lecture Notes in Computer Science, vol. 8438 (Springer, Berlin, Heidelberg), 87–102. https://doi.org/10.1007/978-3-662-44774-1_7.Google Scholar
  • Dunford N, Schwartz JT (1988) Linear Operators. Part II (John Wiley & Sons, Inc., New York).Google Scholar
  • Durrett R, Kesten H (1990) The Critical Parameter for Connectedness of Some Random Graphs. A Tribute to Paul Erdős (Cambridge University Press, Cambridge, UK), 161–176.Google Scholar
  • Fadhil M, Owenson G, Adda M (2016) A Bitcoin model for evaluation of clustering to improve propagation delay in Bitcoin network. 2016 IEEE Internat. Conf. Comput. Sci. Engrg. (CSE) and IEEE Internat. Conf. Embedded Ubiquitous Comput. (EUC) and 15th Internat. Sympos. Distributed Comput. Appl. Bus. Engrg. (DCABES) (IEEE, Piscataway, NJ), 468–475. https://doi.org/10.1109/CSE-EUC-DCABES.2016.226.Google Scholar
  • Fadhil M, Owenson G, Adda M (2017) Locality based approach to improve propagation delay on the Bitcoin peer-to-peer network. IFIP/IEEE Sympos. Integrated Network Service Management (IM) (IEEE, Piscataway, NJ), 556–559. https://doi.org/10.23919/INM.2017.7987328.Google Scholar
  • Feller W (1950) An Introduction to Probability Theory and Its Applications, vol. I (John Wiley & Sons, Inc., New York).Google Scholar
  • Ferraro P, King C, Shorten R (2020) On the stability of unverified transactions in a DAG-based distributed ledger. IEEE Trans. Automatic Control 65(9):3772–3783.Google Scholar
  • Gobel J, Krzesinski AE (2017) Increased block size and Bitcoin blockchain dynamics. 27th IEEE Internat. Telecomm. Networks Appl. Conf. (ITNAC) (IEEE, Piscataway, NJ), 1–6. https://doi.org/10.1109/ATNAC.2017.8215367.Google Scholar
  • Kalikow S, Weiss B (1988) When are random graphs connected. Israel J. Math. 62(3):257–268.Google Scholar
  • Kusmierz B, Gal A (2018) Probability of being left behind and probability of becoming permanent tip in the Tangle v0.2. Working paper, IOTA Foundation.Google Scholar
  • Lewenberg Y, Sompolinsky Y, Zohar A (2015) Inclusive block chain protocols. Böhme R, Okamoto T, eds. Financial Cryptography and Data Security. FC 2015, Lecture Notes in Computer Science, vol. 8975 (Springer, Berlin, Heidelberg), 528–547. https://doi.org/10.1007/978-3-662-47854-7_33.Google Scholar
  • Lyon MR, Mahmoud HM (2020) Trees grown under young-age preferential attachment. J. Appl. Probab. 57(3):9111–927.Google Scholar
  • Müller S, Penzkofer A, Polyanskii N, Theis J, Sanders W, Moog H (2022) Tangle 2.0 leaderless Nakamoto consensus on the heaviest DAG. IEEE Access 10:105807–105842. https://doi.org/10.1109/ACCESS.2022.3211422.Google Scholar
  • Nakamoto S (2006) Bitcoin: A peer-to-peer electronic cash system. White paper.Google Scholar
  • Pappalardo G, Di Matteo T, Caldarelli G, Aste T (2017) Blockchain inefficiency in the Bitcoin peers network. EPJ Data Sci. 7:30. https://doi.org/10.1140/epjds/s13688-018-0159-3.Google Scholar
  • Pemantle R (2007) A survey of random processes with reinforcement. Probab. Surveys 4:1–79.Google Scholar
  • Popov S (2016) The Tangle. Working paper, IOTA Foundation.Google Scholar
  • Popov S, Moog H, Camargo D, Capossele A, Dimitrov V, Gal A, Greve A, et al. (2020) The Coordicide. Working paper, IOTA Foundation.Google Scholar
  • Shepp LA (1989) Connectedness of certain random graphs. Israel J. Math. 67(1):23–33.Google Scholar
  • Sompolinsky Y, Wyborski S, Zohar A (2021) Phantom Ghostdag: A scalable generalization of Nakamoto consensus. Proc. 3rd ACM Conf. Adv. Financial Tech. (AFT ’21) (Association for Computing Machinery, New York), 57–70. https://doi.org/10.1145/3479722.3480990.Google Scholar
  • Sompolinsky Y, Lewenberg Y, Zohar A (2016) Spectre: A fast and scalable cryptocurrency protocol. IACR Cryptology ePrint Archive 1159. Working paper. https://eprint.iacr.org/2016/1159.Google Scholar
  • van der Hofstad R (2017) Random Graphs and Complex Networks, vol. 1, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
  • van der Hofstad R (2021) Random Graphs and Complex Networks, vol. 2, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
  • Wang Q, Yu J, Chen S, Xiang Y (2022) SoK: Diving into DAG-based blockchain systems. Preprint, submitted October 29, https://doi.org/10.48550/arXiv.2012.06128.Google Scholar
  • Yeow K, Gani A, Ahmad RW, Rodrigues JJ, Ko K (2017) Decentralized consensus for edge-centric Internet of things: A review, taxonomy, and research issues. IEEE Access 6:1513–1524. https://doi.org/10.1109/ACCESS.2017.2779263.Google Scholar
  • Yin H, Guo D, Wang K, Jiang Z, Lyu Y, Xing J (2018) Hyperconnected network: A decentralized trusted computing and networking paradigm. IEEE Network 32(1):112–117.Google Scholar
  • Zyskind G, Nathan O, Pentland A (2015) Decentralizing privacy: Using blockchain to protect personal data. 2015 IEEE Security Privacy Workshops (IEEE, Piscataway, NJ), 180–184.Google Scholar