Optimization of stochastic lossy transport networks and applications to power grids

Motivated by developments in renewable energy and smart grids, we formulate a stylized mathematical model of a transport network with stochastic load fluctuations. Using an affine control rule, we explore the trade-off between the number of controllable resources in a lossy transport network and the performance gain they yield in terms of expected power losses. Our results are explicit and reveal the interaction between the level of flexibility, the intrinsic load uncertainty and the network structure.


Introduction
A transport network is an abstract model describing a structure in which some commodity is transferred from the "source" nodes of the network to the "sink" nodes according to a specified routing that is determined by some external principle or design, see [11,50]. Examples of transport networks are road networks, railways, pipes, and power grids.
In this work we focus in particular on lossy transport networks where a fraction of the transported good is inevitably lost, having in mind as primary application power systems in which part of the transported electricity is lost due to heat dissipation in the transmission lines.
The main question that we want to address in the present paper is how these transportation losses can be reduced in the scenario in which we have no direct control on the routing, but some of the nodes of the network have controllable loads. This is the case for power systems in which the line flows are determined by physical laws, but at the same time feature an increasing number of controllable energy resources, like energy storage devices, smart buildings and appliances, and electric vehicles.
In the present work we consider a probabilistic model to describe the stochastic fluctuations of the load in (a subset of) the nodes of the network. This is instrumental to model the stochastic load fluctuations due to power demand uncertainty and intermittent generation by renewable energy sources. The current power grids were originally built around conventional power generation systems and therefore they are not equipped to cope with this massive amount of uncertainty, especially in power supply. Managing this uncertainty on such a large scale with existing methods will soon become crucial: in the next decades power grids will have to become more flexible and robust to reduce the likelihood of contingencies and blackouts, whose social and economic impact is enormous.
As mentioned earlier, next to the increasing renewable penetration, there is another powerful trend that is driving this pervasive evolution of power system: the advent of distributed energy resources.
At a high level, all these resources can be seen as "virtual storage/batteries", in the sense that they can dynamically reduce their power consumptions and even inject electricity in the power grid when necessary, see [33,47,2]. Even if at the present stage these resources are not fully incorporated, they have a huge potential: if their penetration increases and we can actively and optimally control them, then they can make power grids more flexible and at the same time effectively mitigate the volatile nature of renewable power generation and allow a higher share of renewable energy sources.
The controllable loads that we consider in this work should be seen as an abstraction of more concrete examples, such as (i) actual energy storage that is neither full nor empty, (ii) distribution grids with ample flexible and/or deferrable load [34,33,38,37] or (iii) conventional generators that can provide balancing services.
The stochastic network model considered in this paper aims to understand the potential that these controllable resources can have in mitigating the load uncertainty and in particular the transportation losses due to the stochastic load fluctuations in the network nodes. Specifically, we consider a network with random sources and sinks modeled by an undirected connected weighted graph G consisting of n nodes and m edges to which are associated non-negative weights β ∈ R m + , and investigate how much could the average total loss be reduced by operating optimally the subset B ⊆ V of nodes with controllable loads.
The metric we consider here to quantify the transportation losses due to stochastic fluctuations is a quadratic function of the load profile that has been introduced in [22] as proxy for the total power losses in AC power grids, as we will review in more detail in the next section. Such a metric generalizes the notion of total effective resistance R tot (G) of the graph G, also known as Kirchhoff index. This is a key quantity that measures how well connected and robust a network is and for this reason has been extensively studied and rediscovered in various contexts, such as complex network analysis [14] and theoretical chemistry (for an overview see [52] and references therein).
In this paper, we address the question of how to optimally operate distributed energy resources to minimize power losses in power systems. Due to imperfect sensing and delayed communication, a real-time perfect coordination between network nodes based on the realized fluctuations is unfeasible. We consider thus a static decentralized policy that amounts to an affine control rule for the controllable loads, which is inspired by the Automatic Generation Control (AGC) mechanism, currently used in power systems, cf. [51]. This control mechanisms prescribes that in case of a power imbalance, all the generators have to either increase or decrease their power generation proportionally to their partecipation factors, which are static nonnegative scalars set beforehand based on economical principles.
In the present work, we envision a scenario where not just a few big conventional generators, but several other distributed energy resources could also join this effort to balance power fluctuations. We thus formulate a constrained optimization problem to find the optimal static load-sharing factors for the controllable loads. We prove that the solution is unique and give a closed-form expression for such an optimal control from which reflects the interplay between the network structure, the location of the controllable nodes, and the correlation structure of the load fluctuations.
In particular, our result shows how the correlation structure between load fluctuations affects the optimal operations of the network resources and this is extremely relevant in power grids with geographically close wind or solar farms, whose power outputs are obviously highly correlated. These insights are derived without making Gaussian assumptions on the distribution of the fluctuations.
We then use this explicit solution to explore the trade-off between the number of controllable resources available in a network and the performance gain they yield, quantified as transportation loss reduction. The analysis builds on and extends that of [54], which considered the scenario with only two controllable loads. In particular, the authors show therein that the expected total loss due to fluctuations can be reduced by 25% in a line network by adding one controllable storage device. In the present paper we extended that insight by showing that for large graphs the total expected loss can be reduced on average by a factor (1 + 1/k)/2 in the scenario where k controllable loads are available. The key insight is larger values of k yield diminishing improvements in terms of total expected loss.
This suggests that, even if power grids are becoming locally more robust, transportation losses can be reduced by up to 50%, and the number of controllable loads or balancing services k quantify how close one can get to this reduction. Though our model is stylized, it provides a simple quantitative estimate on the value of balancing services in a scenario where each node in the network is self-sufficient on average.
We remark that our stylized model is "static", in the sense the optimal control we derive does rebalance the total power mismatch in the network in any scenario, but does not depend on the realized load fluctuations. Indeed, it depends only on the network structure, on the location of the controllable loads and on their average covariance structure of the load fluctuations. For this reason our model is not specific for a precise time-scale and provides insights into the value of balancing services both in real-time operations as well as long-term planning.
Our work provides a new mathematical approach which can be of help for the design of future power grids and of control schemes for distributed energy resources. In this respect, it complements a large body of literature on optimal topology design for power grids [12,22,18] and on optimal control of multiple controllable devices and/or generators where often the designed participation factors are also affine in the stochastic load fluctuations, see e.g., [1,46,35,6,7,19,20,31,23,43,44,45]. Optimal policies for storage management, especially aiming at the mitigation of the uncertainties in wind generation, have been explored in [3,16,17,48], where, however, the physical network is not modeled explicitly. Optimal storage placement can increase network reliability, as shown in [4] by simulation techniques. Storage can also be used for arbitrage, exploiting temporal price differences [10,8] and the impact of storage on energy markets has been studied in [9,15].
The rest of the paper is organized as follows. We provide a detailed model description in Section 2. In Section 3 we investigate the optimal load sharing factors in several scenarios. These results are applied in Section 4 where a scaling law is presented for large networks. A more general optimization problem, which takes into account economic factors and/or further limitations of the controllable loads is presented in Section 5. In Section 6 we report several numerical experiments and, lastly, Section 7 concludes.

Model description
In this paper we model a lossy transport network as a weighted graph (G, w), where the graph G is a simple undirected graph G = (V, E) with |V | = n nodes labelled as V = {1, . . . , n}, and |E| = m edges and β ∈ R m + is the collection of edge weights. In the context of power grids, the nodes of G are often referred to as buses, the edges as transmission lines and the quantity β i,j is the susceptance of the transmission line = (v, w) ∈ E connecting buses i and j. Missing edges can be thought as edges with zero weight.
The weighted Laplacian matrix of the graph G, often referred to also as the susceptance matrix of G in the case of a power system, is the matrix L ∈ R n×n defined for every i, j = 1, . . . , n as It is well-known that L is a real symmetric positive semi-definite matrix. By construction, all the rows of L sum up to zero and thus the matrix L is singular. Under the assumption that G is a connected graph, the eigenvalue zero has multiplicity one and the corresponding eigenvector is equal to the vector with all unit entries, which we denote by 1 ∈ R n . Let L + ∈ R n×n be the Moore-Penrose pseudoinverse of the weighted Laplacian matrix L. Using the eigenspace structure of L, the pseudoinverse L + can be defined as The matrix L + is also real, symmetric, and positive semi-definite. For the proof of these properties of the matrices L and L + and for further spectral properties of graphs, we refer the reader to [41,49]. Denote by p ∈ R n the load profile at the network nodes, where p i is the load at node i for every i = 1, . . . , n. In the context of power grids the i-th entry of the vector p models the power generated (if p i > 0) or consumed (if p i < 0) at node i. We say that a load profile p ∈ R n is balanced if 1 T p = 0.
Given a network with a balanced load profile p ∈ R n , we define its total loss H = H(p) as The scalar quantity H is a quadratic form of the load profile vector p and, as such, is always non-negative, thanks to the fact that L + is a positive semi-definite matrix. The total aggregated loss H will be central in our analysis, being the most natural choice for an efficiency metric in a lossy transport networks such as power systems. More specifically, H has been shown by [22] to be good approximation for total power loss in AC power grids, as we will briefly outline. AC current flows are described in terms of complex amplitudes and lines with complex impedances. In a power transmission networks operating in stationary conditions (i) all voltages are mostly constants and (ii) we can ignore reactive power flows and line conductances since they are an order of magnitude smaller than the active power flows and line susceptances, respectively. Under these two assumptions, it is reasonable to consider the linearized version of the AC Kirchhoff equations for the power flows, i.e., the so-called DC-approximation, called in this way since it resembles in structure the equations describing a resistive network with DC flows, cf. [51,Chapter 6]. If p denotes the vector of active power injections in the network nodes, by keeping only the leading term of such a DC-approximation, the power flows f ∈ R m on the network lines are given by where L is the weighted Laplacian matrix derived looking only the imaginary part of the network admittance matrix and Λ ∈ R m×n is the corresponding weighted edge-vertex incidence matrix otherwise.
Since the power loss on each line is proportional to β −1 f 2 , the aggregated power loss is (up a constant factor capturing the conductance-to-admittance ratio of the lines) equal to where we used the fact that L = Λ T diag(β 1 , . . . , β m ) −1 Λ and that L + LL + = L + . Therefore, modulo a trivial rescaling, the quantity H is an approximation for the total power losses calculated using only the leading order DC-approximation of the AC network flows.
The quantity H has also been considered as a scalar measure of the network tension in [21,25], where it is shown to be monotone along a cascade failure, as long as the network remains connected. Lastly, as we will show later, H can also be seen as a generalized total effective resistance in the sense that it quantifies how robust the network G is against a stochastic load profile with a predefined covariance structure.

Stochastic fluctuations and load-sharing factors
In this work we are particularly interested in a transport network with a stochastic load profile, which means that we take p to be a multivariate random variable.
More precisely, we assume p to be of the form p = µ + ω, where µ ∈ R n is the nominal load profile in the network and ω is a n-dimensional random vector modeling the fluctuations. We henceforth assume that Eω = 0 and that ω has finite second moment. We further denote by Σ ∈ R n×n the covariance matrix of the load fluctuations, namely Σ i,j = cov(ω i , ω j ) = E[ω i ω j ] < ∞. Nodes in which there are no stochastic load fluctuations can be modeled by setting all the entries equal to zero in the corresponding row and column of the matrix Σ. We denote by S ⊆ V the subset of nodes with stochastic load fluctuations and we will henceforth assume that |S| ≥ 1.
We further assume that the nominal load profile is balanced on average, namely 1 T µ = 0; this assumption is reasonable as every 5-15 minutes the setpoint µ is adjusted by solving the so-called Optimal Power Flow (OPF) problem which specifically constraint the net total power to be equal to zero. This assumption, however, does not guarantee that every stochhastic realization of the load profile is balanced: indeed, the total mismatch 1 T p in the network is a random variable, which can be expressed as the net sum of fluctuations, namely Let σ 2 := Var n i=1 ω i be the variance of the sum of the load fluctuations, which also rewrites as Since we assumed that there is at least one node with stochastic load fluctuations, we have σ 2 > 0.
In order to cope with the stochastic fluctuations and, in particular, to keep the network balanced, we assume that load at each node is controllable: for every i = 1, . . . , n, node i can deal with (either generate or store) a controllable fraction α i ∈ R of the realized total mismatch 1 T (p − µ) = n i=1 ω i . In other words, we assume that while using the load-sharing factors α = (α 1 , . . . , α n ) ∈ R n the net load profile p(α) is given by For any j = 1, . . . , n, the term α j n i=1 ω i corresponds to the power generated or stored in the corresponding controllable load when using an affine control responsive to stochastic load fluctuations. We can rewrite the net power load profile p(α) in vector form as where C α ∈ R n×n is the matrix defined as The last equality in (3) follows from the fact that 1 T µ = 0, since C α µ = (I − α 1 T )µ = µ − 0 · α = µ. When the load-sharing factors α are used, the total mismatch in the network is equal to From this expression it is immediate to derive the condition on α that guarantees that the net load profile p(α) is balanced for any realization ω, which is stated in the next lemma.
Lemma 1 (Stochastic load profile balance condition). Consider a network with balanced nominal load profile 1 T µ = 0 and controllable loads using load-sharing factors α ∈ R n . The stochastic load profile p(α) is balanced for every realization of the fluctuations if and only if

Model discussion
Before presenting our results in the next section, we briefly discuss here the motivation behind the assumptions on the distribution of ω and the motivation for considering an affine static control policy for the fluctuations. First of all, the random variable n j=1 ω j , which describes the net power mismatch, can have any sign. We thus tacitly assume that our controllable nodes can respond to both positive and negative fluctuations, meaning that they can both store energy in excess or provide energy if the network needs it. For sake of generality, we impose no further constraints on the random variable n j=1 ω j , and thus the power absorbed/injected by controllable node i, i.e., α i n j=1 ω j , could theoretically take very large (positive and negative) values. This may seem a dubious and unrealistic assumption, but it is not for various reasons, which we now outline. Firstly, in normal operating conditions and on short time scales the quantity n j=1 ω j is typically "small". Indeed, our model is particularly relevant for a prompt response on short-time scales, in which large unforeseen power fluctuations are highly unlikely. If an extremely large net power fluctuation occurs, it would be physically impossible for controllable loads and storage devices to resolve this imbalance by themselves and the network operator needs to take specific ad-hoc emergency actions anyway. In particular, the real-time energy market and a consequent Optimal Power Flow routine on the 5-15 minutes scale will allow for a new safe setpoint µ and for energy reserves to be used/bought. Secondly, since a detailed statistical modeling of demand fluctuations or renewable power production is not within the scope of this paper, we consider a general multi-dimensional distribution for the stochastic fluctuations ω 1 , . . . , ω n . It is however very reasonable to assume that the support of these random variables is bounded, since, for instance, renewable energy sources have cannot produce a negative amount of power nor produce beyond a specific power rating (above which automatic protective relay mechanisms are automatically triggered causing to so-called power curtailment). Lastly, a controllable load that at a specific moment does not have the two-directional flexibility (i.e., that cannot both store and release energy) can temporarily "go offline" and we can account for it by adding the constraint α i = 0 for the corresponding network node.
An affine control modeled using the load-sharing factors α ∈ R n is a clearly simplification, especially when the controllable loads models energy storage. Indeed, it does not incorporate many details, among which possible ramping constraints or the current state of charge. In our model such details are omitted on purpose to have a mathematically tractable optimization problem and to better identify the interplay between load uncertainty and storage operations. A further simplification we make is that we allow to choose α without invoking line limits, as we also do this for mathematical tractability, but the extension we present in Section 5 accounts for these limits and several others.
We remark that these assumptions on the stochastic fluctuations ω and on the affine control of the deviations from the nominal setpoints are fairly common in power systems operations, see e.g. [5,43,44].

Expected total loss: definition and properties
For any 1 ≤ k ≤ n, we can model a network where exactly k nodes have controllable loads by imposing that the remaining n − k nodes have load-sharing factors equal to zero, so that for any such node p i = p i (α). Using the net load profile p(α), the total loss rewrites as and therefore {H(α)} α∈R n is a family of random variables parametrized by the control α. Being a quadratic form and being L + a positive semi-definite matrix, it immediately follows that H(α) is a non-negative random variable for any α ∈ R n . The next proposition, which is proved in Appendix A, shows how, leveraging the properties of the matrices L + and C α , the expected total loss EH(α) rewrites as the sum of two contributions, one stochastic and one deterministic which is not affected by the control α. Furthermore, it rewrites the expected total loss as a quadratic function the load-sharing vector α.
Proposition 2. Consider a network with a balanced nominal load profile µ and zero-mean stochastic fluctuations ω with covariance matrix Σ. Then, the expected total loss using the control α is given by where EH s (α) is the expected total loss due to the stochastic fluctuations. Furthermore, EH s (α) ≥ 0 for every α ∈ R n , and the following identity holds: The first important remark is that the expected total loss EH(α) is a quadratic form in the vector α ∈ R n since it can be rewritten as where A = L + is a positive semi-definite matrix, σ 2 > 0, b = L + Σ1 ∈ R n , and c = tr(ΣL + )/2+ 1 2 µ T L + µ ∈ R + . Furthermore, we can already conclude that the nominal load profile µ has no impact on the optimal control α, since it appears only in the constant term c.
Before investigating what is the optimal load-sharing vector for a given network and covariance structure of the noise, we argue here why the quantity EH(α) can be seen as a generalized notion of effective resistance. In order to do so, we will first recall some classical definitions.
The effective resistance R i,j between a pair of nodes i and j of the network G is defined as the electrical resistance measured across nodes i and j when we look at G as electrical network in which resistors with conductance β −1 1 , . . . , β −1 m are placed at the corresponding network edges. Equivalently, where e i denotes the vector with a 1 in the i-th coordinate and 0's elsewhere. The total effective resistance of a graph G is then defined as The same quantity is also known as Kirchhoff index when the network G is such that all the edge weights are equal to 1. As mentioned in the introduction, the total effective resistance has been used in various contexts [14,52] to quantify how well connected a given network is. In the context of electrical networks the total effective resistance R tot (G) is shown in [18] to be proportional to the average power dissipated in a resistive DC network (G, w) when random i.i.d. currents with zero mean and unit variance are injected at the nodes. We can look at the network (G, w, α) with controllable loads introduced earlier as a flexible transport network, where the load-sharing factors α 1 , . . . , α n can be tuned to respond optimally to specific stochastic load fluctuations. In this respect, we claim that the quantity EH s (α) can be seen as a (rescaled) generalized total effective resistance that measures how "robust" the network (G, w, α) is against stochastic load fluctuations with covariance structure Σ. To further corroborate this claim, we now show that the expected total loss reduces to the classical total effective resistance R tot (G) in the special case where the load-sharing factors are all equal, i.e., α i = 1/n for every i = 1, . . . , n, and the stochastic load fluctuations are i.i.d. random variables with with zero mean and unit variance, i.e., Σ = I. Using (6), the expected total loss due to fluctuations rewrites as where in the last step we use the well-known identity R tot (G) = n · tr(L + ) proved by [24] that relates the total effective resistance of a graph with its spectrum. Rayleigh's monotonicity principle [13] states that the pairwise effective resistance R i,j is a nonincreasing function of the edge weights and, as a consequence, also the total effective resistance R tot (G) is. The following proposition, proved in Appendix A, shows that a similar property also holds for H(α): regardless of the load-sharing vector α, the total power loss does not increase when edges are added or weights are increased.
Proposition 3 (Monotonicity of total power loss). Let G be a weighted connected graph and let G the graph obtained from G by increasing the weight of edge e = (i, j) by β > 0 or by adding the edge e = (i, j) with weight β > 0. For any load-sharing vector α ∈ R n and any realization of the stochastic load fluctuations ω, the following inequality holds and, in particular, EH G (α) ≤ EH G (α).

Optimal load-sharing control
In this section we consider the problem of minimizing the expected total loss EH(α) given a network G and a stochastic load covariance structure Σ.
Let B ⊆ V be the subset of k nodes with controllable loads. We focus first on the scenario in which not all the nodes have controllable loads and thus assume 1 ≤ k < n. Besides the constraint (4), we further add n − k constraints on the optimal load-sharing vector α ∈ R n to account for the absence of controllable loads in the nodes in V \ B, obtaining the following constrained optimization problem in R n : Note we consider only the expected total loss due to stochastic load fluctuations in the objective function, since in view of (5) it differs only by a constant from the expected total loss. We henceforth assume that the k nodes with the controllable loads are those with labels 1, . . . , k, i.e., B = {1, . . . , k}. We can make this assumption without loss of generality as it amounts to a relabelling the network nodes. If this is the case, the rows and columns of matrices L, L + , and Σ and the entries of the vector µ are also rearranged accordingly.
Let P B ∈ {0, 1} n×k be the binary matrix that maps any k-dimensional vector α ∈ R k to the ndimensional vector P B α = ( α 1 , . . . , α k , 0, . . . , 0) ∈ R n . Such a matrix can be defined component-wise as (P B ) i,j := δ {i=j} δ {i≤k} , for i = 1, . . . , n and j = 1, . . . , k, and has the following structure: where I k is the k × k identity matrix and O ∈ R n−k×k is a matrix with all entries equal to zero. In our first main result we present a closed-form expression for the optimal load-sharing factors of k controllables.
Theorem 4 (Optimal load-sharing between k < n controllables). Consider a network with balanced nominal load profile 1 T µ = 0 in which the nodes in B = {1, . . . , k} have controllable loads. The solution of the optimization problem (9) is the load-sharing vector α * = ( α * 0), with where L + B is the k × k principal submatrix of L + , i.e., L + B := P T B L + P B , and t B : If all the nodes with stochastic load fluctuations have controllable loads, i.e., S ⊆ B, then and, in particular, The involved expression (10) for the optimal load-sharing factors reflects the interplay that exists between the network structure, the location of the controllable loads B, and the correlation structure of the load fluctuations in determining the losses.
In the special case where all the nodes with stochastic load fluctuations have controllable loads there is a nice interpretation for the optimal load-sharing factors: indeed α * i is proportional to how much the stochastic fluctuations of node i contribute in relative terms to the variance of the total mismatch, since We further remark that in the case of i.i.d. stochastic fluctuations in all the nodes, the second term in (10) vanishes, since the vector Σ1 lies in the null space of L + (being a multiple of 1), and, therefore, the optimal control is equal to In particular, it does not depend on the variance of the load fluctuations, but only on the network structure and on the location B of the controllable loads, both encoded in the matrix L + B .

Full controllability
In this subsection we focus on the special case where the load is controllable in every node, i.e., B = V , which is not covered in Theorem 4. Indeed, the proof method does not work in this scenario due to the non-invertibility of the graph Laplacian L, and for this reason is treated separately here. The problem of minimizing the expected loss EH(α) when all the nodes have controllable loads can be written as an optimization problem on R n with a single constraint, namely s.t. 1 T α = 1.
As mentioned earlier, thanks to Proposition 2 we can immediately conclude that the optimal loadsharing vector does not depend on the vector µ, which appears only in the constant term in the equality (5) for the expected loss.
The next theorem derives an analytical expression for the optimal solution of this optimization problem.
Theorem 5 (Optimal load-sharing between n controllables). Consider a network G with n nodes and a balanced nominal load profile 1 T µ = 0. The solution of the optimization problem (13) is the load-sharing vector α * given by The highlight of this result is that in the scenario where all nodes have controllable loads, the optimal control α * does not depend on the graph structure, but only on the covariance structure of the fluctuations. The same interpretation as in the special case S ⊆ B of Theorem 4 holds here for the optimal load-sharing factors: α * i is proportional to how much the stochastic fluctuations of node i contribute in relative terms to the variance of the total mismatch, see (12). In particular, when a node i does not have stochastic load fluctuations, then it is optimal not to use the controllable load in that node, since α * i = 0 in view of the fact that the i-th row of Σ is identically zero. It immediately follows from Theorem 5 that when the the stochastic load fluctuations are independent and identically distributed, the optimal load-sharing factors are all equal, namely Furthermore, the expected total loss due to stochastic fluctuations when using the optimal load-sharing factors rewrites as In this special case, the fact that EH s (α * ) ≥ 0 can equivalently be proved as follows: where both the inequalities leverage in a crucial way that all the matrices Σ, 11 T , L + are positive semi-definite. The first inequality follows from the fact that in the space of positive semi-definite matrices trace is a proper inner-product and thus obeys Cauchy-Schwarz inequality, tr(AB) ≤ tr(A 2 ) tr(B 2 ) ∀ A, B positive semi-definite matrices.
The second inequality follows from the fact that tr(A 2 ) ≤ tr(A) 2 for any positive semi-definite matrix A, obtained by applying Cauchy-Schwarz inequality with A = B.

Scaling properties of the expected total loss
In this section we explore the relation between the expected total loss and the number of controllable loads. Even if the intuition suggests that the expected total loss should be a decreasing function in the number of controllable loads, this fact may not be true in general, as the total loss depends both on the location of the controllable loads as well as on the the load-sharing factors. For instance, in Section 6 we present a counterexample of a network in which by adding the one controllable load and readjusting the load-share factors to be all equal, the expected total power loss increases.
To get rid of these heterogeneities and obtain a more transparent result for the impact of the number of controllable loads, we calculate the expected total loss for a fixed number of controllable loads averaging on all their possible locations and assuming they share equally the load. Consider an integer 1 ≤ k ≤ n and denote by B k ⊂ R n the collection of load-sharing vectors with exactly k non-zero identical entries (and thus equal to 1/k, in view of (4)), namely where e i ∈ R n is the vector with the i-th entry equal to 1 and zero elsewhere.
Let H k denote the expected total loss due to stochastic load fluctuations averaging over all their possible locations of k controllable nodes share equally the load, i.e., This average consists of |B k | = n k terms, one for each possible arrangements of k controllable nodes in a network with n nodes. The following theorem states an explicit expression for H k that makes the dependence on the number of controllable node k very explicit, showing in particular that H k is, up to a constant, proportional to 1/k. Theorem 6 (Average total loss with k controllable loads). Consider a network G of n nodes with balanced nominal profile load, i.e., 1 T µ = 0. Then, where C 1 , C 2 ∈ R are two constants that do not depend on k given by 2n(n − 1) and C 2 = C 2 (G) := σ 2 tr(L + ) 2(n − 1) .
Both the constants C 1 (G) and C 2 (G) depend on the graph structure via L + , on its size n, and on the covariance matrix Σ. We remark that the constant C 2 is always strictly positive, as tr(L + ) > 0 (L + being a positive semi-definite matrix) and σ 2 > 0, in view of (2).
We are interested in understanding how the expected total loss scales for large graphs. Assume we can take a sequence of graphs {G n } n∈N of growing size, |V n | = n, and of covariance matrices {Σ n } n∈N with total variance σ 2 n = 1 T Σ n 1, so that the limit γ := lim n→∞ (n − 1) tr(Σ n L + n ) σ 2 n tr(L + n ) exists. Note that the fact that inequality H n ≥ 0 holds for every n ∈ N (see Proposition 2) guarantees that γ ≥ 0. Under these assumptions, Theorem 1 readily implies that as the graph size n grows large the relative gain of having k controllable loads with respect to a single one scales as In the scenario where the load fluctuations are independent and identically distributed, regardless of the graph structure, this asymptotic scaling reads as In Subsection 6.6 we show numerically that the way the expected total loss scales on average with the number of controllable loads as stated in Theorem 6 is pretty accurate in general, even without averaging over all possible locations of the controllable loads.

Generalized model
The optimization problems (9) and (13) considered in Section 3 focus on the network efficiency, but do not prevent the excessive usage of specific controllable loads nor account for economic factors. In this section we present a more general optimization problem that addresses both these issues and that further includes additional constraints typical of the OPF problem, which network operators use to set the operational setpoints of power systems every 5-15 minutes.
Let P be positive definite n × n matrix, R ∈ R n×n , and q ∈ R n . Introducing a non-negative real number ξ ≥ 0, we can consider the following generalized optimization problem: This optimization problem differs from that in (9) in two ways, namely its objective function (15a) has an extra term and it has two new sets of constraints (15d)-(15e). The term α T P α can be seen as a penalty or cost term and the scalar ξ ≥ 0 weights its relative importance with respect to the average total loss EH(α). More specifically, by taking P to be a diagonal matrix with non-negative entries, the additional term α T P α = n i=1 P i,i α 2 i in the objective function penalizes an excessive usage (i.e., large load-sharing factor) of the controllable loads with large coefficients P i,i . This means that we can model less flexible or more costly controllable loads, by tuning the corresponding terms P i,i > 0 accordingly.
The additional constraint (15d) can be used to set any desired upper and/or lower bounds on the load-sharing factor of any particular node. In particular, we could restrict the load-sharing factors for some of the nodes v ∈ B to be non-negative, α v ≥ 0, or their absolute values |α v | < ε v with ad-hoc constants ε v > 0.
Lastly, the constraint (15e) captures the physical limits of the lines, imposing the power flow f (α) on any given network line ∈ E stays below the corresponding capacity f max > 0 of that line. As outlined in Section 2, using the linear approximation (1), the power flows f (α) = ΛL + p(α) are linear in the load profile p(α) and, ultimately, linear also in the load-sharing factors α.
Clearly (15) is still a convex optimization problem: all the constraints are still linear in α and the objective function rewrites as a quadratic form with the matrix L + + ξP appearing in the leading term that is positive definite in view of the assumption made on the matrix P and Proposition 2. It is impossible, however, to derive the optimal control α * in closed form for the generalized problem (15) and thus we cannot explore its structure as we did in the previous sections, but we present some numerical results in next section.
Nonetheless, we argue that the original problem (9) still gives valuable insight as the line capacity limits appearing in (15e) are often redundant. Indeed, the OPF problem automatically sets a nominal setpoint µ so that all the nominal power flows f are within their limits and the optimal control α * should decrease the largest line flows (in absolute value) f (α) even further, as it is intuitive from the expression H(α) = 1 2 ∈E β −1 f 2 (α), cf. Section 2. Moreover, even if we set a specific load-sharing factor α v to be non-negative or smaller than ε v with constraint (15d), the controllable load in that node should still be capable of both storing and to outputting a possibly very large amount of power, as we outlined at the end of Subsection 2.1. Alternatively, a chance-constraint for the stochastic effort α i j ω j of node i could be included in the optimization problem like it has been done in [5]. However, additional assumptions on the multivariate distribution of ω must be made to rewrite the constraint so that the resulting optimization problem is still convex. Other chance-constraints for line limits could be obtained using concentration inequalities as in [39] or using decay rates and large deviation theory as in [40].
We conclude by deriving the optimal control α * when considering the optimization problem with augmented objective function (15a) in the special case where all the nodes have controllable loads, i.e., B = V , but ignoring the constraints (15d)-(15e). In this scenario, the optimal solution can be still calculated in closed form as This expression can be obtained in a similar way of that in Theorem 4, leveraging the fact that the matrix L + + ξP is positive definite.

Variance of the aggregated total loss
It could be of interest choosing the load-sharing factors α 1 , . . . , α n not only to minimize the expected aggregated total loss EH(α), but also its variance Var(H(α)). Aiming to possibly include this in our optimization, we need a closed-form expression for the variance of a quadratic form of a random vector. Unfortunately, this is known only in the case in which the random vector has a multivariate Gaussian distribution. Under this additional assumption, the variance of the aggregated total loss can be calculated explicitly as follows Var(H(α)) = Var(H s (α)) = Var The last step follows from standard results for quadratic forms of multivariate Gaussian random variables [42, Theorems 5.2c and 5.2d] and the fact that Eω = 0.
Following similar steps to those in the proof of Proposition 2 in Appendix A, the expression for Var(H(α)) can further rewritten as a polynomial of degree four in the load-sharing factors α 1 , . . . , α n . This new term could potentially be included in the objective function of the optimization problem (15), but the resulting problem might is not guaranteed to still be convex in general.

Numerical examples
In this section we present some numerical results. First of all, in Subsection 6.1, we compare the performance of our static optimal load-sharing factors α * with that of idealized real-time load-sharing factors α(ω) that dynamically respond to fluctuations. Then, in Subsections 6.2 and 6.3 we study the impact of the covariance matrix and of the relative position of the nodes affected by fluctuations and those housing controllable loads. We present two examples to illustrate that (i) optimal load-sharing factors have opposite signs in Subsection 6.4 and that (ii) expected total loss is not always decreasing in the number of controllable loads in Subsection 6.5. Lastly, in Subsection 6.6 we corroborate the accuracy of the scaling limit derived in Section 4.

Real-time adaptive load-sharing factors
We briefly consider here the scenario in which is possible to (i) have exact knowledge about the realized fluctuations ω and (ii) to have instantaneous coordination between controllable loads. In this setting, a centralized controller could dynamically change the load-sharing factors depending on the realization of the noise ω, trying to minimize the total loss. In this case, the total aggregated power loss H(α, ω) is not a random variable, but simply a quadratic form in α. Thus, for every realization of the noise ω we can find the optimal load-sharing factors α * (ω) ∈ R n that solve From our previous analysis, it is easy to see that this is still a convex optimization problem in α with a quadratic objective function and linear equality constraints. Its solution can be obtained from the corresponding Karush-Kuhn-Tucker (KKT) conditions where π ∈ R is the dual variable corresponding to the constraint 1 T α = 1 and ν ∈ R n is the vector whose non-zero entries are equal to the dual variables of the constraints α v = 0, i.e., ν v = ν ∈ R if v ∈ V \ B and 0 otherwise. In the special case B = V , the vector ν is equal to 0 and the system of equations above can be solved explicitly leveraging the spectral structure of L + to obtain This result corresponds to the unrealistic scenario in which the response of the controllable loads is such that the net load profile p i (α) becomes zero in every node (and no power flows in the lines).
We focus thus the case where B V and compare numerically the performance of the optimal static load-sharing factors α * against the dynamically changing α * (ω) for the same realizations of the fluctuations ω. We consider the IEEE 14-bus test network, a standard test case used for simulations in the power grid literature [53], with correlated noise, see Figure 1. The resulting static optimal control for this network and correlation structure is α * = ( α * 2 , α * 6 , α * 9 ) = (0.3510, 0.2805, 0.3685). We sample 100 realization of ω from a multivariate Gaussian distribution with mean 0 and correlation matrix Σ and for each of them we calculate the total aggregated loss using α * and finding the optimal dynamic load-sharing factors α * (ω). For these realization, the average total loss using the dynamical control is equal to 18250.8, while it result in a 20% higher average total loss, namely 21985.1, when using the static control. However, as shown in Figure 2 below, the dynamic control α * (ω) takes incredibly large (both positive and negative) values, which are highly unrealistic to be implemented in practical settings.

Impact of covariance structure
The next two figures illustrate how the covariance structure influence the optimal control for a small network of 14 nodes, the IEEE 14-bus test network. Figure 3   For these two networks, we compare the optimal static control α * for the three controllable loads in the various cases in which we vary the covariance structure of the noise, see Figure 4. The correlation matrix Σ c in Figure 5(c) has been generated at random, the correlation matrix Σ b used in case (b) is obtained using the diagonal entries of Σ c , and that in case (a) is equal to Σ a = σ 2 I 14 , with σ 2 = 1 T Σ c 1 = 1 T Σ b 1 = 255.75. In this way, all the three correlation matrices have been rescaled so that their total variance is equal in all three cases, namely σ 2 a = σ 2 b = σ 2 c . The correlation matrices for the second network are displayed in Figure 6; they have been obtained analogously, but only a subset of |S| = 9 nodes is affected by stochastic fluctuations, the other one have corresponding rows and columns equal to zero.

Relative position of controllable loads and stochastic nodes
As suggested by Theorem 4, the relative position of the nodes affected by fluctuations and that of the controllable loads play a crucial role in terms of the achievable total power loss, as we will illustrate in the following example on a ring network. Figure 7 visualizes the optimal load-sharing factors in the scenario where there are |B| = 6 controllable nodes and |S| = 4 nodes affected by fluctuations. The subset of nodes affected by fluctuation is fixed, S = {1, 4, 7, 10}, as well as the covariance matrix, but the location of the controllable loads, i.e., the subset B, changes. Figure 7(a) presents the scenario in which S ⊂ B (i.e., all the stochastic nodes have controllable loads) and the optimal control α * for all other nodes in B \ S = {5, 11} is equal to zero, as prescribed by Theorem 4. In the other two cases, in Figures 7(b) and (c), we picked two different subsets B of controllable loads such that S ⊂ B and S ⊂ B c , respectively.
The corresponding values of the expected loss are higher in these cases than in case (a) and suggest that it may be optimal to place the controllable loads in the nodes affected by stochastic fluctuations. Lastly, note in Figure 7(b) that the load-sharing factors for node 1 and 7 are much larger than the other nodes in B, since these two nodes are affected by stochastic fluctuations but the remaining four ones are not.

Negative load-sharing factors
The fact that a load-sharing factor is non-negative means that the corresponding node absorbs part of power excess (if n i=1 ω i > 0) and balance out shortages (if n i=1 ω i < 0). In most of the related work in primary response mechanisms and automatic generation controls for power grids, the participation factors (that our load-sharing factors generalize) are in fact taken to be non-negative, i.e., α v ≥ 0 for all v ∈ B. This assumption tacitly implies that all the controllable generators and storage have "coordinated" actions, i.e., they either all increase or all decrease their power output.
In our formulation of the optimization problems (9) and (13) we do not make such an assumption and load-sharing factors can also be negative, as long as the condition (4) is met. This is crucial as for certain covariance structures of the load fluctuations (especially when there are strong negative correlations) it is optimal to have negative load-sharing factors in some nodes: we illustrate this fact for a small network illustrated in Figure 8.  Table 1 below lists the optimal load-sharing factors α * corresponding to different set of controllable loads in the network in Figure 8 where the load fluctuations covariance structure is assumed to be The best way for the controllable loads to respond to the negative correlations that the load fluctuations have in this network is having the controllable load in node 4 taking actions "mirroring" those of the other three nodes, in the sense that α * 4 < 0 while the load-sharing factors of the other nodes in B are always positive.  Table 1: Sets of controllable loads for the network in Figure 8 and corresponding optimal load-sharing factors 6.5 Non-monotonicity of expected power loss when adding controllable loads An extra controllable load always reduces the expected total power loss if the corresponding optimal load-sharing vector α * is selected, since it corresponds to removing one constraint in the optimization problem (9). However, if the chosen load-sharing factors of the augmented subset of controllable loads are not the optimal ones, adding an extra controllable load does not necessarily reduce the expected total power loss. We illustrate this fact with an example in which the control is always assumed to be equal-share between the controllable loads in B, i.e., α = 1 |B| 1. Consider the network given in Figure 9 and assume that the stochastic loads are i.i.d. with unit variance.  Figure 9: A small network modeled by a graph with n = 7 nodes and m = 9 edges with unit weights Table 2 below lists the expected total power losses for some subsets B of controllable loads and compares them with those for some augmented subset B ∪ {4}. It is evident that in every one of these case, adding an additional controllable load in node 4 without optimally readjusting the load-sharing factors result in a higher expected total loss.  Table 2: Expected total power losses for the network in Figure 9 assuming equal load-sharing factors for some subsets B of controllable loads and then for the subsets augmented with an extra node, namely B ∪ {4}.

Empirical evidence of the scaling law
The scaling law derived in Section 4, despite having been obtained averaging over the possible location of the controllable loads, still give precious insight about how the average total loss decrease with the number of controllable loads. Aiming to corroborate this fact, we consider the IEEE RTS 96-bus test network [53] and track the average total loss while adding one by one controllable loads in random locations and assuming equal share among them. As illustrated by Figure 10, the theoretical scaling for the expected total loss scales with the number of controllable loads stated in Theorem 6 while averaging on all possible locations is in fact very accurate also for a single instance where new controllable locations are randomly added.

Concluding remarks
In this paper we consider a stochastic lossy transport network in which some nodes have controllable loads and derive a closed-form expression for the optimal control when aiming to minimize the average total loss. The model is inspired by power systems where distributed energy resources can be used as virtual storage to mitigate the fluctuations in the power generated by renewable energy sources and in power demand. Our analysis unveils the complex interplay between the network structure, the location of the controllable loads and the covariance structure of the power fluctuations and gives insight in how much the average total loss can be reduced by adding a given number of controllable loads to the network.
We derived explicit optimal load-sharing factors for controllable loads in various scenarios. Our analysis, even if it uses a stylized mathematical model, suggests that the optimal displacement and operations of distributed energy resources must account for the possible correlations of the power fluctuations. For this reason it complements the recent efforts in the electrical engineering community in upgrading the existing models for power grids to account both for the intrinsic volatility of renewable energy generation and storage capabilities, see e.g. [23,27,28,26,30,29].
Lastly, we presented a more general optimization problem that can be instrumental to explore numerically the trade-off between the best operations for the network and the corresponding cost or penalties for the excessive usage of the controllable nodes. This is particularly relevant for the design of primary response mechanisms and automatic generation controls for power grids [1,7,19,20,31,43,46].
which describes precisely the contribution of the stochastic fluctuations to the transportation losses. From the fact that L + is a positive semi-definite matrix it follows that Combining (17), (18), and the fact that E(µ T L + C α ω) = µL + C α Eω = 0 yields that Applying a classic result for quadratic forms of random vector, see e.g. [32,Corollary 3.2b.1], we derive Since Eω = 0, it follows that E(µ T L + C α ω) = µ T L + C α Eω = 0 and thus identity (17) can be rewritten as We now derive identity (6). Recall the following well-known properties of the trace of matrix: (i) The trace is invariant under cyclic permutations, i.e., for any r ∈ N tr(A 1 . . . A r ) = tr(A 2 . . . A r A 1 ) = . . . = tr(A r A 1 . . . A r−1 ).
(ii) The trace of a matrix and of its transpose coincide, i.e., tr(A) = tr(A T ); (iii) The trace of the outer product of two vectors is their inner product, namely First note that we can rewrite The aforementioned properties of the trace yield and tr(Σ1α T L + ) = tr(Σ1(L + α) T ) = tr((Σ1) ⊗ (L + α)) By combining all these equalities and exploiting the linearity of the trace operator, we obtain where we also used the fact that α T L + α is a scalar in the second step and identity (2) in the third step.
Proof. Proof of Proposition 3 Assume that e = (i, j) ∈ (V × V ) is the edge with weight β > 0 that has been added to G or whose edge weight has been increased by β > 0 and let m e = (e i − e j ) ∈ R n be the corresponding non-weighted incidence vector. In both cases the Laplacian matrix of the newly obtained graph G can be written as L G = L G + β m e m T e and, using the generalized version of the Sherman-Morrison formula in [36], we get We can thus rewrite the total loss corresponding to any net load profile p(α) as , and conclude by noticing that m T e L + G m e ≥ 0 and (m T e L + G p(α)) 2 ≥ 0.

B Proof of Theorem 4
In this proof we use the so-called block matrix inversion formula, which is stated in the next lemma.
For any vector α ∈ R n which is such that α i = 0 for every node i ∈ V \ B, there exists a unique k-dimensional vector α ∈ R k such that α = P B α. Using this correspondence and the fact that the nominal load profile is balanced, i.e., 1 T µ = 0, we can rewrite EH s (α) = EH(P B α) Therefore the n-dimensional optimization problem (9) rewrites as a k-dimensional optimization problem with a single constraint, namely where the last inequality follows from the fact that the vector (v 0) is not a multiple of the vector 1 and thus does not lie in the null space of L + . The optimization problem in (23) has then a unique solution, since the corresponding Hessian is positive definite. Let γ ∈ R be the Lagrange multiplier γ associated with the unique equality constraint of the optimization problem (23). The associated Lagrangian associated is Setting b := P T B L + Σ1 ∈ R k , the optimality conditions read or, equivalently, in matrix form Being positive definite, L + B is invertible and its inverse is also positive definite, which means that t B := 1 T (L + B ) −1 1 > 0. In view of the fact that σ 2 t −1 B = 0 and L + B is invertible, we can use the block matrix inversion formula given in Lemma 7 to obtain The solution of the linear system (24) then reads and thus the optimal load-sharing vector α ∈ R k is given by We now focus on the special case where S ⊆ B and prove identity (11). Rewrite L + as a block matrix with L + B ∈ R k×k , L + C ∈ R k×n−k and L + B c ∈ R n−k×n−k . Note that L + B and L + B c are symmetric matrices, since L + is. This is consistent with the former definition of L + B , since We start by noticing that From the assumption S ⊆ B it follows that the covariance matrix can be rewritten as where Σ B ∈ R k×k is itself a covariance matrix (hence, a symmetric positive semi-definite matrix). Trivially, 1 T (Σ B | O)1 = tr(Σ B 11 T ) = tr(Σ11 T ) = σ 2 . Furthermore, The optimal control α * then rewrites as

C Proof of Theorem 5
We already rewrote in (7) the objective function of the optimization problem (13) as a quadratic form in α, where A = L + , b = L + Σ1 ∈ R n , and c = tr(ΣL + )/2 ∈ R + . Note that, for the purpose of solving the optimization problem (13), we can ignore the constant term c. Denote by 0 = λ 1 < λ 2 < . . . < λ n the eigenvalues of the weighted Laplacian matrix L and let v 1 , v 2 , . . . , v n be the corresponding orthonormal basis of eigenvectors. Consider the representation of the vector α ∈ R n in this basis, namely α = a 1 v 1 + a 2 v 2 + . . . + a n v n , with a 1 , . . . , a n ∈ R. In view of the fact that the rows of L sum up to zero, it immediately follows that v 1 = 1 √ n 1. From the constraint (4), i.e., 1 T α = 1, it immediately follows that a 1 = 1/ √ n. Indeed, Defining the real coefficients κ 2 , . . . , κ n as κ i := Σ1, v i = 1 T Σv i , i = 2, . . . , n, and using the representation (25), the two terms of the quadratic form above rewrites as α T L + α = (a 1 v 1 +a 2 v 2 +. . .+a n v n ) T L + (a 1 v 1 +a 2 v 2 +. . .+a n v n ) = a 2 and b T α = 1 T ΣL + α = 1 T ΣL + (a 1 v 1 + a 2 v 2 + . . . + a n v n ) = n i=2 a i λ i where we used twice the fact that L + v 1 = 0. The objective function thus can be rewritten as a i κ i λ i =: g(a 2 , . . . , a n ).
The optimization problem (13) is therefore equivalent to a unconstrained minimization problem in n − 1 variables, a 2 , . . . , a n , which can be expressed using the newly introduced function g : R n−1 → R. The gradient of g can be calculated as ∇g(a 2 , . . . , a n ) = σ 2 λ i a i − κ i λ i i=2,...,n and the Hessian is the diagonal matrix H(g) = σ 2 · diag(λ −1 2 , . . . , λ −1 n ). The Hessian H(g) is constant as it does not depend on a 2 , . . . , a n and is a positive definite matrix, since all its diagonal terms are positive, in view of the fact that λ i > 0 for i = 2, . . . , n and that σ 2 > 0. The problem is then strictly convex and any stationary point satisfying ∇g(a 2 , . . . , a n ) = 0 would then be a minimum for the function g. Solving the optimality condition ∇g(a 2 , . . . , a n ) = 0 yields a * i = κ i σ 2 , i = 2, . . . , n.

D Proof of Theorem 6
The starting point of the proof are two identities that leverage the properties of the pseudoinverse L + of the graph Laplacian. Firstly, where we use the fact that in a graph with n nodes, each nodes belong to exactly n−1 k−1 subsets of k nodes. We further claim that B⊆V : |B|=k i∈B with the convention that n−2 n−1 = 0. Since The first term on the RHS of (29) can be rewritten as In the last step we used the fact that i =j e T i L + e j = −tr(L + ), which immediately follows from which concludes the proof of identity (28). Each load-sharing vector α ∈ B k can be written as α = 1 k i∈B e i , for some B ⊆ V , |B| = k. By Proposition 2, the expected total loss due to stochastic fluctuations when using this load-sharing vector is given by Therefore, Using (27) and (28)