Degree Distribution Preserving Network Sampling: The Case of Relational Learning

Abhijeet Ghoshal
Abhijeet Ghoshal
[email protected]
https://orcid.org/0000-0002-0165-4204
University of Illinois–Urbana Champaign, Champaign, Illinois 61820
Search for more papers by this author
,
Syam Menon
Corresponding Author
Syam Menon
[email protected]
https://orcid.org/0000-0003-1028-0862
The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author
,
Sumit Sarkar
Sumit Sarkar
[email protected]
https://orcid.org/0000-0003-3045-1024
The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author

University of Illinois–Urbana Champaign, Champaign, Illinois 61820

Search for more papers by this author

Syam Menon

Corresponding Author

Syam Menon

[email protected]

https://orcid.org/0000-0003-1028-0862

The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Sumit Sarkar

[email protected]

https://orcid.org/0000-0003-3045-1024

The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Published Online:19 Nov 2025https://doi.org/10.1287/ijoc.2024.1002

References

Adomavicius G, Huang Z, Tuzhilin A (2014) Personalization and recommender systems. INFORMS TutORials Oper. Res. 55–107.Google Scholar
Ahmed N, Neville J, Kompella R (2014) Network sampling: From static to streaming graphs. ACM Trans. Knowledge Discovery Data 8(2):7–56.Google Scholar
Aral S (2016) Networked experiments (Chapter 15). Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press, Oxford, UK), 376–411.Google Scholar
Arnold T, Emerson J (2011) Nonparametric goodness-of-fit tests for discrete null distributions. R J. 3(2):34–39.Crossref, Google Scholar
Bakhsh F, Rodriguez J, Ranieri A, Kastaniegaard K, Dell’Aglio D (2024) Enriching clinical sample analysis with biological knowledge graphs: A preliminary study. Proc. Sixth Workshop Health Recommender Systems (CEUR-WS.org, Bari, Italy).Google Scholar
Bhadra S, Pensky M, Sengupta S (2025) Scalable community detection in massive networks via predictive assignment. Preprint, submitted March 20, https://arxiv.org/abs/2503.16730.Google Scholar
Broido A, Clauset A (2019) Scale-free networks are rare. Nature Comm. 10(1):1017.Crossref, Google Scholar
Chevalier J, Mayzlin D (2006) The effect of word of mouth on sales. J. Marketing Res. 43(3):345–354.Crossref, Google Scholar
Cochran W (1977) Sampling Techniques, 3rd ed. (John Wiley, Hoboken, NJ).Google Scholar
Doane D (1976) Aesthetic frequency classifications. Amer. Statist. 30(4):181–183.Crossref, Google Scholar
Dover Y, Goldenberg J, Shapira D (2012) Network traces on penetration: Uncovering degree distribution from adoption data. Marketing Sci. 31(4):689–712.Link, Google Scholar
Freedman D, Diaconis P (1981) On this histogram as a density estimator: l2 theory. Probab. Theory Related Fields 57(4):453–476.Google Scholar
Ghoshal A, Menon S, Sarkar S (2025) Degree distribution preserving network sampling: The case of relational learning. https://doi.org/10.1287/ijoc.2024.1002.cd, https://github.com/INFORMSJoC/2024.1002.Google Scholar
Goodman L (1961) Snowball sampling. Ann. Math. Statist. 32(1):148–170.Crossref, Google Scholar
Greene K (2007) Software that knows what you like. MIT Tech. Rev. 8:2007.Google Scholar
Günneç D, Raghavan S, Zhang R (2020) Least-cost influence maximization on social networks. INFORMS J. Comput. 32(2):289–302.Abstract, Google Scholar
Hastings M, Falk B, Tsoukalas G (2023) Privacy-preserving network analytics. Management Sci. 69(9):5482–5500.Google Scholar
Honda K, Nakamura R, Kamiyama N (2023) Analyzing effects of social media user’s influence on contents caching in ICN. IEEE Access 11:127679–127688.Crossref, Google Scholar
Hu M, Yang S, Xu Y (2019) Understanding the social learning effect in contagious switching behavior. Management Sci. 65(10):4771–4794.Google Scholar
Hübler C, Kriegel H, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. Proc. 2008 Eighth IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 283–292.Google Scholar
Katona Z, Zubcsek P, Sarvary M (2011) Network effects and personal influences: The diffusion of an online social network. J. Marketing Res. 48(3):425–443.Crossref, Google Scholar
Krishnamurthy V, Faloutsos M, Chrobak M, Cui J, Lao L, Percus A (2007) Sampling large internet topologies for simulation purposes. Comput. Networks 51(15):4284–4302.Crossref, Google Scholar
Kryven I, Stegehuis C (2021) Contact tracing in configuration models. J. Phys. Complexity 2(2):025004.Crossref, Google Scholar
Lee S, Kim P, Jeong H (2009) Statistical properties of sampled networks. Phys. Rev. E 73(1):016102.Google Scholar
Leskovec J, Faloutsos C (2006) Sampling from large graphs. Proc KDD’06: Proc. 12th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 631– 636.Google Scholar
Li Z, Fang X, Bai X, Sheng O (2016) Utility-based link recommendation for online social networks. Management Sci. 63(6):1938–1952.Link, Google Scholar
Lin M, Li W, Song L, Nguyen C, Wang X, Lu S (2021) SAKE: Estimating Katz centrality based on sampling for large-scale social networks. ACM Trans. Knowledge Discovery Data 15(4):1–21.Crossref, Google Scholar
Lu Q, Getoor L (2003) Link-based classification. Proc. ICML'03: Twentieth Internat. Conf. Machine Learn. (ACM, New York), 496–503.Google Scholar
Luperto M, Riva A, Amigoni F (2017) Semantic classification by reasoning on the whole structure of buildings using statistical relational learning techniques. Proc. 2017 IEEE Internat. Conf. Robotics Automation (ICRA, Singapore), 2562–2568.Google Scholar
Ma H, Gustafson S, Moitra A, Bracewell D (2009) Ego-centric network sampling in viral marketing applications. Proc. 2009 Internat. Conf. Comput. Sci. Engrg. (IEEE, Piscataway, NJ), 777–782.Google Scholar
Macskassy S, Provost F (2007) Classification in networked data: A toolkit and a univariate case study. J. Machine Learn. Res. 8(34):935–983.Google Scholar
Manshadi V, Misra S, Rodilitz S (2020) Diffusion in random networks: Impact of degree distribution. Oper. Res. 68(6):1722–1741.Link, Google Scholar
Marketer W (2002) Verizon Wireless reduces churn to 2%. Accessed November 7, 2025, https://thewisemarketer.com/verizon-wireless-reduces-churn-to-2-2-2/.Google Scholar
McCuen R (2016) Modeling Hydrologic Change: Statistical Methods (Lewis Publishers, Boca Raton, FL).Crossref, Google Scholar
Newman M (2016) Networks an Introduction, 1st ed. (Oxford University Press, Oxford UK).Google Scholar
Riberio B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. Proc. 10th ACM SIGCOMM Conf. Internet. Measurement (ACM, New York), 390–403.Google Scholar
Rossi R, McDowell L, Aha D, Neville J (2012) Transforming graph data for statistical relational learning. J. Artificial Intelligence Res. 45:363–441.Crossref, Google Scholar
Scott D (1979) On optimal and data-based histograms. Biometrika 66(3):605–610.Crossref, Google Scholar
Shevertalov M, Stehle E, Mancoridis S (2007) A genetic algorithm for solving the binning problem in networked applications detection. 2007 IEEE Congress Evolutionary Comput. (IEEE, Piscataway, NJ), 713–720.Google Scholar
Stephen A, Toubia O (2014) Deriving value from social commerce networks. J. Marketing Res. 47(2):215–228.Crossref, Google Scholar
Sturges H (1926) The choice of a class interval. J. Amer. Statist. Assoc. 21(153):65–66.Crossref, Google Scholar
Sundararajan A, Provost F, Oestreicher-Singer G, Aral S (2013) Research commentary—Information in digital, economic, and social networks. Inform. Systems Res. 24(4):883–905.Link, Google Scholar
Tang L, Liu H (2009) Relational learning via latent social dimensions. Proc. 15th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 817–826.Google Scholar
Tang L, Liu H (2011) Leveraging social media networks for classification. Data Mining Knowledge Discovery 23(3):447–478.Crossref, Google Scholar
Toriyama N, Yoshida M, Itoh T (2021) Visualization of sub-network sets by iterative graph sampling from large scale networks. Proc. 2021 25th Internat. Conf. Inform. Visualisation (IV) (IEEE, Piscataway, NJ), 1–6.Google Scholar
Verizon (2002) Verizon wireless reports strong customer growth and churn management. https://www.verizon.com/about/news/press-releases/verizon-wireless-reports-strong-customer-growth-and-churn-management.Google Scholar
Wand M (1997) Data-based choice of histogram bin width. Amer. Statist. 51(1):59–64.Crossref, Google Scholar
Wang T, Chen Y, Zhang Z, Xu T (2011) Understanding graph sampling algorithms for social network analysis. Proc. 2011 31st Internat. Conf. Distributed Comput. Systems Workshops (ACM, New York), 123–128.Google Scholar
Wowak K, Lalor J, Somanchi S, Angst C (2023) Business analytics in healthcare: Past, present, and future trends. Manufacturing Service Oper. Management 25(3):975–995.Link, Google Scholar
Wu L, Hu X, Liu H (2016) Relational learning with social status analysis. Proc. WSDM '16: Ninth ACM Internat. Conf. Web Search Data Mining (ACM, New York), 513–522.Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 21, 2024
Accepted:October 19, 2025
Published Online:November 19, 2025

Cite as

Abhijeet Ghoshal, Syam Menon, Sumit Sarkar (2025) Degree Distribution Preserving Network Sampling: The Case of Relational Learning. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2024.1002

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Degree Distribution Preserving Network Sampling: The Case of Relational Learning

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News