How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
Published Online:16 Mar 2026https://doi.org/10.1287/ijds.2025.0093
References
- (2021) Business data sharing through data marketplaces: A systematic literature review. J. Theoret. Appl. Electronic Commerce Res. 16(7):3321–3339.Google Scholar
- (1998) Query learning strategies using boosting and bagging. Shavlik JW, ed. Proc. 15th Internat. Conf. Machine Learn. (ICML 1998) (Morgan Kaufmann Publishers, San Francisco, CA), 1–9.Google Scholar
- (2022) Too much data: Prices and inefficiencies in data markets. Amer. Econom. J. Microeconom. 14(4):218–256.Google Scholar
- (2019) A marketplace for data: An algorithmic solution. Karlin AR, Immorlica N, Johari R, eds. Proc. ACM Conf. Econom. Comput. (EC ’19) (ACM, New York), 701–726.Google Scholar
- (1999) Committee-based sample selection for probabilistic classifiers. J. Artificial Intelligence Res. 11:335–360.Google Scholar
- (1996) The usefulness of optimum experimental designs. J. Roy. Statist. Soc. Ser. B Statist. Methodology 58(1):59–76.Google Scholar
- (2001) One hundred years of the design of experiments on and off the pages of biometrika. Biometrika 88(1):53–97.Google Scholar
- (2007) Optimum Experimental Designs, with SAS, vol. 34 (OUP, Oxford, UK).Google Scholar
- (2019) Active preference learning based on generalized Gini functions: Application to the multiagent knapsack problem. Singh S, Markovitch S, eds. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 7741–7748.Google Scholar
- (2022) Personal data markets: A narrative review on influence factors of the price of personal data. Nurcan S, Soffer P, Zdravkovic J, eds. Research Challenges in Information Science, Lecture Notes in Business Information Processing, vol. 446 (Springer, Cham, Switzerland), 3–19.Google Scholar
- (2024) Active learning for data streams: A survey. Machine Learn. 113(1):185–239.Google Scholar
- (1996) Active learning with statistical models. J. Artificial Intelligence Res. 4:129–145.Google Scholar
- (2015a) Truthful linear regression. Grünwald P, Hazan E, Kale S, eds. Proc. 28th Conf. Learn. Theory (COLT 2015), Proceedings of Machine Learning Research, vol. 40 (PMLR, Cambridge, MA), 448–483..Google Scholar
- (2015b) Accuracy for sale: Aggregating data with a variance constraint. Roughgarden T, ed. Proc. 2015 Conf. Innovations Theoretical Computer Sci. (ITCS ’15) (Association for Computing Machinery, New York), 317–324.Google Scholar
- (2004) Analysis of a greedy active learning strategy. Thrun SK, Saul LK, Schölkopf B, eds. Advances in Neural Information Processing Systems, vol. 17 (MIT Press, Cambridge, MA), 337–344.Google Scholar
- (2010) Incentive compatible regression learning. J. Comput. Systems Sci. 76(8):759–777.Google Scholar
- Fortune Business Insights (2023) Big data analytics market size, share & COVID-19 impact analysis, by component (solutions, services), by deployment (on-premise, cloud), by enterprise size (large enterprises, SMEs), by industry (BFSI, IT and telecommunications, retail, healthcare, government, manufacturing), and regional forecast, 2023–2030. Accessed September 27, 2024, https://www.fortunebusinessinsights.com/big-data-analytics-market-106179.Google Scholar
- (1997) Selective sampling using the query by committee algorithm. Machine Learn. 28(2):133–168.Google Scholar
- (2011) Selling privacy at auction. Shoham Y, Chen Y, Roughgarden T, eds. Proc. 12th ACM Conf. Electronic Commerce (ACM, New York), 199–208.Google Scholar
- (2011) Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artificial Intelligence Res. 42(1):427–486.Google Scholar
- (2009) Average-case active learning with costs. Gavaldà R, Lugosi G, Zeugmann T, Zilles S, eds. Algorithmic Learn. Theory: 20th Internat. Conf. ALT 2009, Lecture Notes in Artificial Intelligence (Springer, Berlin, Heidelberg), 141–155.Google Scholar
- (2022) Trading data for wind power forecasting: A regression market with lasso regularization. Electric Power Systems Res. 212:108442.Google Scholar
- (2014) A theory of pricing private data. ACM Trans. Database Systems 39(4):1–28.Google Scholar
- (2018) A survey on big data market: Pricing, trading and protection. IEEE Access 6:15132–15154.Google Scholar
- (2024) DAVED: Data acquisition via experimental design for data markets. Preprint, submitted March 20, https://arxiv.org/abs/2403.13893.Google Scholar
- (2021) Role of big data analytics in supply chain management: Current trends and future perspectives. Internat. J. Production Res. 59(6):1875–1900.Google Scholar
- (2020) Exploring how consumer goods companies innovate in the digital age: The role of big data analytics companies. J. Bus. Res. 121:338–352.Google Scholar
- (1990) Knapsack Problems: Algorithms and Computer Implementations (John Wiley & Sons, New York).Google Scholar
- (2021) How to sell a data set? Pricing policies for data monetization. Inform. Systems Res. 32(4):1281–1297.Link, Google Scholar
- OECD (2018) Personalized pricing in the digital era. Accessed October 7, 2024, https://www.oecd.org.Google Scholar
- (2020) A survey on data pricing: From economics to data science. IEEE Trans. Knowledge Data Engrg. 34(10):4586–4608.Google Scholar
- (2022) Regression markets and application to energy forecasting. TOP 30(3):533–573.Google Scholar
- (2011) From theories to queries: Active learning in practice. Guyon I, Cawley G, Dror G, Lemaire V, Statnikov A, eds. Active Learn. Experiment. Design Workshop, JMLR Workshop Conf. Proc., vol. 16 (JMLR.org), 1–18.Google Scholar
- Statista (2024) Amount of data generated worldwide from 2010 to 2025. Accessed October 7, 2024, https://www.statista.com.Google Scholar
- (2020) Data/infrastructure in the smart city: Understanding the infrastructural power of citymapper app through technicity of data. Big Data Soc. 7(2):2053951720965618.Google Scholar
- (2001) Active Learning: Theory and Applications (Stanford University, Stanford, CA).Google Scholar
- (2020) Analytics in the era of big data: The digital transformations and value creation in industrial marketing. Indust. Marketing Management 86:12–15.Google Scholar
- (2022) Adaptive probabilistic load forecasting for individual buildings. iEnergy 1(3):341–350.Google Scholar
- (2023) Data sharing in energy systems. Adv. Appl. Energy 10:100132.Google Scholar
- (2021) Real estate valuation data set. Accessed November 26, 2024, https://archive.ics.uci.edu/data set/477/real+estate+valuation+data+set.Google Scholar
- (2017) Data pricing strategy based on data quality. Comput. Indust. Engrg. 112:1–10.Google Scholar
- (2002) On active learning for data acquisition. Proc. 2002 IEEE Internat. Conf. Data Mining (IEEE Computer Society, Los Alamitos, CA), 562–569.Google Scholar
- (2006) Selectively acquiring customer information: A new data acquisition problem and an active learning-based solution. Management Sci. 52(5):697–712.Link, Google Scholar

