Reducing Manual Labeling Effort in Imbalanced Data Sets: Active Learning for Detecting Illicit Massage Business Reviews

Margaret Tobey
Margaret Tobey
[email protected]
https://orcid.org/0000-0001-7539-767X
National Center for Missing & Exploited Children, Alexandria, Virginia 22314
Search for more papers by this author
,
Maria E. Mayorga
Maria E. Mayorga
[email protected]
https://orcid.org/0000-0002-6399-2153
Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, North Carolina 27695
Search for more papers by this author
,
Sherrie Bosisto
Sherrie Bosisto
[email protected]
https://orcid.org/0000-0002-5451-650X
Global Emancipation Network, Clermont, Florida 34715
Search for more papers by this author
,
Osman Y. Özaltın
Corresponding Author
Osman Y. Özaltın
[email protected]
https://orcid.org/0000-0002-0093-5645
Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, North Carolina 27695
Search for more papers by this author

National Center for Missing & Exploited Children, Alexandria, Virginia 22314

Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, North Carolina 27695

Search for more papers by this author

Sherrie Bosisto

[email protected]

https://orcid.org/0000-0002-5451-650X

Global Emancipation Network, Clermont, Florida 34715

Search for more papers by this author

Osman Y. Özaltın

Corresponding Author

Osman Y. Özaltın

[email protected]

https://orcid.org/0000-0002-0093-5645

Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, North Carolina 27695

Search for more papers by this author

Published Online:12 Feb 2026https://doi.org/10.1287/opre.2023.0625

Abstract

Human trafficking investigators face challenges when processing the sheer volume of publicly available online data. Natural language processing (NLP) models can assist in identifying evidence of exploitation in text data, such as business reviews. However, the scarcity of large and accurately labeled training data sets hinders the potential for NLP-based detection algorithms. Labeling data sets related to human trafficking is challenging because identifying indicators of trafficking requires domain expertise, trafficking cases make up a small portion of the data, and reviewing disturbing content is emotionally demanding for individuals. Active learning optimizes model training by strategically querying the most informative data points’ labels, achieving high accuracy with minimal annotations. We formulate active learning as a decision model and learn a policy through deep reinforcement learning. We evaluate this approach for the imbalanced classification task of detecting Yelp reviews of massage businesses that contain human trafficking risk factors. The active learning policy surpasses benchmark methods in the scoring metric used for classifier training. Moreover, its strong performance remains consistent even in large batch query settings. The proposed approach is compatible with any scoring metric and is particularly well-suited for imbalanced NLP tasks in which labeling demands substantial time, domain expertise, and emotional effort.

Funding: This research was partly supported by the Criminal Investigations and Network Analysis (CINA) [Grant Award 17STCIN00001].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0625.

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:November 13, 2023
Accepted:September 15, 2025
Published Online:February 12, 2026

Cite as

Margaret Tobey, Maria E. Mayorga, Sherrie Bosisto, Osman Y. Özaltın (2026) Reducing Manual Labeling Effort in Imbalanced Data Sets: Active Learning for Detecting Illicit Massage Business Reviews. Operations Research 0(0).

https://doi.org/10.1287/opre.2023.0625

Keywords

Acknowledgments

The authors thank the AE and the review team for constructive feedback.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Reducing Manual Labeling Effort in Imbalanced Data Sets: Active Learning for Detecting Illicit Massage Business Reviews

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News