Local Density Estimation in High Dimensions

Xian Wu
Xian Wu
[email protected]
https://orcid.org/0000-0003-4650-5066
Simons Institute and University of California Berkeley, Berkeley, California 94705;
Search for more papers by this author
,
Moses Charikar
Moses Charikar
[email protected]
Department of Computer Science, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Vishnu Natchu
Vishnu Natchu
[email protected]
Laserlike, Inc., Mountain View, California 94035
Search for more papers by this author

Simons Institute and University of California Berkeley, Berkeley, California 94705;

Search for more papers by this author

Moses Charikar

[email protected]

Department of Computer Science, Stanford University, Stanford, California 94305;

Search for more papers by this author

Vishnu Natchu

[email protected]

Laserlike, Inc., Mountain View, California 94035

Search for more papers by this author

Published Online:19 Jan 2022https://doi.org/10.1287/moor.2021.1221

Abstract

An important question that arises in the study of high-dimensional vector representations learned from data are, given a set $D$ of vectors and a query q, estimate the number of points within a specified distance threshold of q. We develop two estimators, LSH count and multiprobe count that use locality-sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes and demonstrate their effectiveness in experiments on a standard word embedding data set.

cover image Mathematics of Operations Research

Volume 47, Issue 4

November 2022

Pages 2547-3399, C2

Article Information

Metrics

Information

Received:October 15, 2018
Accepted:September 12, 2021
Published Online:January 19, 2022

Cite as

Xian Wu, Moses Charikar, Vishnu Natchu (2022) Local Density Estimation in High Dimensions. Mathematics of Operations Research 47(4):2614-2640.

https://doi.org/10.1287/moor.2021.1221

Keywords

Acknowledgments

The authors thank Tatsunori Hashimoto and Virag Shah for helpful comments and feedback. This work was initiated when the authors were visiting Laserlike, Inc.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Local Density Estimation in High Dimensions

Abstract

Volume 47, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News