Instance-Optimality in I/O-Efficient Sampling and Sequential Estimation

Type: Preprint

Publication Date: 2024-10-18

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.14643

Abstract

Suppose we have a memory storing $0$s and $1$s and we want to estimate the frequency of $1$s by sampling. We want to do this I/O-efficiently, exploiting that each read gives a block of $B$ bits at unit cost; not just one bit. If the input consists of uniform blocks: either all 1s or all 0s, then sampling a whole block at a time does not reduce the number of samples needed for estimation. On the other hand, if bits are randomly permuted, then getting a block of $B$ bits is as good as getting $B$ independent bit samples. However, we do not want to make any such assumptions on the input. Instead, our goal is to have an algorithm with instance-dependent performance guarantees which stops sampling blocks as soon as we know that we have a probabilistically reliable estimate. We prove our algorithms to be instance-optimal among algorithms oblivious to the order of the blocks, which we argue is the strongest form of instance optimality we can hope for. We also present similar results for I/O-efficiently estimating mean with both additive and multiplicative error, estimating histograms, quantiles, as well as the empirical cumulative distribution function. We obtain our above results on I/O-efficient sampling by reducing to corresponding problems in the so-called sequential estimation. In this setting, one samples from an unknown distribution until one can provide an estimate with some desired error probability. We then provide non-parametric instance-optimal results for several fundamental problems: mean and quantile estimation, as well as learning mixture distributions with respect to $\ell_\infty$ and the so-called Kolmogorov-Smirnov distance.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Instance Optimal Learning 2015 Gregory Valiant
Paul Valiant
+ Estimating Entropy of Distributions in Constant Space 2019 Jayadev Acharya
Sourbh Bhadane
Piotr Indyk
Ziteng Sun
+ Lower bounds for sampling algorithms for estimating the average 1995 Ran Canetti
Guy Even
Oded Goldreich
+ Statistical Inference with Limited Memory: A Survey 2023 Tomer Berg
Or Ordentlich
Ofer Shayevitz
+ Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments 2023 Trung Thanh Dang
Jasper C. H. Lee
Maoyuan Song
Paul Valiant
+ Estimating Entropy of Distributions in Constant Space 2019 Jayadev Acharya
Sourbh Bhadane
Piotr Indyk
Ziteng Sun
+ Data Amplification: Instance-Optimal Property Estimation 2019 Hao Yi
Alon Orlitsky
+ Succinct Sampling on Streams 2007 Vladimir Braverman
Rafail Ostrovsky
Carlo Zaniolo
+ PDF Chat Practical Algorithms for On-Line Sampling 1998 Carlos Domingo
Ricard Gavaldà
Osamu Watanabe
+ PDF Chat Optimal pointwise sampling for $L^2$ approximation 2021 Albert Cohen
Matthieu Dolbeault
+ Frequency Estimation with One-Sided Error 2021 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Frequency Estimation with One-Sided Error 2021 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Competitive Distribution Estimation 2015 Alon Orlitsky
Ananda Theertha Suresh
+ How bad is worst-case data if you know where it comes from? 2019 Justin Y. Chen
Gregory Valiant
Paul Valiant
+ PDF Chat Distribution Estimation under the Infinity Norm 2024 Aryeh Kontorovich
Amichai Painsky
+ Truly Perfect Samplers for Data Streams and Sliding Windows 2021 Rajesh Jayaram
David P. Woodruff
Samson Zhou
+ PDF Chat Statistical-Computational Trade-offs for Density Estimation 2024 Anders Aamand
Alexandr Andoni
Justin Y. Chen
Piotr Indyk
Shyam Narayanan
Sandeep Silwal
Haike Xu
+ PDF Chat Sampling bounds 2016 Alexander Olevskiı̆
Alexander Ulanovskii
+ PDF Chat Frequency Estimation with One-Sided Error 2022 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Sampling to estimate arbitrary subset sums 2005 Nick Duffield
Carsten Lund
Mikkel Thorup

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors