Theory meets Practice at the Median: A Worst Case Comparison of Relative Error Quantile Algorithms

Type: Article

Publication Date: 2021-08-12

Citations: 6

DOI: https://doi.org/10.1145/3447548.3467152

Download PDF

Abstract

Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: t-digest and ReqSketch. While t-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than t-digest, while its error stays bounded on any instance. Still, our results confirm that t-digest remains more accurate on the "non-adversarial" data encountered in practice.

Locations

  • arXiv (Cornell University) - View - PDF
  • Warwick Research Archive Portal (University of Warwick) - View - PDF

Similar Works

Action Title Year Authors
+ Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms 2021 Graham Cormode
Abhinav Mishra
Joseph Ross
Pavel VeselĂ˝
+ Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms 2021 Graham Cormode
Abhinav Mishra
Joseph S. Ross
Pavel VeselĂ˝
+ Theory meets Practice: worst case behavior of quantile algorithms 2021 Graham Cormode
Abhinav Mishra
Joseph Ross
Pavel VeselĂ˝
+ Asymmetric scale functions for $t$-digests 2020 Joseph Ross
+ Asymmetric scale functions for $t$-digests 2020 Joseph S. Ross
+ PDF Chat Asymmetric scale functions for <i>t</i>-digests 2021 Joseph Ross
+ Differentially Private Quantiles 2021 Jennifer Gillenwater
Matthew Joseph
Alex Kulesza
+ Computing Extremely Accurate Quantiles Using t-Digests 2019 Ted Dunning
Otmar Ertl
+ Differentially Private Quantiles 2021 Jennifer Gillenwater
Matthew Joseph
Alex Kulesza
+ Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees 2023 Nicholas Schiefer
Justin Y. Chen
Piotr Indyk
Shyam Narayanan
Sandeep Silwal
Tal Wagner
+ PDF Chat Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees 2023 Nicholas Schiefer
Justin Y. Chen
Piotr Indyk
Shyam Narayanan
Sandeep Silwal
Tal Wagner
+ Differentially Private Approximate Quantiles 2021 Haim Kaplan
Shachar Schnapp
Uri Stemmer
+ Bounded Space Differentially Private Quantiles 2022 Daniel Alabi
Omri Ben‐Eliezer
Anamay Chaturvedi
+ PDF Chat Optimal quantile estimation: beyond the comparison model 2024 Meghal Gupta
Mihir Singhal
Hongxun Wu
+ PDF Chat UDDSketch: Accurate Tracking of Quantiles in Data Streams 2020 Italo Epicoco
Catiuscia Melle
Massimo Cafaro
Marco Pulimeno
Giuseppe Morleo
+ UDDSketch: Accurate Tracking of Quantiles in Data Streams 2020 Italo Epicoco
Catiuscia Melle
Massimo Cafaro
Marco Pulimeno
Giuseppe Morleo
+ UDDSketch: Accurate Tracking of Quantiles in Data Streams 2020 Italo Epicoco
Catiuscia Melle
Massimo Cafaro
Marco Pulimeno
Giuseppe Morleo
+ PDF Chat A Tight Lower Bound for Comparison-Based Quantile Summaries 2020 Graham Cormode
Pavel VeselĂ˝
+ Tight Lower Bound for Comparison-Based Quantile Summaries 2019 Graham Cormode
Pavel VeselĂ˝
+ Tight Lower Bound for Comparison-Based Quantile Summaries 2019 Graham Cormode
Pavel VeselĂ˝