Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth

Type: Preprint

Publication Date: 2024-12-04

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2412.03611

Abstract

Estimating the frequency of items on the high-volume, fast data stream has been extensively studied in many areas, such as database and network measurement. Traditional sketch algorithms only allow to give very rough estimates with limited memory cost, whereas some learning-augmented algorithms have been proposed recently, their offline framework requires actual frequencies that are challenging to access in general for training, and speed is too slow for real-time processing, despite the still coarse-grained accuracy. To this end, we propose a more practical learning-based estimation framework namely UCL-sketch, by following the line of equation-based sketch to estimate per-key frequencies. In a nutshell, there are two key techniques: online training via equivalent learning without ground truth, and highly scalable architecture with logical estimation buckets. We implemented experiments on both real-world and synthetic datasets. The results demonstrate that our method greatly outperforms existing state-of-the-art sketches regarding per-key accuracy and distribution, while preserving resource efficiency. Our code is attached in the supplementary material, and will be made publicly available at https://github.com/Y-debug-sys/UCL-sketch.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ SF-Sketch: A Two-Stage Sketch for Data Streams 2020 Lingtong Liu
Yulong Shen
Yibo Yan
Tong Yang
Muhammad Shahzad
Bin Cui
Gaogang Xie
+ PDF Chat QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams 2024 Yiyan Qi
Rundong Li
Pinghui Wang
Yufang Sun
Rui Xing
+ PDF Chat QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams 2024 Yiyan Qi
Rundong Li
Pinghui Wang
Yufang Sun
Rui Xing
+ SF-sketch: A Two-stage Sketch for Data Streams 2017 Tong Yang
Lingtong Liu
Yibo Yan
Muhammad Shahzad
Yulong Shen
Xiaoming Li
Bin Cui
Gaogang Xie
+ PDF Chat Frequency Estimation with One-Sided Error 2022 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Frequency Estimation with One-Sided Error 2021 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Frequency Estimation with One-Sided Error 2021 Piotr Indyk
Shyam Narayanan
David P. Woodruff
+ Differentially Private Linear Sketches: Efficient Implementations and Applications 2022 Fuheng Zhao
Dan Qiao
Rachel Redberg
Divyakant Agrawal
Amr El Abbadi
Yu-Xiang Wang
+ Improved Frequency Estimation Algorithms with and without Predictions 2023 Anders Aamand
Justin Y. Chen
Huy L. Nguyễn
Sandeep Silwal
Ali Vakilian
+ Bias-Aware Sketches 2016 Jiecao Chen
Qin Zhang
+ Bias-Aware Sketches 2016 Jiecao Chen
Qin Zhang
+ Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions 2018 Daniel Ting
+ Sampling Sketches for Concave Sublinear Functions of Frequencies 2019 Edith Cohen
Ofir Geri
+ Sampling Sketches for Concave Sublinear Functions of Frequencies 2019 Edith Cohen
Ofir Geri
+ PDF Chat Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams 2024 Rana Shahout
Michael Mitzenmacher
+ Composite Hashing for Data Stream Sketches 2018 Arijit Khan
Sixing Yan
+ (Learned) Frequency Estimation Algorithms under Zipfian Distribution 2019 Anders Aamand
Piotr Indyk
Ali Vakilian
+ SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation 2022 Rana Shahout
Roy Friedman
Ran Ben Basat
+ Simple and Efficient Cardinality Estimation in Data Streams. 2020 Seth Pettie
Dingyu Wang
Longhui Yin
+ Double-Hashing Algorithm for Frequency Estimation in Data Streams 2022 Nikita Seleznev
Annamalai Senthil Kumar
C. Bayan Bruss

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors