Ask a Question

Prefer a chat interface with context about you and your work?

Improved Consistent Weighted Sampling Revisited

Improved Consistent Weighted Sampling Revisited

Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch weighted sets and has drawn increasing interest from the community. Due to its constant-time complexity independent of the values of the weights, Improved CWS (ICWS) is …