Learning to Hash for Indexing Big Data—A Survey

Type: Article

Publication Date: 2015-12-18

Citations: 499

DOI: https://doi.org/10.1109/jproc.2015.2487976

Abstract

The explosive growth in Big Data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, approximate nearest neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., locality-sensitive hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning-to-hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros, and cons of the emerging techniques. We provide a comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.

Locations

  • arXiv (Cornell University) - View - PDF
  • Proceedings of the IEEE - View

Similar Works

Action Title Year Authors
+ Learning to Hash for Indexing Big Data - A Survey 2015 Jun Wang
Wei Liu
Sanjiv Kumar
Shih‐Fu Chang
+ PDF Chat A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search 2019 Deng Cai
+ A Survey on Deep Hashing Methods 2020 Xiao Luo
Haixin Wang
Daqing Wu
Chong Chen
Minghua Deng
Jianqiang Huang
Xian–Sheng Hua
+ PDF Chat A Survey on Deep Hashing Methods 2022 Xiao Luo
Haixin Wang
Daqing Wu
Chong Chen
Minghua Deng
Jianqiang Huang
Xian–Sheng Hua
+ A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search 2016 Deng Cai
+ Fast Locality Sensitive Hashing with Theoretical Guarantee 2023 Zongyuan Tan
Hongya Wang
Bo Xu
Minjie Luo
Ming Du
+ A Survey on Learning to Hash 2016 Jingdong Wang
Ting Zhang
Jingkuan Song
Nicu Sebe
Heng Tao Shen
+ A Survey on Learning to Hash 2016 Jingdong Wang
Ting Zhang
Jingkuan Song
Nicu Sebe
Heng Tao Shen
+ Near-Isometric Binary Hashing for Large-scale Datasets 2016 Amirali Aghazadeh
Andrew Lan
Anshumali Shrivastava
Richard G. Baraniuk
+ PDF Chat Fast Supervised Hashing with Decision Trees for High-Dimensional Data 2014 Guosheng Lin
Chunhua Shen
Qinfeng Shi
Anton van den Hengel
David Suter
+ PDF Chat DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs 2019 Erkun Yang
Tongliang Liu
Cheng Deng
Wei Liu
Dacheng Tao
+ PDF Chat Deep Discrete Supervised Hashing 2018 Qing-Yuan Jiang
Xue Cui
Wu-Jun Li
+ Learning Hash Functions Using Column Generation 2013 Xi Li
Guosheng Lin
Chunhua Shen
Anton van den Hengel
Anthony Dick
+ Learning Hash Functions Using Column Generation 2013 Xi Li
Guosheng Lin
Chunhua Shen
Anton van den Hengel
Anthony Dick
+ PDF Chat Asymmetric Deep Supervised Hashing 2018 Qing-Yuan Jiang
Wu-Jun Li
+ Asymmetric Deep Supervised Hashing 2017 Qing-Yuan Jiang
Wu-Jun Li
+ PDF Chat Online Hashing 2017 Long-Kai Huang
Qiang Yang
Wei‐Shi Zheng
+ DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs 2019 Erkun Yang
Tongliang Liu
Cheng Deng
Wei Liu
Dacheng Tao
+ PDF Chat Scalable Discrete Supervised Hash Learning with Asymmetric Matrix Factorization 2016 Shifeng Zhang
Jianmin Li
Jinma Guo
Bo Zhang
+ Hashing for Similarity Search: A Survey 2014 Jingdong Wang
Heng Tao Shen
Jingkuan Song
Jianqiu Ji

Works That Cite This (142)

Action Title Year Authors
+ PDF Chat Efficient discrete supervised hashing for large-scale cross-modal retrieval 2019 Tao Yao
Yaru Han
Ruxin Wang
Xiangwei Kong
Lianshan Yan
Haiyan Fu
Qi Tian
+ PDF Chat Ordinal Constrained Binary Code Learning for Nearest Neighbor Search 2017 Hong Liu
Rongrong Ji
Yongjian Wu
Feiyue Huang
+ PDF Chat Discrete Multimodal Hashing With Canonical Views for Robust Mobile Landmark Search 2017 Lei Zhu
Zi Huang
Xiaobai Liu
Xiangnan He
Jiande Sun
Xiaofang Zhou
+ PDF Chat A Survey on Deep Hashing Methods 2022 Xiao Luo
Haixin Wang
Daqing Wu
Chong Chen
Minghua Deng
Jianqiang Huang
Xian–Sheng Hua
+ PDF Chat Recognition of M-type stars in the unclassified spectra of LAMOST DR5 using a hash-learning method 2019 Y-X Guo
A-Li Luo
Shuo Zhang
Bing Du
Y-F Wang
J-J Chen
Fang Zuo
Xiao Kong
Y-H Hou
+ PDF Chat Sketch-based manga retrieval using manga109 dataset 2016 Yusuke Matsui
Kota Ito
Yuji Aramaki
Azuma Fujimoto
Toru Ogawa
Toshihiko Yamasaki
Kiyoharu Aizawa
+ Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors 2017 Alexandr Andoni
Thijs Laarhoven
Ilya Razenshteyn
Erik Waingarten
+ PDF Chat High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations 2023 Jianyang Gao
Cheng Long
+ Meta Cross-Modal Hashing on Long-Tailed Data 2021 Runmin Wang
Guoxian Yu
Carlotta Domeniconi
Xiangliang Zhang
+ PDF Chat HHF: Hashing-Guided Hinge Function for Deep Hashing Retrieval 2022 Chengyin Xu
Zenghao Chai
Zhengzhuo Xu
Hongjia Li
Qiruyi Zuo
Lingyu Yang
Chun Yuan