LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries

Type: Preprint

Publication Date: 2024-03-12

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2403.07331

Abstract

With the proliferation of spatio-textual data, Top-k KNN spatial keyword queries (TkQs), which return a list of objects based on a ranking function that evaluates both spatial and textual relevance, have found many real-life applications. Existing geo-textual indexes for TkQs use traditional retrieval models like BM25 to compute text relevance and usually exploit a simple linear function to compute spatial relevance, but its effectiveness is limited. To improve effectiveness, several deep learning models have recently been proposed, but they suffer severe efficiency issues. To the best of our knowledge, there are no efficient indexes specifically designed to accelerate the top-k search process for these deep learning models. To tackle these issues, we propose a novel technique, which Learns to Index the Spatio-Textual data for answering embedding based spatial keyword queries (called LIST). LIST is featured with two novel components. Firstly, we propose a lightweight and effective relevance model that is capable of learning both textual and spatial relevance. Secondly, we introduce a novel machine learning based Approximate Nearest Neighbor Search (ANNS) index, which utilizes a new learning-to-cluster technique to group relevant queries and objects together while separating irrelevant queries and objects. Two key challenges in building an effective and efficient index are the absence of high-quality labels and unbalanced clustering results. We develop a novel pseudo-label generation technique to address the two challenges. Experimental results show that LIST significantly outperforms state-of-the-art methods on effectiveness, with improvements up to 19.21% and 12.79% in terms of NDCG@1 and Recall@10, and is three orders of magnitude faster than the most effective baseline.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat WISK: A Workload-aware Learned Index for Spatial Keyword Queries 2023 Yufan Sheng
Xin Cao
Yixiang Fang
Kaiqi Zhao
Jianzhong Qi
Gao Cong
Wenjie Zhang
+ WISK: A Workload-aware Learned Index for Spatial Keyword Queries 2023 Yufan Sheng
Xin Cao
Yixiang Fang
Kaiqi Zhao
Jianzhong Qi
Gao Cong
Wenjie Zhang
+ Accelerating Spatio-Textual Queries with Learned Indices 2023 Georgios Chatzigeorgakidis
Kostas Patroumpas
Dimitrios Skoutas
Spiros Athanasiou
+ PDF Chat Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations 2024 Sebastian Bruch
Franco Maria Nardini
Cosimo Rulli
Rossano Venturini
+ The Case for Learned Spatial Indexes 2020 Varun Pandey
Alexander van Renen
Andreas Kipf
Ibrahim Sabek
Jialin Ding
Alfons Kemper
+ The Case for Learned Spatial Indexes 2020 Varun Pandey
Alexander van Renen
Andreas Kipf
Ibrahim Sabek
Jialin Ding
Alfons Kemper
+ Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine 2020 Kuan Fang
Long Zhao
Zhan Shen
Ruixing Wang
RiKang Zhou
LiWen Fan
+ NETR-Tree: An Eifficient Framework for Social-Based Time-Aware Spatial Keyword Query 2019 Zhixian Yang
Yuanning Gao
Xiaofeng Gao
Guihai Chen
+ OOD-DiskANN: Efficient and Scalable Graph ANNS for Out-of-Distribution Queries 2022 Shikhar Jaiswal
Ravishankar Krishnaswamy
Ankit Garg
Harsha Vardhan Simhadri
Sheshansh Agrawal
+ Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data. 2020 Daniel Kang
John Guibas
Peter Bailis
Tatsunori Hashimoto
Matei Zaharia
+ PDF Chat DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search 2024 Jiuqi Wei
Botao Peng
X. Lee
Themis Palpanas
+ SOLAR: Sparse Orthogonal Learned and Random Embeddings 2020 Tharun Medini
Beidi Chen
Anshumali Shrivastava
+ SOLAR: Sparse Orthogonal Learned and Random Embeddings 2021 Tharun Medini
Beidi Chen
Anshumali Shrivastava
+ PDF Chat DeepSSN: A deep convolutional neural network to assess spatial scene similarity 2022 Danhuai Guo
Shiyin Ge
Zhang Shu
Song Gao
Ran Tao
Yangang Wang
+ EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval 2023 Ramnath Kumar
Anshul Mittal
Nilesh Gupta
Aditya Kusupati
Inderjit S. Dhillon
Prateek Jain
+ PDF Chat Approximate Nearest Neighbor Search with Window Filters 2024 Joshua Engels
Benjamin Landrum
Shangdi Yu
Laxman Dhulipala
Julian Shun
+ Constructing Tree-based Index for Efficient and Effective Dense Retrieval 2023 Haitao Li
Qingyao Ai
Jingtao Zhan
Jiaxin Mao
Yiqun Liu
Zheng Liu
Zhao Cao
+ Efficient Spatial Keyword Search in Trajectory Databases 2012 Gao Cong
Hua Lu
Beng Chin Ooi
Dongxiang Zhang
Meihui Zhang
+ PDF Chat GLIN: A (G)eneric (L)earned (In)dexing Mechanism for Complex Geometries 2023 Congying Wang
Jia Yu
Zhuoyue Zhao
+ PDF Chat A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters 2016 Dingming Wu
Christian S. Jensen

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors