Yang, W., Wang, S., Chen, Z., Sun, Y., & Peng, Z. (2024). Joinable Search over Multi-source Spatial Datasets: Overlap, Coverage, and Efficiency. arXiv preprint arXiv:2311.13383v2.
This paper addresses the challenge of efficiently finding joinable spatial datasets across multiple independent data sources, focusing on two specific problems: finding datasets with maximum overlap (OJSP) and maximum coverage (CJSP) with a given query dataset, while ensuring spatial connectivity in CJSP.
The authors propose a distributed framework utilizing a novel index structure called DITS (DIstributed Tree-based Spatial index). DITS consists of local indices (DITS-L) built on individual data sources and a global index (DITS-G) maintained centrally. DITS-L combines balltree and inverted index features to accelerate local searches, while DITS-G facilitates efficient identification of relevant data sources. The framework employs query distribution strategies to minimize communication costs. For OJSP, an efficient filter-verification algorithm using lower and upper bounds is proposed. For the NP-hard CJSP, a heuristic greedy algorithm with spatial merge is designed, leveraging DITS for efficient connectivity verification and result merging.
The proposed distributed framework, with its novel index structure and efficient search algorithms, offers a practical and effective solution for performing overlap and coverage joinable searches over large-scale spatial datasets distributed across multiple sources.
This research contributes significantly to the field of spatial data management by introducing new search problems relevant to real-world applications and providing an efficient solution for multi-source spatial data exploration and integration.
The paper focuses on static datasets. Future work could explore extending the framework to handle dynamic updates in spatial datasets. Additionally, investigating alternative approximation algorithms for CJSP with potentially better approximation ratios could be beneficial.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania