Core Concepts
WaZI is a learned and workload-aware variant of the Z-index, optimizing storage layout and search structures for spatial query performance.
Abstract
The content introduces WaZI, a novel approach to spatial indexing that combines machine learning models with Z-index structure. It addresses the challenges of spatial indexing by optimizing storage layout and search structures based on data distribution and query workload. The paper outlines the cost function formulation, adaptive partitioning, ordering strategies, and a page-skipping mechanism to enhance query performance. Experimental results demonstrate significant improvements in range query time compared to state-of-the-art indexes.
Introduction
Learned indexes aim to improve query performance by utilizing machine learning models.
Traditional spatial indexes like R-trees have limitations in handling large volumes of spatial data.
Related Work
Traditional spatial indexes are categorized into space partitioning-based, data partitioning-based, and data transformation-based indexes.
Learned indexes like RMI have shown benefits in reducing index sizes and query latency.
The Base Z-Index
The Z-index uses hierarchical partitioning and ordering to facilitate range queries efficiently.
Monotonicity property of the Z-index aids in processing range queries effectively.
The WaZI Index
WaZI optimizes partitioning and ordering based on data distribution and query workload.
Adaptive partitioning and ordering strategies are employed to minimize retrieval costs during range queries.
Skipping Mechanism
Introduces look-ahead pointers to skip irrelevant leaf nodes during range query processing.
Algorithm for constructing look-ahead pointers is presented for efficient skipping.
Experiments
Real-world datasets from OpenStreetMap are used along with skewed semi-synthetic query workloads.
Comparison with baselines like STR, CUR, Flood, QUASII, and Base shows significant improvements in range query performance with WaZI.
Stats
Our extensive experiments show that the WaZI index improves range query time by 40% on average over the baselines while always performing better or comparably to state-of-the-art spatial indexes.
Quotes
"We propose a generalization of the Z-index that adapts gracefully to both the distribution of spatial data and the workload of range queries."
"Our aim is for the index to be adaptive to the given data and anticipated range queries."