Khái niệm cốt lõi
A novel DeepMapping abstraction that leverages deep neural networks to integrate compression and indexing capabilities for efficient storage and retrieval of tabular data.
Tóm tắt
The paper proposes a novel data abstraction called DeepMapping that leverages deep neural networks to balance storage cost, query latency, and runtime memory footprint for tabular data. The key ideas are:
Hybrid Data Representation: DeepMapping couples a compact, multi-task neural network model with a lightweight auxiliary data structure to achieve 100% accuracy without requiring a prohibitively large model.
Multi-Task Hybrid Architecture Search (MHAS): MHAS is a neural architecture search algorithm that adaptively tunes the number of shared and private layers and the sizes of the layers to optimize the overall size of the hybrid architecture.
Modification Workflows: DeepMapping supports efficient insert, delete, and update operations by materializing the modifications in the auxiliary structure and triggering model retraining only when the auxiliary structure exceeds a threshold.
Extensive experiments on TPC-H, TPC-DS, synthetic, and real-world datasets demonstrate that DeepMapping can better balance storage, retrieval speed, and runtime memory footprint compared to state-of-the-art compression and indexing techniques, especially in memory-constrained environments.
Thống kê
The paper provides the following key statistics:
DeepMapping can achieve up to 15x speedup over baselines in memory-constrained environments by alleviating I/O and decompression costs.
DeepMapping can reduce the storage size by up to 43x compared to the second-best baseline.
Trích dẫn
"DeepMapping leverages the impressive memorization capabilities of deep neural networks to provide better storage cost, better latency, and better run-time memory footprint, all at the same time."
"The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping."