Core Concepts
This paper introduces a compressed data structure that supports the fundamental operations of rank and select efficiently on large-alphabet strings.
Abstract
The paper focuses on engineering efficient implementations of the alphabet-partitioning approach for supporting rank and select operations on large-alphabet strings. The main contributions are:
The authors carry out algorithm engineering on the alphabet-partition approach by Barbay et al. [15], obtaining an implementation that uses compressed space while supporting operations s.rank and s.select efficiently in practice. Their approach also yields interesting theoretical trade-offs.
The authors show that their approach yields competitive trade-offs when used for (i) snippet extraction from text databases and (ii) intersection of inverted lists, which are key operations for modern information retrieval systems.
The authors show that the alphabet-partition approach can be used to improve run-length compression of large-alphabet strings formed by r equal-symbol runs. They introduce a competitive alternative both in theory and practice.
The authors show that their alphabet-partitioning scheme can be efficiently implemented on a distributed-memory system.
The authors' implementation of alphabet partitioning is effective and efficient for supporting the fundamental rank and select operations, as well as for supporting several key operations in modern information retrieval systems that manipulate large-alphabet strings.
Stats
The paper does not contain any explicit numerical data or statistics. The focus is on the algorithmic and engineering aspects of the proposed data structure.
Quotes
The paper does not contain any direct quotes that are particularly striking or support the key arguments.