Grunnleggende konsepter
Height-bounded LZ encodings offer efficient access to text positions with minimal space requirements.
Sammendrag
Height-bounded Lempel-Ziv (LZHB) encodings provide fast access to text positions with reduced space usage. Greedy algorithms efficiently find small LZHB representations. Theoretical bounds and practical experiments demonstrate the effectiveness of LZHB encodings in data compression.
Statistikk
We show that there exists a constant c such that the size ˆzHB(c log n) of the optimal (smallest) LZHB encoding whose height is bounded by c log n for any string of length n is O(ˆgrl).
Furthermore, we show that there exists a family of strings such that ˆzHB(c log n) = o(ˆgrl), thus making ˆzHB(c log n) one of the smallest known repetitiveness measures.
For example, for the encoding (1, a), (1, b), (3, 1), (1, c), (5, 2) of string ababacbabac, the heights are: 0, 0, 0, 1, 1, 1, 0, 1, 2, 2.
An LZ-like encoding induces an implicit referencing forest where each position references a previous occurrence.
The height of an optimal LZ-like encoding can be Θ(n).
Sitater
"We introduce height-bounded LZ encodings (LZHB), a new family of compressed representations."
"Any LZHB encoding whose referencing height is bounded by h allows access to an arbitrary position using O(h) predecessor queries."
"While computing the optimal LZHB representation seems difficult, linear and near linear time greedy algorithms efficiently find small representations."