The paper presents PHOBIC, a technique for constructing minimal perfect hash functions (MPHFs) that builds upon the PTHash approach. The key contributions are:
The paper first analyzes the theoretical aspects of the bucket placement approach to perfect hashing. It shows that any specialization of this approach requires between log2(e) bits per key and log2(e) + O(log λ/λ) bits per key in expectation, where λ is the average bucket size. The goal is then to minimize the construction time, for which the authors characterize an asymptotically optimal way of distributing the bucket sizes.
The interleaved coding scheme exploits the fact that the seeds for the i-th bucket of each partition follow the same statistical distribution. This allows tuning a compressor for each such index i, improving the space efficiency compared to prior approaches that used a single compressor.
The GPU implementation parallelizes the construction over partitions, seeds, and keys, achieving significant speedups over the CPU-only version, especially for larger average bucket sizes.
Experimental results show that PHOBIC is 0.17 bits/key more space efficient than PTHash for the same query time and construction throughput. The GPU implementation can construct a perfect hash function at 2.17 bits/key in 28 ns per key, which can be queried in 37 ns per query on the CPU.
To Another Language
from source content
arxiv.org
Дополнительные вопросы