Core Concepts

The paper presents novel applications of the large sieve inequality from analytic number theory to obtain improved algorithms for various sparse pattern matching problems, including Sparse Nonnegative Convolution, Sparse General Convolution, Text-to-Pattern Hamming Distances, and the Constellation problem.

Abstract

The paper studies various problems related to sparse pattern matching, such as Sparse Convolution, Text-to-Pattern Hamming Distances, and the Constellation problem. Many of these problems can be reduced to dense instances using the mod-prime hash function, which has two main drawbacks: (1) The collision probability is O(log N/Q) rather than the optimal O(1/Q), and (2) it is difficult to derandomize the choice of the prime p.
The main technical contribution of the paper is the use of the large sieve inequality from analytic number theory to partially overcome these drawbacks in certain scenarios. Specifically:
Sparse Nonnegative Convolution:
The paper obtains a Las Vegas algorithm that computes the convolution A ⋆ B of two nonnegative integer vectors A, B in O(t log t) time with 1 - 1/poly(t) probability, where t is the output sparsity.
This simultaneously improves the previous O(t log t log log t)-time Las Vegas algorithm and the O(t log t)-time Monte Carlo algorithm with 2^{-sqrt(log t)} failure probability.
Sparse General Convolution:
For the case where the length N of the input vectors satisfies N ≤ t^{1.99}, the paper gives a Monte Carlo O(t log t) time algorithm for sparse convolution with possibly negative input.
This partially resolves an open question left by previous work on whether Sparse General Convolution can be solved in O(t log t + poly log(N∆)) time.
Text-to-Pattern Hamming Distances:
The paper obtains a deterministic O(n√m log log m)-time algorithm that exactly computes the Hamming distance between a length-m pattern P and every length-m substring of a length-n text T.
This improves the previous O(n√m(log m log log m)^{1/4})-time deterministic algorithm and nearly matches their O(n√m)-time Las Vegas algorithm.
The key technical component behind the Text-to-Pattern Hamming Distances result is a variant of the "X + Y lemma" that can be computed deterministically in O(N log(s^2/N) + N log log N) time, where s is the sum of the 1-norms of the input vectors.

Stats

None

Quotes

None

Key Insights Distilled From

by Ce Jin,Yinzh... at **arxiv.org** 04-01-2024

Deeper Inquiries

The application of tools from analytic number theory, such as the large sieve inequality, can benefit various problems in algorithm design and analysis. One such problem is the Subset Sum problem, which involves finding a subset of a given set of integers that sums to a target value. By leveraging techniques from analytic number theory, we can potentially improve the efficiency and accuracy of algorithms designed to solve the Subset Sum problem. Additionally, problems related to prime factorization, modular arithmetic, and number theory in general could also benefit from the application of these tools. The large sieve inequality, in particular, can help in analyzing the distribution of prime numbers and their impact on algorithmic solutions.

The techniques developed in the paper can potentially be extended to improve the deterministic algorithms for Sparse Nonnegative Convolution and the Constellation problem. By incorporating the insights gained from the large sieve inequality and the derandomization techniques used in the paper, it may be possible to enhance the efficiency and success probability of the deterministic algorithms for these problems. The key lies in adapting the hashing-based techniques and the sparsity tests introduced in the paper to suit the requirements of Sparse Nonnegative Convolution and the Constellation problem. By refining and optimizing these techniques, it is plausible to achieve better deterministic algorithms for these problems.

It may be possible to relax the assumption N ≤ t^{1.99} in the Sparse General Convolution algorithm and obtain an O(t log t) time algorithm for general values of N. By further refining the techniques used in the paper, such as the mod-prime hashing and the application of the large sieve inequality, it might be feasible to extend the algorithm to handle larger values of N while maintaining the O(t log t) time complexity. This extension would require a deeper analysis of the impact of larger N values on the efficiency of the algorithm and potentially adapting the hashing and sparsity testing methods to accommodate the increased input size. With careful optimization and adjustments, it could be possible to relax the N ≤ t^{1.99} assumption and achieve the desired time complexity for general values of N.

0