Core Concepts
This paper studies the problem of efficiently finding a diverse set of longest common subsequences (LCSs) from a set of input strings, considering both sum and minimum diversity measures under Hamming distance. The authors analyze the computational complexity of these problems, providing polynomial-time algorithms for bounded K, as well as PTAS and FPT algorithms for unbounded K.
Abstract
The paper focuses on the problem of finding a diverse set of longest common subsequences (LCSs) from a set of input strings, considering both sum and minimum diversity measures under Hamming distance.
The key highlights are:
When the number K of LCSs to be selected is bounded, both the Max-Sum and Max-Min versions of the problem can be solved in polynomial time using dynamic programming.
For unbounded K, the Max-Sum version admits a polynomial-time approximation scheme (PTAS), by leveraging the property that Hamming distance is a metric of negative type.
The authors also provide fixed-parameter tractable (FPT) algorithms for both the Max-Sum and Max-Min versions, parameterized by K and the length r of the input strings.
The paper shows that both problems become NP-hard when K is part of the input, even for constant string length r ≥ 3.
The parameterized complexity analysis reveals that the problems are W[1]-hard when parameterized by K alone.
The authors work in a more general setting where the input strings are represented by an edge-labeled directed acyclic graph (DAG), which can succinctly represent the set of all LCSs. This allows them to extend their positive results to this more general case.