Core Concepts
The author presents a space-efficient data structure based on CDAWG to compute MAWs and EBFs, providing insights into bioinformatics and data compression applications.
Abstract
The content discusses the computation of minimal absent words (MAWs) and extended bispecial factors (EBFs) using a compact directed acyclic word graph (CDAWG). The focus is on non-trivial MAWs of length at least 2, with applications in bioinformatics and data compression. The proposed method offers efficient space utilization for outputting MAWs and EBFs in linear time relative to their sizes. Additionally, the relationship between MAWs, MRWs, MUSs, and EBFs is explored through the CDAWG structure.
The authors introduce a more space-efficient data structure based on the compact DAWG (CDAWG) to compute MAWs in linear time with minimal space requirements. They demonstrate how MAWs are related to extended bispecial factors (EBFs) through the CDAWG representation. Furthermore, they discuss the connection between MRWs, MUSs, and EBFs within this computational framework.
The study delves into combinatorial properties of MAWs, EBFs, and MRWs in strings represented by CDAWGs. Efficient algorithms are proposed for computing these structures with minimal space complexity while maintaining linear time performance relative to output size.
Stats
Fujishige et al. [16] proposed a data structure of size Θ(n) that can output the set MAW(S) of all MAWs for a given string S of length n in O(n+|MAW(S)|) time.
The new data structure based on compact DAWG (CDAWG) can output MAW(S) in O(|MAW(S)|) time with O(emin) space.
For any strings of length n, it holds that emin < 2n.
There exists a family of strings S of length n such that er(S) = Θ(√n).
Quotes
"The proposed method offers efficient space utilization for outputting MAWs and EBFs in linear time relative to their sizes."
"The study delves into combinatorial properties of MAWs, EBFs, and MRWs in strings represented by CDAWGs."