Centrala begrepp
This paper presents two families of codes that can efficiently correct synthesis defects that occur during the parallel synthesis of DNA strands for data storage applications.
Sammanfattning
The paper investigates errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by a machine according to a template supersequence. If there is a cycle where the machine fails, then the strands meant to be appended at this cycle will not be appended, and this is referred to as a synthesis defect.
The authors present two families of codes to correct these synthesis defects:
t-known-synthesis-defect correcting codes (t-KDCCs):
Assume the defective cycles are known.
Provide constructions for t = 1, 2 with redundancy log 4 and log n + 18 log 3, respectively.
Show that knowing the locations of the defects helps narrow down the locations of the resulting deletions.
t-synthesis-defect correcting codes (t-SDCCs):
Assume the defective cycles are unknown.
Provide constructions for t = 1, 2 that use a two-part coding scheme.
The first part localizes the defects to a window of length O(log n) using a set of "defect-locating" strands.
The second part employs a more efficient coding scheme to correct the remaining strands.
Achieve redundancy of λ1(log n)^2 + M log log n for t = 1 and λ2(log n)^2 + 2M log n for t = 2, where M is the number of strands and λ1, λ2 are constants.
The paper also derives a lower bound on the redundancy for single-known-synthesis-defect correcting codes, showing that the constructions are almost optimal.