The paper studies the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones.
The authors consider coincidence constraints, which include key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set.
The authors first show that the problem is APX-hard for general metric spaces. They then present an algorithm that solves the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining their algorithm for tree metrics and a classic result on probabilistic tree embeddings, the authors design a (high probability) logarithmic-ratio approximation for general metrics.
The authors also study the variant of the problem where each individual value's allowed change is limited. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and the authors present a polynomial-time repairing algorithm for the case of a line metric.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Youri Kamins... kl. arxiv.org 09-26-2024
https://arxiv.org/pdf/2409.16713.pdfDybere Forespørgsler