Core Concepts
The authors propose a novel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. SatDiff generates correct, minimal, and type-safe low-level edit scripts with formal guarantees, and then synthesizes concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order.
Abstract
The paper addresses the problem of computing differences between tree-structured data, which is critical for software analysis and evolution. Existing approaches, such as Unix diff and other text-level methods, do not consider the structure of the code, making it challenging for users to interpret the generated edit scripts.
The authors present a novel approach called SatDiff, which reformulates the tree diffing problem as a maximum satisfiability (MaxSAT) problem. This allows them to leverage state-of-the-art SAT solvers to search for the correct minimum edits. SatDiff generates low-level edit actions, such as disconnecting edges, deleting nodes, and connecting edges, and then synthesizes high-level edit scripts, including update, move, insert, and delete actions, by combining the low-level edits in the appropriate topological order.
The key features of SatDiff are:
Correctness and minimality guarantees for the low-level edit scripts, achieved through the encoding of hard and soft constraints in the MaxSAT problem.
Type safety of the intermediate trees resulting from each edit action, even if they may contain holes.
Conciseness of the high-level edit scripts, which outperform existing approaches such as truediff and Gumtree.
The authors also present an ablation study to demonstrate the effectiveness of their encoding constraints and a case study to understand the discrepancies between SatDiff and Gumtree.
Stats
The paper does not contain any explicit numerical data or metrics. The focus is on the algorithmic approach and its theoretical properties.