Core Concepts
This paper proposes a method to simplify Mixed Boolean-Arithmetic (MBA) expressions using term rewriting techniques with the E-Graph data structure, which can efficiently represent multiple expressions with the same semantics. The approach aims to address the challenges of existing MBA deobfuscation techniques in terms of performance and preserving semantics.
Abstract
The paper discusses the problem of MBA obfuscation, where programs are transformed into a more complex form using a mixture of Boolean and arithmetic operations to impede reverse engineering and analysis. Existing deobfuscation techniques, such as those based on SMT solvers, have limitations in handling complex MBA expressions.
The key points are:
MBA expressions can be classified into linear and polynomial forms, with the latter being more challenging to simplify.
The E-Graph data structure is introduced as a way to efficiently represent and manipulate multiple expressions with the same semantics during the term rewriting process.
The paper describes the implementation of an MBA expression simplifier using the Rust E-Graph library, including preprocessing steps and the application of basic rewriting rules.
Experimental results are presented, showing that the proposed E-Graph-based approach can simplify a large portion of the tested MBA expressions, with reasonable performance compared to other deobfuscation techniques.
The authors identify the need to further improve the simplification of polynomial MBA expressions and explore the integration of constant folding techniques to enhance the overall deobfuscation capabilities.
Stats
The paper presents the following key statistics:
Tigress dataset: 323 total expressions, 267 successfully simplified (82.66% success rate), 69% simplification ratio, 3.98s average time.
Qsynth Custom EA dataset: 501 total expressions, 493 successfully simplified (98.40% success rate), 65.67% simplification ratio, 72.79s average time.
MBA Solver (Linear) dataset: 1008 total expressions, 818 successfully simplified (81.15% success rate), 93.26% simplification ratio, 41.13s average time.
MBA Solver (Non-polynomial) dataset: 1003 total expressions, 949 successfully simplified (94.61% success rate), 93.26% simplification ratio, 239.04s average time.
MBA Solver (Polynomial) dataset: 1008 total expressions, 587 successfully simplified (58.23% success rate), 94.91% simplification ratio, 27.15s average time.