Deep neural networks (DNNs) may develop abstract internal representations, termed "symbols," which can be extracted and used to understand, improve, and safeguard DNN decision-making.
This research paper proposes a novel framework and method for disambiguating multiple class label predictions in deep learning image recognition, determining whether predictions stem from distinct entities or a single entity misidentified, and provides verifiable counterfactual proofs for increased confidence in model interpretations.