Główne pojęcia
Deep neural networks (DNNs) may develop abstract internal representations, termed "symbols," which can be extracted and used to understand, improve, and safeguard DNN decision-making.
Statystyki
The prediction accuracy based on symbols is around 80% in layer 4.
The accuracy of symbol-based predictions on adversarial inputs remains high even when the accuracy of the DNN's original predictions drops significantly.
The accuracy of predictions on OOD examples using a temporary mapping of symbols is around 40-50%.