Core Concepts
Introducing semantic features as a conceptual framework for transparent and robust white box neural networks.
Abstract
This paper proposes a new paradigm for training models by introducing semantic features in white box neural networks. The focus is on theoretical aspects rather than quantitative metrics, aiming to build interpretable models. The proof of concept model is trained on a Minimum Viable Dataset (MVD) using the MNIST dataset subset of "3" and "5" digits. The structure includes layers based on semantic features like real-valued, convolutional, affine, and logical features. Training results show high adversarial accuracy without adversarial training, minimal hyperparameter tuning, and quick training on a single CPU. Further research ideas include self-supervised learning with semantic features and exploring more complex logical features.
Stats
A well-motivated proof of concept model consists of 4 layers with ~4.8K learnable parameters.
Model achieves human-level adversarial test accuracy without adversarial training.
Training time on a single CPU is around 9 seconds per epoch.
Quotes
"The general nature of the technique bears promise for a paradigm shift towards radically democratised and truly generalizable white box neural networks."
"The discrepancy between animal brains' learning abilities and current neural network limitations indicates a need for simplified AI."
"The model achieves ~92% accuracy under AutoAttack with strong adversarial regime."