核心概念
Integrating transfer entropy (TE) feedback connections into the training process of convolutional neural networks (CNNs) can accelerate the training process and improve the classification accuracy, at the cost of increased computational overhead.
摘要
The paper explores how to include transfer entropy (TE) in the learning mechanisms of convolutional neural network (CNN) architectures. TE is used to quantify the directional information transfer between neurons in different layers of the CNN.
Key highlights:
- TE can be used to measure the causal relationships and information flow between artificial neurons in a feedforward neural network.
- The authors introduce a novel training mechanism for CNNs that integrates TE feedback connections between neurons.
- Adding the TE feedback parameter accelerates the training process, as fewer epochs are needed to reach a target accuracy. However, it also adds computational overhead to each epoch.
- The authors find that it is efficient to consider only the inter-neural information transfer of a random subset of neuron pairs from the last two fully connected layers of the CNN.
- The TE acts as a smoothing factor, becoming active only periodically during training, which helps generate stability.
- Experiments are conducted on several benchmark image classification datasets, including CIFAR-10, FashionMNIST, STL-10, SVHN, and USPS.
统计
Applying the TE feedback mechanism to the CIFAR-10 dataset, the target 98% accuracy was reached in 5 epochs, compared to 6 epochs without TE.
For the FashionMNIST dataset, the target 97% accuracy was reached in 23 epochs with TE, compared to 28 epochs without TE.
On the STL-10 dataset, the target 98% accuracy was reached in 5 epochs with TE, compared to 7 epochs without TE.
For the SVHN dataset, the target 94% accuracy was reached in 9 epochs with TE, compared to 11 epochs without TE.
On the USPS dataset, the target 99% accuracy was reached in 3 epochs with or without TE.
引用
"Adding the TE feedback parameter accelerates the training process, as fewer epochs are needed. On the flip side, it adds computational overhead to each epoch."
"According to our experiments, it is efficient to consider only the inter-neural information transfer of a random subset of the neuron pairs from the last two fully connected layers."
"The TE acts as a smoothing factor, becoming active only periodically during training, which helps generate stability."