Conceitos essenciais
This research introduces a novel pyramid EATFormer architecture that leverages Vision Transformers and Evolutionary Algorithms to significantly improve traffic sign recognition accuracy and efficiency.
Resumo
This research explores the use of Vision Transformers for traffic sign recognition, a critical task for driver assistance systems and autonomous vehicles. The authors propose a novel pyramid EATFormer architecture that combines the strengths of Vision Transformers and Evolutionary Algorithms.
Key highlights:
- Compares the performance of three Vision Transformer variants (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models.
- Introduces a pyramid EATFormer backbone that incorporates an Evolutionary Algorithm-based Transformer (EAT) block, consisting of three improved modules: Feed-Forward Network (FFN), Global and Local Interaction (GLI), and Multi-Scale Region Aggregation (MSRA).
- Designs a Modulated Deformable MSA (MD-MSA) module to dynamically model irregular locations.
- Evaluates the proposed approach on the GTSRB and BelgiumTS datasets, demonstrating significant improvements in prediction speed and accuracy compared to state-of-the-art methods.
- Highlights the potential of Vision Transformers for practical applications in traffic sign recognition, benefiting driver assistance systems and autonomous vehicles.
Estatísticas
The proposed model achieves an accuracy of 98.41% on the GTSRB dataset, outperforming AlexNet, ResNet, VGG16, EfficientNet, GoogleNet, PVT, and LNL.
On the BelgiumTS dataset, the proposed model achieves an accuracy of 92.16%, outperforming AlexNet by 21.45 percentage points, EfficientNet by 8.08 percentage points, TNT by 9.01 percentage points, and LNL by 7.51 percentage points.
Citações
"This study explores three variants of Vision Transformers (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models."
"We provide a pioneering pyramid EATFormer architecture that incorporates the suggested EA-based Transformer (EAT) block."
"Experimental evaluations on the GTSRB and BelgiumTS datasets demonstrate the efficacy of the proposed approach in enhancing both prediction speed and accuracy."