핵심 개념
A novel 3D viewpoint data augmentation technique can significantly improve the performance of deep learning models for wine label recognition, even when training data is extremely limited.
초록
The paper introduces a novel 3D viewpoint data augmentation technique to address the challenge of insufficient training data in the field of complex image recognition, specifically for wine label recognition. The proposed method generates visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos.
The key steps of the 3D viewpoint augmentation process are:
- Conversion of the 3D cylindrical surface of the wine label into a 2D representation by identifying the upper and lower elliptical rims and the two straight longitudinal edges.
- Extraction of 2D line samples along the label's longitudinal direction using the vanishing points from the longitudinal edges.
- Mapping of the line samples onto an image of a cylindrical surface with a different pose using a view-invariant cross-ratio technique to ensure the correct perspective of the wine label.
The authors then employ the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture to obtain the most discriminative embedding features for every wine label. This enables one-shot recognition of existing wine labels in the training classes or future newly collected wine labels.
Experimental results show that the proposed 3D viewpoint augmentation can significantly increase the recognition accuracy by more than 14.6% over conventional 2D data augmentation techniques. The authors also demonstrate that replacing the black background with randomly sourced background images further improves the recognition accuracy by 3.27%.
통계
"Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition."
"Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques."
"The ViT-S/16 model achieved the highest Top-1 accuracy performance of 91.15% using the proposed 3D viewpoint augmentation."
"After replacing the black background with randomly sourced background images, the accuracy of recognition may increase by 3.27%."
인용구
"Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications."
"By combining 3D viewpoint training data augmentation for metric learning of embedding features, we have developed an efficient and precise wine label recognition system."
"This innovative approach to data augmentation circumvents the constraints of limited training resources."