Core Concepts
Utilizing pre-trained language models and learnable prompts in a straightforward framework enhances few-shot learning performance.
Abstract
This analysis delves into the Semantic-based Few-Shot Learning framework proposed by Zhou et al. The study focuses on leveraging pre-trained language models and learnable prompts to improve few-shot learning tasks. The framework simplifies multi-modal fusion, utilizes self-ensemble and distillation, and achieves impressive results across various datasets.
Introduction
Few-shot learning remains a challenge despite advancements in deep learning.
Leveraging semantic information aids in recognizing novel classes.
Related Work
Various methods like ProtoNet, MAML, and GNNFSL enhance feature representation.
Preliminary
Problem formulation involves recognizing unknown samples with limited labeled data.
Meta-training helps alleviate overfitting by using a base dataset for pre-training.
Method
Utilizes visual and textual backbones for feature extraction.
Implements multi-modal feature fusion with simple addition operation.
Experiments
Conducted on four datasets: miniImageNet, tieredImageNet, CIFAR-FS, FC100.
SimpleFSL and SimpleFSL++ outperform state-of-the-art methods in 5-way 1-shot tasks.
Conclusion
Emphasizes the importance of pre-trained language models and learnable prompts in enhancing few-shot learning performance.
Stats
Particularly noteworthy is its outstanding performance in the 1-shot learning task, surpassing the current state-of-the-art by an average of 3.3% in classification accuracy.
Quotes
"Our proposed SimpleFSL and SimpleFSL++ both surpass the SOTA SP-CLIP [6] and LEP-CLIP [60] with substantial accuracy gains."
"The exploration of prompt design deserves further investigation in the future."