The author introduces the Text2Model approach, which generates task-specific classifiers using only text descriptions, addressing limitations of existing zero-shot learning methods and demonstrating strong improvements in classification tasks.
Enhancing zero-shot image classification by leveraging contextual attributes for improved performance.
人間の視覚知覚を模倣するPerceptionCLIPは、ゼロショット画像分類において改善された汎化性能、スパリアス特徴への依存の軽減、およびグループの堅牢性を達成します。
Adapting Large Language Models for zero-shot image classification through contrastive learning.
Text2Model introduces a novel approach to zero-shot image classification by generating task-specific classifiers from text descriptions, outperforming existing methods.
A novel framework that leverages the strengths of multiple models, including CLIP and DINO, along with an adaptive weighting mechanism based on confidence levels, to significantly improve zero-shot image classification performance.