This research paper proposes a novel framework that leverages the power of large language models (LLMs) and layout-to-image synthesis (LIS) to significantly improve the performance of few-shot object detection models.
DE-ViTは、微調整不要の新しいアーキテクチャに基づく少数ショット物体検出手法です。