핵심 개념
EventBind proposes a novel framework to optimize event-based recognition by aligning images, text, and events in a unified representation space.
초록
EventBind introduces a novel framework to address the challenges of event-based recognition by aligning images, text, and events. The framework consists of an event encoder, text encoder, and image encoder, along with a Hierarchical Triple Contrastive Alignment module. Extensive experiments show significant performance improvements in fine-tuning and few-shot settings on various benchmarks.
통계
EventBind achieves new state-of-the-art accuracy on N-Caltech101 and N-Imagenet datasets.
EventBind outperforms existing methods by a large margin in fine-tuning and few-shot settings.
EventBind shows remarkable performance in event retrieval tasks with text and image queries.
인용구
"Our EventBind achieves new state-of-art accuracy compared with the previous methods."
"EventBind can be flexibly extended to the event retrieval task using text or image queries."