Multimodal Knowledge Transfer for Open-World Video Recognition
The author proposes a generic knowledge transfer pipeline, PCA, to enhance open-world video recognition by progressively integrating external multimodal knowledge from foundation models. The approach involves three stages: Percept, Chat, and Adapt.