This technical report outlines a novel approach to temporal action localization (TAL) in untrimmed videos using a unified network called Faster-TAD, achieving competitive results in the ActivityNet Challenge 2022.
본 논문에서는 관성 센서 데이터를 활용한 인간 활동 인식 분야에서 시간적 행동 지역화(TAL) 모델의 적용 가능성을 체계적으로 보여주고, 기존의 고정된 시간 윈도우 기반 분류 방식보다 우수한 성능을 입증합니다.
This research paper presents a novel approach to temporal action localization in videos, combining multimodal and unimodal transformers to achieve state-of-the-art results on the Perception Test Challenge 2024 dataset.
A multi-level cross-scale solution called video self-stitching graph network (VSGN) is proposed to tackle the challenge of large action scale variation, especially for short actions, in temporal action localization.
LoSA, a memory-and-parameter-efficient backbone adapter, enables end-to-end training of large video foundation models for improved temporal action localization in untrimmed videos.