Scaling Up End-to-End Temporal Action Detection with Efficient Adapter Tuning
By introducing a novel temporal-informative adapter and an alternative adapter placement, our method AdaTAD achieves state-of-the-art performance on multiple temporal action detection datasets, becoming the first end-to-end approach to outperform the best feature-based methods.