MLLMs are explored for object-level perception in videos, introducing novel tasks and datasets to enhance performance.
MLLMs can effectively handle object-level tasks in videos with the introduction of ElysiumTrack-1M dataset and T-Selector model.