Efficient End-to-end Multi-person Gaze Target Detection with Head-Target Association
GazeHTA, an end-to-end multi-person gaze target detection framework, leverages semantic features from a pre-trained diffusion model, improves head priors through head feature re-injection, and establishes explicit associations between heads and gaze targets with a connection map.