Multi-view Temporal Granularity Aligned Aggregation for Enhancing Event-based Lip-reading Performance
A novel multi-view learning method termed Multi-view Temporal Granularity Aligned Aggregation (MTGA) is proposed to effectively integrate global spatial features from event frames and local spatio-temporal features from voxel graph list for improved event-based lip-reading performance.