Grunnleggende konsepter
A novel Graph Transformer Neural Network (GTNN) algorithm is proposed for efficient event-based motion segmentation without requiring prior knowledge about the scene geometry or camera motion.
Sammendrag
The proposed framework consists of two key components:
- Graph Transformer Neural Network (GTNN) Algorithm for Event-based Motion Segmentation:
- GTNN processes raw event streams as 3D graphs, capturing the spatiotemporal correlations between events through a series of nonlinear transformations.
- Based on these correlations, GTNN classifies events into those belonging to moving objects (foreground) or the static background, without any prior knowledge about the scene or camera motion.
- GTNN utilizes a point transformer layer, transition down and up modules to effectively encode and decode the 3D event graphs.
- An effective training scheme is proposed to facilitate efficient training on extensive datasets, leading to faster convergence and improved performance.
- Dynamic Object Mask-aware Event Labeling (DOMEL) Approach:
- DOMEL is an offline approach to generate approximate ground-truth labels for event-based motion segmentation datasets.
- It leverages the high temporal resolution of event data and the spatial information from corresponding grayscale frames to accurately label events as foreground (moving objects) or background.
- DOMEL is used to label a new Event-based Motion Segmentation (EMS-DOMEL) dataset, which is released for further research and benchmarking.
Extensive experiments on publicly available datasets and the new EMS-DOMEL dataset demonstrate that the proposed GTNN algorithm outperforms state-of-the-art event-based motion segmentation approaches in terms of segmentation accuracy (IoU%) and detection rate (DR%), especially in the presence of dynamic backgrounds, varying motion patterns, and multiple moving objects of different sizes and velocities.
Statistikk
The proposed GTNN algorithm achieves an average increase of 9.4% in motion segmentation accuracy (IoU%) and 4.5% in detection rate (DR%) compared to state-of-the-art methods.
The EMS-DOMEL dataset contains event sequences captured using DAVIS346c and DVXplorer cameras, with varying numbers of dynamic objects (1-4) and corresponding event counts for foreground (dynamic object events) and background (camera motion events).
Sitater
"Neuromorphic vision sensors are tailored for motion perception due to their asynchronous nature, high temporal resolution, and reduced power consumption."
"Transformers have been recently used for event-based applications and have demonstrated outstanding performance in processing event streams for event-denoising, object detection, and action and gesture recognition."