Core Concepts
A novel Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, dubbed SGV3D, that addresses the challenges of overfitting to background and specific camera poses within labeled scenes.
Abstract
The paper proposes SGV3D, a novel Scenario Generalization Framework for Vision-based Roadside 3D Object Detection. The key components are:
Background-suppressed BEV Detector with Background-suppressed Module (BMS):
Mitigates background overfitting in vision-centric pipelines by attenuating background features during the 2D to bird's-eye-view projection.
Semi-supervised Data Generation Pipeline (SSDG):
Generates diverse, well-labeled images under varying camera poses using unlabeled images from new scenes.
Addresses the risk of overfitting to specific camera settings, including intrinsic and extrinsic parameters.
The authors evaluate their method on two large-scale roadside benchmarks, DAIR-V2X-I and Rope3D. On the DAIR-V2X-I heterogeneous benchmark, SGV3D surpasses leading methods by +42.57%, +5.87%, and +14.89% for vehicle, pedestrian, and cyclist, respectively. On the larger-scale Rope3D heterologous benchmark, SGV3D achieves notable gains of +14.48% and +12.41% for car and big vehicle.
Stats
The background regions constitute the majority of BEV features in current vision-centric roadside 3D object detectors.
The distance errors for the background regions in new scenes are significantly larger than the errors in the labeled scenes.
The foreground distance errors in new scenes are more pronounced than those in labeled scenes.
Quotes
"Roadside perception can greatly increase the safety of autonomous vehicles by extending their perception ability beyond the visual range and addressing blind spots."
"Current state-of-the-art vision-based roadside detection methods possess high accuracy on labeled scenes but have inferior performance on new scenes."