Improving Learned Bird's-Eye View (BEV) Encoders by Combining Temporal Aggregation in Image and BEV Spaces
Combining temporal aggregation in image and BEV latent spaces can significantly improve the performance of learned BEV encoders for 3D object detection and BEV segmentation tasks.