Learning a Category-Level 3D Object Pose Estimator without Requiring Pose Annotations
We propose a method to learn a category-level 3D object pose estimator without requiring any pose annotations. By leveraging diffusion models to generate multiple views of objects and an image encoder to extract robust features, our model can learn the 3D pose correspondence from the generated views and outperform state-of-the-art methods on few-shot category-level pose estimation benchmarks.