wawasan - Computer Vision - # FastCAD Methodology for Real-Time CAD Retrieval and Alignment
FastCAD: Real-Time CAD Retrieval and Alignment for Augmented Reality and Robotics
Konsep Inti
FastCAD is a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene, achieving high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework.
Abstrak
Abstract:
Digitizing the 3D world into clean, CAD model-based representations is crucial for augmented reality and robotics.
FastCAD proposes a real-time method that accelerates inference time by 50x compared to other methods operating on RGB-D scans while outperforming them on the Scan2CAD alignment benchmark.
Introduction:
Representing environments with aligned 3D CAD models is essential for downstream tasks in AR or robotics.
FastCAD introduces an efficient system for predicting CAD retrievals and alignments in just 50 ms, enabling online application to videos at 10 FPS.
Method:
FastCAD predicts CAD alignments and shape embeddings simultaneously, achieving state-of-the-art alignment accuracy on the Scan2CAD benchmark.
The learned embedding space allows for high-quality shape retrievals even when retrieving less optimal CAD models.
Experimental Setup:
Trained and validated on ScanNet with annotations from Scan2CAD, demonstrating superior performance compared to competing methods.
Results:
FastCAD significantly improves alignment accuracy from 43.0% to 48.2% when using videos as input, running at least three times faster than competitors.
Shape accuracy remains high even when retrieving less optimal CAD models from the learned embedding space.
Ablations:
Directly predicting shape embeddings leads to better reconstruction and shape accuracy compared to two-step retrieval methods.
Training the encoder network with auxiliary tasks improves reconstruction and shape accuracy significantly.
FastCADは高速な推論時間を実現するために、効率的な設計を採用しています。まず、FastCADはCADモデルのアラインメントと形状埋め込みを一度に直接予測することで、2段階のプロセスではなく単一のステップで処理を行います。これにより、他の方法よりも迅速な推論が可能となっています。また、RGBビデオからCADベースの再構築をリアルタイムで行う際には、Ju et al. [2023]の出力から情報を統合するだけで100 ms(Ju et al. [2023] の実行時間50 ms + FastCAD の実行時間50 ms)で処理が完了します。