The paper proposes PartDistill, a cross-modal distillation framework that transfers 2D knowledge from vision-language models (VLMs) to facilitate 3D shape part segmentation. PartDistill addresses three major challenges in this task:
PartDistill consists of a teacher network that uses a VLM to make 2D predictions and a student network that learns from the 2D predictions while extracting geometrical features from multiple 3D shapes to carry out 3D part segmentation. A bi-directional distillation, including forward and backward distillations, is carried out within the framework, where the former forward distills the 2D predictions to the student network, and the latter improves the quality of the 2D predictions, which subsequently enhances the final 3D segmentation.
PartDistill can also leverage existing generative models to enrich knowledge sources for distillation. Extensive experiments demonstrate that PartDistill surpasses existing methods by substantial margins on widely used benchmark datasets, ShapeNetPart and PartNetE, with more than 15% and 12% higher mIoU scores, respectively. PartDistill consistently outperforms competing methods in zero-shot and few-shot scenarios on 3D data in point clouds or mesh shapes.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Ardian Umam,... at arxiv.org 04-17-2024
https://arxiv.org/pdf/2312.04016.pdfDeeper Inquiries