Conceitos essenciais
T3Bench, the first comprehensive benchmark for evaluating text-to-3D generation methods, provides diverse prompt suites and automatic evaluation metrics that closely correlate with human judgments.
Resumo
The paper introduces T3Bench, the first comprehensive benchmark for evaluating text-to-3D generation methods. The benchmark includes:
- Diverse prompt suites with increasing complexity, including single object, single object with surroundings, and multiple objects.
- Two novel evaluation metrics that assess the quality and alignment of the generated 3D scenes:
- The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency.
- The alignment metric uses multi-view captioning and GPT-4 evaluation to measure text-3D consistency.
The authors benchmark 10 prevalent text-to-3D methods on T3Bench and highlight several common challenges, including:
- Density collapse and inconsistency issues with Score Distillation Sampling (SDS) guidance.
- The need for improved geometry initialization and efficiency in current methods.
- The limitations of leveraging 2D diffusion models for 3D generation, particularly in handling out-of-distribution prompts and ensuring view consistency.
The proposed metrics closely correlate with human judgments, providing a reliable and efficient way to evaluate text-to-3D methods.
Estatísticas
Current text-to-3D techniques require a minimum of half an hour and potentially several hours for a single prompt, making it challenging to test with larger sets of prompts.
The 3D to 2D rendering process and the limitations of 3D captioning frameworks result in an inevitable loss of information during the evaluation.
Citações
"It is a narrow mind which cannot look at a subject from various points of view."
George Eliot