動機
2026 年的 Text-to-3D 領域有 Hunyuan3D-2、Trellis、InstantMesh、TripoSR、Shap-E 等多個開源模型,但缺乏一個輕量、可重現、無需 ground truth 的橫向比較。本專案提供:
- 10 組 prompt,涵蓋簡單物件 / 組合物件 / 角色 / 場景 / 細節 / 抽象六大類別
- 三大量化指標:CLIP Score(文字-3D 對齊)、Mesh Quality(幾何品質)、Aesthetic Score(視覺美感)
- 統一的 generate → render → metrics → report pipeline,每階段可獨立快取與重跑
CLIP Score
OpenCLIP ViT-L/14 對 8 個視角圖與 prompt 文字的餘弦相似度平均。
預期 0.20 – 0.32
Mesh Quality
watertightness、manifoldness、normal consistency、頂點/面數,使用 trimesh + pymeshlab。
多維度 sub-metrics
Aesthetic Score
LAION improved-aesthetic-predictor,CLIP+MLP 對 8 視角圖打分後平均。
1.0 – 10.0
10 組 Prompt
均衡分佈:2 簡單物件 + 2 組合物件 + 2 角色 + 2 場景 + 1 細節 + 1 抽象
| ID | 類別 | 描述 |
|---|---|---|
| p01 | simple | A red ceramic coffee mug on a white background |
| p02 | simple | A wooden rocking chair |
| p03 | compound | A vintage bicycle with a leather saddle and metal bell |
| p04 | compound | A fantasy treasure chest with golden hinges and emerald gems |
| p05 | character | A cartoon corgi wearing a blue astronaut helmet |
| p06 | character | A medieval knight in silver plate armor holding a longsword |
| p07 | scene | A small Japanese garden with a stone lantern and koi pond |
| p08 | scene | A cyberpunk street food cart at night with neon signs |
| p09 | detail | An ornate Victorian pocket watch with intricate gear engravings |
| p10 | abstract | An abstract sculpture representing ‘flowing time’, smooth marble |
Status
評估進行中。Benchmark 結果將在實驗完成後更新於此頁。