Open-source · Reproducible · Quantitative

T2M-Benchmark

用 10 組均衡分佈的描述,對開源 Text-to-3D 模型做量化評估。

動機

2026 年的 Text-to-3D 領域有 Hunyuan3D-2、Trellis、InstantMesh、TripoSR、Shap-E 等多個開源模型,但缺乏一個輕量、可重現、無需 ground truth 的橫向比較。本專案提供:

CLIP Score

OpenCLIP ViT-L/14 對 8 個視角圖與 prompt 文字的餘弦相似度平均。

預期 0.20 – 0.32

Mesh Quality

watertightness、manifoldness、normal consistency、頂點/面數,使用 trimesh + pymeshlab。

多維度 sub-metrics

Aesthetic Score

LAION improved-aesthetic-predictor,CLIP+MLP 對 8 視角圖打分後平均。

1.0 – 10.0

10 組 Prompt

均衡分佈:2 簡單物件 + 2 組合物件 + 2 角色 + 2 場景 + 1 細節 + 1 抽象

ID類別描述
p01simpleA red ceramic coffee mug on a white background
p02simpleA wooden rocking chair
p03compoundA vintage bicycle with a leather saddle and metal bell
p04compoundA fantasy treasure chest with golden hinges and emerald gems
p05characterA cartoon corgi wearing a blue astronaut helmet
p06characterA medieval knight in silver plate armor holding a longsword
p07sceneA small Japanese garden with a stone lantern and koi pond
p08sceneA cyberpunk street food cart at night with neon signs
p09detailAn ornate Victorian pocket watch with intricate gear engravings
p10abstractAn abstract sculpture representing ‘flowing time’, smooth marble

Status

評估進行中。Benchmark 結果將在實驗完成後更新於此頁。