Evaluative Process for 3D Content Creation from High-Resolution Text

This repository presents research on evaluating 3D content creation from high-resolution text inputs using advanced techniques like CLIP and loss functions tailored for text-to-3D mesh generation.

Introduction

This project explores advancements in high-resolution text-to-3D mesh generation using OpenAI’s CLIP, differentiable rendering, and various optimization techniques. The focus is on overcoming challenges in current loss functions to improve the quality and fidelity of generated 3D models.

Background

High-Resolution Text-to-3D Mesh Generation

Generating high-resolution 3D meshes from text involves transforming textual descriptions (e.g., "Evergreen tree") into accurate and detailed 3D shapes. Challenges include the need for computational resources and reliance on 3D-text datasets. Recent advancements use zero-shot learning powered by tools like CLIP.

How CLIP Supports Text-to-3D Generation

Differentiable Rendering: Converts 3D models into 2D images for evaluation.
CLIP-Based Optimization: Measures similarity between rendered images and text descriptions, driving iterative improvements.

Current Loss Models

Diffusion Prior Loss

Refines embeddings based on text-image pairs for improved consistency.

Regularization Losses (Geometry)

Maintains smooth shapes using techniques like Laplacian Regularization.

Regularization Losses (Texture)

Optimizes textures to avoid artifacts using maps like normal maps.

CLIP-Based Loss

Measures cosine similarity between text prompts and rendered images.

Viewpoint and Augmentation Losses

Encourages consistency across different angles and viewpoints.

The Unsolved Problem of CLIP-Loss

CLIP's 2D training bias causes challenges in generating consistent 3D shapes. Without sufficient constraints, optimization can lead to tangled or noisy meshes. Techniques like regularization constraints and viewpoint augmentation help mitigate these issues but increase computational demands.

Methodology

Baseline Approaches

Dream Fields and CLIP-Mesh: Evaluate single views at specific elevations.
DreamFusion: Averages across azimuths to reduce variance.

Proposed Information-Theoretic Weighting Approach

Preprocessing Pipeline

Generate four distinct views per mesh.
Compute Shannon entropy for each view as a measure of informational complexity.
Apply exponential weighting based on entropy using a hyperparameter p.

Enhanced Dataset Schema

Each 3D mesh is enriched with:

Human-annotated quality scores (MOS).
View-specific weights and R-Precision scores.

CLIP R-Precision Calculation

Compute individual R-Precision scores for each view.
Apply entropy-based weights.
Generate a weighted mean R-Precision score.

Dataset

The project uses the LIRIS lab's Graphics-LPIPS dataset:

Over 343,000 stimuli from 55 source models.
A subset of 3,000 stimuli with human-annotated quality scores (MOS).

Experimental Design

Loss Calculation

Compute weighted CLIP R-Precision scores for various values of p.

Linear Probe Training

Use 5-fold cross-validation to train a linear probe mapping R-Precision scores to MOS values.

Performance Evaluation

Evaluate predicted MOS values against human-annotated ground truth using Mean Squared Error (MSE).

References

Khalid et al., 2022: CLIP-Mesh
Jain et al., 2022: Dream Fields
Nehmé et al., 2023: Graphics-LPIPS Dataset

For more details, refer to the full research paper.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CLIP_MOS_Comparison.ipynb		CLIP_MOS_Comparison.ipynb
ClipInitialFindings.ipynb		ClipInitialFindings.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluative Process for 3D Content Creation from High-Resolution Text

Table of Contents

Introduction

Background

High-Resolution Text-to-3D Mesh Generation

How CLIP Supports Text-to-3D Generation

Current Loss Models

Diffusion Prior Loss

Regularization Losses (Geometry)

Regularization Losses (Texture)

CLIP-Based Loss

Viewpoint and Augmentation Losses

The Unsolved Problem of CLIP-Loss

Methodology

Baseline Approaches

Proposed Information-Theoretic Weighting Approach

Preprocessing Pipeline

Enhanced Dataset Schema

CLIP R-Precision Calculation

Dataset

Experimental Design

Loss Calculation

Linear Probe Training

Performance Evaluation

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluative Process for 3D Content Creation from High-Resolution Text

Table of Contents

Introduction

Background

High-Resolution Text-to-3D Mesh Generation

How CLIP Supports Text-to-3D Generation

Current Loss Models

Diffusion Prior Loss

Regularization Losses (Geometry)

Regularization Losses (Texture)

CLIP-Based Loss

Viewpoint and Augmentation Losses

The Unsolved Problem of CLIP-Loss

Methodology

Baseline Approaches

Proposed Information-Theoretic Weighting Approach

Preprocessing Pipeline

Enhanced Dataset Schema

CLIP R-Precision Calculation

Dataset

Experimental Design

Loss Calculation

Linear Probe Training

Performance Evaluation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages