LLM-Enhanced Prompt Engineering for Compositional Text-to-Image Generation

This project investigates whether LLM-based prompt enhancement can improve compositional text-to-image generation on the T2I-CompBench benchmark.

Overview

Text-to-image models like Stable Diffusion often struggle with compositional prompts that require:

Numeracy: Generating the correct number of objects ("four ships, two tents")
Spatial Relations: Positioning objects correctly ("a cat behind a dog")

Our approach uses Claude Opus 4.5 to enhance simple prompts with explicit spatial relations and scale-appropriate attributes, then evaluates whether this improves image generation quality.

Key Findings

Task	Baseline Win Rate	Enhanced Win Rate	Tie Rate
Numeracy	20%	51%	29%
3D Spatial	26%	51%	23%

Enhanced prompts outperform baseline prompts in ~51% of cases across both tasks.

Project Structure

├── T2I-CompBench_dataset/       # Original benchmark prompts
│   ├── numeracy.txt
│   ├── 3d_spatial.txt
│   └── ...
│
├── prompt_enhanced.py           # LLM-based prompt enhancement
├── stable_diffusion_pipeline.py # Batch image generation with SD3
├── extract_scene_graphs.py      # Extract scene graphs from prompts
├── iterative_scene_prompt.py    # Generate iterative prompts from scene graphs
│
├── numeracy_val/                # Numeracy evaluation data
│   ├── sampled_prompts.txt      # 100 baseline prompts
│   ├── enhanced_prompts.txt     # 100 enhanced prompts
│   ├── generated_images_baseline/
│   └── generated_images_enhanced/
│
├── 3d_spatial_val/              # 3D Spatial evaluation data
│   ├── sampled_prompts.txt
│   ├── enhanced_prompts.txt
│   ├── generated_images_baseline/
│   └── generated_images_enhanced/
│
├── evaluate_unbiased.py         # VLM-as-judge evaluation
├── evaluation_results_unbiased.json
└── evaluation_results_unbiased_3d.json

Pipeline

1. Prompt Enhancement

python prompt_enhanced.py --input_file numeracy_val/sampled_prompts.txt \
                          --output_file numeracy_val/enhanced_prompts.json

Example transformation:

Original	Enhanced
"four ships, two tents and two fish"	"Four toy ships on a table, with two small tents placed behind the ships and two toy fish in front of the ships."

The enhancement adds:

Clear spatial relations (behind, in front of, to the left)
Scale adjustments (toy, miniature) for coherent scenes
Simple surface placement (on a table, on the ground)

2. Image Generation

Generate images using Stable Diffusion for both baseline and enhanced prompts. Images should be named:

img_0_prompt_text.png
img_1_prompt_text.png
...

3. Evaluation (VLM as Judge)

export ANTHROPIC_API_KEY="your-api-key"
python evaluate_unbiased.py --limit 10  # Test with 10 pairs first
python evaluate_unbiased.py             # Full evaluation

Evaluation methodology:

Uses Claude Opus 4.5 as an unbiased judge
Both images evaluated against the same baseline prompt (fair comparison)
Image order is randomized to prevent position bias
Neutral labels ("Image A", "Image B") instead of "Baseline/Enhanced"
Supports Tie when both images perform equally

Scripts

`prompt_enhanced.py`

Enhances simple prompts with spatial relations using Claude.

python prompt_enhanced.py --test                    # Test mode
python prompt_enhanced.py --limit 10                # Process first 10
python prompt_enhanced.py --input_file prompts.txt  # Custom input

`stable_diffusion_pipeline.py`

Generates images using Stable Diffusion 3 Medium with memory optimizations for consumer hardware (CPU offload, fp16).

# Run the pipeline (configured in __main__)
python stable_diffusion_pipeline.py

`extract_scene_graphs.py`

Extracts structured scene graphs from natural language prompts.

python extract_scene_graphs.py --input_file T2I-CompBench_dataset/numeracy.txt \
                               --output_file scene_graphs_output/numeracy.json

`iterative_scene_prompt.py`

Generates progressive prompts by traversing scene graphs step-by-step.

python iterative_scene_prompt.py --test  # Quick test
python iterative_scene_prompt.py --scene_graph_file scene_graphs.json

`evaluate_unbiased.py`

Compares baseline vs enhanced images using VLM as judge.

python evaluate_unbiased.py \
  --baseline_prompts numeracy_val/sampled_prompts.txt \
  --enhanced_prompts numeracy_val/enhanced_prompts.txt \
  --baseline_images numeracy_val/generated_images_baseline/generated_images_baseline \
  --enhanced_images numeracy_val/generated_images_enhanced/generated_images_enhanced \
  --output_file evaluation_results.json

Installation

pip install -r requirements.txt

Requirements:

Python 3.8+
anthropic >= 0.39.0
tqdm >= 4.66.0

Environment:

export ANTHROPIC_API_KEY="your-api-key-here"

Evaluation Metrics

The evaluation produces:

{
  "summary": {
    "total_pairs": 100,
    "baseline_wins": 20,
    "enhanced_wins": 51,
    "ties": 29,
    "baseline_win_rate": 0.20,
    "enhanced_win_rate": 0.51,
    "tie_rate": 0.29,
    "position_bias_analysis": {
      "times_A_chosen": 34,
      "times_B_chosen": 37,
      "A_rate": 0.479
    }
  }
}

Position Bias Analysis: If A_rate ≈ 0.5, the evaluation has no position bias.

Unbiased Evaluation Design

Same Prompt for Both: Both images are evaluated against the baseline prompt only
Randomized Order: Which image is "A" vs "B" is randomized
Neutral Labels: No "baseline" or "enhanced" labels shown to the judge
Tie Support: Judge can declare a tie when both images are equally good/bad

Limitations

Evaluation uses a VLM (Claude) as judge, which may have its own biases
Image generation uses a single random seed per prompt
Sample size is 100 pairs per task

Citation

If you use this work, please cite:

@misc{prompt_enhancement_t2i,
  title={LLM-Enhanced Prompt Engineering for Compositional Text-to-Image Generation},
  author={Your Name},
  year={2024},
  howpublished={Course Project, CS 771}
}

Acknowledgments

T2I-CompBench for the benchmark dataset
Anthropic Claude for the LLM API
Stable Diffusion for image generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Enhanced Prompt Engineering for Compositional Text-to-Image Generation

Overview

Key Findings

Project Structure

Pipeline

1. Prompt Enhancement

2. Image Generation

3. Evaluation (VLM as Judge)

Scripts

`prompt_enhanced.py`

`stable_diffusion_pipeline.py`

`extract_scene_graphs.py`

`iterative_scene_prompt.py`

`evaluate_unbiased.py`

Installation

Evaluation Metrics

Unbiased Evaluation Design

Limitations

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
T2I-CompBench_dataset		T2I-CompBench_dataset
__pycache__		__pycache__
scene_graphs_output		scene_graphs_output
.DS_Store		.DS_Store
README.md		README.md
convert_to_prompt.py		convert_to_prompt.py
evaluate_numeracy.py		evaluate_numeracy.py
evaluate_unbiased.py		evaluate_unbiased.py
evaluation_results_3d.json		evaluation_results_3d.json
evaluation_results_numeracy.json		evaluation_results_numeracy.json
evaluation_results_unbiased.json		evaluation_results_unbiased.json
evaluation_results_unbiased_3d.json		evaluation_results_unbiased_3d.json
extract_scene_graphs.py		extract_scene_graphs.py
prompt_enhanced.py		prompt_enhanced.py
requirements.txt		requirements.txt
stable_diffusion_pipeline.py		stable_diffusion_pipeline.py

Vaibhav-03/CV_Project_771

Folders and files

Latest commit

History

Repository files navigation

LLM-Enhanced Prompt Engineering for Compositional Text-to-Image Generation

Overview

Key Findings

Project Structure

Pipeline

1. Prompt Enhancement

2. Image Generation

3. Evaluation (VLM as Judge)

Scripts

prompt_enhanced.py

stable_diffusion_pipeline.py

extract_scene_graphs.py

iterative_scene_prompt.py

evaluate_unbiased.py

Installation

Evaluation Metrics

Unbiased Evaluation Design

Limitations

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`prompt_enhanced.py`

`stable_diffusion_pipeline.py`

`extract_scene_graphs.py`

`iterative_scene_prompt.py`

`evaluate_unbiased.py`

Packages