Research: AIGenBench Benchmark Analysis — Architecture, Dataset Overlap & Submission Strategy by Vihaan-Singhal1 · Pull Request #67 · McMasterAI-Society/DeepFakeDetector

Vihaan-Singhal1 · 2026-02-24T19:21:03Z

Summary

Closes #65

This PR delivers docs/research/aigenbench_analysis.md — a research-paper-quality analysis of the AI-GenBench benchmark covering all three requirements from issue #65.

Benchmark results table for all evaluated baselines (ViT-L/14 DINOv2, ViT-L/14 CLIP, ResNet-50 CLIP) with AUROC and accuracy
Full mathematical treatment: multi-head self-attention, DINOv2 self-distillation loss (teacher-student), residual block formula
Key design choices from the companion paper (arXiv:2511.21507): why full fine-tuning beats linear probing, why resize beats multi-crop, why JPEG augmentation is the single most impactful factor
Comparison table of our current models vs AIGenBench SOTA with estimated generalization AUROC and gap analysis

2. Dataset Overlap Analysis (Critical Finding)

DRAGON directly consolidates Synthbuster and GenImage — both are AIGenBench source datasets
OpenFake contains SD 1.5/2.1/XL, DALL-E 3, and FLUX — overlapping AIGenBench windows w7–w8
WildFake's GAN content overlaps ForenSynths (windows w0–w3)
Result: contamination in 7 of 9 evaluation windows — current training data cannot be used for a valid leaderboard submission
Includes Jaccard overlap coefficients per window and a clean data decision matrix

3. Testing Strategy and Submission Process

Dataset access, environment setup, and PyTorch Lightning framework overview
Full DetermAugment pipeline specification (JPEG compression, blur, noise, resize)
Algorithm 1 temporal training pseudocode
Submission requirements (public codebase + arXiv report)
Step-by-step adaptation plan for integrating our models into the AIGenBench framework

Document Highlights

13 tables covering generator taxonomy, benchmark results, overlap matrices, clean data strategy, performance projections
Math throughout: AUROC integral, temporal generalization score $\bar{G}$, self-attention, DINOv2 distillation loss, F1/Precision/Recall
ASCII visualizations: sliding window protocol diagram, dataset overlap coverage diagram
13 references spanning the full literature from CycleGAN (2017) to DINOv2 (2024)

Actions Recommended

Upgrade ViT-Base → ViT-L/14 DINOv2 for AIGenBench submission
Add JPEG compression simulation (quality 50–99) to all training augmentation pipelines immediately
Exclude DRAGON and overlap-contaminated OpenFake subsets before any AIGenBench evaluation

🤖 Generated with Claude Code

…erlap, submission strategy Adds docs/research/aigenbench_analysis.md covering: - Architecture analysis of top AIGenBench models (ViT-L/14 DINOv2, CLIP, ResNet-50 CLIP) with full math: attention, DINOv2 self-distillation loss, AUROC/F1 formulas - Critical dataset contamination finding: DRAGON consolidates Synthbuster + GenImage (both AIGenBench sources); OpenFake/WildFake overlap 7 of 9 evaluation windows - Clean data strategy for valid leaderboard submission - Testing and submission protocol (PyTorch Lightning framework, DetermAugment pipeline, sliding window evaluation) - Roadmap: upgrade to ViT-L/14 DINOv2, add JPEG compression augmentation, ensemble with FFT Closes #65 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

24-page styled PDF generated from aigenbench_analysis.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Vihaan-Singhal1 and others added 2 commits February 24, 2026 19:20

docs: add PDF version of AIGenBench analysis

2b5ede1

24-page styled PDF generated from aigenbench_analysis.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: AIGenBench Benchmark Analysis — Architecture, Dataset Overlap & Submission Strategy#67

Research: AIGenBench Benchmark Analysis — Architecture, Dataset Overlap & Submission Strategy#67
Vihaan-Singhal1 wants to merge 2 commits intodevelopmentfrom
research/aigenbench-analysis

Vihaan-Singhal1 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Vihaan-Singhal1 commented Feb 24, 2026

Summary

Contents

1. Architecture Analysis of Top-Performing Models

2. Dataset Overlap Analysis (Critical Finding)

3. Testing Strategy and Submission Process

Document Highlights

Actions Recommended

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant