Skip to content

run tests on real images#66

Open
coolguy-stack wants to merge 1 commit intodevelopmentfrom
model/cnn_baseline_test_real_images
Open

run tests on real images#66
coolguy-stack wants to merge 1 commit intodevelopmentfrom
model/cnn_baseline_test_real_images

Conversation

@coolguy-stack
Copy link
Collaborator

@coolguy-stack coolguy-stack commented Feb 23, 2026

Description of Changes

Linked Issue

Closes #issue_number

Type of Change

  • Model: Machine learning model development, training, or architecture changes
  • Data: Dataset management, preprocessing, or data pipeline work
  • Web: Frontend dashboard or web interface development
  • API: Backend API development or FastAPI endpoints
  • Research: Experimental features or research exploration
  • Documentation: Documentation updates or technical writing
  • Bug Fix: Fixes a bug or issue
  • Refactor: Code cleanup or optimization without functional changes

Screenshots/Results

Assignment Instructions

OpenImages Real-Only Evaluation & Threshold Calibration

I evaluated the current EfficientNet-B0 baseline on a real-only subset of OpenImagesV7 to measure false positive behavior on natural images.

Setup

  • Dataset: OpenImagesV7 (224 real images, all labels = real)
  • Metric: False Positive Rate (FPR) and score distribution (prob_fake)
  • Model default threshold (from config): 0.452

Results (real-only)

  • At default threshold 0.452:
    • FPR ≈ 53.6%
    • Score stats: mean ≈ 0.51, p95 ≈ 0.96, max ≈ 0.99
      → The model is highly overconfident and flags many real photos as fake.

Threshold calibration on OpenImages (real-only)
I swept thresholds to hit target FPR levels:

Target FPR Threshold Actual FPR
20% 0.8900 20.1%
10% 0.9466 10.3%
5% 0.9607 5.36%
1% 0.9788 1.34%

A practical operating point for ~5% FPR on real photos is threshold = 0.9607.

Tradeoff on in-distribution real/fake test set
Using a standard test split (with both real and fake):

  • At default threshold 0.452:
    • FPR (real) ≈ 18%
    • TPR (fake) ≈ 88%
  • At calibrated threshold 0.9607:
    • FPR (real) ≈ 0%
    • TPR (fake) ≈ 24%

Additional Comments

Checklist

  • The branch has been rebased with the latest development branch
  • This PR has the project manager assigned as a reviewer
  • This PR has myself assigned as the assignee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant