DEEPFAKE DETECTION

Documentation is loosely updated.

$ git clone https://github.com/BumpiestDig10/deepfake-detector.git
$ python -m venv venv
$ source venv/bin/activate  # Linux
$ venv/Scripts/activate     # Windows
$ pip install -r allRequirements.txt

To run the Deepfake Training Orchestrator (UI Dashboard with all the tools)

On the dashboard, paramaeters for all tools will have "Browse Files" and "Browse Folders" buttons, be smart about what you should actually input.

python -m ui.dashboard

To use the detector

Input may be a single image, multiple images, a folder containing image(s), or a csv file containing features (headers must be ranging from "feature_0" to "feature_2047").
Use the showOutput tag with caution. It is not very refined and consumes a lot of RAM (depending on the number of images to and display)

python -m detectors.2048FeatureDetector --input "path/to/input" --modelPath "path/to/model.joblib" --featureExtractor "(OPTIONAL) ResNet50 OR InceptionV3" --weights "(OPTIONAL) imagenet" --output "(OPTIONAL) path/to/outputDirectory" [--showOutput]
# --modelPath is optional if "results/imageModels/ResNet50_imagenet/32kModel/randomForest/best_random_forest_model.joblib" exists.

To calculate SHA256 hash of a file

python -m utils.filehash --input "path/to/inputFile"

To run the InceptionV3 Feature Extractor

python -m utils.featureExtractor.InceptionV3_image_feature_extractor --input "relativePath/to/input_directory" --output "(OPTIONAL) relativePath/to/output_file" --weights "(OPTIONAL) imagenet"

To run the ResNet50 Feature Extractor

python -m utils.featureExtractor.ResNet50_image_feature_extractor --input "relativePath/to/input_directory" --output "(OPTIONAL) relativePath/to/output_file" --weights "(OPTIONAL) imagenet"

To run the Metadata Parser

python -m utils.metadata.metadata_parser --input "relativePath/to/input_directory" --output "(OPTIONAL) relativePath/to/output_file"

To download image datasets from Hugging Face

python -m utils.preprocessor.hf_to_image --dataset "huggingFace/Dataset" --split "(OPTIONAL) train" --output "(OPTIONAL) relativePath/to/output_directory" --token "(OPTIONAL) huggingFaceAccessToken"

To run the Instagram Profile Downloader (no private accounts)

Change the username and password in the script if needed. The one mentioned is a burner and may or may not work for you.
Create instaProfile/usernames.txt to sequentially download for each username mentioned. Not required for single profile.

python -m utils.preprocessor.insta_profile_download

To run the Reddit Downloader

This tool works as a Chrome extension to bypass Reddit login issues.
Downloads everything to the Downloads/ folder.
Logs can be checked using Chrome's DevTools Console.

- Navigate to chrome://extensions/ using Google Chrome.
- Enable Developer Mode.
- Select "Load Unpacked".
- Select the folder with all files of the extension. Default should be utils/preprocessor/reddit_downloader.

To find and delete duplicate files in a particular directory (Windows-only)

Update the target folder in duplicate_finder.ps1.

Execute the script in powershell.

$ ./utils/preprocessor/duplicate_finder.ps1 # From project root directory or
$ ./duplicate_finder.ps1                    # if CWD = utils/preprocessor/

To Train a Random Forest Model

Make sure you check the parameter grid before running.

python -m trainers.RandomForestTrainer --input "path/to/features.csv" --output "(OPTIONAL) path/to/outputDirectory" --test_size (OPTIONAL) 0.2 --random_state (OPTIONAL) 420

To Train a XGBoost Model

Make sure you check the parameter grid before running.

python -m trainers.XGBoostTrainer --input "path/to/features.csv" --output "(OPTIONAL) path/to/outputDirectory" --test_size (OPTIONAL) 0.2 --random_state (OPTIONAL) 420

To Train a CatBoost Model

Make sure you check the parameter grid before running.
Some issues with keyboardInterrupt.

python -m trainers.CatBoostTrainer --input "path/to/features.csv" --output "(OPTIONAL) path/to/outputDirectory" --test_size (OPTIONAL) 0.2 --random_state (OPTIONAL) 420

TODO: (for images branch)

Note

Args:

filehash.py: input
2048FeaturesDetector.py: input, modelPath, featureExtractor (optional), weights (optional), output (optional), showOutput (this is a toggle switch, don't include this flag if you don't want to view all photos with predictions)
/utils/featureExtractor/
- InceptionV3_image_feature_extractor.py: input, output (optional), weights (optional)
- ResNet50_image_feature_extractor.py: input, output (optional), weights (optional)
/utils/metadata/
- metadata_parser.py: input, output (optional)
/utils/preprocessor/
- csv_mapNmerge.py: base, label
- hf_to_image.py: dataset, split (optional), output (optional), token (optional)
- real_fake_csv_merger.py: real, fake, output (optional)
/trainers/
- RandomForestTrainer.py: input, output (optional), test_size (optional), random_state (optional)
- XGBoostTrainer.py: input, output (optional), test_size (optional), random_state (optional)
- CatBoostTrainer.py: input, output (optional), test_size (optional), random_state (optional)

Labels:

Real = 1
Fake = 0

RESULTS

32k Models

Model	Random Forest	XGBoost
Dataset Type	Images	Images
Dataset Size	Total: 31,762 Real: 15,364 Fake: 16,398	Total: 31,762 Real: 15,364 Fake: 16,398
Feature Extractor	ResNet50 (imagenet)	ResNet50 (imagenet)
Data Split	Train: 80% (25,409 images) Test: 20% (6,353 images)	Train: 80% (25,409 images) Test: 20% (6,353 images)
params	n_estimators: 150 max_depth: null min_samples_split: 5 min_samples_leaf: 1 max_features: 0.2 bootstrap: false	n_estimators: 100 max_depth: 3 learning_rate: 0.01 min_child_weight: 1 gamma: 0 reg_alpha: 0 reg_lambda: 1 subsample: 0.8 colsample_bytree: 0.8 scale_pos_weight: 1
Report Path	Classification Report \| Full Results	Classification Report \| Full Results
Accuracy	0.854	0.869
Precision	0.854	0.869
F1 Score	0.854	0.869
Matthews Correlation Coefficient	0.7070566271285679	0.7389886538917796
Cohen's Kappa	0.7070160654228586	0.7388856764484775
Balanced Accuracy	0.8536314517473194	0.8696438988673973

Datasets Used:

JamieWithofs/Deepfake-and-real-images-4
StyleGan-StyleGan2 Deepfake Face Images
Fake-Vs-Real-Faces (Hard)
Images scraped from Instagram and Reddit

Important

This project is under the MIT license but the datasets used may be under different licenses. You must comply with them all when using any of the trained models from this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
detectors		detectors
images/Deepfake Training Orchestrator		images/Deepfake Training Orchestrator
models		models
results/imageModels/ResNet50_imagenet/32kModel		results/imageModels/ResNet50_imagenet/32kModel
trainers		trainers
ui		ui
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
allRequirements.txt		allRequirements.txt
centralLogging.py		centralLogging.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEEPFAKE DETECTION

TODO: (for images branch)

RESULTS

32k Models

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DEEPFAKE DETECTION

TODO: (for images branch)

RESULTS

32k Models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages