Outlier detection in embeddings
download clip_visual.onnx from here
download model_wat.onnx from here
get features with image_text_features_web copy them to folder ./clean/
train_gmm.py -> trains gmm with features from ./clean
testing.ipynb -> notebook for comparing distributions of ./clean/ and ./test/ for manual adjusting of threshold
anti_sus.py -> zeromq server for filtering outlier images. Receives batch of rgb numpy images, returns indexes of good images.
It has 2 step filtering:
- gmm score threshold
- watermark detection (filters images with watermarks, trained on scenery_watermarks)
docker build -t qwertyforce/anti_sus_nomad:1.0.0 --network host -t qwertyforce/anti_sus_nomad:latest ./
docker run -d --network host --name anti_sus_nomad qwertyforce/anti_sus_nomad:1.0.0