Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx] Implementation of IVF faiss indices in PDQ #1756

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

b8zhong
Copy link
Contributor

@b8zhong b8zhong commented Feb 6, 2025

Summary

PDQ now using IVF Faiss for better scalability (as we move away from the old imp)

  • Adds PDQSignalTypeIndex2 that automatically switches between flat and IVF indices based on dataset size
  • Uses IVF-Faiss for datasets >= 1000 entries (as you said), flat index for less than
  • Maintains backward compatibility with existing PDQIndex
  • Updates signal.py to use the new index implementation as default (if we're not swapping yet, I'll change it back)

Test Plan

Run the tests....

python3 -m pytest threatexchange/signal_type/tests/test_pdq_signal_type_index2.py -v -W ignore::DeprecationWarning

Passes.. ?

@b8zhong b8zhong requested a review from Dcallies as a code owner February 6, 2025 20:31
@b8zhong b8zhong changed the title Implementation of IVF faiss indices in PD [py-tx] Implementation of IVF faiss indices in PDQ Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants