Update benchmarks with PDQ Faiss results #1755

b8zhong · 2025-02-06T17:18:10Z

Summary

Update the benchmarks, plus benchmark_pdq_faiss_matchers.py didn't have any results in the markdown file, so I added that as well.

Test Plan

Yes

Dcallies

Threshold of 255 is very unrealistic, and is voiding your benchmark! We could be even faster by just return not index.is_empty(), since 255 is the maximum possible distance.

Dcallies · 2025-02-10T19:20:57Z

python-threatexchange/benchmarks/README.MD

+	 faiss_threads :  1
+	 dataset_size :  10000
+	 num_queries :  1000
+	 thresholds :  [255]


This is a very unrealistic threshold! We should probably only be testing at 31 - this is why your multihash results are so poor, since you are forcing it search every subtree instead of the ones nearby.

Lol yep; totally right there. I thought a high threshold enforces strict matching. Changed to 31

Dcallies · 2025-02-10T19:22:46Z

python-threatexchange/benchmarks/README.MD

@@ -1,45 +1,81 @@
-# pytx-vpdq
-Benchmark vPDQ implementation in threatexchange library
+# pytx-vPDQ


blocking q: I am confused by why this is here instead of python-threatexchange/vpdq/README.md

Any ideas?

Not sure either... maybe not to clutter the docs? I can move it -- up to you

#1122

Don't think there was a specific reasoning originally here either tho

I think it probably makes more sense in vpdq, if you want to move it in a followup will leave for you.

Dcallies · 2025-02-10T19:24:08Z

python-threatexchange/benchmarks/benchmark_vpdq_index.py

-    else:
-        raise ValueError("Invalid test type")


This seems useful to keep in, since the error will otherwise be on L76 with variable undefined. This is else is also to defend against future developers adding a new test_type and forgetting to update it here.

Kk; I thought argparse would stop before we reached here, so that's why I did it

Reverted it back tho

It would in normal operation - this else is to defend against a future developer adding it to the argparse and then forgetting to add a case here. This error is meant to save them time debugging.

Dcallies · 2025-02-11T15:53:08Z

python-threatexchange/benchmarks/README.MD

@@ -1,45 +1,81 @@
-# pytx-vpdq
-Benchmark vPDQ implementation in threatexchange library
+# pytx-vPDQ


I think it probably makes more sense in vpdq, if you want to move it in a followup will leave for you.

Dcallies · 2025-02-11T15:54:37Z

python-threatexchange/benchmarks/README.MD

+	PDQFlatHashIndex - Total Time to search  (s):  0.012083053588867188
+	PDQMultiHashIndex - Total Time to search  (s):  0.01529383659362793


Hmm, this is closer to what I would expect, but this might be worth digging more into later. I think we are choosing the wrong thresholds for these.

Update benchmarks with PDQ Faiss results

ddd2292

b8zhong requested a review from Dcallies as a code owner February 6, 2025 17:18

facebook-github-bot added the CLA Signed label Feb 6, 2025

fixes to benchmark script

673e933

Dcallies requested changes Feb 10, 2025

View reviewed changes

docs: update benchmark

6527a87

Dcallies approved these changes Feb 11, 2025

View reviewed changes

Dcallies merged commit 092e081 into facebook:main Feb 11, 2025
6 checks passed

b8zhong deleted the py-tx-benchmarks branch February 11, 2025 22:12

b8zhong mentioned this pull request Feb 12, 2025

[vpdq] Move benchmark files #1760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmarks with PDQ Faiss results #1755

Update benchmarks with PDQ Faiss results #1755

b8zhong commented Feb 6, 2025

Dcallies left a comment

Dcallies Feb 10, 2025

b8zhong Feb 11, 2025

Dcallies Feb 10, 2025

b8zhong Feb 11, 2025

Dcallies Feb 11, 2025

Dcallies Feb 10, 2025

b8zhong Feb 11, 2025 •

edited

Loading

Dcallies Feb 11, 2025

Dcallies Feb 11, 2025

Dcallies Feb 11, 2025

		PDQFlatHashIndex - Total Time to search (s): 0.012083053588867188
		PDQMultiHashIndex - Total Time to search (s): 0.01529383659362793

Update benchmarks with PDQ Faiss results #1755

Update benchmarks with PDQ Faiss results #1755

Conversation

b8zhong commented Feb 6, 2025

Summary

Test Plan

Dcallies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b8zhong Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b8zhong Feb 11, 2025 •

edited

Loading