SG-37902 - Remove duplicate filter conditions in find and summarize operations #409
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SG-37902 - Deduplicate filter conditions in the Python API before sending requests
Overview
This change introduces automatic deduplication of filter conditions within the Python API client. This prevents redundant filter data from being sent to the server, which reduces payload size and minimizes network latency. While this adds a small amount of overhead to local processing, the resulting decrease in network transfer time improves the end-to-end performance of
find()
andsummarize()
operations. The fix is self-contained within the Python API and requires no server-side changes.Problem
API requests constructed with duplicate filter conditions are inefficient. They result in:
Solution
A new set of utility functions has been added to
shotgun_api3/shotgun.py
to recursively clean filters before they are sent to the server.remove_duplicate_filters(filters)
: The main entry point that sanitizes a list of filters.set
of normalized filter representations to efficiently track and discard duplicates while preserving the original order of the first occurrence.try...except
block to ensure that if any unexpected error occurs during deduplication, the original, untouched filters are sent to the server, preventing any client-side crashes._translate_filters_list
function now callsremove_duplicate_filters
before processing, making the change transparent to all API methods that use filters (find
,find_one
,summarize
).By handling deduplication on the client side, the API creates a more efficient and robust integration, saving network bandwidth and reducing the overall request time.
Performance Impact
Client-side deduplication has two main effects: a minor processing overhead and a major payload reduction.
Click to view Full Benchmark Report
Overview of Tests
I tested the performance overhead and payload reduction of the client-side deduplication feature by running a series of filter translation operations. The tests covered scenarios with a small (10), medium (100), and large (1000) number of filters. For each of these sizes, I tested cases with no duplicates (0%), few duplicates (20%), some duplicates (50%), and many duplicates (90%).
Results Summary
The following table shows the median time it takes to prepare the API request (overhead) and the final size of the data sent to the server (payload). A positive percentage in the 'Impact' column means the feature made the process slower.
How the Tests Were Run
I ran the tests by directly calling the filter translation logic within the Python API. To get stable results, I ran each scenario 50 times and recorded the median processing time and resulting payload size.
Example
The following example demonstrates how the filter list is cleaned before being sent to the server.
Resulting
conditions
sent to the server (After client-side deduplication):The Python API now generates a cleaner, smaller set of conditions to send in the request body.
Additional examples including complex nested filters and edge cases can be found in the new test suite at tests/test_unit.py.