Skip to content

SG-37902 - Remove duplicate filter conditions in find and summarize operations #409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

camiloleal-globant
Copy link

SG-37902 - Deduplicate filter conditions in the Python API before sending requests

Overview

This change introduces automatic deduplication of filter conditions within the Python API client. This prevents redundant filter data from being sent to the server, which reduces payload size and minimizes network latency. While this adds a small amount of overhead to local processing, the resulting decrease in network transfer time improves the end-to-end performance of find() and summarize() operations. The fix is self-contained within the Python API and requires no server-side changes.

Problem

API requests constructed with duplicate filter conditions are inefficient. They result in:

  • Larger Payloads: Unnecessary data is sent over the network, consuming bandwidth.
  • Increased Latency: Larger requests take more time to transfer, especially on slower networks.
  • Wasted Server Resources: The server must still receive and parse the redundant information.

Solution

A new set of utility functions has been added to shotgun_api3/shotgun.py to recursively clean filters before they are sent to the server.

  • remove_duplicate_filters(filters): The main entry point that sanitizes a list of filters.
  • Core Logic: The implementation uses a set of normalized filter representations to efficiently track and discard duplicates while preserving the original order of the first occurrence.
  • Resilience: The process is wrapped in a try...except block to ensure that if any unexpected error occurs during deduplication, the original, untouched filters are sent to the server, preventing any client-side crashes.
  • Integration: The _translate_filters_list function now calls remove_duplicate_filters before processing, making the change transparent to all API methods that use filters (find, find_one, summarize).

By handling deduplication on the client side, the API creates a more efficient and robust integration, saving network bandwidth and reducing the overall request time.

Performance Impact

Client-side deduplication has two main effects: a minor processing overhead and a major payload reduction.

  • The added processing overhead is negligible for most queries (under 0.1ms for 100 filters) and only becomes potentially noticeable in extreme cases with thousands of filters.
  • This overhead is offset by a significant reduction in payload size—up to 94% in tests—which directly translates to faster, more efficient network requests.
Click to view Full Benchmark Report

Overview of Tests

I tested the performance overhead and payload reduction of the client-side deduplication feature by running a series of filter translation operations. The tests covered scenarios with a small (10), medium (100), and large (1000) number of filters. For each of these sizes, I tested cases with no duplicates (0%), few duplicates (20%), some duplicates (50%), and many duplicates (90%).

Results Summary

The following table shows the median time it takes to prepare the API request (overhead) and the final size of the data sent to the server (payload). A positive percentage in the 'Impact' column means the feature made the process slower.

Test Scenario Time without Fix (ms) Time with Fix (ms) Performance Impact Payload without Fix (bytes) Payload with Fix (bytes) Payload Reduction
Small (10 filters), 0% Dups 0.0030 0.0110 +222.86 % 691 571 17.37%
Small (10 filters), 20% Dups 0.0030 0.0110 +232.35 % 687 589 14.26%
Small (10 filters), 50% Dups 0.0030 0.0100 +191.43 % 725 377 48.00%
Small (10 filters), 90% Dups 0.0030 0.0080 +142.86 % 713 105 85.27%
Medium (100 filters), 0% Dups 0.0330 0.0980 +194.28 % 6729 4367 35.10%
Medium (100 filters), 20% Dups 0.0320 0.0940 +195.61 % 6660 3600 45.95%
Medium (100 filters), 50% Dups 0.0320 0.0910 +181.99 % 6731 2461 63.44%
Medium (100 filters), 90% Dups 0.0320 0.0810 +150.78 % 6653 655 90.15%
Large (1000 filters), 0% Dups 0.3680 0.9600 +160.96 % 66267 33703 49.14%
Large (1000 filters), 20% Dups 0.3560 0.8870 +148.96 % 66351 26776 59.64%
Large (1000 filters), 50% Dups 0.3510 0.8320 +136.97 % 66303 17170 74.10%
Large (1000 filters), 90% Dups 0.3470 0.7570 +117.88 % 66262 3914 94.09%

How the Tests Were Run

I ran the tests by directly calling the filter translation logic within the Python API. To get stable results, I ran each scenario 50 times and recorded the median processing time and resulting payload size.

Example

The following example demonstrates how the filter list is cleaned before being sent to the server.

project_filter = ['project', 'is', {'type': 'Project', 'id': 123}]

# Define filters with duplicates
filters_with_duplicates = [
    project_filter,
    ['sg_status_list', 'is', 'rev'],
    project_filter,  # DUPLICATE
    ['entity', 'type_is', 'Shot'],
    project_filter   # DUPLICATE
]

# This call will now automatically deduplicate the filters
shots = sg.find('Shot', filters_with_duplicates, ['id', 'code'])

Resulting conditions sent to the server (After client-side deduplication):
The Python API now generates a cleaner, smaller set of conditions to send in the request body.

{
  "logical_operator": "and",
  "conditions": [
    {"path": "project", "relation": "is", "values": [{"type": "Project", "id": 123}]},
    {"path": "sg_status_list", "relation": "is", "values": ["rev"]},
    {"path": "entity", "relation": "type_is", "values": ["Shot"]}
  ]
}

Additional examples including complex nested filters and edge cases can be found in the new test suite at tests/test_unit.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant