Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 27% (0.27x) speedup for CogView4DenoiseInvocation._prepare_cfg_scale in invokeai/app/invocations/cogview4_denoise.py

⏱️ Runtime : 39.1 microseconds 30.8 microseconds (best of 157 runs)

📝 Explanation and details

The optimized code achieves a 27% speedup through three key micro-optimizations:

1. Reduced Attribute Lookups:
The original code calls self.cfg_scale multiple times (up to 3x in some code paths). The optimization stores it once as cfg_scale = self.cfg_scale, eliminating redundant attribute access overhead. This is particularly beneficial since attribute lookups in Python involve dictionary operations.

2. Faster Type Checking:
Replaced isinstance(self.cfg_scale, float/list) with type(cfg_scale) is float/list. The type() is pattern is faster because it performs exact type matching without inheritance checks, avoiding the more complex isinstance() machinery. Since this code only needs to distinguish between exactly float and list types, this optimization is safe and effective.

3. Early Returns:
Changed the control flow to return directly from each branch instead of assigning to a variable and returning at the end. This eliminates the extra variable assignment and the final return statement execution in most cases.

Performance Impact by Test Case:
The optimizations show consistent improvements across all test scenarios:

  • Basic cases: 21-48% faster (float/list with typical timesteps)
  • Edge cases: 12-22% faster (error conditions, zero timesteps)
  • Large scale: 17-36% faster (1000+ timesteps)

Why This Matters:
These micro-optimizations are particularly valuable in image generation pipelines where _prepare_cfg_scale could be called frequently during the denoising process. The consistent speedups across all test cases indicate the optimizations don't introduce performance regressions in any scenario while providing meaningful gains in a potentially hot code path.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 70 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 85.7%
🌀 Generated Regression Tests and Runtime

import pytest
from invokeai.app.invocations.cogview4_denoise import CogView4DenoiseInvocation

unit tests

--- BASIC TEST CASES ---

def test_float_cfg_scale_basic():
# Test with float cfg_scale and small num_timesteps
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.23μs -> 914ns (34.8% faster)

def test_list_cfg_scale_basic():
# Test with list cfg_scale of correct length
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.11μs -> 759ns (45.7% faster)

def test_float_cfg_scale_one_step():
# Test with float cfg_scale and one timestep
invocation = CogView4DenoiseInvocation(cfg_scale=4.2)
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 949ns -> 783ns (21.2% faster)

def test_list_cfg_scale_one_step():
# Test with list cfg_scale and one timestep
invocation = CogView4DenoiseInvocation(cfg_scale=[7.7])
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.07μs -> 723ns (47.4% faster)

--- EDGE TEST CASES ---

def test_list_cfg_scale_wrong_length_raises():
# Test with list cfg_scale of wrong length (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(3) # 1.54μs -> 1.35μs (14.1% faster)

def test_cfg_scale_zero_timesteps_float():
# Test with zero timesteps and float cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=1.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.02μs -> 808ns (25.9% faster)

def test_cfg_scale_zero_timesteps_list():
# Test with zero timesteps and empty list cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.05μs -> 732ns (44.1% faster)

def test_cfg_scale_zero_timesteps_list_nonempty():
# Test with zero timesteps and non-empty list cfg_scale (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(0) # 1.52μs -> 1.24μs (22.7% faster)

def test_cfg_scale_invalid_type_int():
# Test with invalid cfg_scale type (int)
invocation = CogView4DenoiseInvocation(cfg_scale=5)
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(2)

def test_float_cfg_scale_large_timesteps():
# Test with float cfg_scale and large num_timesteps
large_steps = 999
invocation = CogView4DenoiseInvocation(cfg_scale=2.0)
codeflash_output = invocation._prepare_cfg_scale(large_steps); result = codeflash_output # 1.55μs -> 1.26μs (22.6% faster)

def test_list_cfg_scale_large_timesteps():
# Test with large list cfg_scale
large_steps = 999
cfg_list = [float(i) for i in range(large_steps)]
invocation = CogView4DenoiseInvocation(cfg_scale=cfg_list)
codeflash_output = invocation._prepare_cfg_scale(large_steps); result = codeflash_output # 1.22μs -> 870ns (40.5% faster)

def test_cfg_scale_list_length_one_large_timesteps():
# Test with list of length one and large timesteps (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.56μs -> 1.38μs (13.2% faster)

def test_cfg_scale_empty_list_large_timesteps():
# Test with empty list and large timesteps (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.52μs -> 1.36μs (12.2% faster)

--- MISCELLANEOUS CASES ---

def test_cfg_scale_list_with_boolean_element():
# Test with list containing a boolean element
invocation = CogView4DenoiseInvocation(cfg_scale=[2.0, True, 4.0])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.30μs -> 882ns (47.6% faster)

def test_cfg_scale_list_with_negative_elements():
# Test with list containing negative floats
invocation = CogView4DenoiseInvocation(cfg_scale=[-1.0, -2.5, -3.3])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.08μs -> 770ns (39.9% faster)

def test_cfg_scale_float_negative():
# Test with negative float cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=-5.5)
codeflash_output = invocation._prepare_cfg_scale(2); result = codeflash_output # 981ns -> 815ns (20.4% faster)

def test_cfg_scale_float_zero():
# Test with float cfg_scale set to zero
invocation = CogView4DenoiseInvocation(cfg_scale=0.0)
codeflash_output = invocation._prepare_cfg_scale(5); result = codeflash_output # 956ns -> 764ns (25.1% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
from typing import Optional

imports

import pytest
from invokeai.app.invocations.cogview4_denoise import CogView4DenoiseInvocation

unit tests

---------- BASIC TEST CASES ----------

def test_float_cfg_scale_basic():
"""Test: cfg_scale is a float, num_timesteps is a typical positive integer."""
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(5); result = codeflash_output # 1.40μs -> 897ns (55.6% faster)

def test_list_cfg_scale_basic():
"""Test: cfg_scale is a list of floats, length matches num_timesteps."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0, 4.0])
codeflash_output = invocation._prepare_cfg_scale(4); result = codeflash_output # 1.14μs -> 747ns (52.1% faster)

def test_float_cfg_scale_one_timestep():
"""Test: cfg_scale is a float, num_timesteps is 1."""
invocation = CogView4DenoiseInvocation(cfg_scale=7.0)
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.00μs -> 777ns (28.8% faster)

def test_list_cfg_scale_one_timestep():
"""Test: cfg_scale is a list with one element, num_timesteps is 1."""
invocation = CogView4DenoiseInvocation(cfg_scale=[8.5])
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.02μs -> 763ns (33.7% faster)

---------- EDGE TEST CASES ----------

def test_list_cfg_scale_length_mismatch_shorter():
"""Test: cfg_scale is a list shorter than num_timesteps (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(3) # 1.50μs -> 1.39μs (8.36% faster)

def test_list_cfg_scale_length_mismatch_longer():
"""Test: cfg_scale is a list longer than num_timesteps (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(2) # 1.42μs -> 1.23μs (15.7% faster)

def test_cfg_scale_invalid_type_int():
"""Test: cfg_scale is an int (invalid type, should raise ValueError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=5)
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(3)

def test_zero_timesteps_with_float_cfg_scale():
"""Test: num_timesteps is zero, cfg_scale is float (should return empty list)."""
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.20μs -> 914ns (31.7% faster)

def test_zero_timesteps_with_list_cfg_scale():
"""Test: num_timesteps is zero, cfg_scale is an empty list."""
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.13μs -> 765ns (47.8% faster)

def test_negative_timesteps_float_cfg_scale():
"""Test: num_timesteps is negative, cfg_scale is float (should return list of negative length, which is empty)."""
invocation = CogView4DenoiseInvocation(cfg_scale=1.0)
codeflash_output = invocation._prepare_cfg_scale(-2); result = codeflash_output # 996ns -> 882ns (12.9% faster)

def test_negative_timesteps_list_cfg_scale():
"""Test: num_timesteps is negative, cfg_scale is a list (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(-2) # 1.57μs -> 1.33μs (18.2% faster)

---------- LARGE SCALE TEST CASES ----------

def test_large_float_cfg_scale():
"""Test: cfg_scale is float, num_timesteps is large (1000)."""
invocation = CogView4DenoiseInvocation(cfg_scale=3.3)
codeflash_output = invocation._prepare_cfg_scale(1000); result = codeflash_output # 1.45μs -> 1.24μs (17.7% faster)

def test_large_list_cfg_scale():
"""Test: cfg_scale is a large list (1000 elements), num_timesteps matches."""
large_list = [float(i) for i in range(1000)]
invocation = CogView4DenoiseInvocation(cfg_scale=large_list)
codeflash_output = invocation._prepare_cfg_scale(1000); result = codeflash_output # 1.10μs -> 811ns (36.3% faster)

def test_large_list_cfg_scale_length_mismatch():
"""Test: cfg_scale is a large list (1000 elements), num_timesteps is less (999), should raise AssertionError."""
large_list = [float(i) for i in range(1000)]
invocation = CogView4DenoiseInvocation(cfg_scale=large_list)
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.59μs -> 1.38μs (15.5% faster)

def test_large_empty_list_cfg_scale():
"""Test: cfg_scale is an empty list, num_timesteps is 0."""
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.04μs -> 823ns (26.7% faster)

def test_large_float_cfg_scale_zero_timesteps():
"""Test: cfg_scale is float, num_timesteps is 0 (should return empty list)."""
invocation = CogView4DenoiseInvocation(cfg_scale=5.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 898ns -> 751ns (19.6% faster)

---------- ADDITIONAL EDGE CASES ----------

def test_cfg_scale_list_of_ints():
"""Test: cfg_scale is a list of ints (should not raise, but returns list as-is)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1, 2, 3])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.01μs -> 709ns (42.3% faster)

def test_cfg_scale_tuple_type():
"""Test: cfg_scale is a tuple, should raise ValueError."""
invocation = CogView4DenoiseInvocation(cfg_scale=(1.0, 2.0))
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(2)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-CogView4DenoiseInvocation._prepare_cfg_scale-mhvlzzz9 and push.

Codeflash Static Badge

The optimized code achieves a **27% speedup** through three key micro-optimizations:

**1. Reduced Attribute Lookups:** 
The original code calls `self.cfg_scale` multiple times (up to 3x in some code paths). The optimization stores it once as `cfg_scale = self.cfg_scale`, eliminating redundant attribute access overhead. This is particularly beneficial since attribute lookups in Python involve dictionary operations.

**2. Faster Type Checking:**
Replaced `isinstance(self.cfg_scale, float/list)` with `type(cfg_scale) is float/list`. The `type() is` pattern is faster because it performs exact type matching without inheritance checks, avoiding the more complex isinstance() machinery. Since this code only needs to distinguish between exactly `float` and `list` types, this optimization is safe and effective.

**3. Early Returns:**
Changed the control flow to return directly from each branch instead of assigning to a variable and returning at the end. This eliminates the extra variable assignment and the final return statement execution in most cases.

**Performance Impact by Test Case:**
The optimizations show consistent improvements across all test scenarios:
- **Basic cases**: 21-48% faster (float/list with typical timesteps)
- **Edge cases**: 12-22% faster (error conditions, zero timesteps)  
- **Large scale**: 17-36% faster (1000+ timesteps)

**Why This Matters:**
These micro-optimizations are particularly valuable in image generation pipelines where `_prepare_cfg_scale` could be called frequently during the denoising process. The consistent speedups across all test cases indicate the optimizations don't introduce performance regressions in any scenario while providing meaningful gains in a potentially hot code path.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 06:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant