⚡️ Speed up method `CogView4DenoiseInvocation._prepare_cfg_scale` by 27% #141

codeflash-ai · 2025-11-12T06:18:38Z

📄 27% (0.27x) speedup for `CogView4DenoiseInvocation._prepare_cfg_scale` in `invokeai/app/invocations/cogview4_denoise.py`

⏱️ Runtime : 39.1 microseconds → 30.8 microseconds (best of 157 runs)

📝 Explanation and details

The optimized code achieves a 27% speedup through three key micro-optimizations:

1. Reduced Attribute Lookups:
The original code calls self.cfg_scale multiple times (up to 3x in some code paths). The optimization stores it once as cfg_scale = self.cfg_scale, eliminating redundant attribute access overhead. This is particularly beneficial since attribute lookups in Python involve dictionary operations.

2. Faster Type Checking:
Replaced isinstance(self.cfg_scale, float/list) with type(cfg_scale) is float/list. The type() is pattern is faster because it performs exact type matching without inheritance checks, avoiding the more complex isinstance() machinery. Since this code only needs to distinguish between exactly float and list types, this optimization is safe and effective.

3. Early Returns:
Changed the control flow to return directly from each branch instead of assigning to a variable and returning at the end. This eliminates the extra variable assignment and the final return statement execution in most cases.

Performance Impact by Test Case:
The optimizations show consistent improvements across all test scenarios:

Basic cases: 21-48% faster (float/list with typical timesteps)
Edge cases: 12-22% faster (error conditions, zero timesteps)
Large scale: 17-36% faster (1000+ timesteps)

Why This Matters:
These micro-optimizations are particularly valuable in image generation pipelines where _prepare_cfg_scale could be called frequently during the denoising process. The consistent speedups across all test cases indicate the optimizations don't introduce performance regressions in any scenario while providing meaningful gains in a potentially hot code path.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 70 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	85.7%

🌀 Generated Regression Tests and Runtime

import pytest
from invokeai.app.invocations.cogview4_denoise import CogView4DenoiseInvocation

unit tests

--- BASIC TEST CASES ---

def test_float_cfg_scale_basic():
# Test with float cfg_scale and small num_timesteps
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.23μs -> 914ns (34.8% faster)

def test_list_cfg_scale_basic():
# Test with list cfg_scale of correct length
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.11μs -> 759ns (45.7% faster)

def test_float_cfg_scale_one_step():
# Test with float cfg_scale and one timestep
invocation = CogView4DenoiseInvocation(cfg_scale=4.2)
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 949ns -> 783ns (21.2% faster)

def test_list_cfg_scale_one_step():
# Test with list cfg_scale and one timestep
invocation = CogView4DenoiseInvocation(cfg_scale=[7.7])
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.07μs -> 723ns (47.4% faster)

--- EDGE TEST CASES ---

def test_list_cfg_scale_wrong_length_raises():
# Test with list cfg_scale of wrong length (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(3) # 1.54μs -> 1.35μs (14.1% faster)

def test_cfg_scale_zero_timesteps_float():
# Test with zero timesteps and float cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=1.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.02μs -> 808ns (25.9% faster)

def test_cfg_scale_zero_timesteps_list():
# Test with zero timesteps and empty list cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.05μs -> 732ns (44.1% faster)

def test_cfg_scale_zero_timesteps_list_nonempty():
# Test with zero timesteps and non-empty list cfg_scale (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(0) # 1.52μs -> 1.24μs (22.7% faster)

def test_cfg_scale_invalid_type_int():
# Test with invalid cfg_scale type (int)
invocation = CogView4DenoiseInvocation(cfg_scale=5)
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(2)

def test_float_cfg_scale_large_timesteps():
# Test with float cfg_scale and large num_timesteps
large_steps = 999
invocation = CogView4DenoiseInvocation(cfg_scale=2.0)
codeflash_output = invocation._prepare_cfg_scale(large_steps); result = codeflash_output # 1.55μs -> 1.26μs (22.6% faster)

def test_list_cfg_scale_large_timesteps():
# Test with large list cfg_scale
large_steps = 999
cfg_list = [float(i) for i in range(large_steps)]
invocation = CogView4DenoiseInvocation(cfg_scale=cfg_list)
codeflash_output = invocation._prepare_cfg_scale(large_steps); result = codeflash_output # 1.22μs -> 870ns (40.5% faster)

def test_cfg_scale_list_length_one_large_timesteps():
# Test with list of length one and large timesteps (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.56μs -> 1.38μs (13.2% faster)

def test_cfg_scale_empty_list_large_timesteps():
# Test with empty list and large timesteps (should raise AssertionError)
invocation = CogView4DenoiseInvocation(cfg_scale=[])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.52μs -> 1.36μs (12.2% faster)

--- MISCELLANEOUS CASES ---

def test_cfg_scale_list_with_boolean_element():
# Test with list containing a boolean element
invocation = CogView4DenoiseInvocation(cfg_scale=[2.0, True, 4.0])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.30μs -> 882ns (47.6% faster)

def test_cfg_scale_list_with_negative_elements():
# Test with list containing negative floats
invocation = CogView4DenoiseInvocation(cfg_scale=[-1.0, -2.5, -3.3])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.08μs -> 770ns (39.9% faster)

def test_cfg_scale_float_negative():
# Test with negative float cfg_scale
invocation = CogView4DenoiseInvocation(cfg_scale=-5.5)
codeflash_output = invocation._prepare_cfg_scale(2); result = codeflash_output # 981ns -> 815ns (20.4% faster)

def test_cfg_scale_float_zero():
# Test with float cfg_scale set to zero
invocation = CogView4DenoiseInvocation(cfg_scale=0.0)
codeflash_output = invocation._prepare_cfg_scale(5); result = codeflash_output # 956ns -> 764ns (25.1% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
from typing import Optional

imports

import pytest
from invokeai.app.invocations.cogview4_denoise import CogView4DenoiseInvocation

unit tests

---------- BASIC TEST CASES ----------

def test_float_cfg_scale_basic():
"""Test: cfg_scale is a float, num_timesteps is a typical positive integer."""
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(5); result = codeflash_output # 1.40μs -> 897ns (55.6% faster)

def test_list_cfg_scale_basic():
"""Test: cfg_scale is a list of floats, length matches num_timesteps."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0, 4.0])
codeflash_output = invocation._prepare_cfg_scale(4); result = codeflash_output # 1.14μs -> 747ns (52.1% faster)

def test_float_cfg_scale_one_timestep():
"""Test: cfg_scale is a float, num_timesteps is 1."""
invocation = CogView4DenoiseInvocation(cfg_scale=7.0)
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.00μs -> 777ns (28.8% faster)

def test_list_cfg_scale_one_timestep():
"""Test: cfg_scale is a list with one element, num_timesteps is 1."""
invocation = CogView4DenoiseInvocation(cfg_scale=[8.5])
codeflash_output = invocation._prepare_cfg_scale(1); result = codeflash_output # 1.02μs -> 763ns (33.7% faster)

---------- EDGE TEST CASES ----------

def test_list_cfg_scale_length_mismatch_shorter():
"""Test: cfg_scale is a list shorter than num_timesteps (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(3) # 1.50μs -> 1.39μs (8.36% faster)

def test_list_cfg_scale_length_mismatch_longer():
"""Test: cfg_scale is a list longer than num_timesteps (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0, 3.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(2) # 1.42μs -> 1.23μs (15.7% faster)

def test_cfg_scale_invalid_type_int():
"""Test: cfg_scale is an int (invalid type, should raise ValueError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=5)
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(3)

def test_zero_timesteps_with_float_cfg_scale():
"""Test: num_timesteps is zero, cfg_scale is float (should return empty list)."""
invocation = CogView4DenoiseInvocation(cfg_scale=2.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.20μs -> 914ns (31.7% faster)

def test_zero_timesteps_with_list_cfg_scale():
"""Test: num_timesteps is zero, cfg_scale is an empty list."""
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.13μs -> 765ns (47.8% faster)

def test_negative_timesteps_float_cfg_scale():
"""Test: num_timesteps is negative, cfg_scale is float (should return list of negative length, which is empty)."""
invocation = CogView4DenoiseInvocation(cfg_scale=1.0)
codeflash_output = invocation._prepare_cfg_scale(-2); result = codeflash_output # 996ns -> 882ns (12.9% faster)

def test_negative_timesteps_list_cfg_scale():
"""Test: num_timesteps is negative, cfg_scale is a list (should raise AssertionError)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1.0, 2.0])
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(-2) # 1.57μs -> 1.33μs (18.2% faster)

---------- LARGE SCALE TEST CASES ----------

def test_large_float_cfg_scale():
"""Test: cfg_scale is float, num_timesteps is large (1000)."""
invocation = CogView4DenoiseInvocation(cfg_scale=3.3)
codeflash_output = invocation._prepare_cfg_scale(1000); result = codeflash_output # 1.45μs -> 1.24μs (17.7% faster)

def test_large_list_cfg_scale():
"""Test: cfg_scale is a large list (1000 elements), num_timesteps matches."""
large_list = [float(i) for i in range(1000)]
invocation = CogView4DenoiseInvocation(cfg_scale=large_list)
codeflash_output = invocation._prepare_cfg_scale(1000); result = codeflash_output # 1.10μs -> 811ns (36.3% faster)

def test_large_list_cfg_scale_length_mismatch():
"""Test: cfg_scale is a large list (1000 elements), num_timesteps is less (999), should raise AssertionError."""
large_list = [float(i) for i in range(1000)]
invocation = CogView4DenoiseInvocation(cfg_scale=large_list)
with pytest.raises(AssertionError):
invocation._prepare_cfg_scale(999) # 1.59μs -> 1.38μs (15.5% faster)

def test_large_empty_list_cfg_scale():
"""Test: cfg_scale is an empty list, num_timesteps is 0."""
invocation = CogView4DenoiseInvocation(cfg_scale=[])
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 1.04μs -> 823ns (26.7% faster)

def test_large_float_cfg_scale_zero_timesteps():
"""Test: cfg_scale is float, num_timesteps is 0 (should return empty list)."""
invocation = CogView4DenoiseInvocation(cfg_scale=5.5)
codeflash_output = invocation._prepare_cfg_scale(0); result = codeflash_output # 898ns -> 751ns (19.6% faster)

---------- ADDITIONAL EDGE CASES ----------

def test_cfg_scale_list_of_ints():
"""Test: cfg_scale is a list of ints (should not raise, but returns list as-is)."""
invocation = CogView4DenoiseInvocation(cfg_scale=[1, 2, 3])
codeflash_output = invocation._prepare_cfg_scale(3); result = codeflash_output # 1.01μs -> 709ns (42.3% faster)

def test_cfg_scale_tuple_type():
"""Test: cfg_scale is a tuple, should raise ValueError."""
invocation = CogView4DenoiseInvocation(cfg_scale=(1.0, 2.0))
with pytest.raises(ValueError):
invocation._prepare_cfg_scale(2)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-CogView4DenoiseInvocation._prepare_cfg_scale-mhvlzzz9 and push.

The optimized code achieves a **27% speedup** through three key micro-optimizations: **1. Reduced Attribute Lookups:** The original code calls `self.cfg_scale` multiple times (up to 3x in some code paths). The optimization stores it once as `cfg_scale = self.cfg_scale`, eliminating redundant attribute access overhead. This is particularly beneficial since attribute lookups in Python involve dictionary operations. **2. Faster Type Checking:** Replaced `isinstance(self.cfg_scale, float/list)` with `type(cfg_scale) is float/list`. The `type() is` pattern is faster because it performs exact type matching without inheritance checks, avoiding the more complex isinstance() machinery. Since this code only needs to distinguish between exactly `float` and `list` types, this optimization is safe and effective. **3. Early Returns:** Changed the control flow to return directly from each branch instead of assigning to a variable and returning at the end. This eliminates the extra variable assignment and the final return statement execution in most cases. **Performance Impact by Test Case:** The optimizations show consistent improvements across all test scenarios: - **Basic cases**: 21-48% faster (float/list with typical timesteps) - **Edge cases**: 12-22% faster (error conditions, zero timesteps) - **Large scale**: 17-36% faster (1000+ timesteps) **Why This Matters:** These micro-optimizations are particularly valuable in image generation pipelines where `_prepare_cfg_scale` could be called frequently during the denoising process. The consistent speedups across all test cases indicate the optimizations don't introduce performance regressions in any scenario while providing meaningful gains in a potentially hot code path.

codeflash-ai bot requested a review from mashraf-222 November 12, 2025 06:18

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `CogView4DenoiseInvocation._prepare_cfg_scale` by 27% #141

⚡️ Speed up method `CogView4DenoiseInvocation._prepare_cfg_scale` by 27% #141

Uh oh!

codeflash-ai bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method CogView4DenoiseInvocation._prepare_cfg_scale by 27% #141

Are you sure you want to change the base?

⚡️ Speed up method CogView4DenoiseInvocation._prepare_cfg_scale by 27% #141

Uh oh!

Conversation

codeflash-ai bot commented Nov 12, 2025

📄 27% (0.27x) speedup for CogView4DenoiseInvocation._prepare_cfg_scale in invokeai/app/invocations/cogview4_denoise.py

📝 Explanation and details

unit tests

--- BASIC TEST CASES ---

--- EDGE TEST CASES ---

--- MISCELLANEOUS CASES ---

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

imports

unit tests

---------- BASIC TEST CASES ----------

---------- EDGE TEST CASES ----------

---------- LARGE SCALE TEST CASES ----------

---------- ADDITIONAL EDGE CASES ----------

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `CogView4DenoiseInvocation._prepare_cfg_scale` by 27% #141

⚡️ Speed up method `CogView4DenoiseInvocation._prepare_cfg_scale` by 27% #141

📄 27% (0.27x) speedup for `CogView4DenoiseInvocation._prepare_cfg_scale` in `invokeai/app/invocations/cogview4_denoise.py`