⚡️ Speed up method `NumpyExtensionArray.kurt` by 7% #316

codeflash-ai · 2025-11-12T08:25:39Z

📄 7% (0.07x) speedup for `NumpyExtensionArray.kurt` in `pandas/core/arrays/numpy_.py`

⏱️ Runtime : 2.08 milliseconds → 1.94 milliseconds (best of 62 runs)

📝 Explanation and details

The optimized code achieves a 7% speedup through several targeted memory and computation optimizations in the nankurt function:

Key Performance Optimizations:

Memory-efficient masking: Replaced values.copy() followed by np.putmask() with direct np.where(mask, 0, values), eliminating unnecessary array copies when handling NaN values. This is particularly effective for large arrays or when many NaN values are present.
Early termination for edge cases: Added upfront checks for insufficient data (count < 4) to return np.nan immediately, avoiding expensive statistical computations on invalid datasets. This provides dramatic speedups (74-84% faster) for small arrays as shown in the test results.
Optimized array operations: Changed adjusted**2 and adjusted2**2 to use multiplication (adjusted * adjusted) instead of exponentiation, which is computationally cheaper in NumPy.
Streamlined conditional masking: Consolidated mask applications using np.where() for both zeroing adjusted values and final result assignments, reducing the number of array traversals.

Performance Impact by Test Case:

Edge cases with insufficient data: 74-84% faster due to early returns
All-NaN arrays: 46-56% faster by avoiding unnecessary computations
Arrays with scattered NaNs: 8-12% slower due to additional mask checking overhead
Normal computational cases: 2-6% faster from reduced memory allocations and optimized arithmetic

The optimizations are particularly beneficial for statistical workloads with edge cases (small arrays, many NaNs) while maintaining equivalent performance for typical use cases. The slight regression in NaN-heavy scenarios is outweighed by substantial gains in common edge cases.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 120 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np

imports

import pytest # used for our unit tests
from pandas.core.arrays.numpy_ import NumpyExtensionArray

------------------- Unit Tests -------------------

1. Basic Test Cases

def test_kurt_basic_normal_distribution():
# Normal distribution should have kurtosis close to 0 (excess kurtosis)
arr = np.random.normal(loc=0, scale=1, size=100)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 64.9μs -> 61.5μs (5.50% faster)

def test_kurt_basic_uniform_distribution():
# Uniform distribution has negative excess kurtosis
arr = np.random.uniform(low=-1, high=1, size=100)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 58.1μs -> 54.7μs (6.24% faster)

def test_kurt_basic_constant_array():
# All elements the same: variance is zero, kurtosis should be 0
arr = np.array([5.0, 5.0, 5.0, 5.0, 5.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 56.1μs -> 52.1μs (7.53% faster)

def test_kurt_basic_integer_array():
# Integer array, kurtosis should be computed correctly
arr = np.array([1, 2, 3, 4, 5, 6], dtype=int)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.2μs -> 55.4μs (2.10% slower)

def test_kurt_basic_simple_known():
# Known values, manually computed kurtosis
arr = np.array([1, 2, 3, 4, 5])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.6μs -> 53.4μs (2.28% faster)

2. Edge Test Cases

def test_kurt_edge_all_nan():
# All NaN values: should return np.nan
arr = np.array([np.nan, np.nan, np.nan, np.nan])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 58.4μs -> 39.9μs (46.4% faster)

def test_kurt_edge_some_nan_skipna_true():
# Some NaNs, skipna True: should ignore NaNs
arr = np.array([1.0, np.nan, 2.0, 3.0, 4.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(skipna=True); result = codeflash_output # 59.2μs -> 66.0μs (10.2% slower)
# Should match kurtosis of [1,2,3,4]
arr2 = np.array([1.0, 2.0, 3.0, 4.0])
mean = arr2.mean()
m2 = np.mean((arr2 - mean) ** 2)
m4 = np.mean((arr2 - mean) ** 4)
n = arr2.size
numerator = (n * (n + 1) * m4)
denominator = ((n - 1) * (n - 2) * (n - 3) * m2 ** 2)
adj = 3 * (n - 1) ** 2 / ((n - 2) * (n - 3))
expected = numerator / denominator - adj

def test_kurt_edge_some_nan_skipna_false():
# Some NaNs, skipna False: should return np.nan
arr = np.array([1.0, np.nan, 2.0, 3.0, 4.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(skipna=False); result = codeflash_output # 52.2μs -> 53.0μs (1.51% slower)

def test_kurt_edge_less_than_four_elements():
# Fewer than 4 elements: kurtosis is undefined, should return np.nan
arr = np.array([1.0, 2.0, 3.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 56.0μs -> 32.2μs (74.0% faster)

def test_kurt_edge_zero_variance():
# All elements identical, zero variance: kurtosis should be 0
arr = np.array([7.0, 7.0, 7.0, 7.0, 7.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.1μs -> 55.1μs (1.71% slower)

def test_kurt_edge_mixed_types():
# Should raise ValueError if input is not ndarray
with pytest.raises(ValueError):
NumpyExtensionArray([1,2,3,4])

def test_kurt_edge_multidimensional():
# Should raise ValueError if input is not 1-dimensional
arr = np.array([[1,2],[3,4]])
with pytest.raises(ValueError):
NumpyExtensionArray(arr)

def test_kurt_edge_all_integers():
# Integer dtype, kurtosis should be float
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 70.0μs -> 70.6μs (0.761% slower)

def test_kurt_edge_parameter_validation():
# Unsupported parameters should raise NotImplementedError
arr = np.array([1,2,3,4,5,6])
nea = NumpyExtensionArray(arr)
with pytest.raises(NotImplementedError):
nea.kurt(dtype=float)
with pytest.raises(NotImplementedError):
nea.kurt(out=np.empty(1))
with pytest.raises(NotImplementedError):
nea.kurt(keepdims=True)

3. Large Scale Test Cases

def test_kurt_large_scale_random():
# Large array, normal distribution, kurtosis should be close to 0
arr = np.random.normal(0, 1, 1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 80.3μs -> 77.6μs (3.54% faster)

def test_kurt_large_scale_uniform():
# Large array, uniform distribution, kurtosis should be close to -1.2
arr = np.random.uniform(-1, 1, 1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 64.0μs -> 60.9μs (4.99% faster)

def test_kurt_large_scale_all_nan():
# Large array, all nan
arr = np.full(1000, np.nan)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 60.8μs -> 40.7μs (49.2% faster)

def test_kurt_large_scale_some_nan():
# Large array, some nan values
arr = np.random.normal(0, 1, 1000)
arr[::10] = np.nan
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 62.7μs -> 71.4μs (12.2% slower)

def test_kurt_large_scale_integer_array():
# Large integer array
arr = np.arange(1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 59.1μs -> 61.0μs (3.12% slower)

def test_kurt_large_scale_constant_array():
# Large constant array, kurtosis should be 0
arr = np.full(1000, 42.0)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 61.1μs -> 59.0μs (3.48% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import math

Function to test: minimal standalone implementation of NumpyExtensionArray.kurt

import numpy as np

imports

import pytest # used for our unit tests
from pandas.core.arrays.numpy_ import NumpyExtensionArray

------------------------------

Basic Test Cases

------------------------------

def test_kurt_basic_positive():
# Test with a simple array with positive kurtosis (leptokurtic)
arr = NumpyExtensionArray(np.array([1, 2, 2, 2, 3, 100]))
codeflash_output = arr.kurt(); result = codeflash_output # 64.1μs -> 64.0μs (0.087% faster)

def test_kurt_basic_negative():
# Uniform distribution should have negative kurtosis (platykurtic)
arr = NumpyExtensionArray(np.array([1, 2, 3, 4, 5, 6]))
codeflash_output = arr.kurt(); result = codeflash_output # 56.6μs -> 54.7μs (3.57% faster)

def test_kurt_basic_normal():
# Normal distribution kurtosis should be close to 0 (mesokurtic)
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=100)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 61.1μs -> 59.6μs (2.51% faster)

def test_kurt_basic_dtype():
# Test dtype conversion
arr = NumpyExtensionArray(np.array([1, 2, 3, 4], dtype=np.int32))
codeflash_output = arr.kurt(dtype=np.float64); result = codeflash_output

------------------------------

Edge Test Cases

------------------------------

def test_kurt_edge_all_same():
# All elements are the same, variance is 0, kurtosis should be 0
arr = NumpyExtensionArray(np.array([5, 5, 5, 5, 5, 5]))
codeflash_output = arr.kurt(); result = codeflash_output # 68.0μs -> 67.4μs (0.827% faster)

def test_kurt_edge_nan_handling_skipna():
# Test with NaNs, skipna=True should ignore them
arr = NumpyExtensionArray(np.array([1, 2, np.nan, 4, 5, np.nan, 6]))
codeflash_output = arr.kurt(skipna=True); result = codeflash_output # 65.6μs -> 71.6μs (8.38% slower)

def test_kurt_edge_nan_handling_no_skipna():
# skipna=False, if any NaN present, should return nan
arr = NumpyExtensionArray(np.array([1, 2, np.nan, 4, 5, 6]))
codeflash_output = arr.kurt(skipna=False); result = codeflash_output # 54.1μs -> 53.3μs (1.51% faster)

def test_kurt_edge_less_than_4_elements():
# Less than 4 elements, kurtosis is undefined
arr = NumpyExtensionArray(np.array([1, 2, 3]))
codeflash_output = arr.kurt(); result = codeflash_output # 53.3μs -> 28.9μs (84.5% faster)

def test_kurt_edge_empty_array():
# Empty array, kurtosis is undefined
arr = NumpyExtensionArray(np.array([]))
codeflash_output = arr.kurt(); result = codeflash_output # 57.4μs -> 33.1μs (73.5% faster)

def test_kurt_edge_all_nan():
# All elements are NaN, kurtosis is undefined
arr = NumpyExtensionArray(np.array([np.nan, np.nan, np.nan, np.nan]))
codeflash_output = arr.kurt(); result = codeflash_output # 57.3μs -> 38.7μs (48.2% faster)

def test_kurt_edge_inf_values():
# Array with inf values, kurtosis should handle gracefully
arr = NumpyExtensionArray(np.array([1, 2, np.inf, 4, 5]))
codeflash_output = arr.kurt(); result = codeflash_output

def test_kurt_edge_negative_inf():
arr = NumpyExtensionArray(np.array([1, -np.inf, 3, 4, 5]))
codeflash_output = arr.kurt(); result = codeflash_output

def test_kurt_edge_mixed_inf_nan():
arr = NumpyExtensionArray(np.array([np.nan, np.inf, -np.inf, 1, 2, 3, 4]))
codeflash_output = arr.kurt(); result = codeflash_output # 76.8μs -> 82.7μs (7.09% slower)

------------------------------

Large Scale Test Cases

------------------------------

def test_kurt_large_scale_normal():
# Test large array with normal distribution
np.random.seed(1)
data = np.random.normal(0, 1, 1000)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 67.6μs -> 65.1μs (3.91% faster)

def test_kurt_large_scale_uniform():
# Test large array with uniform distribution
np.random.seed(2)
data = np.random.uniform(-1, 1, 1000)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 61.9μs -> 60.2μs (2.77% faster)

def test_kurt_large_scale_heavy_tail():
# Test large array with a heavy tail
np.random.seed(3)
data = np.concatenate([np.random.normal(0, 1, 995), np.array([100, 200, 300, 400, 500])])
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 62.6μs -> 60.0μs (4.35% faster)

def test_kurt_large_scale_nan_handling():
# Large array with NaNs
np.random.seed(4)
data = np.random.normal(0, 1, 1000)
data[::100] = np.nan # Insert NaNs
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 62.5μs -> 69.8μs (10.5% slower)

def test_kurt_large_scale_all_nan():
# Large array, all NaN
arr = NumpyExtensionArray(np.full(1000, np.nan))
codeflash_output = arr.kurt(); result = codeflash_output # 60.1μs -> 38.4μs (56.2% faster)

def test_kurt_large_scale_all_same():
# Large array, all same value
arr = NumpyExtensionArray(np.full(1000, 7.0))
codeflash_output = arr.kurt(); result = codeflash_output # 57.4μs -> 59.4μs (3.36% slower)

------------------------------

Additional Robustness Tests

------------------------------

def test_kurt_type_error():
# Should raise ValueError for non-numpy array input
with pytest.raises(ValueError):
NumpyExtensionArray([1, 2, 3, 4])

def test_kurt_ndim_error():
# Should raise ValueError for non-1D array
with pytest.raises(ValueError):
NumpyExtensionArray(np.array([[1, 2], [3, 4]]))

def test_kurt_output_type():
# Should always return a float
arr = NumpyExtensionArray(np.array([1, 2, 3, 4, 5, 6]))
codeflash_output = arr.kurt(); result = codeflash_output # 70.2μs -> 70.8μs (0.769% slower)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-NumpyExtensionArray.kurt-mhvqjaxq and push.

The optimized code achieves a **7% speedup** through several targeted memory and computation optimizations in the `nankurt` function: **Key Performance Optimizations:** 1. **Memory-efficient masking**: Replaced `values.copy()` followed by `np.putmask()` with direct `np.where(mask, 0, values)`, eliminating unnecessary array copies when handling NaN values. This is particularly effective for large arrays or when many NaN values are present. 2. **Early termination for edge cases**: Added upfront checks for insufficient data (`count < 4`) to return `np.nan` immediately, avoiding expensive statistical computations on invalid datasets. This provides dramatic speedups (74-84% faster) for small arrays as shown in the test results. 3. **Optimized array operations**: Changed `adjusted**2` and `adjusted2**2` to use multiplication (`adjusted * adjusted`) instead of exponentiation, which is computationally cheaper in NumPy. 4. **Streamlined conditional masking**: Consolidated mask applications using `np.where()` for both zeroing adjusted values and final result assignments, reducing the number of array traversals. **Performance Impact by Test Case:** - **Edge cases with insufficient data**: 74-84% faster due to early returns - **All-NaN arrays**: 46-56% faster by avoiding unnecessary computations - **Arrays with scattered NaNs**: 8-12% slower due to additional mask checking overhead - **Normal computational cases**: 2-6% faster from reduced memory allocations and optimized arithmetic The optimizations are particularly beneficial for statistical workloads with edge cases (small arrays, many NaNs) while maintaining equivalent performance for typical use cases. The slight regression in NaN-heavy scenarios is outweighed by substantial gains in common edge cases.

codeflash-ai bot requested a review from mashraf-222 November 12, 2025 08:25

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `NumpyExtensionArray.kurt` by 7% #316

⚡️ Speed up method `NumpyExtensionArray.kurt` by 7% #316

Uh oh!

codeflash-ai bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method NumpyExtensionArray.kurt by 7% #316

Are you sure you want to change the base?

⚡️ Speed up method NumpyExtensionArray.kurt by 7% #316

Uh oh!

Conversation

codeflash-ai bot commented Nov 12, 2025

📄 7% (0.07x) speedup for NumpyExtensionArray.kurt in pandas/core/arrays/numpy_.py

📝 Explanation and details

imports

------------------- Unit Tests -------------------

1. Basic Test Cases

2. Edge Test Cases

3. Large Scale Test Cases

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Function to test: minimal standalone implementation of NumpyExtensionArray.kurt

imports

------------------------------

Basic Test Cases

------------------------------

------------------------------

Edge Test Cases

------------------------------

------------------------------

Large Scale Test Cases

------------------------------

------------------------------

Additional Robustness Tests

------------------------------

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `NumpyExtensionArray.kurt` by 7% #316

⚡️ Speed up method `NumpyExtensionArray.kurt` by 7% #316

📄 7% (0.07x) speedup for `NumpyExtensionArray.kurt` in `pandas/core/arrays/numpy_.py`