⚡️ Speed up method NumpyExtensionArray.kurt by 7%
#316
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
NumpyExtensionArray.kurtinpandas/core/arrays/numpy_.py⏱️ Runtime :
2.08 milliseconds→1.94 milliseconds(best of62runs)📝 Explanation and details
The optimized code achieves a 7% speedup through several targeted memory and computation optimizations in the
nankurtfunction:Key Performance Optimizations:
Memory-efficient masking: Replaced
values.copy()followed bynp.putmask()with directnp.where(mask, 0, values), eliminating unnecessary array copies when handling NaN values. This is particularly effective for large arrays or when many NaN values are present.Early termination for edge cases: Added upfront checks for insufficient data (
count < 4) to returnnp.nanimmediately, avoiding expensive statistical computations on invalid datasets. This provides dramatic speedups (74-84% faster) for small arrays as shown in the test results.Optimized array operations: Changed
adjusted**2andadjusted2**2to use multiplication (adjusted * adjusted) instead of exponentiation, which is computationally cheaper in NumPy.Streamlined conditional masking: Consolidated mask applications using
np.where()for both zeroing adjusted values and final result assignments, reducing the number of array traversals.Performance Impact by Test Case:
The optimizations are particularly beneficial for statistical workloads with edge cases (small arrays, many NaNs) while maintaining equivalent performance for typical use cases. The slight regression in NaN-heavy scenarios is outweighed by substantial gains in common edge cases.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import numpy as np
imports
import pytest # used for our unit tests
from pandas.core.arrays.numpy_ import NumpyExtensionArray
------------------- Unit Tests -------------------
1. Basic Test Cases
def test_kurt_basic_normal_distribution():
# Normal distribution should have kurtosis close to 0 (excess kurtosis)
arr = np.random.normal(loc=0, scale=1, size=100)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 64.9μs -> 61.5μs (5.50% faster)
def test_kurt_basic_uniform_distribution():
# Uniform distribution has negative excess kurtosis
arr = np.random.uniform(low=-1, high=1, size=100)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 58.1μs -> 54.7μs (6.24% faster)
def test_kurt_basic_constant_array():
# All elements the same: variance is zero, kurtosis should be 0
arr = np.array([5.0, 5.0, 5.0, 5.0, 5.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 56.1μs -> 52.1μs (7.53% faster)
def test_kurt_basic_integer_array():
# Integer array, kurtosis should be computed correctly
arr = np.array([1, 2, 3, 4, 5, 6], dtype=int)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.2μs -> 55.4μs (2.10% slower)
def test_kurt_basic_simple_known():
# Known values, manually computed kurtosis
arr = np.array([1, 2, 3, 4, 5])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.6μs -> 53.4μs (2.28% faster)
2. Edge Test Cases
def test_kurt_edge_all_nan():
# All NaN values: should return np.nan
arr = np.array([np.nan, np.nan, np.nan, np.nan])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 58.4μs -> 39.9μs (46.4% faster)
def test_kurt_edge_some_nan_skipna_true():
# Some NaNs, skipna True: should ignore NaNs
arr = np.array([1.0, np.nan, 2.0, 3.0, 4.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(skipna=True); result = codeflash_output # 59.2μs -> 66.0μs (10.2% slower)
# Should match kurtosis of [1,2,3,4]
arr2 = np.array([1.0, 2.0, 3.0, 4.0])
mean = arr2.mean()
m2 = np.mean((arr2 - mean) ** 2)
m4 = np.mean((arr2 - mean) ** 4)
n = arr2.size
numerator = (n * (n + 1) * m4)
denominator = ((n - 1) * (n - 2) * (n - 3) * m2 ** 2)
adj = 3 * (n - 1) ** 2 / ((n - 2) * (n - 3))
expected = numerator / denominator - adj
def test_kurt_edge_some_nan_skipna_false():
# Some NaNs, skipna False: should return np.nan
arr = np.array([1.0, np.nan, 2.0, 3.0, 4.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(skipna=False); result = codeflash_output # 52.2μs -> 53.0μs (1.51% slower)
def test_kurt_edge_less_than_four_elements():
# Fewer than 4 elements: kurtosis is undefined, should return np.nan
arr = np.array([1.0, 2.0, 3.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 56.0μs -> 32.2μs (74.0% faster)
def test_kurt_edge_zero_variance():
# All elements identical, zero variance: kurtosis should be 0
arr = np.array([7.0, 7.0, 7.0, 7.0, 7.0])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 54.1μs -> 55.1μs (1.71% slower)
def test_kurt_edge_mixed_types():
# Should raise ValueError if input is not ndarray
with pytest.raises(ValueError):
NumpyExtensionArray([1,2,3,4])
def test_kurt_edge_multidimensional():
# Should raise ValueError if input is not 1-dimensional
arr = np.array([[1,2],[3,4]])
with pytest.raises(ValueError):
NumpyExtensionArray(arr)
def test_kurt_edge_all_integers():
# Integer dtype, kurtosis should be float
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 70.0μs -> 70.6μs (0.761% slower)
def test_kurt_edge_parameter_validation():
# Unsupported parameters should raise NotImplementedError
arr = np.array([1,2,3,4,5,6])
nea = NumpyExtensionArray(arr)
with pytest.raises(NotImplementedError):
nea.kurt(dtype=float)
with pytest.raises(NotImplementedError):
nea.kurt(out=np.empty(1))
with pytest.raises(NotImplementedError):
nea.kurt(keepdims=True)
3. Large Scale Test Cases
def test_kurt_large_scale_random():
# Large array, normal distribution, kurtosis should be close to 0
arr = np.random.normal(0, 1, 1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 80.3μs -> 77.6μs (3.54% faster)
def test_kurt_large_scale_uniform():
# Large array, uniform distribution, kurtosis should be close to -1.2
arr = np.random.uniform(-1, 1, 1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 64.0μs -> 60.9μs (4.99% faster)
def test_kurt_large_scale_all_nan():
# Large array, all nan
arr = np.full(1000, np.nan)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 60.8μs -> 40.7μs (49.2% faster)
def test_kurt_large_scale_some_nan():
# Large array, some nan values
arr = np.random.normal(0, 1, 1000)
arr[::10] = np.nan
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 62.7μs -> 71.4μs (12.2% slower)
def test_kurt_large_scale_integer_array():
# Large integer array
arr = np.arange(1000)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 59.1μs -> 61.0μs (3.12% slower)
def test_kurt_large_scale_constant_array():
# Large constant array, kurtosis should be 0
arr = np.full(1000, 42.0)
nea = NumpyExtensionArray(arr)
codeflash_output = nea.kurt(); result = codeflash_output # 61.1μs -> 59.0μs (3.48% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math
Function to test: minimal standalone implementation of NumpyExtensionArray.kurt
import numpy as np
imports
import pytest # used for our unit tests
from pandas.core.arrays.numpy_ import NumpyExtensionArray
------------------------------
Basic Test Cases
------------------------------
def test_kurt_basic_positive():
# Test with a simple array with positive kurtosis (leptokurtic)
arr = NumpyExtensionArray(np.array([1, 2, 2, 2, 3, 100]))
codeflash_output = arr.kurt(); result = codeflash_output # 64.1μs -> 64.0μs (0.087% faster)
def test_kurt_basic_negative():
# Uniform distribution should have negative kurtosis (platykurtic)
arr = NumpyExtensionArray(np.array([1, 2, 3, 4, 5, 6]))
codeflash_output = arr.kurt(); result = codeflash_output # 56.6μs -> 54.7μs (3.57% faster)
def test_kurt_basic_normal():
# Normal distribution kurtosis should be close to 0 (mesokurtic)
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=100)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 61.1μs -> 59.6μs (2.51% faster)
def test_kurt_basic_dtype():
# Test dtype conversion
arr = NumpyExtensionArray(np.array([1, 2, 3, 4], dtype=np.int32))
codeflash_output = arr.kurt(dtype=np.float64); result = codeflash_output
------------------------------
Edge Test Cases
------------------------------
def test_kurt_edge_all_same():
# All elements are the same, variance is 0, kurtosis should be 0
arr = NumpyExtensionArray(np.array([5, 5, 5, 5, 5, 5]))
codeflash_output = arr.kurt(); result = codeflash_output # 68.0μs -> 67.4μs (0.827% faster)
def test_kurt_edge_nan_handling_skipna():
# Test with NaNs, skipna=True should ignore them
arr = NumpyExtensionArray(np.array([1, 2, np.nan, 4, 5, np.nan, 6]))
codeflash_output = arr.kurt(skipna=True); result = codeflash_output # 65.6μs -> 71.6μs (8.38% slower)
def test_kurt_edge_nan_handling_no_skipna():
# skipna=False, if any NaN present, should return nan
arr = NumpyExtensionArray(np.array([1, 2, np.nan, 4, 5, 6]))
codeflash_output = arr.kurt(skipna=False); result = codeflash_output # 54.1μs -> 53.3μs (1.51% faster)
def test_kurt_edge_less_than_4_elements():
# Less than 4 elements, kurtosis is undefined
arr = NumpyExtensionArray(np.array([1, 2, 3]))
codeflash_output = arr.kurt(); result = codeflash_output # 53.3μs -> 28.9μs (84.5% faster)
def test_kurt_edge_empty_array():
# Empty array, kurtosis is undefined
arr = NumpyExtensionArray(np.array([]))
codeflash_output = arr.kurt(); result = codeflash_output # 57.4μs -> 33.1μs (73.5% faster)
def test_kurt_edge_all_nan():
# All elements are NaN, kurtosis is undefined
arr = NumpyExtensionArray(np.array([np.nan, np.nan, np.nan, np.nan]))
codeflash_output = arr.kurt(); result = codeflash_output # 57.3μs -> 38.7μs (48.2% faster)
def test_kurt_edge_inf_values():
# Array with inf values, kurtosis should handle gracefully
arr = NumpyExtensionArray(np.array([1, 2, np.inf, 4, 5]))
codeflash_output = arr.kurt(); result = codeflash_output
def test_kurt_edge_negative_inf():
arr = NumpyExtensionArray(np.array([1, -np.inf, 3, 4, 5]))
codeflash_output = arr.kurt(); result = codeflash_output
def test_kurt_edge_mixed_inf_nan():
arr = NumpyExtensionArray(np.array([np.nan, np.inf, -np.inf, 1, 2, 3, 4]))
codeflash_output = arr.kurt(); result = codeflash_output # 76.8μs -> 82.7μs (7.09% slower)
------------------------------
Large Scale Test Cases
------------------------------
def test_kurt_large_scale_normal():
# Test large array with normal distribution
np.random.seed(1)
data = np.random.normal(0, 1, 1000)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 67.6μs -> 65.1μs (3.91% faster)
def test_kurt_large_scale_uniform():
# Test large array with uniform distribution
np.random.seed(2)
data = np.random.uniform(-1, 1, 1000)
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 61.9μs -> 60.2μs (2.77% faster)
def test_kurt_large_scale_heavy_tail():
# Test large array with a heavy tail
np.random.seed(3)
data = np.concatenate([np.random.normal(0, 1, 995), np.array([100, 200, 300, 400, 500])])
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 62.6μs -> 60.0μs (4.35% faster)
def test_kurt_large_scale_nan_handling():
# Large array with NaNs
np.random.seed(4)
data = np.random.normal(0, 1, 1000)
data[::100] = np.nan # Insert NaNs
arr = NumpyExtensionArray(data)
codeflash_output = arr.kurt(); result = codeflash_output # 62.5μs -> 69.8μs (10.5% slower)
def test_kurt_large_scale_all_nan():
# Large array, all NaN
arr = NumpyExtensionArray(np.full(1000, np.nan))
codeflash_output = arr.kurt(); result = codeflash_output # 60.1μs -> 38.4μs (56.2% faster)
def test_kurt_large_scale_all_same():
# Large array, all same value
arr = NumpyExtensionArray(np.full(1000, 7.0))
codeflash_output = arr.kurt(); result = codeflash_output # 57.4μs -> 59.4μs (3.36% slower)
------------------------------
Additional Robustness Tests
------------------------------
def test_kurt_type_error():
# Should raise ValueError for non-numpy array input
with pytest.raises(ValueError):
NumpyExtensionArray([1, 2, 3, 4])
def test_kurt_ndim_error():
# Should raise ValueError for non-1D array
with pytest.raises(ValueError):
NumpyExtensionArray(np.array([[1, 2], [3, 4]]))
def test_kurt_output_type():
# Should always return a float
arr = NumpyExtensionArray(np.array([1, 2, 3, 4, 5, 6]))
codeflash_output = arr.kurt(); result = codeflash_output # 70.2μs -> 70.8μs (0.769% slower)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-NumpyExtensionArray.kurt-mhvqjaxqand push.