⚡️ Speed up method DataCol.get_atom_data by 7%
#324
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
DataCol.get_atom_datainpandas/io/pytables.py⏱️ Runtime :
624 microseconds→584 microseconds(best of15runs)📝 Explanation and details
The optimization introduces class-level caching to eliminate redundant PyTables column type lookups. Here's what changed:
Key Optimization:
_coltype_cacheclass attribute that stores the mapping fromkindstrings to PyTables column classesDataColinstanceskindvalues, bypassing the expensivegetattr(_tables(), col_name)callWhy This Provides a Speedup:
The line profiler shows
getattr(_tables(), col_name)consumes 99.6% of the original function's runtime. While_tables()returns a cached module reference, thegetattr()lookup on that module for column class names like "Int64Col", "UInt32Col" etc. is still expensive when called repeatedly. By caching these column type objects at the class level, subsequent calls with the samekindskip both the string processing logic AND the costlygetattr()lookup.Performance Impact:
Workload Benefits:
This optimization is particularly valuable for data processing pipelines that repeatedly create columns of the same types, which is common in pandas HDF5/PyTables operations where the same column schemas are used across multiple operations or datasets.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
from types import SimpleNamespace
imports
import pytest
from pandas.io.pytables import DataCol
--- Minimal stub for tables module to make tests run without PyTables ---
We need to provide dummy Col classes for the test to work,
since the real tables module is not available.
class DummyCol:
def init(self, shape):
self.shape = shape
class Int8Col(DummyCol): pass
class Int16Col(DummyCol): pass
class Int32Col(DummyCol): pass
class Int64Col(DummyCol): pass
class UInt8Col(DummyCol): pass
class UInt16Col(DummyCol): pass
class UInt32Col(DummyCol): pass
class UInt64Col(DummyCol): pass
class Float32Col(DummyCol): pass
class Float64Col(DummyCol): pass
class StringCol(DummyCol): pass
class BoolCol(DummyCol): pass
class ComplexCol(DummyCol): pass
------------------ UNIT TESTS ------------------
Basic Test Cases
def test_basic_int8():
# Test with shape (10,) and kind 'int8'
codeflash_output = DataCol.get_atom_data((10,), 'int8'); col = codeflash_output # 21.8μs -> 20.1μs (8.57% faster)
def test_basic_uint16():
# Test with shape (5,) and kind 'uint16'
codeflash_output = DataCol.get_atom_data((5,), 'uint16'); col = codeflash_output # 18.5μs -> 17.9μs (3.58% faster)
def test_basic_float64():
# Test with shape (1,) and kind 'float64'
codeflash_output = DataCol.get_atom_data((1,), 'float64'); col = codeflash_output # 17.5μs -> 16.7μs (4.68% faster)
def test_basic_bool():
# Test with shape (3,) and kind 'bool'
codeflash_output = DataCol.get_atom_data((3,), 'bool'); col = codeflash_output # 24.5μs -> 22.8μs (7.22% faster)
Edge Test Cases
def test_shape_one():
# Test with shape (1,) and kind 'float32'
codeflash_output = DataCol.get_atom_data((1,), 'float32'); col = codeflash_output # 25.2μs -> 22.4μs (12.3% faster)
def test_shape_large():
# Test with shape (999,) and kind 'int64'
codeflash_output = DataCol.get_atom_data((999,), 'int64'); col = codeflash_output # 17.6μs -> 17.5μs (1.01% faster)
def test_kind_period():
# Test with kind 'period[D]' which should map to Int64Col
codeflash_output = DataCol.get_atom_data((10,), 'period[D]'); col = codeflash_output # 21.5μs -> 20.7μs (3.79% faster)
def test_kind_uint8_boundary():
# Test with kind 'uint8'
codeflash_output = DataCol.get_atom_data((8,), 'uint8'); col = codeflash_output # 18.8μs -> 16.7μs (12.3% faster)
def test_kind_unknown():
# Test with an unknown kind
with pytest.raises(AttributeError):
DataCol.get_atom_data((4,), 'foobar') # 4.53μs -> 5.36μs (15.5% slower)
def test_shape_tuple_length_greater_than_one():
# Test with shape (5, 2) -- only shape[0] is used
codeflash_output = DataCol.get_atom_data((5, 2), 'int16'); col = codeflash_output # 19.7μs -> 19.0μs (3.49% faster)
def test_shape_not_tuple():
# Test with shape as a list
codeflash_output = DataCol.get_atom_data([7], 'int8'); col = codeflash_output # 17.9μs -> 16.5μs (8.71% faster)
def test_shape_string_kind():
# Test with a kind that is a string but not a known type
with pytest.raises(AttributeError):
DataCol.get_atom_data((1,), 'unknown_kind') # 4.14μs -> 5.12μs (19.3% slower)
def test_large_scale_uint32():
# Test with large shape (1000,) and kind 'uint32'
codeflash_output = DataCol.get_atom_data((1000,), 'uint32'); col = codeflash_output # 26.1μs -> 23.5μs (10.9% faster)
def test_large_scale_float32():
# Test with large shape (999,) and kind 'float32'
codeflash_output = DataCol.get_atom_data((999,), 'float32'); col = codeflash_output # 19.0μs -> 17.0μs (11.7% faster)
def test_large_scale_multiple_types():
# Test multiple types in a loop (but under 1000 iterations)
for kind, coltype in [
('int8', Int8Col),
('int16', Int16Col),
('int32', Int32Col),
('int64', Int64Col),
('uint8', UInt8Col),
('uint16', UInt16Col),
('uint32', UInt32Col),
('uint64', UInt64Col),
('float32', Float32Col),
('float64', Float64Col),
('string', StringCol),
('bool', BoolCol),
('complex', ComplexCol)
]:
codeflash_output = DataCol.get_atom_data((123,), kind); col = codeflash_output
Additional edge: test with empty shape list/tuple
def test_empty_shape():
# Should raise IndexError since shape[0] is accessed
with pytest.raises(IndexError):
DataCol.get_atom_data((), 'int8') # 4.08μs -> 2.30μs (77.4% faster)
with pytest.raises(IndexError):
DataCol.get_atom_data([], 'int8') # 1.55μs -> 967ns (60.5% faster)
Additional edge: test with non-sequence shape
def test_shape_not_sequence():
# Should raise TypeError since shape[0] is accessed
with pytest.raises(TypeError):
DataCol.get_atom_data(5, 'int8') # 3.16μs -> 1.94μs (62.6% faster)
Additional edge: test with shape as None
def test_shape_none():
with pytest.raises(TypeError):
DataCol.get_atom_data(None, 'int8') # 3.23μs -> 1.91μs (69.1% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from pandas.io.pytables import DataCol
function to test (from above)
(DataCol.get_atom_data is defined in the code block above)
Basic Test Cases
def test_basic_int_col():
# Test for 'int32' kind, shape (5,)
codeflash_output = DataCol.get_atom_data((5,), 'int32'); atom = codeflash_output # 23.5μs -> 22.1μs (6.23% faster)
def test_basic_float_col():
# Test for 'float64' kind, shape (10,)
codeflash_output = DataCol.get_atom_data((10,), 'float64'); atom = codeflash_output # 19.0μs -> 16.6μs (14.5% faster)
def test_basic_bool_col():
# Test for 'bool' kind, shape (3,)
codeflash_output = DataCol.get_atom_data((3,), 'bool'); atom = codeflash_output # 17.1μs -> 17.0μs (0.712% faster)
def test_basic_uint_col():
# Test for 'uint16' kind, shape (4,)
codeflash_output = DataCol.get_atom_data((4,), 'uint16'); atom = codeflash_output # 24.3μs -> 22.6μs (7.53% faster)
def test_basic_period_col():
# Test for 'period' kind, shape (6,)
codeflash_output = DataCol.get_atom_data((6,), 'period'); atom = codeflash_output # 16.3μs -> 17.1μs (4.60% slower)
Edge Test Cases
def test_edge_shape_one():
# Shape one, should create column with shape=1
codeflash_output = DataCol.get_atom_data((1,), 'float32'); atom = codeflash_output # 25.5μs -> 22.3μs (14.3% faster)
def test_edge_large_kind_name():
# Kind with unexpected capitalization
codeflash_output = DataCol.get_atom_data((2,), 'Int64'); atom = codeflash_output # 17.2μs -> 17.1μs (0.473% faster)
def test_edge_kind_with_spaces():
# Kind with leading/trailing spaces
codeflash_output = DataCol.get_atom_data((2,), ' int32 '.strip()); atom = codeflash_output # 16.5μs -> 15.8μs (4.16% faster)
def test_edge_kind_case_insensitive():
# Kind with mixed case
codeflash_output = DataCol.get_atom_data((2,), 'Int32'.lower()); atom = codeflash_output # 15.1μs -> 15.5μs (2.39% slower)
def test_edge_unknown_kind_raises():
# Unknown kind should raise AttributeError
with pytest.raises(AttributeError):
DataCol.get_atom_data((1,), 'unknown_kind') # 4.40μs -> 5.15μs (14.6% slower)
def test_edge_shape_tuple_length_greater_than_one():
# Only first element of shape should be used
codeflash_output = DataCol.get_atom_data((5, 2), 'float64'); atom = codeflash_output # 19.9μs -> 18.4μs (8.38% faster)
def test_edge_shape_not_tuple():
# Shape as list
codeflash_output = DataCol.get_atom_data([8], 'int32'); atom = codeflash_output # 17.1μs -> 15.5μs (10.1% faster)
def test_edge_shape_as_int():
# Shape as int, not tuple/list
codeflash_output = DataCol.get_atom_data((9,), 'int32'); atom = codeflash_output # 16.0μs -> 15.0μs (6.18% faster)
def test_edge_uint8_col():
# Test for 'uint8' kind
codeflash_output = DataCol.get_atom_data((3,), 'uint8'); atom = codeflash_output # 17.8μs -> 16.2μs (10.0% faster)
def test_edge_period_dtype():
# Test for 'period[D]' kind, which should map to Int64Col
codeflash_output = DataCol.get_atom_data((4,), 'period[D]'); atom = codeflash_output # 14.7μs -> 15.4μs (4.25% slower)
Large Scale Test Cases
def test_large_scale_int_col():
# Large shape, but under 1000 elements
codeflash_output = DataCol.get_atom_data((999,), 'int32'); atom = codeflash_output # 16.0μs -> 14.5μs (10.8% faster)
def test_large_scale_float_col():
codeflash_output = DataCol.get_atom_data((1000,), 'float64'); atom = codeflash_output # 15.7μs -> 15.1μs (4.37% faster)
def test_large_scale_multiple_types():
# Test many kinds in a loop, but <1000 iterations
kinds = ['int32', 'float64', 'bool', 'string', 'uint8', 'uint16', 'period']
for i, kind in enumerate(kinds, 1):
codeflash_output = DataCol.get_atom_data((i,), kind); atom = codeflash_output
if kind.startswith('uint'):
pass
elif kind == 'period':
pass
else:
pass
def test_large_scale_varied_shapes():
# Test a range of shapes from 1 to 999
for n in [1, 10, 100, 500, 999]:
codeflash_output = DataCol.get_atom_data((n,), 'int64'); atom = codeflash_output # 41.5μs -> 38.7μs (7.44% faster)
def test_large_scale_edge_shape_tuple():
# Shape as tuple with more than one element, only first used
codeflash_output = DataCol.get_atom_data((1000, 2), 'float32'); atom = codeflash_output # 17.5μs -> 16.2μs (8.12% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-DataCol.get_atom_data-mhw012u8and push.