Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into cln/param1
Browse files Browse the repository at this point in the history
  • Loading branch information
mroeschke committed Jan 8, 2024
2 parents 731c48a + 3df5771 commit 30521c2
Show file tree
Hide file tree
Showing 113 changed files with 734 additions and 725 deletions.
7 changes: 0 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -272,13 +272,6 @@ repos:
language: python
types: [rst]
files: ^doc/source/(development|reference)/
- id: unwanted-patterns-bare-pytest-raises
name: Check for use of bare pytest raises
language: python
entry: python scripts/validate_unwanted_patterns.py --validation-type="bare_pytest_raises"
types: [python]
files: ^pandas/tests/
exclude: ^pandas/tests/extension/
- id: unwanted-patterns-private-function-across-module
name: Check for use of private functions across modules
language: python
Expand Down
10 changes: 8 additions & 2 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,18 @@

set -uo pipefail

[[ -z "$1" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "single-docs" || "$1" == "notebooks" ]] || \
if [[ -v 1 ]]; then
CHECK=$1
else
# script will fail if it uses an unset variable (i.e. $1 is not provided)
CHECK=""
fi

[[ -z "$CHECK" || "$CHECK" == "code" || "$CHECK" == "doctests" || "$CHECK" == "docstrings" || "$CHECK" == "single-docs" || "$CHECK" == "notebooks" ]] || \
{ echo "Unknown command $1. Usage: $0 [code|doctests|docstrings|single-docs|notebooks]"; exit 9999; }

BASE_DIR="$(dirname $0)/.."
RET=0
CHECK=$1

### CODE ###
if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then
Expand Down
6 changes: 6 additions & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -786,8 +786,11 @@ Timezones
Numeric
^^^^^^^
- Bug in :func:`read_csv` with ``engine="pyarrow"`` causing rounding errors for large integers (:issue:`52505`)
- Bug in :meth:`Series.__floordiv__` and :meth:`Series.__truediv__` for :class:`ArrowDtype` with integral dtypes raising for large divisors (:issue:`56706`)
- Bug in :meth:`Series.__floordiv__` for :class:`ArrowDtype` with integral dtypes raising for large values (:issue:`56645`)
- Bug in :meth:`Series.pow` not filling missing values correctly (:issue:`55512`)
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` matching float ``0.0`` with ``False`` and vice versa (:issue:`55398`)
- Bug in :meth:`Series.round` raising for nullable boolean dtype (:issue:`55936`)

Conversion
^^^^^^^^^^
Expand All @@ -814,6 +817,7 @@ Interval
- Bug in :class:`Interval` ``__repr__`` not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown (:issue:`55015`)
- Bug in :meth:`IntervalIndex.factorize` and :meth:`Series.factorize` with :class:`IntervalDtype` with datetime64 or timedelta64 intervals not preserving non-nanosecond units (:issue:`56099`)
- Bug in :meth:`IntervalIndex.from_arrays` when passed ``datetime64`` or ``timedelta64`` arrays with mismatched resolutions constructing an invalid ``IntervalArray`` object (:issue:`55714`)
- Bug in :meth:`IntervalIndex.from_tuples` raising if subtype is a nullable extension dtype (:issue:`56765`)
- Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
- Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
- Bug in setting values on a :class:`Series` with an :class:`IntervalIndex` using a slice incorrectly raising (:issue:`54722`)
Expand Down Expand Up @@ -845,6 +849,7 @@ I/O
- Bug in :func:`read_json` not handling dtype conversion properly if ``infer_string`` is set (:issue:`56195`)
- Bug in :meth:`DataFrame.to_excel`, with ``OdsWriter`` (``ods`` files) writing Boolean/string value (:issue:`54994`)
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``datetime64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`55622`)
- Bug in :meth:`DataFrame.to_stata` raising for extension dtypes (:issue:`54671`)
- Bug in :meth:`~pandas.read_excel` with ``engine="odf"`` (``ods`` files) when a string cell contains an annotation (:issue:`55200`)
- Bug in :meth:`~pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)
- Bug where :meth:`DataFrame.to_json` would raise an ``OverflowError`` instead of a ``TypeError`` with unsupported NumPy types (:issue:`55403`)
Expand Down Expand Up @@ -872,6 +877,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrame.asfreq` and :meth:`Series.asfreq` with a :class:`DatetimeIndex` with non-nanosecond resolution incorrectly converting to nanosecond resolution (:issue:`55958`)
- Bug in :meth:`DataFrame.ewm` when passed ``times`` with non-nanosecond ``datetime64`` or :class:`DatetimeTZDtype` dtype (:issue:`56262`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` where grouping by a combination of ``Decimal`` and NA values would fail when ``sort=True`` (:issue:`54847`)
- Bug in :meth:`DataFrame.groupby` for DataFrame subclasses when selecting a subset of columns to apply the function to (:issue:`56761`)
- Bug in :meth:`DataFrame.resample` not respecting ``closed`` and ``label`` arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`)
- Bug in :meth:`DataFrame.resample` when resampling on a :class:`ArrowDtype` of ``pyarrow.timestamp`` or ``pyarrow.duration`` type (:issue:`55989`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
Expand Down
5 changes: 4 additions & 1 deletion doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ Other API changes

Deprecations
~~~~~~~~~~~~
-
- Deprecated :meth:`Timestamp.utcfromtimestamp`, use ``Timestamp.fromtimestamp(ts, "UTC")`` instead (:issue:`56680`)
- Deprecated :meth:`Timestamp.utcnow`, use ``Timestamp.now("UTC")`` instead (:issue:`56680`)
-

.. ---------------------------------------------------------------------------
Expand All @@ -108,6 +109,8 @@ Performance improvements

Bug fixes
~~~~~~~~~
- Fixed bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)


Categorical
^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/strptime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ cdef bint parse_today_now(
if infer_reso:
creso = NPY_DATETIMEUNIT.NPY_FR_us
if utc:
ts = <_Timestamp>Timestamp.utcnow()
ts = <_Timestamp>Timestamp.now(timezone.utc)
iresult[0] = ts._as_creso(creso)._value
else:
# GH#18705 make sure to_datetime("now") matches Timestamp("now")
Expand Down
16 changes: 16 additions & 0 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1418,6 +1418,14 @@ class Timestamp(_Timestamp):
>>> pd.Timestamp.utcnow() # doctest: +SKIP
Timestamp('2020-11-16 22:50:18.092888+0000', tz='UTC')
"""
warnings.warn(
# The stdlib datetime.utcnow is deprecated, so we deprecate to match.
# GH#56680
"Timestamp.utcnow is deprecated and will be removed in a future "
"version. Use Timestamp.now('UTC') instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
return cls.now(UTC)

@classmethod
Expand All @@ -1438,6 +1446,14 @@ class Timestamp(_Timestamp):
Timestamp('2020-03-14 15:32:52+0000', tz='UTC')
"""
# GH#22451
warnings.warn(
# The stdlib datetime.utcfromtimestamp is deprecated, so we deprecate
# to match. GH#56680
"Timestamp.utcfromtimestamp is deprecated and will be removed in a "
"future version. Use Timestamp.fromtimestamp(ts, 'UTC') instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
return cls.fromtimestamp(ts, tz="UTC")

@classmethod
Expand Down
98 changes: 72 additions & 26 deletions pandas/_testing/asserters.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import numpy as np

from pandas._libs import lib
from pandas._libs.missing import is_matching_na
from pandas._libs.sparse import SparseIndex
import pandas._libs.testing as _testing
Expand Down Expand Up @@ -698,9 +699,9 @@ def assert_extension_array_equal(
right,
check_dtype: bool | Literal["equiv"] = True,
index_values=None,
check_exact: bool = False,
rtol: float = 1.0e-5,
atol: float = 1.0e-8,
check_exact: bool | lib.NoDefault = lib.no_default,
rtol: float | lib.NoDefault = lib.no_default,
atol: float | lib.NoDefault = lib.no_default,
obj: str = "ExtensionArray",
) -> None:
"""
Expand All @@ -715,7 +716,12 @@ def assert_extension_array_equal(
index_values : Index | numpy.ndarray, default None
Optional index (shared by both left and right), used in output.
check_exact : bool, default False
Whether to compare number exactly. Only takes effect for float dtypes.
Whether to compare number exactly.
.. versionchanged:: 2.2.0
Defaults to True for integer dtypes if none of
``check_exact``, ``rtol`` and ``atol`` are specified.
rtol : float, default 1e-5
Relative tolerance. Only used when check_exact is False.
atol : float, default 1e-8
Expand All @@ -739,6 +745,23 @@ def assert_extension_array_equal(
>>> b, c = a.array, a.array
>>> tm.assert_extension_array_equal(b, c)
"""
if (
check_exact is lib.no_default
and rtol is lib.no_default
and atol is lib.no_default
):
check_exact = (
is_numeric_dtype(left.dtype)
and not is_float_dtype(left.dtype)
or is_numeric_dtype(right.dtype)
and not is_float_dtype(right.dtype)
)
elif check_exact is lib.no_default:
check_exact = False

rtol = rtol if rtol is not lib.no_default else 1.0e-5
atol = atol if atol is not lib.no_default else 1.0e-8

assert isinstance(left, ExtensionArray), "left is not an ExtensionArray"
assert isinstance(right, ExtensionArray), "right is not an ExtensionArray"
if check_dtype:
Expand Down Expand Up @@ -784,10 +807,7 @@ def assert_extension_array_equal(

left_valid = left[~left_na].to_numpy(dtype=object)
right_valid = right[~right_na].to_numpy(dtype=object)
if check_exact or (
(is_numeric_dtype(left.dtype) and not is_float_dtype(left.dtype))
or (is_numeric_dtype(right.dtype) and not is_float_dtype(right.dtype))
):
if check_exact:
assert_numpy_array_equal(
left_valid, right_valid, obj=obj, index_values=index_values
)
Expand All @@ -811,14 +831,14 @@ def assert_series_equal(
check_index_type: bool | Literal["equiv"] = "equiv",
check_series_type: bool = True,
check_names: bool = True,
check_exact: bool = False,
check_exact: bool | lib.NoDefault = lib.no_default,
check_datetimelike_compat: bool = False,
check_categorical: bool = True,
check_category_order: bool = True,
check_freq: bool = True,
check_flags: bool = True,
rtol: float = 1.0e-5,
atol: float = 1.0e-8,
rtol: float | lib.NoDefault = lib.no_default,
atol: float | lib.NoDefault = lib.no_default,
obj: str = "Series",
*,
check_index: bool = True,
Expand All @@ -841,7 +861,12 @@ def assert_series_equal(
check_names : bool, default True
Whether to check the Series and Index names attribute.
check_exact : bool, default False
Whether to compare number exactly. Only takes effect for float dtypes.
Whether to compare number exactly.
.. versionchanged:: 2.2.0
Defaults to True for integer dtypes if none of
``check_exact``, ``rtol`` and ``atol`` are specified.
check_datetimelike_compat : bool, default False
Compare datetime-like which is comparable ignoring dtype.
check_categorical : bool, default True
Expand Down Expand Up @@ -877,6 +902,22 @@ def assert_series_equal(
>>> tm.assert_series_equal(a, b)
"""
__tracebackhide__ = True
if (
check_exact is lib.no_default
and rtol is lib.no_default
and atol is lib.no_default
):
check_exact = (
is_numeric_dtype(left.dtype)
and not is_float_dtype(left.dtype)
or is_numeric_dtype(right.dtype)
and not is_float_dtype(right.dtype)
)
elif check_exact is lib.no_default:
check_exact = False

rtol = rtol if rtol is not lib.no_default else 1.0e-5
atol = atol if atol is not lib.no_default else 1.0e-8

if not check_index and check_like:
raise ValueError("check_like must be False if check_index is False")
Expand Down Expand Up @@ -931,10 +972,7 @@ def assert_series_equal(
pass
else:
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
if check_exact or (
(is_numeric_dtype(left.dtype) and not is_float_dtype(left.dtype))
or (is_numeric_dtype(right.dtype) and not is_float_dtype(right.dtype))
):
if check_exact:
left_values = left._values
right_values = right._values
# Only check exact if dtype is numeric
Expand Down Expand Up @@ -1061,14 +1099,14 @@ def assert_frame_equal(
check_frame_type: bool = True,
check_names: bool = True,
by_blocks: bool = False,
check_exact: bool = False,
check_exact: bool | lib.NoDefault = lib.no_default,
check_datetimelike_compat: bool = False,
check_categorical: bool = True,
check_like: bool = False,
check_freq: bool = True,
check_flags: bool = True,
rtol: float = 1.0e-5,
atol: float = 1.0e-8,
rtol: float | lib.NoDefault = lib.no_default,
atol: float | lib.NoDefault = lib.no_default,
obj: str = "DataFrame",
) -> None:
"""
Expand Down Expand Up @@ -1103,7 +1141,12 @@ def assert_frame_equal(
Specify how to compare internal data. If False, compare by columns.
If True, compare by blocks.
check_exact : bool, default False
Whether to compare number exactly. Only takes effect for float dtypes.
Whether to compare number exactly.
.. versionchanged:: 2.2.0
Defaults to True for integer dtypes if none of
``check_exact``, ``rtol`` and ``atol`` are specified.
check_datetimelike_compat : bool, default False
Compare datetime-like which is comparable ignoring dtype.
check_categorical : bool, default True
Expand Down Expand Up @@ -1158,6 +1201,9 @@ def assert_frame_equal(
>>> assert_frame_equal(df1, df2, check_dtype=False)
"""
__tracebackhide__ = True
_rtol = rtol if rtol is not lib.no_default else 1.0e-5
_atol = atol if atol is not lib.no_default else 1.0e-8
_check_exact = check_exact if check_exact is not lib.no_default else False

# instance validation
_check_isinstance(left, right, DataFrame)
Expand All @@ -1181,11 +1227,11 @@ def assert_frame_equal(
right.index,
exact=check_index_type,
check_names=check_names,
check_exact=check_exact,
check_exact=_check_exact,
check_categorical=check_categorical,
check_order=not check_like,
rtol=rtol,
atol=atol,
rtol=_rtol,
atol=_atol,
obj=f"{obj}.index",
)

Expand All @@ -1195,11 +1241,11 @@ def assert_frame_equal(
right.columns,
exact=check_column_type,
check_names=check_names,
check_exact=check_exact,
check_exact=_check_exact,
check_categorical=check_categorical,
check_order=not check_like,
rtol=rtol,
atol=atol,
rtol=_rtol,
atol=_atol,
obj=f"{obj}.columns",
)

Expand Down
2 changes: 1 addition & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1971,6 +1971,6 @@ def warsaw(request) -> str:
return request.param


@pytest.fixture()
@pytest.fixture
def arrow_string_storage():
return ("pyarrow", "pyarrow_numpy")
8 changes: 7 additions & 1 deletion pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
is_complex_dtype,
is_dict_like,
is_extension_array_dtype,
is_float,
is_float_dtype,
is_integer,
is_integer_dtype,
Expand Down Expand Up @@ -1361,7 +1362,12 @@ def diff(arr, n: int, axis: AxisInt = 0):
shifted
"""

n = int(n)
# added a check on the integer value of period
# see https://github.com/pandas-dev/pandas/issues/56607
if not lib.is_integer(n):
if not (is_float(n) and n.is_integer()):
raise ValueError("periods must be an integer")
n = int(n)
na = np.nan
dtype = arr.dtype

Expand Down
Loading

0 comments on commit 30521c2

Please sign in to comment.