Skip to content

Releases: snowflakedb/snowpark-python

Release

12 Mar 23:39
a1ac12c
Compare
Choose a tag to compare

1.29.1 (2025-03-12)

Snowpark Python API Updates

Bug Fixes

  • Fixed a bug in DataFrameReader.dbapi (PrPr) that prevents usage in stored procedure and snowbooks.

Release

06 Mar 01:40
Compare
Choose a tag to compare

1.29.0 (2025-03-05)

Snowpark Python API Updates

New Features

  • Added support for the following AI-powered functions in functions.py (Private Preview):
    • ai_filter
    • ai_agg
    • summarize_agg
  • Added support for the new FILE SQL type support, with the following related functions in functions.py (Private Preview):
    • fl_get_content_type
    • fl_get_etag
    • fl_get_file_type
    • fl_get_last_modified
    • fl_get_relative_path
    • fl_get_scoped_file_url
    • fl_get_size
    • fl_get_stage
    • fl_get_stage_file_url
    • fl_is_audio
    • fl_is_compressed
    • fl_is_document
    • fl_is_image
    • fl_is_video
  • Added support for importing third-party packages from PyPi using Artifact Repository (Private Preview):
    • Use keyword arguments artifact_repository and artifact_repository_packages to specify your artifact repository and packages respectively when registering stored procedures or user defined functions.
    • Supported APIs are:
      • Session.sproc.register
      • Session.udf.register
      • Session.udaf.register
      • Session.udtf.register
      • functions.sproc
      • functions.udf
      • functions.udaf
      • functions.udtf
      • functions.pandas_udf
      • functions.pandas_udtf

Bug Fixes

  • Fixed a bug where creating a Dataframe with large number of values raised Unsupported feature 'SCOPED_TEMPORARY'. error if thread-safe session was disabled.
  • Fixed a bug where df.describe raised internal SQL execution error when the dataframe is created from reading a stage file and CTE optimization is enabled.
  • Fixed a bug where df.order_by(A).select(B).distinct() would generate invalid SQL when simplified query generation was enabled using session.conf.set("use_simplified_query_generation", True).
    • Disabled simplified query generation by default.

Improvements

  • Improved version validation warnings for snowflake-snowpark-python package compatibility when registering stored procedures. Now, warnings are only triggered if the major or minor version does not match, while bugfix version differences no longer generate warnings.
  • Bumped cloudpickle dependency to also support cloudpickle==3.0.0 in addition to previous versions.

Snowpark Local Testing Updates

New Features

  • Added support for literal values to range_between window function.

Snowpark pandas API Updates

New Features

  • Added support for applying Snowflake Cortex functions ClassifyText, Translate, and ExtractAnswer.

Improvements

  • Improve error message for pd.to_snowflake, DataFrame.to_snowflake, and Series.to_snowflake when the table does not exist.
  • Improve readability of docstring for the if_exists parameter in pd.to_snowflake, DataFrame.to_snowflake, and Series.to_snowflake.
  • Improve error message for all pandas functions that use UDFs with Snowpark objects.

Bug Fixes

  • Fixed a bug in Series.rename_axis where an AttributeError was being raised.
  • Fixed a bug where pd.get_dummies didn't ignore NULL/NaN values by default.
  • Fixed a bug where repeated calls to pd.get_dummies results in 'Duplicated column name error'.
  • Fixed a bug in pd.get_dummies where passing list of columns generated incorrect column labels in output DataFrame.
  • Update pd.get_dummies to return bool values instead of int.

Release

20 Feb 18:01
2b1e444
Compare
Choose a tag to compare

1.28.0 (2025-02-20)

Snowpark Python API Updates

New Features

  • Added support for the following functions in functions.py
    • normal
    • randn
  • Added support for allow_missing_columns parameter to Dataframe.union_by_name and Dataframe.union_all_by_name.

Improvements

  • Improved the random object name generation to avoid collisions.
  • Improved query generation for Dataframe.distinct to generate SELECT DISTINCT instead of SELECT with GROUP BY all columns. To disable this feature, set session.conf.set("use_simplified_query_generation", False).

Deprecations

  • Deprecated Snowpark Python function snowflake_cortex_summarize. Users can install snowflake-ml-python and use the snowflake.cortex.summarize function instead.
  • Deprecated Snowpark Python function snowflake_cortex_sentiment. Users can install snowflake-ml-python and use the snowflake.cortex.sentiment function instead.

Bug Fixes

  • Fixed a bug where session-level query tag was overwritten by a stacktrace for dataframes that generate multiple queries. Now, the query tag will only be set to the stacktrace if session.conf.set("collect_stacktrace_in_query_tag", True).
  • Fixed a bug in Session._write_pandas where it was erroneously passing use_logical_type parameter to Session._write_modin_pandas_helper when writing a Snowpark pandas object.
  • Fixed a bug in options sql generation that could cause multiple values to be formatted incorrectly.
  • Fixed a bug in Session.catalog where empty strings for database or schema were not handled correctly and were generating erroneous sql statements.

Experimental Features

  • Added support for writing pyarrow Tables to Snowflake tables.

Snowpark pandas API Updates

New Features

  • Added support for applying Snowflake Cortex functions Summarize and Sentiment.
  • Added support for list values in Series.str.get.

Bug Fixes

  • Fixed a bug in apply where kwargs were not being correctly passed into the applied function.

Snowpark Local Testing Updates

New Features

  • Added support for the following functions
    • hour
    • minute
  • Added support for NULL_IF parameter to csv reader.
  • Added support for date_format, datetime_format, and timestamp_format options when loading csvs.

Bug Fixes

  • Fixed a bug in Dataframe.join that caused columns to have incorrect typing.
  • Fixed a bug in when statements that caused incorrect results in the otherwise clause.

Release

04 Feb 02:44
Compare
Choose a tag to compare

1.27.0 (2025-02-03)

Snowpark Python API Updates

New Features

  • Added support for the following functions in functions.py
    • array_reverse
    • divnull
    • map_cat
    • map_contains_key
    • map_keys
    • nullifzero
    • snowflake_cortex_sentiment
    • acosh
    • asinh
    • atanh
    • bit_length
    • bitmap_bit_position
    • bitmap_bucket_number
    • bitmap_construct_agg
    • cbrt
    • equal_null
    • from_json
    • ifnull
    • localtimestamp
    • max_by
    • min_by
    • nth_value
    • nvl
    • octet_length
    • position
    • regr_avgx
    • regr_avgy
    • regr_count
    • regr_intercept
    • regr_r2
    • regr_slope
    • regr_sxx
    • regr_sxy
    • regr_syy
    • try_to_binary
    • base64
    • base64_decode_string
    • base64_encode
    • editdistance
    • hex
    • hex_encode
    • instr
    • log1p
    • log2
    • log10
    • percentile_approx
    • unbase64
  • Added support for specifying a schema string (including implicit struct syntax) when calling DataFrame.create_dataframe.
  • Added support for DataFrameWriter.insert_into/insertInto. This method also supports local testing mode.
  • Added support for DataFrame.create_temp_view to create a temporary view. It will fail if the view already exists.
  • Added support for multiple columns in the functions map_cat and map_concat.
  • Added an option keep_column_order for keeping original column order in DataFrame.with_column and DataFrame.with_columns.
  • Added options to column casts that allow renaming or adding fields in StructType columns.
  • Added support for contains_null parameter to ArrayType.
  • Added support for creating a temporary view via DataFrame.create_or_replace_temp_view from a DataFrame created by reading a file from a stage.
  • Added support for value_contains_null parameter to MapType.
  • Added interactive to telemetry that indicates whether the current environment is an interactive one.
  • Allow session.file.get in a Native App to read file paths starting with / from the current version
  • Added support for multiple aggregation functions after DataFrame.pivot.

Experimental Features

  • Added Catalog class to manage snowflake objects. It can be accessed via Session.catalog.
    • snowflake.core is a dependency required for this feature.
  • Allow user input schema when reading JSON file on stage.
  • Added support for specifying a schema string (including implicit struct syntax) when calling DataFrame.create_dataframe.

Improvements

  • Updated README.md to include instructions on how to verify package signatures using cosign.

Bug Fixes

  • Fixed a bug in local testing mode that caused a column to contain None when it should contain 0.
  • Fixed a bug in StructField.from_json that prevented TimestampTypes with tzinfo from being parsed correctly.
  • Fixed a bug in function date_format that caused an error when the input column was date type or timestamp type.
  • Fixed a bug in dataframe that null value can be inserted in a non-nullable column.
  • Fixed a bug in replace and lit which raised type hint assertion error when passing Column expression objects.
  • Fixed a bug in pandas_udf and pandas_udtf where session parameter was erroneously ignored.
  • Fixed a bug that raised incorrect type conversion error for system function called through session.call.

Snowpark pandas API Updates

New Features

  • Added support for Series.str.ljust and Series.str.rjust.
  • Added support for Series.str.center.
  • Added support for Series.str.pad.
  • Added support for applying Snowpark Python function snowflake_cortex_sentiment.
  • Added support for DataFrame.map.
  • Added support for DataFrame.from_dict and DataFrame.from_records.
  • Added support for mixed case field names in struct type columns.
  • Added support for SeriesGroupBy.unique
  • Added support for Series.dt.strftime with the following directives:
    • %d: Day of the month as a zero-padded decimal number.
    • %m: Month as a zero-padded decimal number.
    • %Y: Year with century as a decimal number.
    • %H: Hour (24-hour clock) as a zero-padded decimal number.
    • %M: Minute as a zero-padded decimal number.
    • %S: Second as a zero-padded decimal number.
    • %f: Microsecond as a decimal number, zero-padded to 6 digits.
    • %j: Day of the year as a zero-padded decimal number.
    • %X: Locale’s appropriate time representation.
    • %%: A literal '%' character.
  • Added support for Series.between.
  • Added support for include_groups=False in DataFrameGroupBy.apply.
  • Added support for expand=True in Series.str.split.
  • Added support for DataFrame.pop and Series.pop.
  • Added support for first and last in DataFrameGroupBy.agg and SeriesGroupBy.agg.
  • Added support for Index.drop_duplicates.
  • Added support for aggregations "count", "median", np.median,
    "skew", "std", np.std "var", and np.var in
    pd.pivot_table(), DataFrame.pivot_table(), and pd.crosstab().

Improvements

  • Improve performance of DataFrame.map, Series.apply and Series.map methods by mapping numpy functions to snowpark functions if possible.
  • Added documentation for DataFrame.map.
  • Improve performance of DataFrame.apply by mapping numpy functions to snowpark functions if possible.
  • Added documentation on the extent of Snowpark pandas interoperability with scikit-learn.
  • Infer return type of functions in Series.map, Series.apply and DataFrame.map if type-hint is not provided.
  • Added call_count to telemetry that counts method calls including interchange protocol calls.

Release

05 Dec 22:11
Compare
Choose a tag to compare

1.26.0 (2024-12-05)

Snowpark Python API Updates

New Features

  • Added support for property version and class method get_active_session for Session class.
  • Added new methods and variables to enhance data type handling and JSON serialization/deserialization:
    • To DataType, its derived classes, and StructField:
      • type_name: Returns the type name of the data.
      • simple_string: Provides a simple string representation of the data.
      • json_value: Returns the data as a JSON-compatible value.
      • json: Converts the data to a JSON string.
    • To ArrayType, MapType, StructField, PandasSeriesType, PandasDataFrameType and StructType:
      • from_json: Enables these types to be created from JSON data.
    • To MapType:
      • keyType: keys of the map
      • valueType: values of the map
  • Added support for method appName in SessionBuilder.
  • Added support for include_nulls argument in DataFrame.unpivot.
  • Added support for following functions in functions.py:
    • size to get size of array, object, or map columns.
    • collect_list an alias of array_agg.
    • substring makes len argument optional.
  • Added parameter ast_enabled to session for internal usage (default: False).

Improvements

  • Added support for specifying the following to DataFrame.create_or_replace_dynamic_table:
    • iceberg_config A dictionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for nested data types to DataFrame.print_schema
  • Added support for level parameter to DataFrame.print_schema
  • Improved flexibility of DataFrameReader and DataFrameWriter API by adding support for the following:
    • Added format method to DataFrameReader and DataFrameWriter to specify file format when loading or unloading results.
    • Added load method to DataFrameReader to work in conjunction with format.
    • Added save method to DataFrameWriter to work in conjunction with format.
    • Added support to read keyword arguments to options method for DataFrameReader and DataFrameWriter.
  • Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11, cloudpickle==2.2.1 remains the only supported version.

Bug Fixes

  • Removed warnings that dynamic pivot features were in private preview, because
    dynamic pivot is now generally available.
  • Fixed a bug in session.read.options where False Boolean values were incorrectly parsed as True in the generated file format.

Dependency Updates

  • Added a runtime dependency on python-dateutil.

Snowpark pandas API Updates

New Features

  • Added partial support for Series.map when arg is a pandas Series or a
    collections.abc.Mapping. No support for instances of dict that implement
    __missing__ but are not instances of collections.defaultdict.
  • Added support for DataFrame.align and Series.align for axis=1 and axis=None.
  • Added support for pd.json_normalize.
  • Added support for GroupBy.pct_change with axis=0, freq=None, and limit=None.
  • Added support for DataFrameGroupBy.__iter__ and SeriesGroupBy.__iter__.
  • Added support for np.sqrt, np.trunc, np.floor, numpy trig functions, np.exp, np.abs, np.positive and np.negative.
  • Added partial support for the dataframe interchange protocol method
    DataFrame.__dataframe__().

Bug Fixes

  • Fixed a bug in df.loc where setting a single column from a series results in unexpected None values.

Improvements

  • Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.
  • Improved documentation for pd.read_excel.

Release

14 Nov 20:22
Compare
Choose a tag to compare

1.25.0 (2024-11-13)

Snowpark Python API Updates

New Features

  • Added the following new functions in snowflake.snowpark.dataframe:
    • map
  • Added support for passing parameter include_error to Session.query_history to record queries that have error during execution.

Improvements

  • When target stage is not set in profiler, a default stage from Session.get_session_stage is used instead of raising SnowparkSQLException.
  • Allowed lower case or mixed case input when calling Session.stored_procedure_profiler.set_active_profiler.
  • Added distributed tracing using open telemetry APIs for action function in DataFrame:
    • cache_result
  • Removed opentelemetry warning from logging.

Bug Fixes

  • Fixed the pre-action and post-action query propagation when In expression were used in selects.
  • Fixed a bug that raised error AttributeError while calling Session.stored_procedure_profiler.get_output when Session.stored_procedure_profiler is disabled.

Dependency Updates

  • Added a dependency on protobuf>=5.28 and tzlocal at runtime.
  • Added a dependency on protoc-wheel-0 for the development profile.
  • Require snowflake-connector-python>=3.12.0, <4.0.0 (was >=3.10.0).

Snowpark pandas API Updates

Dependency Updates

  • Updated modin from 0.28.1 to 0.30.1.
  • Added support for all pandas 2.2.x versions.

New Features

  • Added support for Index.to_numpy.
  • Added support for DataFrame.align and Series.align for axis=0.
  • Added support for size in GroupBy.aggregate, DataFrame.aggregate, and Series.aggregate.
  • Added support for snowflake.snowpark.functions.window
  • Added support for pd.read_pickle (Uses native pandas for processing).
  • Added support for pd.read_html (Uses native pandas for processing).
  • Added support for pd.read_xml (Uses native pandas for processing).
  • Added support for aggregation functions "size" and len in GroupBy.aggregate, DataFrame.aggregate, and Series.aggregate.
  • Added support for list values in Series.str.len.

Bug Fixes

  • Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g. pd.DataFrame([0]).agg(np.mean)) would fail to transpose the result.
  • Fixed bugs where DataFrame.dropna() would:
    • Treat an empty subset (e.g. []) as if it specified all columns instead of no columns.
    • Raise a TypeError for a scalar subset instead of filtering on just that column.
    • Raise a ValueError for a subset of type pandas.Index instead of filtering on the columns in the index.
  • Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate TableNotFoundError when using dynamic pivot in notebook environment.
  • Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.

Improvements

  • Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.
  • Improve get_dummies performance by flattening the pivot with join.

Snowpark Local Testing Updates

New Features

  • Added support for patching functions that are unavailable in the snowflake.snowpark.functions module.
  • Added support for snowflake.snowpark.functions.any_value

Bug Fixes

  • Fixed a bug where Table.update could not handle VariantType, MapType, and ArrayType data types.
  • Fixed a bug where column aliases were incorrectly resolved in DataFrame.join, causing errors when selecting columns from a joined DataFrame.
  • Fixed a bug where Table.update and Table.merge could fail if the target table's index was not the default RangeIndex.

Release

28 Oct 22:57
Compare
Choose a tag to compare

1.24.0 (2024-10-28)

Snowpark Python API Updates

New Features

  • Updated Session class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same Session object.
    • The feature is disabled by default and can be enabled by setting FEATURE_THREAD_SAFE_PYTHON_SESSION to True for account.
    • Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
    • When enabled, some internally created temporary table names returned from DataFrame.queries API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
  • Added support for 'Service' domain to session.lineage.trace API.
  • Added support for copy_grants parameter when registering UDxF and stored procedures.
  • Added support for the following methods in DataFrameWriter to support daisy-chaining:
    • option
    • options
    • partition_by
  • Added support for snowflake_cortex_summarize.

Improvements

  • Improved the following new capability for function snowflake.snowpark.functions.array_remove it is now possible to use in python.
  • Disables sql simplification when sort is performed after limit.
    • Previously, df.sort().limit() and df.limit().sort() generates the same query with sort in front of limit. Now, df.limit().sort() will generate query that reads df.limit().sort().
    • Improve performance of generated query for df.limit().sort(), because limit stops table scanning as soon as the number of records is satisfied.

Bug Fixes

  • Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
  • Fixed a bug in DataFrame.analytics.time_series_agg function to handle multiple data points in same sliding interval.
  • Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.

Deprecations

Snowpark pandas API Updates

New Features

  • Added support for np.subtract, np.multiply, np.divide, and np.true_divide.
  • Added support for tracking usages of __array_ufunc__.
  • Added numpy compatibility support for np.float_power, np.mod, np.remainder, np.greater, np.greater_equal, np.less, np.less_equal, np.not_equal, and np.equal.
  • Added numpy compatibility support for np.log, np.log2, and np.log10
  • Added support for DataFrameGroupBy.bfill, SeriesGroupBy.bfill, DataFrameGroupBy.ffill, and SeriesGroupBy.ffill.
  • Added support for on parameter with Resampler.
  • Added support for timedelta inputs in value_counts().
  • Added support for applying Snowpark Python function snowflake_cortex_summarize.
  • Added support for DataFrame.attrs and Series.attrs.
  • Added support for DataFrame.style.

Improvements

  • Improved generated SQL query for head and iloc when the row key is a slice.
  • Improved error message when passing an unknown timezone to tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex.
  • Improved documentation for tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex to specify the supported timezone formats.
  • Added additional kwargs support for df.apply and series.apply ( as well as map and applymap ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.
  • Improved generated SQL query for iloc and iat when the row key is a scalar.
  • Removed all joins in iterrows.
  • Improved documentation for Series.map to reflect the unsupported features.
  • Added support for np.may_share_memory which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.

Bug Fixes

  • Fixed a bug where DataFrame and Series pct_change() would raise TypeError when input contained timedelta columns.
  • Fixed a bug where replace() would sometimes propagate Timedelta types incorrectly through replace(). Instead raise NotImplementedError for replace() on Timedelta.
  • Fixed a bug where DataFrame and Series round() would raise AssertionError for Timedelta columns. Instead raise NotImplementedError for round() on Timedelta.
  • Fixed a bug where reindex fails when the new index is a Series with non-overlapping types from the original index.
  • Fixed a bug where calling __getitem__ on a DataFrameGroupBy object always returned a DataFrameGroupBy object if as_index=False.
  • Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising NotImplementedError.
  • Fixed a bug where DataFrame.shift() on axis=0 and axis=1 would fail to propagate timedelta types.
  • DataFrame.abs(), DataFrame.__neg__(), DataFrame.stack(), and DataFrame.unstack() now raise NotImplementedError for timedelta inputs instead of failing to propagate timedelta types.

Snowpark Local Testing Updates

Bug Fixes

  • Fixed a bug where DataFrame.alias raises KeyError for input column name.
  • Fixed a bug where to_csv on Snowflake stage fails when data contains empty strings.

Release

10 Oct 00:23
bb1ed3d
Compare
Choose a tag to compare

1.23.0 (2024-10-09)

Snowpark Python API Updates

New Features

  • Added the following new functions in snowflake.snowpark.functions:
    • make_interval
  • Added support for using Snowflake Interval constants with Window.range_between() when the order by column is TIMESTAMP or DATE type.
  • Added support for file writes. This feature is currently in private preview.
  • Added thread_id to QueryRecord to track the thread id submitting the query history.
  • Added support for Session.stored_procedure_profiler.

Improvements

Bug Fixes

  • Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning 'NoneType' has no len() when trying to read default values from function.

Snowpark pandas API Updates

New Features

  • Added support for TimedeltaIndex.mean method.
  • Added support for some cases of aggregating Timedelta columns on axis=0 with agg or aggregate.
  • Added support for by, left_by, right_by, left_index, and right_index for pd.merge_asof.
  • Added support for passing parameter include_describe to Session.query_history.
  • Added support for DatetimeIndex.mean and DatetimeIndex.std methods.
  • Added support for Resampler.asfreq, Resampler.indices, Resampler.nunique, and Resampler.quantile.
  • Added support for resample frequency W, ME, YE with closed = "left".
  • Added support for DataFrame.rolling.corr and Series.rolling.corr for pairwise = False and int window.
  • Added support for string time-based window and min_periods = None for Rolling.
  • Added support for DataFrameGroupBy.fillna and SeriesGroupBy.fillna.
  • Added support for constructing Series and DataFrame objects with the lazy Index object as data, index, and columns arguments.
  • Added support for constructing Series and DataFrame objects with index and column values not present in DataFrame/Series data.
  • Added support for pd.read_sas (Uses native pandas for processing).
  • Added support for applying rolling().count() and expanding().count() to Timedelta series and columns.
  • Added support for tz in both pd.date_range and pd.bdate_range.
  • Added support for Series.items.
  • Added support for errors="ignore" in pd.to_datetime.
  • Added support for DataFrame.tz_localize and Series.tz_localize.
  • Added support for DataFrame.tz_convert and Series.tz_convert.
  • Added support for applying Snowpark Python functions (e.g., sin) in Series.map, Series.apply, DataFrame.apply and DataFrame.applymap.

Improvements

  • Improved to_pandas to persist the original timezone offset for TIMESTAMP_TZ type.
  • Improved dtype results for TIMESTAMP_TZ type to show correct timezone offset.
  • Improved dtype results for TIMESTAMP_LTZ type to show correct timezone.
  • Improved error message when passing non-bool value to numeric_only for groupby aggregations.
  • Removed unnecessary warning about sort algorithm in sort_values.
  • Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
  • Improved warning messages for operations that lead to materialization with inadvertent slowness.
  • Removed unnecessary warning message about convert_dtype in Series.apply.

Bug Fixes

  • Fixed a bug where an Index object created from a Series/DataFrame incorrectly updates the Series/DataFrame's index name after an inplace update has been applied to the original Series/DataFrame.
  • Suppressed an unhelpful SettingWithCopyWarning that sometimes appeared when printing Timedelta columns.
  • Fixed inplace argument for Series objects derived from other Series objects.
  • Fixed a bug where Series.sort_values failed if series name overlapped with index column name.
  • Fixed a bug where transposing a dataframe would map Timedelta index levels to integer column levels.
  • Fixed a bug where Resampler methods on timedelta columns would produce integer results.
  • Fixed a bug where pd.to_numeric() would leave Timedelta inputs as Timedelta instead of converting them to integers.
  • Fixed loc set when setting a single row, or multiple rows, of a DataFrame with a Series value.

Release

12 Sep 19:06
Compare
Choose a tag to compare

1.22.1 (2024-09-11)

This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.

1.22.0 (2024-09-10)

Snowpark Python API Updates

New Features

  • Added the following new functions in snowflake.snowpark.functions:
    • array_remove
    • ln

Improvements

  • Improved documentation for Session.write_pandas by making use_logical_type option more explicit.
  • Added support for specifying the following to DataFrameWriter.save_as_table:
    • enable_schema_evolution
    • data_retention_time
    • max_data_extension_time
    • change_tracking
    • copy_grants
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following to DataFrameWriter.copy_into_table:
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following parameters to DataFrame.create_or_replace_dynamic_table:
    • mode
    • refresh_mode
    • initialize
    • clustering_keys
    • is_transient
    • data_retention_time
    • max_data_extension_time

Bug Fixes

  • Fixed a bug in session.read.csv that caused an error when setting PARSE_HEADER = True in an externally defined file format.
  • Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
  • Fixed a bug in session.get_session_stage that referenced a non-existing stage after switching database or schema.
  • Fixed a bug where calling DataFrame.to_snowpark_pandas without explicitly initializing the Snowpark pandas plugin caused an error.
  • Fixed a bug where using the explode function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer parameter.

Snowpark Local Testing Updates

New Features

  • Added support for type coercion when passing columns as input to UDF calls.
  • Added support for Index.identical.

Bug Fixes

  • Fixed a bug where the truncate mode in DataFrameWriter.save_as_table incorrectly handled DataFrames containing only a subset of columns from the existing table.
  • Fixed a bug where function to_timestamp does not set the default timezone of the column datatype.

Snowpark pandas API Updates

New Features

  • Added limited support for the Timedelta type, including the following features. Snowpark pandas will raise NotImplementedError for unsupported Timedelta use cases.
    • supporting tracking the Timedelta type through copy, cache_result, shift, sort_index, assign, bfill, ffill, fillna, compare, diff, drop, dropna, duplicated, empty, equals, insert, isin, isna, items, iterrows, join, len, mask, melt, merge, nlargest, nsmallest, to_pandas.
    • converting non-timedelta to timedelta via astype.
    • NotImplementedError will be raised for the rest of methods that do not support Timedelta.
    • support for subtracting two timestamps to get a Timedelta.
    • support indexing with Timedelta data columns.
    • support for adding or subtracting timestamps and Timedelta.
    • support for binary arithmetic between two Timedelta values.
    • support for binary arithmetic and comparisons between Timedelta values and numeric values.
    • support for lazy TimedeltaIndex.
    • support for pd.to_timedelta.
    • support for GroupBy aggregations min, max, mean, idxmax, idxmin, std, sum, median, count, any, all, size, nunique, head, tail, aggregate.
    • support for GroupBy filtrations first and last.
    • support for TimedeltaIndex attributes: days, seconds, microseconds and nanoseconds.
    • support for diff with timestamp columns on axis=0 and axis=1
    • support for TimedeltaIndex methods: ceil, floor and round.
    • support for TimedeltaIndex.total_seconds method.
  • Added support for index's arithmetic and comparison operators.
  • Added support for Series.dt.round.
  • Added documentation pages for DatetimeIndex.
  • Added support for Index.name, Index.names, Index.rename, and Index.set_names.
  • Added support for Index.__repr__.
  • Added support for DatetimeIndex.month_name and DatetimeIndex.day_name.
  • Added support for Series.dt.weekday, Series.dt.time, and DatetimeIndex.time.
  • Added support for Index.min and Index.max.
  • Added support for pd.merge_asof.
  • Added support for Series.dt.normalize and DatetimeIndex.normalize.
  • Added support for Index.is_boolean, Index.is_integer, Index.is_floating, Index.is_numeric, and Index.is_object.
  • Added support for DatetimeIndex.round, DatetimeIndex.floor and DatetimeIndex.ceil.
  • Added support for Series.dt.days_in_month and Series.dt.daysinmonth.
  • Added support for DataFrameGroupBy.value_counts and SeriesGroupBy.value_counts.
  • Added support for Series.is_monotonic_increasing and Series.is_monotonic_decreasing.
  • Added support for Index.is_monotonic_increasing and Index.is_monotonic_decreasing.
  • Added support for pd.crosstab.
  • Added support for pd.bdate_range and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range and pd.bdate_range.
  • Added support for lazy Index objects as labels in DataFrame.reindex and Series.reindex.
  • Added support for Series.dt.days, Series.dt.seconds, Series.dt.microseconds, and Series.dt.nanoseconds.
  • Added support for creating a DatetimeIndex from an Index of numeric or string type.
  • Added support for string indexing with Timedelta objects.
  • Added support for Series.dt.total_seconds method.

Improvements

  • Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
  • Refactored quoted_identifier_to_snowflake_type to avoid making metadata queries if the types have been cached locally.
  • Improved pd.to_datetime to handle all local input cases.
  • Create a lazy index from another lazy index without pulling data to client.
  • Raised NotImplementedError for Index bitwise operators.
  • Display a more clear error message when Index.names is set to a non-like-like object.
  • Raise a warning whenever MultiIndex values are pulled in locally.
  • Improve warning message for pd.read_snowflake include the creation reason when temp table creation is triggered.
  • Improve performance for DataFrame.set_index, or setting DataFrame.index or Series.index by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current Series/DataFrame object length, a ValueError is no longer raised. Instead, when the Series/DataFrame object is longer than the provided index, the Series/DataFrame's new index is filled with NaN values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.

Bug Fixes

  • Stopped ignoring nanoseconds in pd.Timedelta scalars.
  • Fixed AssertionError in tree of binary operations.
  • Fixed bug in Series.dt.isocalendar using a named Series
  • Fixed inplace argument for Series objects derived from DataFrame columns.
  • Fixed a bug where Series.reindex and DataFrame.reindex did not update the result index's name correctly.
  • Fixed a bug where Series.take did not error when axis=1 was specified.

Release

05 Sep 20:28
Compare
Choose a tag to compare

1.21.1 (2024-09-05)

Snowpark Python API Updates

Bug Fixes

  • Fixed a bug where using to_pandas_batches with async jobs caused an error due to improper handling of waiting for asynchronous query completion.