Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
Release
Release
1.29.0 (2025-03-05)
Snowpark Python API Updates
New Features
- Added support for the following AI-powered functions in
functions.py
(Private Preview):ai_filter
ai_agg
summarize_agg
- Added support for the new FILE SQL type support, with the following related functions in
functions.py
(Private Preview):fl_get_content_type
fl_get_etag
fl_get_file_type
fl_get_last_modified
fl_get_relative_path
fl_get_scoped_file_url
fl_get_size
fl_get_stage
fl_get_stage_file_url
fl_is_audio
fl_is_compressed
fl_is_document
fl_is_image
fl_is_video
- Added support for importing third-party packages from PyPi using Artifact Repository (Private Preview):
- Use keyword arguments
artifact_repository
andartifact_repository_packages
to specify your artifact repository and packages respectively when registering stored procedures or user defined functions. - Supported APIs are:
Session.sproc.register
Session.udf.register
Session.udaf.register
Session.udtf.register
functions.sproc
functions.udf
functions.udaf
functions.udtf
functions.pandas_udf
functions.pandas_udtf
- Use keyword arguments
Bug Fixes
- Fixed a bug where creating a Dataframe with large number of values raised
Unsupported feature 'SCOPED_TEMPORARY'.
error if thread-safe session was disabled. - Fixed a bug where
df.describe
raised internal SQL execution error when the dataframe is created from reading a stage file and CTE optimization is enabled. - Fixed a bug where
df.order_by(A).select(B).distinct()
would generate invalid SQL when simplified query generation was enabled usingsession.conf.set("use_simplified_query_generation", True)
.- Disabled simplified query generation by default.
Improvements
- Improved version validation warnings for
snowflake-snowpark-python
package compatibility when registering stored procedures. Now, warnings are only triggered if the major or minor version does not match, while bugfix version differences no longer generate warnings. - Bumped cloudpickle dependency to also support
cloudpickle==3.0.0
in addition to previous versions.
Snowpark Local Testing Updates
New Features
- Added support for literal values to
range_between
window function.
Snowpark pandas API Updates
New Features
- Added support for applying Snowflake Cortex functions
ClassifyText
,Translate
, andExtractAnswer
.
Improvements
- Improve error message for
pd.to_snowflake
,DataFrame.to_snowflake
, andSeries.to_snowflake
when the table does not exist. - Improve readability of docstring for the
if_exists
parameter inpd.to_snowflake
,DataFrame.to_snowflake
, andSeries.to_snowflake
. - Improve error message for all pandas functions that use UDFs with Snowpark objects.
Bug Fixes
- Fixed a bug in
Series.rename_axis
where anAttributeError
was being raised. - Fixed a bug where
pd.get_dummies
didn't ignore NULL/NaN values by default. - Fixed a bug where repeated calls to
pd.get_dummies
results in 'Duplicated column name error'. - Fixed a bug in
pd.get_dummies
where passing list of columns generated incorrect column labels in output DataFrame. - Update
pd.get_dummies
to return bool values instead of int.
Release
1.28.0 (2025-02-20)
Snowpark Python API Updates
New Features
- Added support for the following functions in
functions.py
normal
randn
- Added support for
allow_missing_columns
parameter toDataframe.union_by_name
andDataframe.union_all_by_name
.
Improvements
- Improved the random object name generation to avoid collisions.
- Improved query generation for
Dataframe.distinct
to generateSELECT DISTINCT
instead ofSELECT
withGROUP BY
all columns. To disable this feature, setsession.conf.set("use_simplified_query_generation", False)
.
Deprecations
- Deprecated Snowpark Python function
snowflake_cortex_summarize
. Users can install snowflake-ml-python and use the snowflake.cortex.summarize function instead. - Deprecated Snowpark Python function
snowflake_cortex_sentiment
. Users can install snowflake-ml-python and use the snowflake.cortex.sentiment function instead.
Bug Fixes
- Fixed a bug where session-level query tag was overwritten by a stacktrace for dataframes that generate multiple queries. Now, the query tag will only be set to the stacktrace if
session.conf.set("collect_stacktrace_in_query_tag", True)
. - Fixed a bug in
Session._write_pandas
where it was erroneously passinguse_logical_type
parameter toSession._write_modin_pandas_helper
when writing a Snowpark pandas object. - Fixed a bug in options sql generation that could cause multiple values to be formatted incorrectly.
- Fixed a bug in
Session.catalog
where empty strings for database or schema were not handled correctly and were generating erroneous sql statements.
Experimental Features
- Added support for writing pyarrow Tables to Snowflake tables.
Snowpark pandas API Updates
New Features
- Added support for applying Snowflake Cortex functions
Summarize
andSentiment
. - Added support for list values in
Series.str.get
.
Bug Fixes
- Fixed a bug in
apply
where kwargs were not being correctly passed into the applied function.
Snowpark Local Testing Updates
New Features
- Added support for the following functions
hour
minute
- Added support for NULL_IF parameter to csv reader.
- Added support for
date_format
,datetime_format
, andtimestamp_format
options when loading csvs.
Bug Fixes
- Fixed a bug in Dataframe.join that caused columns to have incorrect typing.
- Fixed a bug in when statements that caused incorrect results in the otherwise clause.
Release
1.27.0 (2025-02-03)
Snowpark Python API Updates
New Features
- Added support for the following functions in
functions.py
array_reverse
divnull
map_cat
map_contains_key
map_keys
nullifzero
snowflake_cortex_sentiment
acosh
asinh
atanh
bit_length
bitmap_bit_position
bitmap_bucket_number
bitmap_construct_agg
cbrt
equal_null
from_json
ifnull
localtimestamp
max_by
min_by
nth_value
nvl
octet_length
position
regr_avgx
regr_avgy
regr_count
regr_intercept
regr_r2
regr_slope
regr_sxx
regr_sxy
regr_syy
try_to_binary
base64
base64_decode_string
base64_encode
editdistance
hex
hex_encode
instr
log1p
log2
log10
percentile_approx
unbase64
- Added support for specifying a schema string (including implicit struct syntax) when calling
DataFrame.create_dataframe
. - Added support for
DataFrameWriter.insert_into/insertInto
. This method also supports local testing mode. - Added support for
DataFrame.create_temp_view
to create a temporary view. It will fail if the view already exists. - Added support for multiple columns in the functions
map_cat
andmap_concat
. - Added an option
keep_column_order
for keeping original column order inDataFrame.with_column
andDataFrame.with_columns
. - Added options to column casts that allow renaming or adding fields in StructType columns.
- Added support for
contains_null
parameter to ArrayType. - Added support for creating a temporary view via
DataFrame.create_or_replace_temp_view
from a DataFrame created by reading a file from a stage. - Added support for
value_contains_null
parameter to MapType. - Added
interactive
to telemetry that indicates whether the current environment is an interactive one. - Allow
session.file.get
in a Native App to read file paths starting with/
from the current version - Added support for multiple aggregation functions after
DataFrame.pivot
.
Experimental Features
- Added
Catalog
class to manage snowflake objects. It can be accessed viaSession.catalog
.snowflake.core
is a dependency required for this feature.
- Allow user input schema when reading JSON file on stage.
- Added support for specifying a schema string (including implicit struct syntax) when calling
DataFrame.create_dataframe
.
Improvements
- Updated README.md to include instructions on how to verify package signatures using
cosign
.
Bug Fixes
- Fixed a bug in local testing mode that caused a column to contain None when it should contain 0.
- Fixed a bug in
StructField.from_json
that prevented TimestampTypes withtzinfo
from being parsed correctly. - Fixed a bug in function
date_format
that caused an error when the input column was date type or timestamp type. - Fixed a bug in dataframe that null value can be inserted in a non-nullable column.
- Fixed a bug in
replace
andlit
which raised type hint assertion error when passingColumn
expression objects. - Fixed a bug in
pandas_udf
andpandas_udtf
wheresession
parameter was erroneously ignored. - Fixed a bug that raised incorrect type conversion error for system function called through
session.call
.
Snowpark pandas API Updates
New Features
- Added support for
Series.str.ljust
andSeries.str.rjust
. - Added support for
Series.str.center
. - Added support for
Series.str.pad
. - Added support for applying Snowpark Python function
snowflake_cortex_sentiment
. - Added support for
DataFrame.map
. - Added support for
DataFrame.from_dict
andDataFrame.from_records
. - Added support for mixed case field names in struct type columns.
- Added support for
SeriesGroupBy.unique
- Added support for
Series.dt.strftime
with the following directives:- %d: Day of the month as a zero-padded decimal number.
- %m: Month as a zero-padded decimal number.
- %Y: Year with century as a decimal number.
- %H: Hour (24-hour clock) as a zero-padded decimal number.
- %M: Minute as a zero-padded decimal number.
- %S: Second as a zero-padded decimal number.
- %f: Microsecond as a decimal number, zero-padded to 6 digits.
- %j: Day of the year as a zero-padded decimal number.
- %X: Locale’s appropriate time representation.
- %%: A literal '%' character.
- Added support for
Series.between
. - Added support for
include_groups=False
inDataFrameGroupBy.apply
. - Added support for
expand=True
inSeries.str.split
. - Added support for
DataFrame.pop
andSeries.pop
. - Added support for
first
andlast
inDataFrameGroupBy.agg
andSeriesGroupBy.agg
. - Added support for
Index.drop_duplicates
. - Added support for aggregations
"count"
,"median"
,np.median
,
"skew"
,"std"
,np.std
"var"
, andnp.var
in
pd.pivot_table()
,DataFrame.pivot_table()
, andpd.crosstab()
.
Improvements
- Improve performance of
DataFrame.map
,Series.apply
andSeries.map
methods by mapping numpy functions to snowpark functions if possible. - Added documentation for
DataFrame.map
. - Improve performance of
DataFrame.apply
by mapping numpy functions to snowpark functions if possible. - Added documentation on the extent of Snowpark pandas interoperability with scikit-learn.
- Infer return type of functions in
Series.map
,Series.apply
andDataFrame.map
if type-hint is not provided. - Added
call_count
to telemetry that counts method calls including interchange protocol calls.
Release
1.26.0 (2024-12-05)
Snowpark Python API Updates
New Features
- Added support for property
version
and class methodget_active_session
forSession
class. - Added new methods and variables to enhance data type handling and JSON serialization/deserialization:
- To
DataType
, its derived classes, andStructField
:type_name
: Returns the type name of the data.simple_string
: Provides a simple string representation of the data.json_value
: Returns the data as a JSON-compatible value.json
: Converts the data to a JSON string.
- To
ArrayType
,MapType
,StructField
,PandasSeriesType
,PandasDataFrameType
andStructType
:from_json
: Enables these types to be created from JSON data.
- To
MapType
:keyType
: keys of the mapvalueType
: values of the map
- To
- Added support for method
appName
inSessionBuilder
. - Added support for
include_nulls
argument inDataFrame.unpivot
. - Added support for following functions in
functions.py
:size
to get size of array, object, or map columns.collect_list
an alias ofarray_agg
.substring
makeslen
argument optional.
- Added parameter
ast_enabled
to session for internal usage (default:False
).
Improvements
- Added support for specifying the following to
DataFrame.create_or_replace_dynamic_table
:iceberg_config
A dictionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
- Added support for nested data types to
DataFrame.print_schema
- Added support for
level
parameter toDataFrame.print_schema
- Improved flexibility of
DataFrameReader
andDataFrameWriter
API by adding support for the following:- Added
format
method toDataFrameReader
andDataFrameWriter
to specify file format when loading or unloading results. - Added
load
method toDataFrameReader
to work in conjunction withformat
. - Added
save
method toDataFrameWriter
to work in conjunction withformat
. - Added support to read keyword arguments to
options
method forDataFrameReader
andDataFrameWriter
.
- Added
- Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11,
cloudpickle==2.2.1
remains the only supported version.
Bug Fixes
- Removed warnings that dynamic pivot features were in private preview, because
dynamic pivot is now generally available. - Fixed a bug in
session.read.options
whereFalse
Boolean values were incorrectly parsed asTrue
in the generated file format.
Dependency Updates
- Added a runtime dependency on
python-dateutil
.
Snowpark pandas API Updates
New Features
- Added partial support for
Series.map
whenarg
is a pandasSeries
or a
collections.abc.Mapping
. No support for instances ofdict
that implement
__missing__
but are not instances ofcollections.defaultdict
. - Added support for
DataFrame.align
andSeries.align
foraxis=1
andaxis=None
. - Added support for
pd.json_normalize
. - Added support for
GroupBy.pct_change
withaxis=0
,freq=None
, andlimit=None
. - Added support for
DataFrameGroupBy.__iter__
andSeriesGroupBy.__iter__
. - Added support for
np.sqrt
,np.trunc
,np.floor
, numpy trig functions,np.exp
,np.abs
,np.positive
andnp.negative
. - Added partial support for the dataframe interchange protocol method
DataFrame.__dataframe__()
.
Bug Fixes
- Fixed a bug in
df.loc
where setting a single column from a series results in unexpectedNone
values.
Improvements
- Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.
- Improved documentation for pd.read_excel.
Release
1.25.0 (2024-11-13)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.dataframe
:map
- Added support for passing parameter
include_error
toSession.query_history
to record queries that have error during execution.
Improvements
- When target stage is not set in profiler, a default stage from
Session.get_session_stage
is used instead of raisingSnowparkSQLException
. - Allowed lower case or mixed case input when calling
Session.stored_procedure_profiler.set_active_profiler
. - Added distributed tracing using open telemetry APIs for action function in
DataFrame
:cache_result
- Removed opentelemetry warning from logging.
Bug Fixes
- Fixed the pre-action and post-action query propagation when
In
expression were used in selects. - Fixed a bug that raised error
AttributeError
while callingSession.stored_procedure_profiler.get_output
whenSession.stored_procedure_profiler
is disabled.
Dependency Updates
- Added a dependency on
protobuf>=5.28
andtzlocal
at runtime. - Added a dependency on
protoc-wheel-0
for the development profile. - Require
snowflake-connector-python>=3.12.0, <4.0.0
(was>=3.10.0
).
Snowpark pandas API Updates
Dependency Updates
- Updated
modin
from 0.28.1 to 0.30.1. - Added support for all
pandas
2.2.x versions.
New Features
- Added support for
Index.to_numpy
. - Added support for
DataFrame.align
andSeries.align
foraxis=0
. - Added support for
size
inGroupBy.aggregate
,DataFrame.aggregate
, andSeries.aggregate
. - Added support for
snowflake.snowpark.functions.window
- Added support for
pd.read_pickle
(Uses native pandas for processing). - Added support for
pd.read_html
(Uses native pandas for processing). - Added support for
pd.read_xml
(Uses native pandas for processing). - Added support for aggregation functions
"size"
andlen
inGroupBy.aggregate
,DataFrame.aggregate
, andSeries.aggregate
. - Added support for list values in
Series.str.len
.
Bug Fixes
- Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g.
pd.DataFrame([0]).agg(np.mean)
) would fail to transpose the result. - Fixed bugs where
DataFrame.dropna()
would:- Treat an empty
subset
(e.g.[]
) as if it specified all columns instead of no columns. - Raise a
TypeError
for a scalarsubset
instead of filtering on just that column. - Raise a
ValueError
for asubset
of typepandas.Index
instead of filtering on the columns in the index.
- Treat an empty
- Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate
TableNotFoundError
when using dynamic pivot in notebook environment. - Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.
Improvements
- Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.
- Improve get_dummies performance by flattening the pivot with join.
Snowpark Local Testing Updates
New Features
- Added support for patching functions that are unavailable in the
snowflake.snowpark.functions
module. - Added support for
snowflake.snowpark.functions.any_value
Bug Fixes
- Fixed a bug where
Table.update
could not handleVariantType
,MapType
, andArrayType
data types. - Fixed a bug where column aliases were incorrectly resolved in
DataFrame.join
, causing errors when selecting columns from a joined DataFrame. - Fixed a bug where
Table.update
andTable.merge
could fail if the target table's index was not the defaultRangeIndex
.
Release
1.24.0 (2024-10-28)
Snowpark Python API Updates
New Features
- Updated
Session
class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the sameSession
object.- The feature is disabled by default and can be enabled by setting
FEATURE_THREAD_SAFE_PYTHON_SESSION
toTrue
for account. - Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
- When enabled, some internally created temporary table names returned from
DataFrame.queries
API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
- The feature is disabled by default and can be enabled by setting
- Added support for 'Service' domain to
session.lineage.trace
API. - Added support for
copy_grants
parameter when registering UDxF and stored procedures. - Added support for the following methods in
DataFrameWriter
to support daisy-chaining:option
options
partition_by
- Added support for
snowflake_cortex_summarize
.
Improvements
- Improved the following new capability for function
snowflake.snowpark.functions.array_remove
it is now possible to use in python. - Disables sql simplification when sort is performed after limit.
- Previously,
df.sort().limit()
anddf.limit().sort()
generates the same query with sort in front of limit. Now,df.limit().sort()
will generate query that readsdf.limit().sort()
. - Improve performance of generated query for
df.limit().sort()
, because limit stops table scanning as soon as the number of records is satisfied.
- Previously,
Bug Fixes
- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
- Fixed a bug in
DataFrame.analytics.time_series_agg
function to handle multiple data points in same sliding interval. - Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.
Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.8. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
Snowpark pandas API Updates
New Features
- Added support for
np.subtract
,np.multiply
,np.divide
, andnp.true_divide
. - Added support for tracking usages of
__array_ufunc__
. - Added numpy compatibility support for
np.float_power
,np.mod
,np.remainder
,np.greater
,np.greater_equal
,np.less
,np.less_equal
,np.not_equal
, andnp.equal
. - Added numpy compatibility support for
np.log
,np.log2
, andnp.log10
- Added support for
DataFrameGroupBy.bfill
,SeriesGroupBy.bfill
,DataFrameGroupBy.ffill
, andSeriesGroupBy.ffill
. - Added support for
on
parameter withResampler
. - Added support for timedelta inputs in
value_counts()
. - Added support for applying Snowpark Python function
snowflake_cortex_summarize
. - Added support for
DataFrame.attrs
andSeries.attrs
. - Added support for
DataFrame.style
.
Improvements
- Improved generated SQL query for
head
andiloc
when the row key is a slice. - Improved error message when passing an unknown timezone to
tz_convert
andtz_localize
inSeries
,DataFrame
,Series.dt
, andDatetimeIndex
. - Improved documentation for
tz_convert
andtz_localize
inSeries
,DataFrame
,Series.dt
, andDatetimeIndex
to specify the supported timezone formats. - Added additional kwargs support for
df.apply
andseries.apply
( as well asmap
andapplymap
) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object. - Improved generated SQL query for
iloc
andiat
when the row key is a scalar. - Removed all joins in
iterrows
. - Improved documentation for
Series.map
to reflect the unsupported features. - Added support for
np.may_share_memory
which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.
Bug Fixes
- Fixed a bug where
DataFrame
andSeries
pct_change()
would raiseTypeError
when input contained timedelta columns. - Fixed a bug where
replace()
would sometimes propagateTimedelta
types incorrectly throughreplace()
. Instead raiseNotImplementedError
forreplace()
onTimedelta
. - Fixed a bug where
DataFrame
andSeries
round()
would raiseAssertionError
forTimedelta
columns. Instead raiseNotImplementedError
forround()
onTimedelta
. - Fixed a bug where
reindex
fails when the new index is a Series with non-overlapping types from the original index. - Fixed a bug where calling
__getitem__
on a DataFrameGroupBy object always returned a DataFrameGroupBy object ifas_index=False
. - Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising
NotImplementedError
. - Fixed a bug where
DataFrame.shift()
on axis=0 and axis=1 would fail to propagate timedelta types. DataFrame.abs()
,DataFrame.__neg__()
,DataFrame.stack()
, andDataFrame.unstack()
now raiseNotImplementedError
for timedelta inputs instead of failing to propagate timedelta types.
Snowpark Local Testing Updates
Bug Fixes
- Fixed a bug where
DataFrame.alias
raisesKeyError
for input column name. - Fixed a bug where
to_csv
on Snowflake stage fails when data contains empty strings.
Release
1.23.0 (2024-10-09)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.functions
:make_interval
- Added support for using Snowflake Interval constants with
Window.range_between()
when the order by column is TIMESTAMP or DATE type. - Added support for file writes. This feature is currently in private preview.
- Added
thread_id
toQueryRecord
to track the thread id submitting the query history. - Added support for
Session.stored_procedure_profiler
.
Improvements
Bug Fixes
- Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning
'NoneType' has no len() when trying to read default values from function
.
Snowpark pandas API Updates
New Features
- Added support for
TimedeltaIndex.mean
method. - Added support for some cases of aggregating
Timedelta
columns onaxis=0
withagg
oraggregate
. - Added support for
by
,left_by
,right_by
,left_index
, andright_index
forpd.merge_asof
. - Added support for passing parameter
include_describe
toSession.query_history
. - Added support for
DatetimeIndex.mean
andDatetimeIndex.std
methods. - Added support for
Resampler.asfreq
,Resampler.indices
,Resampler.nunique
, andResampler.quantile
. - Added support for
resample
frequencyW
,ME
,YE
withclosed = "left"
. - Added support for
DataFrame.rolling.corr
andSeries.rolling.corr
forpairwise = False
and intwindow
. - Added support for string time-based
window
andmin_periods = None
forRolling
. - Added support for
DataFrameGroupBy.fillna
andSeriesGroupBy.fillna
. - Added support for constructing
Series
andDataFrame
objects with the lazyIndex
object asdata
,index
, andcolumns
arguments. - Added support for constructing
Series
andDataFrame
objects withindex
andcolumn
values not present inDataFrame
/Series
data
. - Added support for
pd.read_sas
(Uses native pandas for processing). - Added support for applying
rolling().count()
andexpanding().count()
toTimedelta
series and columns. - Added support for
tz
in bothpd.date_range
andpd.bdate_range
. - Added support for
Series.items
. - Added support for
errors="ignore"
inpd.to_datetime
. - Added support for
DataFrame.tz_localize
andSeries.tz_localize
. - Added support for
DataFrame.tz_convert
andSeries.tz_convert
. - Added support for applying Snowpark Python functions (e.g.,
sin
) inSeries.map
,Series.apply
,DataFrame.apply
andDataFrame.applymap
.
Improvements
- Improved
to_pandas
to persist the original timezone offset for TIMESTAMP_TZ type. - Improved
dtype
results for TIMESTAMP_TZ type to show correct timezone offset. - Improved
dtype
results for TIMESTAMP_LTZ type to show correct timezone. - Improved error message when passing non-bool value to
numeric_only
for groupby aggregations. - Removed unnecessary warning about sort algorithm in
sort_values
. - Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
- Improved warning messages for operations that lead to materialization with inadvertent slowness.
- Removed unnecessary warning message about
convert_dtype
inSeries.apply
.
Bug Fixes
- Fixed a bug where an
Index
object created from aSeries
/DataFrame
incorrectly updates theSeries
/DataFrame
's index name after an inplace update has been applied to the originalSeries
/DataFrame
. - Suppressed an unhelpful
SettingWithCopyWarning
that sometimes appeared when printingTimedelta
columns. - Fixed
inplace
argument forSeries
objects derived from otherSeries
objects. - Fixed a bug where
Series.sort_values
failed if series name overlapped with index column name. - Fixed a bug where transposing a dataframe would map
Timedelta
index levels to integer column levels. - Fixed a bug where
Resampler
methods on timedelta columns would produce integer results. - Fixed a bug where
pd.to_numeric()
would leaveTimedelta
inputs asTimedelta
instead of converting them to integers. - Fixed
loc
set when setting a single row, or multiple rows, of a DataFrame with a Series value.
Release
1.22.1 (2024-09-11)
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
1.22.0 (2024-09-10)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.functions
:array_remove
ln
Improvements
- Improved documentation for
Session.write_pandas
by makinguse_logical_type
option more explicit. - Added support for specifying the following to
DataFrameWriter.save_as_table
:enable_schema_evolution
data_retention_time
max_data_extension_time
change_tracking
copy_grants
iceberg_config
A dicitionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
- Added support for specifying the following to
DataFrameWriter.copy_into_table
:iceberg_config
A dicitionary that can hold the following iceberg configuration options:external_volume
catalog
base_location
catalog_sync
storage_serialization_policy
- Added support for specifying the following parameters to
DataFrame.create_or_replace_dynamic_table
:mode
refresh_mode
initialize
clustering_keys
is_transient
data_retention_time
max_data_extension_time
Bug Fixes
- Fixed a bug in
session.read.csv
that caused an error when settingPARSE_HEADER = True
in an externally defined file format. - Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in
session.get_session_stage
that referenced a non-existing stage after switching database or schema. - Fixed a bug where calling
DataFrame.to_snowpark_pandas
without explicitly initializing the Snowpark pandas plugin caused an error. - Fixed a bug where using the
explode
function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on theouter
parameter.
Snowpark Local Testing Updates
New Features
- Added support for type coercion when passing columns as input to UDF calls.
- Added support for
Index.identical
.
Bug Fixes
- Fixed a bug where the truncate mode in
DataFrameWriter.save_as_table
incorrectly handled DataFrames containing only a subset of columns from the existing table. - Fixed a bug where function
to_timestamp
does not set the default timezone of the column datatype.
Snowpark pandas API Updates
New Features
- Added limited support for the
Timedelta
type, including the following features. Snowpark pandas will raiseNotImplementedError
for unsupportedTimedelta
use cases.- supporting tracking the Timedelta type through
copy
,cache_result
,shift
,sort_index
,assign
,bfill
,ffill
,fillna
,compare
,diff
,drop
,dropna
,duplicated
,empty
,equals
,insert
,isin
,isna
,items
,iterrows
,join
,len
,mask
,melt
,merge
,nlargest
,nsmallest
,to_pandas
. - converting non-timedelta to timedelta via
astype
. NotImplementedError
will be raised for the rest of methods that do not supportTimedelta
.- support for subtracting two timestamps to get a Timedelta.
- support indexing with Timedelta data columns.
- support for adding or subtracting timestamps and
Timedelta
. - support for binary arithmetic between two
Timedelta
values. - support for binary arithmetic and comparisons between
Timedelta
values and numeric values. - support for lazy
TimedeltaIndex
. - support for
pd.to_timedelta
. - support for
GroupBy
aggregationsmin
,max
,mean
,idxmax
,idxmin
,std
,sum
,median
,count
,any
,all
,size
,nunique
,head
,tail
,aggregate
. - support for
GroupBy
filtrationsfirst
andlast
. - support for
TimedeltaIndex
attributes:days
,seconds
,microseconds
andnanoseconds
. - support for
diff
with timestamp columns onaxis=0
andaxis=1
- support for
TimedeltaIndex
methods:ceil
,floor
andround
. - support for
TimedeltaIndex.total_seconds
method.
- supporting tracking the Timedelta type through
- Added support for index's arithmetic and comparison operators.
- Added support for
Series.dt.round
. - Added documentation pages for
DatetimeIndex
. - Added support for
Index.name
,Index.names
,Index.rename
, andIndex.set_names
. - Added support for
Index.__repr__
. - Added support for
DatetimeIndex.month_name
andDatetimeIndex.day_name
. - Added support for
Series.dt.weekday
,Series.dt.time
, andDatetimeIndex.time
. - Added support for
Index.min
andIndex.max
. - Added support for
pd.merge_asof
. - Added support for
Series.dt.normalize
andDatetimeIndex.normalize
. - Added support for
Index.is_boolean
,Index.is_integer
,Index.is_floating
,Index.is_numeric
, andIndex.is_object
. - Added support for
DatetimeIndex.round
,DatetimeIndex.floor
andDatetimeIndex.ceil
. - Added support for
Series.dt.days_in_month
andSeries.dt.daysinmonth
. - Added support for
DataFrameGroupBy.value_counts
andSeriesGroupBy.value_counts
. - Added support for
Series.is_monotonic_increasing
andSeries.is_monotonic_decreasing
. - Added support for
Index.is_monotonic_increasing
andIndex.is_monotonic_decreasing
. - Added support for
pd.crosstab
. - Added support for
pd.bdate_range
and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for bothpd.date_range
andpd.bdate_range
. - Added support for lazy
Index
objects aslabels
inDataFrame.reindex
andSeries.reindex
. - Added support for
Series.dt.days
,Series.dt.seconds
,Series.dt.microseconds
, andSeries.dt.nanoseconds
. - Added support for creating a
DatetimeIndex
from anIndex
of numeric or string type. - Added support for string indexing with
Timedelta
objects. - Added support for
Series.dt.total_seconds
method.
Improvements
- Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
- Refactored
quoted_identifier_to_snowflake_type
to avoid making metadata queries if the types have been cached locally. - Improved
pd.to_datetime
to handle all local input cases. - Create a lazy index from another lazy index without pulling data to client.
- Raised
NotImplementedError
for Index bitwise operators. - Display a more clear error message when
Index.names
is set to a non-like-like object. - Raise a warning whenever MultiIndex values are pulled in locally.
- Improve warning message for
pd.read_snowflake
include the creation reason when temp table creation is triggered. - Improve performance for
DataFrame.set_index
, or settingDataFrame.index
orSeries.index
by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the currentSeries
/DataFrame
object length, aValueError
is no longer raised. Instead, when theSeries
/DataFrame
object is longer than the provided index, theSeries
/DataFrame
's new index is filled withNaN
values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.
Bug Fixes
- Stopped ignoring nanoseconds in
pd.Timedelta
scalars. - Fixed AssertionError in tree of binary operations.
- Fixed bug in
Series.dt.isocalendar
using a named Series - Fixed
inplace
argument for Series objects derived from DataFrame columns. - Fixed a bug where
Series.reindex
andDataFrame.reindex
did not update the result index's name correctly. - Fixed a bug where
Series.take
did not error whenaxis=1
was specified.
Release
1.21.1 (2024-09-05)
Snowpark Python API Updates
Bug Fixes
- Fixed a bug where using
to_pandas_batches
with async jobs caused an error due to improper handling of waiting for asynchronous query completion.