From 88e172d14d1a3ea8a6ef59ef3f327cd515d17c29 Mon Sep 17 00:00:00 2001 From: Bryce Mecum Date: Thu, 23 Jan 2025 16:29:56 -0800 Subject: [PATCH] Website: Add blog post for 19.0.0 (#580) Co-authored-by: Antoine Pitrou Co-authored-by: Sutou Kouhei Co-authored-by: Adam Reeve Co-authored-by: David Li Co-authored-by: Sutou Kouhei Co-authored-by: Matt Topol Co-authored-by: Rossi Sun Co-authored-by: Alenka Frim --- _posts/2025-01-16-19.0.0-release.md | 239 ++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 _posts/2025-01-16-19.0.0-release.md diff --git a/_posts/2025-01-16-19.0.0-release.md b/_posts/2025-01-16-19.0.0-release.md new file mode 100644 index 00000000000..c6eee12ae8f --- /dev/null +++ b/_posts/2025-01-16-19.0.0-release.md @@ -0,0 +1,239 @@ +--- +layout: post +title: "Apache Arrow 19.0.0 Release" +date: "2025-01-16 00:00:00" +author: pmc +categories: [release] +--- + + +The Apache Arrow team is pleased to announce the 19.0.0 release. This release +covers over 2 months of development work and includes [**202 resolved +issues**][1] on [**330 distinct commits**][2] from [**67 distinct +contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) to +learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 18.1.0 release, Adam Reeve and Laurent Goujon have been invited to +become committers. Gang Wu has been invited to join the Project Management +Committee (PMC). + +Thanks for your contributions and participation in the project! + +## Release Highlights + +A [bug](https://github.com/apache/arrow/issues/45283) has been identified in the +19.0.0 versions of the C++ and Python libraries which prevents reading Parquet +files written by Arrow Rust v53.0.0 or higher. The files written by Arrow Rust +are correct and the bug was in the patch adding support for Parquet's +[SizeStatistics](https://github.com/apache/parquet-format/pull/197) feature to +Arrow C++ and Python. See [#45293](https://github.com/apache/arrow/issues/45283) +for more details including a potential workaround. + +As a result, we plan to create a 19.0.1 release to include a fix for this which +should be available in next few weeks. + +## Columnar Format + +We've added a new experimental specification for representing statistics on +Arrow Arrays as Arrow Arrays. This is useful for preserving and exchanging +statistics between systems such as when converting Parquet data to Arrow. See +[the statistics schema +documentation](https://arrow.apache.org/docs/format/StatisticsSchema.html) for +details. + +We've expanded the Arrow C Device Data Interface to include an experimental +Async Device Stream Interface. While the existing Arrow C Device Data Interface +is a pull-oriented API, the Async interface provides a push-oriented design for +other workflows. See the +[documentation](https://arrow.apache.org/docs/format/CDeviceDataInterface.html#async-device-stream-interface) +for more information. It currently has implementations in the C++ and Go +libraries. + +## Arrow Flight RPC Notes + +The precision of a Timestamp (used for timeouts) is now nanoseconds on all +platforms; previously it was platform-dependent. This may be a breaking change +depending on your use case. +([#44679](https://github.com/apache/arrow/issues/44679)) + +The Python bindings now support various new fields that were added to +FlightEndpoint/FlightInfo (like `expiration_time`). +([#36954](https://github.com/apache/arrow/issues/36954)) + +## C++ Notes + +### Compute + +- It is now possible to cast from a struct type to another struct type with +additional columns, provided the additional columns are nullable +([#44555)](https://github.com/apache/arrow/issues/44555). +- The compute function `expm1` has been added to compute `exp(x) - 1` with better +accuracy when the input value is close to 0 +([#44903](https://github.com/apache/arrow/issues/44903)). +- Hyperbolic trigonometric functions and their reciprocals have also been added. +([#44952](https://github.com/apache/arrow/issues/44952)). +- The new Decimal32 and Decimal64 types have been further supported by allowing +casting between numeric, string, and other decimal types +([#43956](https://github.com/apache/arrow/issues/43956)). + +### Acero + +- Added AVX2 support for decoding row tables in the Swiss join specialization of +hash joins, enabling up to 40% performance improvement for build-heavy +workloads. ([#43693](https://github.com/apache/arrow/issues/43693)) + +### Filesystems + +- The S3 filesystem has gained support for server-side encryption with customer +provided keys, aka SSE-C. +([#43535](https://github.com/apache/arrow/issues/43535)) +- The S3 filesystem also gained an option to disable the SIGPIPE signals that +may be emitted on some network events. +([#44695](https://github.com/apache/arrow/issues/44695)) +- The Azure filesystem now supports SAS token authentication. +([#44308](https://github.com/apache/arrow/issues/44308)). + +### Flight RPC + +- The precision of a Timestamp (used for timeouts) is now nanoseconds on all + platforms; previously it was platform-dependent. This may be a breaking change +depending on your use case. + ([#44679](https://github.com/apache/arrow/issues/44679)) +- The Python bindings now support various new fields that were added to + FlightEndpoint/FlightInfo (like `expiration_time`). + ([#36954](https://github.com/apache/arrow/issues/36954)) +- The UCX backend has been deprecated and is scheduled for removal. + ([#45079](https://github.com/apache/arrow/issues/45079)) + +### Parquet + +- The initial footer read size can now be configured to reduce the number of +potential round-trips on hi-latency filesystems such as S3. +([#45015](https://github.com/apache/arrow/issues/45015)) +- The new `SizeStatistics` format feature has been implemented, though it is +disabled by default when writing. +([#40592](https://github.com/apache/arrow/issues/40592)) +- We've added a new method to the ParquetFileReader class, +[GetReadRanges](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet17ParquetFileReader13GetReadRangesERKNSt6vectorIiEERKNSt6vectorIiEE7int64_t7int64_t), +which can calculate the byte ranges necessary to read a given set of columns and +row groups. This may be useful to pre-buffer file data via caching mechanisms. +([#45092](https://github.com/apache/arrow/issues/45092)) +- We've added `arrow::Result`-returning variants for +`parquet::arrow::OpenFile()` and +`parquet::arrow::FileReader::GetRecordBatchReader()`. +([#44784](https://github.com/apache/arrow/issues/44784), +[#44808](https://github.com/apache/arrow/issues/44808)) + +## C# Notes + +- The `PrimitiveArrayBuilder` constructor has been made public to allow writing + custom builders. ([#23995](https://github.com/apache/arrow/issues/23995)) +- Improved the performance of looking up schema fields by name. + ([#44575](https://github.com/apache/arrow/issues/44575)) + +## Java, Go, and Rust Notes + +The Java, Go, and Rust Go projects have moved to separate repositories outside +the main Arrow [monorepo](https://github.com/apache/arrow). + +- For notes on the latest release of the [Java +implementation](https://github.com/apache/arrow-java), see the latest [Arrow +Java changelog][7]. +- For notes on the latest release of the [Rust + implementation](https://github.com/apache/arrow-rs) see the latest [Arrow Rust + changelog][5]. +- For notes on the latest release of the [Go +implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go +changelog][6]. + +## Linux Packaging Notes + +- Debian: Fixed keyring format to support newer libapt (e.g., used by + Trixie). ([#45118](https://github.com/apache/arrow/issues/45118)) + +## Python Notes + +New features: + +- The upcoming pandas 3.0 [string + dtype](https://pandas.pydata.org/pdeps/0014-string-dtype.html) is now + supported by PyArrow's `to_pandas` routine. In the future, when using pandas >=3.0, + the new pandas behavior will be enabled by default. You can opt into + the new behavior under pandas >=2.3 by setting `pd.options.future.infer_string + = True`. This may be considered a breaking change. + ([#43683](https://github.com/apache/arrow/issues/43683)) +- Support for 32-bit and 64-bit decimal types was added. + ([#44713](https://github.com/apache/arrow/issues/44713)) +- Arrow PyCapsule stream objects are supported in `write_dataset`. + ([#43410](https://github.com/apache/arrow/issues/43410)) +- New Flight features have been exposed. + ([#36954](https://github.com/apache/arrow/issues/36954)) +- Bindings for `JsonExtensionType` and `JsonArray` were added. + ([#44066](https://github.com/apache/arrow/issues/44066)) +- Hyperbolic trigonometry functions added to the Arrow C++ compute kernels are + also available in PyArrow. + ([#44952](https://github.com/apache/arrow/issues/44952)) + +Other improvements: + +- `strings_to_categorical` keyword in `to_pandas` can now be used for string + view type. ([#45175](https://github.com/apache/arrow/issues/45175)) +- `from_buffers` is updated to work with `StringView`. + ([#44651](https://github.com/apache/arrow/issues/44651)) +- Version suffixes are also set for Arrow Python C++ (`libarrow_python*`) + libraries. ([#44614](https://github.com/apache/arrow/issues/44614)) + +## Ruby and C GLib Notes + +### Ruby + +- Added basic support for JRuby with an implementation based on Arrow Java + ([#44346](https://github.com/apache/arrow/pull/44346)). The plan is to release + this as a gem once it covers a base set of features. See + [#45324](https://github.com/apache/arrow/issues/45324) for more information. +- Added support for 32bit and 64bit decimal, binary view, and string view. See + [issues + listed](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22) + in the 19.0.0 milestone for more details. +- Fixed a bug that empty struct list can't be built. + ([#44742](https://github.com/apache/arrow/issues/44742)) +- Fixed a bug that `record_batch[:column].size` raises an exception. + ([#45119](https://github.com/apache/arrow/issues/45119)) + +### C GLib + +- Added support for 32bit and 64bit decimal, binary view, and string view. See + [issues listed in the 19.0.0 + milestone](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22) + for more details. + +[1]: https://github.com/apache/arrow/milestone/66?closed=1 +[2]: {{ site.baseurl }}/release/19.0.0.html#contributors +[3]: {{ site.baseurl }}/release/19.0.0.html#changelog +[4]: {{ site.baseurl }}/docs/r/news/ +[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md +[6]: https://github.com/apache/arrow-go/releases +[7]: https://github.com/apache/arrow-java/releases