Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] 32.0.0 release notes #17677

Open
adarshsanjeev opened this issue Jan 28, 2025 · 0 comments
Open

[DRAFT] 32.0.0 release notes #17677

adarshsanjeev opened this issue Jan 28, 2025 · 0 comments
Assignees

Comments

@adarshsanjeev
Copy link
Contributor

Apache Druid 32.0.0 contains over 335 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 52 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 32.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# ANSI-SQL compatibility and query results

Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed:

  • druid.generic.useDefaultValueForNull=true
  • druid.expressions.useStrictBooleans=false
  • druid.generic.useThreeValueLogicForNativeFilters=false

They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now.

If the configs are set to the legacy behavior, Druid services will fail to start.

If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade.

For more information about how to update your queries, see the migration guide.

#17568 #17609

# Java support

Java support in Druid has been updated:

  • Java 8 support has been removed
  • Java 11 support is deprecated

We recommend that you upgrade to Java 17.

#17466

# Hadoop-based ingestion

Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion.

# New Overlord APIs

APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:

  • Mark all segments of a datasource as unused:
    POST /druid/indexer/v1/datasources/{dataSourceName}

  • Mark all (non-overshadowed) segments of a datasource as used:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}

  • Mark multiple segments as used
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed

  • Mark multiple (non-overshadowed) segments as unused
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused

  • Mark a single segment as used:
    POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

  • Mark a single segment as unused:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

#17545

# 17386

#17386

# Functional area and related changes

This section contains detailed release notes separated by areas.

# Web console

# Explore view (experimental)

Several improvements have been made to the Explore view in the web console:

#17627

# Segment timeline view

The segment timeline is now more interactive and no longer forces day granularity.

#17521

# Other web console improvements

  • The timezoner picker now always shows your timezone #17521
  • UNNEST is now supported for autocomplete suggestions #17521
  • Tables now support less than and greater than filters #17521
  • You can now resize the side panels in the Query view #17387
  • Added the expectedLoadTimeMillis segment loading metric to the web console #17359

# Ingestion

# Numbers for CSV and TSV input formats

Use the new optional config tryParseNumbers for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner:

  • long data type for integer types and
  • double for floating-point numbers

By default, this configuration is set to false, so numeric strings will be treated as strings.

#17082

# Other ingestion improvements

  • Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them #16887
  • JSON-based and SQL-based ingestion now support request headers when using an HTTP input source #16974

# SQL-based ingestion

# Other SQL-based ingestion improvements
  • SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE #17126
  • Improved thread names to include the stage ID and worker number to help with troubleshooting #17324

# Streaming ingestion

# Control how many segments get merged for publishing

You can now use the maxColumsnToMerge property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting.

#17030

# Other streaming ingestion improvements
  • Druid now properly supports early/late rejection periods when stopTasksCount is configured and streaming tasks run longer than the configured task duration #17442
  • Improved segment publishing when resubmitting supervisors or when task publishing takes a long time #17509

# Querying

# Window queries

The following fields are deprecated for window queries that use the MSQ task engine: maxRowsMaterializedInWindow and partitionColumnNames. They will be removed in a future release.

#17433

# Join hints

SQL JOIN queries now include hints. This allows queries to hint the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.

#17541

# Other querying improvements

  • Added automatic query prioritization based on the period of the segments scanned in a query. You can set the duration threshold in ISO format using druid.query.scheduler.prioritization.segmentRangeThreshold #17009
  • Improved error handling for incomplete queries. A trailer header to indicate an error is returned now #16672
  • Improved scan queries to account for column types in more situations #17463
  • Improved lookups so that they can now iterate over fetched data #17212
  • Improved projections so that they can contain only aggregators and no grouping columns #17484
  • Removed microseconds as a supported unit for EXTRACT #17247

# Cluster management

# Reduced metadata IO

The Overlord runtime property druid.indexer.tasklock.batchAllocationReduceMetadataIO can help reduce IO during segment allocation. Setting this flag to true (default value) allows the Overlord to fetch only necessary segment payloads during segment allocation.

#17496

# Other cluster management improvements

  • Druid can now run non-G1 Garbage Collectors with JAVA_OPTS #17078
  • You no longer have to configure a temporary storage directory on the Middle Manager for durable storage or exports. If it isn't configured, Druid uses the task directory #17015 #17335
  • Improved autoscaling on supervisors so that tasks don't get published needlessly #17335
  • Improved recovery time for Overlord leadership after ZooKeeper are bounced #17535
  • Improved Druid to be more resilient of Druid service leadership changes due to ZooKeeper outages #17546
  • Removed the following unused Coordinator dynamic configs: mergeBytesLimit and mergeSegmentsLimit #17384

# Data management

# Sorting columns for compaction with the MSQ task engine

Compaction that uses the MSQ task engine now supports sorting segments with non-time columns. If forceSegmentSortByTime is set in the compaction config or the inferred schema, the following happens:

  • Skip adding __time explicitly as the first column to the dimension schema since it already comes as part of the schema
  • Ensure column mappings propagate __time in the order specified by the schema
  • Set forceSegmentSortByTime in the MSQ query context.

# Other data management improvements

  • Improved centeralized datasource schemas so that different permutations of the same column order do not result in distinct schemas #17044
  • Changed compaction tasks to always handle multivalue dimensions as arrays if the column schema is not explicitly specified #17110

# Metrics and monitoring

# New metrics for GroupByStatsMonitor:

Druid now emits the folowing metrics for GroupBy queries:

mergeBuffer/used: Number of merge buffers used
mergeBuffer/acquisitionTimeNs: Total time required to acquire merge buffer
mergeBuffer/acquisition: Number of queries that acquired a batch of merge buffers
groupBy/spilledQueries: Number of queries that spilled onto the disk
groupBy/spilledBytes-> Spilled bytes on the disk
groupBy/mergeDictionarySize: Size of the merging dictionary

#17360

# CgroupV2 monitors (experimental)

The following monitors for cgroupv2 are now available:

  • CPU: org.apache.druid.java.util.metrics.CgroupV2CpuMonitor
  • disk usage org.apache.druid.java.util.metrics.CgroupV2DiskMonitor
  • memory org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor

#16905

# Other metrics and monitoring improvements

  • Added ingest/notices/queueSize, ingest/notices/time, and ingest/pause/time metrics to the statsd emitter #17487 #17468
  • Added duty group as a dimension for the coordinator.global.time metric for the statsd-emitter #17320
    -The service/heartbeat metric now reports the status on the Peon #17488
  • Changed real-time segment metrics so that they are for each Sink instead of for each FireHydrant. This is a return to emission behavior prior to improvements to real-time query performance made in 30.0.0 #17170
  • Changed query stats to be first before intervals in getNativeQueryLine logging so that the stats are retained if the query object gets truncated #17326

# Extensions

# Delta Lake

  • The Delta Lake input source now supports decimal data types and is handled as a double. If the value cannot fit within a double, ingest it as a string #17376
  • You can now filter by snap shot version even if if the version is 0 #17367

# gRPC queries

A new contributor extension enabled a gRPC API for SQL and native queries, which means that gRPC-based clients can use the extension to issue SQL queries. Use this extension for simple queries.

For more information, see gRPC query extension for Druid.

#15982

# Kubernetes

  • Middle Manger-less ingestion using Kubernetes is now more resilient to Overlord restarts #17446
  • You can now pass empty arrays to type and dataSource keys in selector based pod template selection strategy #17400
  • Improved the TaskRunner to expose the getMaximumCapacity field #17107

# Iceberg

The Iceberg extension now supports the AWS Glue Iceberg catalog.

#17392

# Documentation improvements

# Upgrade notes and incompatible changes

# Upgrade notes

# Front-coded dictionaries

In Druid 32.0.0, the front coded dictionaries feature will be turned on by default. Front-coded dictionaries reduce storage and improve performance by optimizing for strings where the front part looks similar.

Once this feature is on, you cannot easily downgrade to an earlier version that does not support the feature.

For more information, see Migration guide: front-coded dictionaries.

If you're already using this feature, you don't need to take any action.

# Incompatible changes

# Developer notes

  • Improved dependency support between extensions. When an extension has a dependency on another extension, it now tries to use the dependency's class loader to find classes required classes #16973

# Dependency updates

The following dependencies have had their versions bumped:

  • javax.xml.bind is no longer used. Druid now uses jakarta.xml.bind #17370
  • Several dependencies for the web console have had their dependencies updated. For a full list, see #17381, #17365, #17363
  • Removed file-loader dependency for the web console #17346
  • Guice from 4.2.2 to 5.1.0
  • git-commit-id-maven-plugin from 4.9.10 to 9.0.1 #17571
  • Netty from version 4.1.108.Final to 4.1.116.Final
@adarshsanjeev adarshsanjeev self-assigned this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant