HIVE-29133: Support Z-order for Iceberg tables via CREATE TABLE #6095

kokila-19 · 2025-09-23T06:42:17Z

What changes were proposed in this pull request?

Added Z-order support for Iceberg tables via CREATE TABLE DDL

Why are the changes needed?

To support zorder indexing which will improve data clustering and query performance on Iceberg tables.

Does this PR introduce any user-facing change?

Yes , new syntax support
CREATE TABLE test_zorder (
id int,
text string)
WRITE LOCALLY ZORDER by id, text
STORED BY iceberg
STORED As orc;

How was this patch tested?

qtest

Copilot

Pull Request Overview

This PR adds Z-order support for Iceberg tables via CREATE TABLE DDL syntax, enabling improved data clustering and query performance. The implementation introduces a new syntax "WRITE LOCALLY ZORDER BY" alongside supporting infrastructure for Z-order indexing.

Adds new DDL syntax "WRITE LOCALLY ZORDER BY" for creating Iceberg tables with Z-order sorting
Implements Z-order functionality through a custom UDF that uses Iceberg's ZOrderByteUtils
Extends the parser and analyzer to handle Z-order specifications in CREATE TABLE statements

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
CreateTableAnalyzer.java	Adds Z-order AST processing and JSON serialization for CREATE TABLE
ZorderFields.java	New data structure for holding Z-order field metadata
ZOrderFieldDesc.java	New descriptor class for individual Z-order columns
HiveParser.g	Adds grammar rules for ZORDER syntax parsing
GenericUDFIcebergZorder.java	New UDF implementing Z-order value calculation using Iceberg utilities
HiveIcebergStorageHandler.java	Integrates Z-order sorting into Iceberg write operations
BaseHiveIcebergMetaHook.java	Handles Z-order metadata persistence in table properties

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-27T13:33:23Z

...rg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/udf/GenericUDFIcebergZorder.java

+    if (arguments.length < 2) {
+      throw new UDFArgumentException("iceberg_zorder requires at least 2 arguments");


The minimum argument check should allow for 1 argument since Z-order can be applied to a single column. Change the condition to arguments.length < 1.

Suggested change

if (arguments.length < 2) {

throw new UDFArgumentException("iceberg_zorder requires at least 2 arguments");

if (arguments.length < 1) {

throw new UDFArgumentException("iceberg_zorder requires at least 1 argument");

Z-Order is designed for multi-dimensional clustering. It can be applied for 1 column but it does not differ from normal sort. Calculating z-indices , ordering and rewriting files in case of compaction adds overhead without any value. So, it is not required to support for 1 column

Copilot · 2025-09-27T13:33:24Z

...rg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/udf/GenericUDFIcebergZorder.java

+    if (value == null) {
+      // For NULL values, we have primitive buffer size of 8 with values of 0
+      return ByteBuffer.wrap(new byte[ZOrderByteUtils.PRIMITIVE_BUFFER_SIZE]).array();
+    }


Creating a new ByteBuffer and byte array for each null value is inefficient. Consider creating a static final byte array for null values to reuse across calls.

Copilot · 2025-09-27T13:33:24Z

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java

+  }
+
+  private boolean isZOrderJSON(String jsonString) {
+    return jsonString.contains("zorderFields");


Using string contains for JSON detection is unreliable and could produce false positives. Consider parsing the JSON to check for the actual structure or use a more robust detection method.

Suggested change

return jsonString.contains("zorderFields");

try {

com.fasterxml.jackson.databind.JsonNode node = JSON_OBJECT_MAPPER.readTree(jsonString);

return node.has("zorderFields");

} catch (Exception e) {

return false;

}

zhangbutao · 2025-09-27T13:53:44Z

iceberg/iceberg-handler/src/test/queries/positive/iceberg_create_locally_zordered_table.q

+set hive.optimize.shared.work.merge.ts.schema=true;
+
+-- Validates z-order on CREATE via clause.
+CREATE TABLE default.zorder_it_nulls (


How can we enable the z-order feature for an already created Iceberg table?
Can we use the ALTER TABLE syntax to enable z-order for an already created Iceberg table?

@zhangbutao, what are your thoughts on #6095 (comment) ?

@zhangbutao Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT.
I will have two more commits coming in for alter and compaction zorder support.
Next PR will focus on adding alter table support for Zorder.

@zhangbutao, what are your thoughts on #6095 (comment) ?

I am ok with the syntax.

@zhangbutao Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT. I will have two more commits coming in for alter and compaction zorder support. Next PR will focus on adding alter table support for Zorder.

@kokila-19 That's ok.
BTW, could you please supplement the SQL documentation as well? I don't want such great features to go unused due to a lack of documentation, as it's likely that many users aren't even aware of them.
To be honest, I've even forgotten a lot of the semantics we implemented for Iceberg, haha.
Perhaps it's about time we gradually improve our Hive documentation too. :)
Thanks.

Current PR implements end-to-end functionality for ZOrder, including support for CREATE and INSERT.

@kokila-19, what is the syntax for insert? or insert uses the schema/spec zorder?

INSERT INTO ice_orc WRITE ORDERED BY ZORDER(id, text) SELECT * FROM source_table;

@deniskuzZ
Insert will use Z-Order spec stored in table properties sort.order and sort.columns which will be provided in INSERT/ALTER command.

@zhangbutao Certainly, documentation is a must. I've already created a ticket for it: HIVE-29135 and I plan to add the documentation once the feature is complete.

Perhaps it's about time we gradually improve our Hive documentation too. :)

I agree that it's time we improve Hive's documentation, and I'm happy to contribute to that effort.

deniskuzZ · 2025-09-29T15:22:29Z

iceberg/iceberg-handler/src/test/queries/positive/iceberg_create_locally_zordered_table.q

+CREATE TABLE default.zorder_it_nulls (
+    id int,
+    text string)
+WRITE LOCALLY ZORDER by id, text


@kokila-19, syntax should be the following:

WRITE [LOCALLY] ORDERED BY zorder (id, text)

alter cmd

ALTER TABLE ice_orc SET WRITE [LOCALLY] ORDERED BY zorder (id, text);

compaction cmd:

OPTIMIZE TABLE ice_orc rewrite data zorder by (id, text); // optional, if not specified, should use zorder from table spec/schema

i think we can kill the LOCALLY or make it optional to be consistent with Spark

I was using existing locally ordered by syntax implemented in #5541
If i have to remove locally keyword for zorder, I believe we should remove for this as well to be consistent in hive.
Or make LOCALLY optional for both orders

OPTIMIZE TABLE ice_orc rewrite data zorder by (id, text); // optional, if not specified, should use zorder from table spec/schema

Note during implementation:
If the Zorder columns in table spec is different from spec provided in OPTIMIZE TABLE cmd for compaction, this should throw error.

not sure. delta and spark iceberg support that. we can call optimize with zorder and later alter the write zorder.

Spark only supports Z-order in rewrite_data_files (during compaction), it does not have Z-order support for DDL queries. So, they will not have this scenario.

@deniskuzZ
i think we can kill the LOCALLY or make it optional to be consistent with Spark
I will have this change in the follow up PR.

…BLE DDL

sonarqubecloud · 2025-10-03T07:13:29Z

Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added the tests pending label Sep 23, 2025

kokila-19 changed the title ~~HIVE-29133: Support Z-order indexing for Iceberg tables via CREATE TA…~~ HIVE-29133: Support Z-order for Iceberg tables via CREATE TABLE Sep 23, 2025

asf-ci-hive added tests unstable and removed tests pending labels Sep 23, 2025

kokila-19 force-pushed the zorder branch from 14958c1 to 6c1106f Compare September 24, 2025 07:56

asf-ci-hive added tests pending tests unstable tests passed and removed tests unstable tests pending labels Sep 24, 2025

zhangbutao requested a review from Copilot September 27, 2025 13:32

Copilot AI reviewed Sep 27, 2025

View reviewed changes

zhangbutao reviewed Sep 27, 2025

View reviewed changes

deniskuzZ reviewed Sep 29, 2025

View reviewed changes

HIVE-29133: Support Z-order indexing for Iceberg tables via CREATE TA…

becc480

…BLE DDL

kokila-19 force-pushed the zorder branch from 6c1106f to becc480 Compare October 3, 2025 05:51

asf-ci-hive added tests pending and removed tests passed labels Oct 3, 2025

asf-ci-hive added tests unstable and removed tests pending labels Oct 3, 2025

		if (arguments.length < 2) {
		throw new UDFArgumentException("iceberg_zorder requires at least 2 arguments");

-    return jsonString.contains("zorderFields");
+    try {
+      com.fasterxml.jackson.databind.JsonNode node = JSON_OBJECT_MAPPER.readTree(jsonString);
+      return node.has("zorderFields");
+    } catch (Exception e) {
+      return false;
+    }

HIVE-29133: Support Z-order for Iceberg tables via CREATE TABLE #6095

Are you sure you want to change the base?

HIVE-29133: Support Z-order for Iceberg tables via CREATE TABLE #6095

Conversation

kokila-19 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kokila-19 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 3, 2025

Quality Gate passed

Uh oh!

Uh oh!

kokila-19 commented Sep 23, 2025 •

edited

Loading

deniskuzZ Sep 29, 2025 •

edited

Loading

deniskuzZ Sep 30, 2025 •

edited

Loading

kokila-19 Sep 30, 2025 •

edited

Loading

deniskuzZ Sep 29, 2025 •

edited

Loading

deniskuzZ Sep 29, 2025 •

edited

Loading