Skip to content

Conversation

@szlta
Copy link
Contributor

@szlta szlta commented Nov 25, 2025

Introducing new Hive table property metadata_hash (to be stored exclusively in HMS) that tracks the hash of the current table metadata.
It is used in HiveTableOperations to carry out integrity check and ensure that the metadata.json has not been tampered with when table encryption is used.

@szlta
Copy link
Contributor Author

szlta commented Nov 25, 2025

@ggershinsky this is regarding our discussion at #13225 (comment)

Let me know if you think this approach is right, then I'll work on the finishing touches and tests.

@ggershinsky
Copy link
Contributor

Thanks @szlta , this approach looks good to me.

@szlta szlta force-pushed the hive_integrity_check branch from b170f1a to 9afe8ff Compare December 2, 2025 16:02
@github-actions github-actions bot added the spark label Dec 2, 2025
@szlta szlta changed the title [WiP] Hive: Metadata integrity check for encrypted tables Hive: Metadata integrity check for encrypted tables Dec 2, 2025
@szlta szlta marked this pull request as ready for review December 2, 2025 16:04
public void close() throws IOException {}

public byte[] getHash() {
return digest.digest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an exception if not closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's needed as calling digest() will cause a reset in MessageDigest internal state and further calls will produce a wrong value. I've added a closing logic now.

Copy link
Contributor

@ggershinsky ggershinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szlta !

szlta added 3 commits December 5, 2025 10:20
Introducing new Hive table property metadata_hash (to be stored
exclusively in HMS) that tracks the hash of the current table
metadata.
It is used in HiveTableOperations to carry out integrity check
and ensure that the metadata.json has not been tampered with
when table encryption is used.
Change-Id: I72dad6d8dbc2338299236e495bc76ba60fab7db8
Change-Id: Ie8d978b022ab67bfa3b3238ebc3c5d4f25d0a843
@szlta szlta force-pushed the hive_integrity_check branch from 67b209e to e614a79 Compare December 5, 2025 09:23
@ggershinsky
Copy link
Contributor

cc @huaxingao

Change-Id: Ie5a0310a1b2eb71ed171c1eaa4c4529ada136565
@huaxingao huaxingao merged commit bd8d289 into apache:main Dec 8, 2025
44 checks passed
@huaxingao
Copy link
Contributor

Thanks @szlta for the PR! Thanks @ggershinsky for the review!

@szlta
Copy link
Contributor Author

szlta commented Dec 9, 2025

Thanks for review @ggershinsky and @huaxingao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants