Hudi 0 x snapshot mdt #26

codope · 2025-02-25T15:47:14Z

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

…ctor

codope · 2025-02-25T15:56:51Z

plugin/trino-hudi/pom.xml

-            <artifactId>hudi-common</artifactId>
-            <version>${dep.hudi.version}</version>
+            <groupId>org.apache.hbase</groupId>
+            <artifactId>hbase-common</artifactId>


Will hbase dependencies still be required after integration with filegroup reader (even in tests)?

We still need that because the Hudi writer needs this dependency to write HFiles and MDT. If we want to get rid of this in test dependency, we'll need to add artifacts of generated Hudi tables with MDT enabled.

I'm just worried for psuhback from Trino committers. Should be ok for tests for now.

This dependency is removed in 1.0.x, tests are passing.
May I know which execution path will actually require this dependency? I can craft a test case specifically for such scenarios.

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

codope · 2025-02-25T16:04:59Z

plugin/trino-hudi/pom.xml

+        <dependency>
+            <!--Used to test execution in task executor after de-serializing-->
+            <groupId>com.esotericsoftware</groupId>
+            <artifactId>kryo</artifactId>


If it is only needed for test, then let;s just define in test scope? Will it still be required after integrating with filegroup reader?
Also, should we use kryo-shaded?

I'll check why I added this.

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSnapshotPageSource.java

codope · 2025-02-25T17:33:08Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/util/HudiAvroSerializer.java

+
+        BigDecimal convert(int precision, int scale, Object value)
+        {
+            Schema schema = new Schema.Parser().parse(format("{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":%d,\"scale\":%d}", precision, scale));


a new Avro schema is created on each call. If many decimals with the same precision and scale are processed, consider caching the schema to avoid repeated parsing overhead.

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/storage/TrinoStorageConfiguration.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/storage/HudiTrinoStorage.java

codope · 2025-02-25T17:39:35Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/storage/HudiTrinoStorage.java

+
+    @Override
+    public List<StoragePathInfo> listFiles(StoragePath path) throws IOException {
+        FileIterator fileIterator = fileSystem.listFiles(convertToLocation(path));


looks duplicate of listDirectEntries. Consider refactoring the two listing methods into a single helper method that returns the list of StoragePathInfo objects, then call that helper from both listDirectEntries and listFiles.

btw, are these methods actually called in some path?

Will refactor.

They are called inside Hudi's file system view so they have to be implemented correctly.

mxmarkovics · 2025-08-15T17:54:38Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSnapshotPageSource.java

+    @Override
+    public Page getNextPage() {
+        if (logRecordMap == null) {
+            try (HoodieMergedLogRecordScanner logScanner = getMergedLogRecordScanner(storage, basePath, split, readerSchema)) {


There is a bug here, since the HoodieMergedLogRecordScanner is created in the try-with-resources when the try block completes, logScanner.close() will be called (see here), which will also call close on the log record map object meaning that after this try block is exited the logRecordMap will no longer be null, but all of its entries will be removed. In my testing this is exactly what happens and it doesn't seem like it's actually even possible for the if (logRecord != null) branch below to ever actually be hit currently, so the snapshot table is effectively just being read as a read optimized table as it gets only the stale/last compacted data.

Can be fixed by just moving the log scanner instantiation to inside the try block instead of try with resources as the log record map will be closed below later anyways.

mxmarkovics · 2025-08-15T18:16:18Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/util/HudiAvroSerializer.java

+    public void buildRecordInPage(PageBuilder pageBuilder, IndexedRecord record,
+                                  Map<Integer, String> partitionValueMap, boolean SkipMetaColumns) {
+        pageBuilder.declarePosition();
+        int startChannel = SkipMetaColumns ? HOODIE_META_COLUMNS.size() : 0;


This causes a bug (also in the other implementation below) when there are hudi meta columns present in the table schema itself and are selected as the pagebuilder blocks will contain entries for these columns, however they will always be skipped which can cause index out of bounds errors and/or type mismatch errors that get swallowed due to the block entries and type entries being out of sync. The proposed solution is to instead of passing in a boolean of whether or not to skipMetaColumns pass in an int from HudiSnapshotPageSource caller which is the appropriate number of meta columns to be skipped, which is just the total number of meta columns that exist - number of meta columns in the data columns.

yihua and others added 2 commits February 25, 2025 19:59

Add MOR snapshot query and MDT-based file listing in Trino Hudi conne…

74359e3

…ctor

fix deps and compilation issues

2685e06

github-actions bot added hive hudi labels Feb 25, 2025

codope commented Feb 25, 2025

View reviewed changes

codope mentioned this pull request Mar 20, 2025

Add support for reading Hudi tables using FileGroupReader onehouseinc/trino#1

Merged

mxmarkovics reviewed Aug 15, 2025

View reviewed changes

Hudi 0 x snapshot mdt #26

Are you sure you want to change the base?

Hudi 0 x snapshot mdt #26

Uh oh!

Conversation

codope commented Feb 25, 2025

Description

Additional context and related issues

Release notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxmarkovics Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxmarkovics Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mxmarkovics Aug 15, 2025 •

edited

Loading

mxmarkovics Aug 15, 2025 •

edited

Loading