Core: Interface based DataFile reader and writer API - PoC #12298

pvary · 2025-02-17T15:03:50Z

Here is what the PR does:

Created 3 interface classes which are implemented by the file formats:
- ReadBuilder - Builder for reading data from data files
- AppenderBuilder - Builder for writing data to data files
- ObjectModel - Providing ReadBuilders, and AppenderBuilders for the specific data file format and object model pair
Updated the Parquet, Avro, ORC implementation for this interfaces, and deprecated the old reader/writer APIs
Created interface classes which will be used by the actual readers/writers of the data files:
- AppenderBuilder - Builder for writing a file
- DataWriterBuilder - Builder for generating a data file
- PositionDeleteWriterBuilder - Builder for generating a position delete file
- EqualityDeleteWriterBuilder - Builder for generating an equality delete file
- No ReadBuilder here - the file format reader builder is reused
Created a WriterBuilder class which implements the interfaces above (AppenderBuilder/DataWriterBuilder/PositionDeleteWriterBuilder/EqualityDeleteWriterBuilder) based on a provided file format specific AppenderBuilder
Created an ObjectModelRegistry which stores the available ObjectModels, and engines and users could request the readers (ReadBuilder) and writers (AppenderBuilder/DataWriterBuilder/PositionDeleteWriterBuilder/EqualityDeleteWriterBuilder) from.
Created the appropriate ObjectModels:
- GenericObjectModels - for reading and writing Iceberg Records
- SparkObjectModels - for reading (vectorized and non-vectorized) and writing Spark InternalRow/ColumnarBatch objects
- FlinkObjectModels - for reading and writing Flink RowData objects
- An arrow object model is also registered for vectorized reads of Parquet files into Arrow ColumnarBatch objects
Updated the production code where the reading and writing happens to use the ObjectModelRegistry and the new reader/writer interfaces to access data files
Kept the testing code intact to ensure that the new API/code is not breaking anything

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

snazy · 2025-02-18T15:52:38Z

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

+    public Key(FileFormat fileFormat, String dataType, String builderType) {
+      this.fileFormat = fileFormat;
+      this.dataType = dataType;
+      this.builderType = builderType;


I was thinking of defining the default one using a priority (int) based approach and let the one with the highest priority be the default one. WDYT?

We have a concrete example for this: Comet vectorized parquet reader spark.sql.iceberg.parquet.reader-type

I think it is good if the reader/writer choice is a conscious decision, and not happening based on some behind the scenes algorithm.

+1 for simplicity. This code should not determine things like whether Comet is used. This should have a single purpose, which is to standardize how object models plug in.

Moved the config to properties, and the builder method will create the different readers based on this config

snazy · 2025-02-18T15:56:34Z

core/src/main/java/org/apache/iceberg/io/datafile/AppenderBuilder.java

+import org.apache.iceberg.io.FileAppender;
+
+/** Builder API for creating {@link FileAppender}s. */
+public interface AppenderBuilder extends InternalData.WriteBuilder {


Wonder that AppenderBuilder has a base interface but the other builders don't.
Guess it might help to have a common DataFileIoBuilder interface defining the common builder attributes (table, schema, properties, meta). It's a bit of an "adventure in Java generics", but doable.

If you take a look at the other PRs (#12164, #12069), you can see that first, I took that adventurous route, but the result was too many classes/interfaces and casts.

This PR is aiming for the minimal set of changes, and the InternalData.WriteBuilder is already introduced to Iceberg by #12060. We either need to widen that interface or inherit from it here.

I'm also confused by this inheritance. We're extending but overriding everything and it's not clear to me what we really gain by going with this approach. It looks like it ends up as a completely different builder that produces the same build result.

The goal with the PR was to show the minimal changes required to make the idea work.
We either create a different builder class for the InternalData.WriteBuilder and the DataFile.WriteBuilder, or we need to have inheritance of the interfaces.

Based on our discussion below we might end up using a different strategy, so revisit this comment later.

snazy · 2025-02-18T15:57:46Z

data/src/main/java/org/apache/iceberg/data/GenericReader.java

+    return DataFileServiceRegistry.read(
+            task.file().format(), Record.class.getName(), input, fileProjection, partition)
+        .split(task.start(), task.length())
+        .caseSensitive(caseSensitive)
+        .reuseContainers(reuseContainers)
+        .filter(task.residual())
+        .build();


I like these simplifications!

liurenjie1024

Thanks @pvary for this proposal, I left some comments.

liurenjie1024 · 2025-02-21T09:55:30Z

core/src/main/java/org/apache/iceberg/io/datafile/ReaderBuilder.java

+
+  /** Enables reusing the containers returned by the reader. Decreases pressure on GC. */
+  @Override
+  default ReaderBuilder reuseContainers() {


Seems it should not be here? These are parquet reader specific.

It is also used by Avro.
See:

iceberg/core/src/main/java/org/apache/iceberg/avro/AvroIterable.java

Line 51 in 604422b

this.reuseContainers = reuseContainers;

liurenjie1024 · 2025-02-21T09:57:56Z

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

+   * @param rowType of the native input data
+   * @return {@link DataWriterBuilder} for building the actual writer
+   */
+  public static <S> DataWriterBuilder dataWriterBuilder(


I don't quite understand in what case need this? I think append would be enough?

I will check this. We might be able to remove this.

Based on the current approach, the file format api implementation creates the appender, and the PR creates the writers for the different data/delete files

liurenjie1024 · 2025-02-21T09:59:32Z

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

+   * @return {@link AppenderBuilder} for building the actual writer
+   */
+  public static <S, B extends EqualityDeleteWriterBuilder<B>>
+      EqualityDeleteWriterBuilder<B> equalityDeleteWriterBuilder(


I don't think file format should consider eqaulity deletion/pos deletion here.

Current Avro positional delete writer behaves differently than Parquet/ORC positional delete writers.
In case of the positional delete files the schema provided to the Avro writer should omit the PATH and the POS fields, and only needs the actual table schema. The writer handles the PATH/POS fields by static code:

iceberg/core/src/main/java/org/apache/iceberg/avro/Avro.java

Lines 614 to 618 in b8fdd84

public void write(PositionDelete<D> delete, Encoder out) throws IOException {

PATH_WRITER.write(delete.path(), out);

POS_WRITER.write(delete.pos(), out);

rowWriter.write(delete.row(), out);

}

The Parquet/ORC positional delete writers behave in the same way. They expect the same input.

If we are ready for a more invasive change we can harmonize the writers.
I have aimed for a minimal changeset to allow easier acceptance for the PR.

The appender doesn't need to know about these, but the file formats and the writer implementations need this

liurenjie1024 · 2025-02-21T10:04:29Z

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

+   * issues.
+   */
+  private static final class Registry {
+    private static final Map<Key, ReaderService> READ_BUILDERS = Maps.newConcurrentMap();


This is more like a convention problem, I think maybe we just need to store in FileFormatService in registry?

Sometimes we don't have writers (arrow), or we have multiple readers vectorized/non-vectorized readers. Also Parquet has Comet reader. So I kept the writers and the readers separate

liurenjie1024 · 2025-02-21T10:05:38Z

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

+  /** Key used to identify readers and writers in the {@link DataFileServiceRegistry}. */
+  public static class Key {
+    private final FileFormat fileFormat;
+    private final String dataType;


Is this things like arrow, internal row?

Yeah,
Currenly we have:

Record - generic readers/writers

ColumnarBatch (arrow) - arrow

RowData - Flink

InternalRow - Spark

ColumnarBatch (spark) - Spark batch

pvary · 2025-02-21T14:10:03Z

I will start to collect the differences here between the different writer types (appender/dataWriter/equalityDeleteWriter/positionalDeleteWriter) for reference:

Writer context is different between delete and data files. This contains TableProperties/Configurations which could be different between delete and data files. For example for parquet: RowGroupSize/PageSize/PageRowLimit/DictSize/Compression etc. For ORC and Avro we have some similar changing configs
Specific writer functions for position deletes to write out the PositionDelete records
Positional delete PathTransformFunction to convert writer data type for the path to file format data type

danielcweeks · 2025-02-21T19:31:33Z

core/src/main/java/org/apache/iceberg/io/datafile/DataWriterBuilder.java

+import org.apache.iceberg.io.DataWriter;
+
+/** Builder API for creating {@link DataWriter}s. */
+public interface DataWriterBuilder {


Why not put the builder interface into the DataWriter class and put it in the same package? It seems odd to me that we're introducing this new datafile package.

The classes which has to be implemented by the file formats are kept in the io package, but moved the others to the data package

core/src/main/java/org/apache/iceberg/avro/Avro.java

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java

...urces/META-INF/services/org.apache.iceberg.io.datafile.DataFileServiceRegistry$WriterService

rdblue · 2025-02-22T00:10:36Z

While I think the goal here is a good one, the implementation looks too complex to be workable in its current form.

The primary issue that we currently have is adapting object models (like Iceber's internal StructLike, Spark's InternalRow, or Flink's RowData) to file formats so that you can separately write object model to format glue code and have it work throughout support for an engine. I think a diff from the InternalData PR demonstrates it pretty well:

-    switch (format) {
-      case AVRO:
-        AvroIterable<ManifestEntry<F>> reader =
-            Avro.read(file)
-                .project(ManifestEntry.wrapFileSchema(Types.StructType.of(fields)))
-                .createResolvingReader(this::newReader)
-                .reuseContainers()
-                .build();
+    CloseableIterable<ManifestEntry<F>> reader =
+        InternalData.read(format, file)
+            .project(ManifestEntry.wrapFileSchema(Types.StructType.of(fields)))
+            .reuseContainers()
+            .build();
 
-        addCloseable(reader);
+    addCloseable(reader);
 
-        return CloseableIterable.transform(reader, inheritableMetadata::apply);
+    return CloseableIterable.transform(reader, inheritableMetadata::apply);
-
-      default:
-        throw new UnsupportedOperationException("Invalid format for manifest file: " + format);
-    }

This shows:

Rather than a switch, the format is passed to create the builder
There is no longer a callback passed to create readers for the object model (createResolvingReader)

In this PR, there are a lot of other changes as well. I'm looking at one of the simpler Spark cases in the row reader.

The builder is initialized from DataFileServiceRegistry and now requires a format, class name, file, projection, and constant map:

    return DataFileServiceRegistry.readerBuilder(
            format, InternalRow.class.getName(), file, projection, idToConstant)

There are also new static classes in the file. Each creates a new service and each service creates the builder and object model:

  public static class AvroReaderService implements DataFileServiceRegistry.ReaderService {
    @Override
    public DataFileServiceRegistry.Key key() {
      return new DataFileServiceRegistry.Key(FileFormat.AVRO, InternalRow.class.getName());
    }

    @Override
    public ReaderBuilder builder(
        InputFile inputFile,
        Schema readSchema,
        Map<Integer, ?> idToConstant,
        DeleteFilter<?> deleteFilter) {
      return Avro.read(inputFile)
          .project(readSchema)
          .createResolvingReader(schema -> SparkPlannedAvroReader.create(schema, idToConstant));
    }

The createResolvingReader line is still there, just moved into its own service class instead of in branches of a switch statement.

In addition, there are now a lot more abstractions:

A builder for creating an appender for a file format
A builder for creating a data file writer for a file format
A builder for creating an equality delete writer for a file format
A builder for creating a position delete writer for a file format
A builder for creating a reader for a file format
A "service" registry (what is a service?)
A "key"
A writer service
A reader service

I think that the next steps are to focus on making this a lot simpler, and there are some good ways to do that:

Focus on removing boilerplate and hiding the internals. For instance, Key, if needed, should be an internal abstraction and not complexity that is exposed to callers
The format-specific data and delete file builders typically wrap an appender builder. Is there a way to handle just the reader builder and appender builder?
Is the extra "service" abstraction helpful?
Remove ServiceLoader and use a simpler solution. I think that formats could simply register themselves like we do for InternalData. I think it would be fine to have a trade-off that Iceberg ships with a list of known formats that can be loaded, and if you want to replace that list it's at your own risk.
Standardize more across the builders for FileFormat. How idToConstant is handled is a good example. That should be passed to the builder instead of making the whole API more complicated. Projection is the same.

pvary · 2025-02-24T10:34:16Z

While I think the goal here is a good one, the implementation looks too complex to be workable in its current form.

I'm happy that we agree with the goals. I created a PR to start the conversation. If there are willing reviewers we can introduce more invasive changes to archive a better API. I'm all for it!

The primary issue that we currently have is adapting object models (like Iceber's internal StructLike, Spark's InternalRow, or Flink's RowData) to file formats so that you can separately write object model to format glue code and have it work throughout support for an engine.

I think we need to keep this direct transformations to prevent the performance loss which would be caused by multiple transformations between object model -> common model -> file format.

We have a matrix of transformation which we need to encode somewhere:

Source	Target
Parquet	StructLike
Parquet	InternalRow
Parquet	RowData
Parquet	Arrow
Avro	...
ORC	...

[..]

Rather than a switch, the format is passed to create the builder

There is no longer a callback passed to create readers for the object model (createResolvingReader)

The InternalData reader has one advantage over the data file readers/writers. The internal object model is static for these readers/writers. For the DataFile readers/writers we have multiple object models to handle.

[..]
I think that the next steps are to focus on making this a lot simpler, and there are some good ways to do that:

Focus on removing boilerplate and hiding the internals. For instance, Key, if needed, should be an internal abstraction and not complexity that is exposed to callers

If we allow adding new builders for the file formats we can remove a good chunk of the boilerplate code. Let me see how this would look like

The format-specific data and delete file builders typically wrap an appender builder. Is there a way to handle just the reader builder and appender builder?

We need to refactor the Avro positional delete write for this, or add a positionalWriterFunc. Also need to consider that the format specific configurations which are different for the appenders and the delete files (DELETE_PARQUET_ROW_GROUP_SIZE_BYTES vs. PARQUET_ROW_GROUP_SIZE_BYTES)

Is the extra "service" abstraction helpful?

If we are ok with having a new Builder for the readers/writers, then we don't need the service. It was needed to keep the current APIs and the new APIs compatible.

Remove ServiceLoader and use a simpler solution. I think that formats could simply register themselves like we do for InternalData. I think it would be fine to have a trade-off that Iceberg ships with a list of known formats that can be loaded, and if you want to replace that list it's at your own risk.

Will do

Standardize more across the builders for FileFormat. How idToConstant is handled is a good example. That should be passed to the builder instead of making the whole API more complicated. Projection is the same.

Will see what could be arcived

core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java

core/src/main/java/org/apache/iceberg/formats/ContentFileWriteBuilder.java

core/src/main/java/org/apache/iceberg/avro/AvroFormatModel.java

rdblue · 2025-11-20T22:55:45Z

core/src/main/java/org/apache/iceberg/formats/ReadBuilder.java

+   * @param value config value
+   * @return this for method chaining
+   */
+  ReadBuilder<D, S> set(String key, String value);


Looks like both set and setAll are not called. Are they really needed in this API?

If they are not used, then I'd prefer to add them later if and when they are necessary.

They are used on the Parquet.ReadBuilder. So they will be used once we migrate all of the use-cases.

iceberg/spark/v4.0/spark/src/jmh/java/org/apache/iceberg/spark/data/parquet/SparkParquetReadersFlatDataBenchmark.java

Lines 151 to 162 in 0d4d3a5

Parquet.read(Files.localInput(dataFile))

.project(SCHEMA)

.readSupport(new ParquetReadSupport())

.set("org.apache.spark.sql.parquet.row.requested_schema", sparkSchema.json())

.set("spark.sql.parquet.binaryAsString", "false")

.set("spark.sql.parquet.int96AsTimestamp", "false")

.set("spark.sql.caseSensitive", "false")

.set("spark.sql.parquet.fieldId.write.enabled", "false")

.set("spark.sql.parquet.inferTimestampNTZ.enabled", "false")

.set("spark.sql.legacy.parquet.nanosAsLong", "false")

.callInit()

.build()) {

core/src/main/java/org/apache/iceberg/formats/ReadBuilder.java

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkFormatModels.java

rdblue · 2025-11-20T23:56:56Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkFormatModels.java

+        new ParquetFormatModel<ColumnarBatch, StructType, DeleteFilter<InternalRow>>(
+            VectorizedSparkParquetReaders.CometColumnarBatch.class,
+            StructType.class,
+            VectorizedSparkParquetReaders::buildCometReader));


Is this correct? I think this should be calling VectorizedSparkParquetReaders.buildReader instead of buildCometReader. That is the one that checks the incoming config and chooses whether to use Comet or the regular Arrow reader.

I also think that the buildCometReader method that this references isn't needed because this is the only place where it is called and it ignores the config map passed in.

Okay, I went to investigate where config was used and then argue that it should be removed. The replacement is to register a different type and I see that's actually what is happening here. This is correct because the registered output type is CometColumnarBatch.

So the real problem is that there is an extra and unnecessary argument passed to these methods, the config string map. Registering a different batch type is the right way to handle the registration, so that Spark can use the registered reader and make the decision about using Arrow vs Comet entirely independent of the file format models.

This one is a questionable decision. I didn't intend to put this commit on this branch. I just created it to test how it could work.

Currently (on main) the Comet/Arrow decision is done in the BaseBatchReader:

iceberg/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java

Lines 96 to 105 in 0d4d3a5

.createBatchedReaderFunc(

fileSchema -> {

if (parquetConf.readerType() == ParquetReaderType.COMET) {

return VectorizedSparkParquetReaders.buildCometReader(

requiredSchema, fileSchema, idToConstant, deleteFilter);

} else {

return VectorizedSparkParquetReaders.buildReader(

requiredSchema, fileSchema, idToConstant, deleteFilter);

}

})

My original idea was to add another parameter to the FormatModelRegistry.readBuilder method, like readBuilder(returnType, readerType, inputFile), and based on the readerType it could chose the Arrow, or the Comet reader.
Since this was only used for Spark/Parquet, your suggestion was to hide it behind a config, and push this decision to the VectorizedSparkParquetReaders.buildReader. This is how I wanted to keep this PR.

I was playing around how can I change this, and use the currently proposed API to move this decision back to the caller. That is why I experimented with a Hacky solution to register the Comet vectorized reader to the File Format API. I did not like what I have seen. Mostly because I had to create a CometColumnarBatch for the sole reason to differentiate between the Arrow and the Comet reader. So I abandoned the idea but forgot to revert the commit 😢

rdblue · 2025-11-20T23:58:31Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkFileWriterFactory.java


+  @Override
+  public PositionDeleteWriter<InternalRow> newPositionDeleteWriter(
+      EncryptedOutputFile file, PartitionSpec spec, StructLike partition) {


This is only here to preserve the existing functionality that we are not moving into the new file interfaces, right?

Yes. This will be removed in 1.12.0.

rdblue · 2025-11-21T00:11:23Z

...rk/src/main/java/org/apache/iceberg/spark/data/vectorized/VectorizedSparkParquetReaders.java

+      MessageType fileSchema,
+      Map<Integer, ?> idToConstant,
+      DeleteFilter<InternalRow> deleteFilter,
+      Map<String, String> config) {


As I noted in SparkFormatModels, I think that this config is no longer needed becuase this buildReader method is not used.

Let's talk about it at the other place: #12298 (comment)

rdblue · 2025-11-21T00:14:39Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetFormatModel.java

+        MessageType messageType,
+        Map<Integer, ?> constantValues,
+        F deleteFilter,
+        Map<String, String> config);


This should be removed (see previous comments).

Let's talk about it at the other place: #12298 (comment)

rdblue · 2025-11-21T00:20:26Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetFormatModel.java

+        Schema schema,
+        MessageType messageType,
+        Map<Integer, ?> constantValues,
+        F deleteFilter,


I don't think that the delete filter should be part of this interface. I realize that it is currently part of how vectorized readers are built, but the row deletes should be independent of the read plan and reader. That also simplifies the types here because you don't need a specific filter type (which would probably need to be changed to DeleteFilter<D> anyway).

The delete filter extracts information from each row, so it does not require being part of the reader or vectorized reader. It can and should be applied by engines rather than being passed through this API.

There are several uses for the DeleteFilter in vectorized reads:

Handling the _deleted metadata column when needed

RowId mapping to delete the rows

Maintaining the counter for the delete records for metrics

This is most probably a quite sizeable change.
I will check what I can do with it

Created a PR to see how this could work: #14652

rdblue · 2025-11-21T00:32:25Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetFormatModel.java

+    }
+  }
+
+  private abstract static class ReadBuilderWrapper<D, S, F> implements ReadBuilder<D, S> {


I think this can be simplified quite a bit:

The batch reader function and the row reader function can be combined because the delete filter and config do not need to be passed through. That means the sub-classes of this builder are not needed

Without subclasses, this does not need to expose methods for fetching its state. I also refactored this class and concrete children locally and removed the need to expose its state, but it's easier to just remove the subclasses.

This doesn't need to track the Iceberg schema. Instead, this should register a binary reader function with Parquet so that Parquet is responsible for passing the Iceberg schema.

Also, shouldn't this account for the DF schema?

I know that it isn't used at the moment, but the engine type is a type param here and we have a use case where it will be needed later (requesting shredded variant reads) so I don't think this is complete without it.

This depends on #12298 (comment) and #12298 (comment)

Otherwise the differences are bigger

Most of this became irrelevant after the DeleteFilter change (#14065)

These points might still be worth discussing:

This doesn't need to track the Iceberg schema. Instead, this should register a binary reader function with Parquet so that Parquet is responsible for passing the Iceberg schema.

Are you suggesting updating the underlying Parquet class to add a method like:
ReadBuilder.createBatchedReaderFunc(BiFunction<Schema, MessageType, VectorizedReader<?>> newReaderFunction)?
I’ve made this change, but it turned out to be a bit more involved to keep it consistent with Parquet.BinaryReaderFunction. Please, review!

Also, shouldn't this account for the DF schema?

I originally included it, but several reviewers noted that it’s not currently used. So, asked to remove it for now and add it back when needed. I’m open to either approach.

parquet/src/main/java/org/apache/iceberg/parquet/ParquetFormatModel.java

rdblue · 2025-11-21T00:54:31Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetFormatModel.java

+    this(type, schemaType, readerFunction, null, writerFunction);
+  }
+
+  public ParquetFormatModel(


It makes sense to me to have a model that is for batch reading. But why not also have options for reading a single row format or writing a single row format?

For that matter, maybe reads and writes should be registered separately?

The early decision was that we don't allow only readers/writers. The File Format needs to implement readers and writers both to be accepted as a supported File Format.

I see the vectorized readers as an exception.

…quetFormatModel.ReadBuilderWrapper

github-actions bot added spark parquet arrow core data flink ORC labels Feb 17, 2025

pvary force-pushed the file_Format_api_without_base branch 6 times, most recently from 907089c to 313c2d5 Compare February 18, 2025 09:13

snazy reviewed Feb 18, 2025

View reviewed changes

pvary force-pushed the file_Format_api_without_base branch 2 times, most recently from c528a52 to 9975b4f Compare February 20, 2025 09:45

pvary changed the title ~~WIP: Interface based FileFormat API~~ WIP: Interface based DataFile reader and writer API Feb 20, 2025

liurenjie1024 reviewed Feb 21, 2025

View reviewed changes

danielcweeks reviewed Feb 21, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/avro/Avro.java Outdated Show resolved Hide resolved

danielcweeks reviewed Feb 21, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/io/datafile/DataFileServiceRegistry.java Outdated Show resolved Hide resolved

rdblue reviewed Feb 21, 2025

View reviewed changes

...urces/META-INF/services/org.apache.iceberg.io.datafile.DataFileServiceRegistry$WriterService Outdated Show resolved Hide resolved

pvary force-pushed the file_Format_api_without_base branch 5 times, most recently from c488d32 to 71ec538 Compare February 25, 2025 16:53

pvary force-pushed the file_Format_api_without_base branch 3 times, most recently from cb12c93 to e6c6147 Compare November 12, 2025 17:08

pvary mentioned this pull request Nov 14, 2025

Spark: Encapsulate parquet objects for Comet #13786

Open