Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-45815][SQL][STREAMING] Provide an interface for other Streamin…
…g sources to add `_metadata` columns ### What changes were proposed in this pull request? Currently, only the native V1 file-based streaming source can read the `_metadata` column: https://github.com/apache/spark/blob/370870b7a0303e4a2c4b3dea1b479b4fcbc93f8d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala#L63 Our goal is to create an interface that allows other streaming sources to add `_metadata` columns. For instance, we would like the Delta Streaming source, which you can find here: https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaDataSource.scala#L49, to extend this interface and provide the `_metadata` column for its underlying storage format, such as Parquet. ### Why are the changes needed? A generic interface to enable other streaming sources to expose and add `_metadata` columns. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43692 from Yaohua628/spark-45815. Authored-by: Yaohua Zhao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information