Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write lineage metadata to centralized file on hive #856

Open
zacaydcloudera opened this issue Jan 28, 2025 · 0 comments
Open

Write lineage metadata to centralized file on hive #856

zacaydcloudera opened this issue Jan 28, 2025 · 0 comments

Comments

@zacaydcloudera
Copy link

Hi @wajda
I used on the HDFS on cloudera this config
spark.jars=hdfs:///tmp/spark-2.4-spline-agent-bundle_2.11-2.2.1.jar
spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
spark.spline.mode=ENABLED
spark.spline.lineageDispatcher=hdfs
spark.spline.lineageDispatcher.hdfs.outputDir=hdfs:///tmp/spline/lineage/
spark.spline.lineageDispatcher.hdfs.fileNamePrefix=lineage_
spark.spline.lineageDispatcher.hdfs.fileBufferSize=4096
spark.spline.lineageDispatcher.hdfs.filePermissions=777
spark.driver.memory=4g

But it wrote the lineage to the target file on the script, seems to ignore the spark.spline.lineageDispatcher.hdfs.outputDir=hdfs:///tmp/spline/lineage/
Is there a way to set it to a centralized place and also to give each json file the execution_plan_id
thanks

@cerveada cerveada added this to Spline Jan 28, 2025
@github-project-automation github-project-automation bot moved this to New in Spline Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

No branches or pull requests

1 participant