You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently testing the Spark lineage feature in OpenMetadata with Mysql as the database. I have successfully created some simple Spark lineage pipelines. However, I’ve noticed that if my job contains complex transformations such as GROUP BY, window functions, etc., the lineage pipeline fails to generate, and I encounter an error similar to the one below:
Py4JJavaError: An error occurred while calling o156.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError
java.lang.StackOverflowError
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1428)
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
...
I’m wondering whether there are any specific constraints or best practices that Spark jobs must follow when using Spark lineage in OpenMetadata. I couldn’t find any detailed documentation regarding this.
Has anyone encountered this issue before? Any insights on how to resolve it?
The text was updated successfully, but these errors were encountered:
I am currently testing the Spark lineage feature in OpenMetadata with Mysql as the database. I have successfully created some simple Spark lineage pipelines. However, I’ve noticed that if my job contains complex transformations such as GROUP BY, window functions, etc., the lineage pipeline fails to generate, and I encounter an error similar to the one below:
I’m wondering whether there are any specific constraints or best practices that Spark jobs must follow when using Spark lineage in OpenMetadata. I couldn’t find any detailed documentation regarding this.
Has anyone encountered this issue before? Any insights on how to resolve it?
The text was updated successfully, but these errors were encountered: