fix(ingest): Fix JSON serialization for reports with tuple/enum keys #15450
+262
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Fixes JSON serialization error when
IngestionStageReport(and other reports) contain tuple or enum dictionary keys.Why
The DataHub GC source and other sources using stage tracking fail with:
This occurs because:
IngestionStageReport.ingestion_stage_durationsuses tuple keys:(IngestionHighStage, str)IngestionStageReport.ingestion_high_stage_secondsuses enum keys:IngestionHighStageReport.to_pure_python_obj()doesn't properly convert these to JSON-compatible stringsJSON specification requires object keys to be strings (or numbers converted to strings). Python tuple and enum keys must be explicitly converted.
How
Updated
Report.to_pure_python_obj()method inmetadata-ingestion/src/datahub/ingestion/api/report.py:.valueif available, fallback to stringTesting
Unit Tests Added
TopKDictwith tuple keysIngestionStageReportserializationRelated Issues
Fixes #15445
Before you submit your PR, please go through the checklist below: