-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(ingest): Add OAuth Callback Support for Kafka Sink Producer Config #15444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
max-datahub
wants to merge
6
commits into
datahub-project:master
Choose a base branch
from
max-datahub:feat/kafka-producer-oauth-cb-support
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat(ingest): Add OAuth Callback Support for Kafka Sink Producer Config #15444
max-datahub
wants to merge
6
commits into
datahub-project:master
from
max-datahub:feat/kafka-producer-oauth-cb-support
+109
−73
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Enable MSK IAM authentication and other OAuth mechanisms for Kafka sinks by adding oauth_cb resolution to producer_config, matching the existing consumer_config functionality. Changes: - Add @field_validator for producer_config in KafkaProducerConnectionConfig to resolve oauth_cb string paths to callable functions - Add unit tests for producer oauth_cb validation and resolution - Update documentation with producer oauth_cb examples including MSK IAM Fixes issue where datahub-kafka sink would fail with 'expected oauth_cb property as a callable function' when trying to use MSK IAM authentication. The fix reuses the existing CallableConsumerConfig class, maintaining consistency between consumer and producer authentication patterns.
…tion - Create _resolve_kafka_oauth_callback() helper to eliminate code duplication - Both KafkaConsumerConnectionConfig and KafkaProducerConnectionConfig now use the same helper - Simplify unit tests to avoid duplicating OAuth validation tests already covered by Kafka source tests - Add documentation to the helper function explaining its purpose This refactoring improves code maintainability by having a single source of truth for OAuth callback resolution logic.
1. Refactor tests for clarity (addresses review feedback): - Rename test_kafka_sink_producer_config_accepts_oauth_cb → test_kafka_sink_producer_config_without_oauth_cb - Add test_kafka_sink_producer_config_with_oauth_cb to verify oauth_cb resolution works - Keep test_kafka_sink_oauth_cb_rejects_callable for edge case validation - Now have clear separation: backward compatibility test, positive test, negative test 2. Enhance OAuth callback deployment documentation: - Add explicit pip install command for acryl-datahub-actions>=1.3.1.2 - Document built-in callbacks (MSK IAM, Azure Event Hubs) with package requirements - Add PYTHONPATH configuration examples for custom callbacks Test coverage: - ✅ Standard config without oauth_cb works (no breaking change) - ✅ OAuth callback string path resolves to callable (new functionality) - ✅ Direct callable objects are rejected (edge case validation)
…llback resolver This change completes OAuth support for Kafka producers by adding proper initialization and refactoring the OAuth callback resolver class for clarity. ## Changes: ### 1. Rename CallableConsumerConfig → KafkaOAuthCallbackResolver - Renamed class for better clarity and generalization - Updated all references across codebase (kafka.py, kafka.py source, etc.) - More accurately reflects that this resolver works for both consumers and producers ### 2. Producer OAuth Initialization (kafka_emitter.py) - Added OAuth callback initialization in DatahubKafkaEmitter.__init__() - Calls producer.poll(0) on startup when OAuth is detected - Critical for OAuth mechanisms like AWS MSK IAM where tokens must be obtained before first message send - Without this, producers fail with authentication errors ### 3. Kafka Callback Error Handling Fix (datahub_kafka.py) - Fixed _KafkaCallback.kafka_callback signature to match actual Kafka callback - Changed from (err: Optional[Exception], msg: str) to (err, msg) - Properly handles KafkaError and Message objects from confluent-kafka - Converts KafkaError to Exception for consistent error handling downstream ### 4. Documentation Simplification - Simplified custom OAuth callback deployment guidance - Removed verbose examples, kept essential information - Easier to maintain and understand ## Context: This builds on previous work that added oauth_cb validation for producer_config. Together, these changes enable full MSK IAM authentication support for Kafka sinks, matching the existing functionality available for Kafka sources. ## Testing: ✅ End-to-end: File→Kafka sink with MSK IAM OAuth (8 events in 0.91s) ✅ Unit tests: OAuth producer validation tests pass ✅ Linting: ruff checks pass ✅ Existing tests: All Kafka source OAuth tests continue to pass
- Add unit tests for OAuth producer initialization in DatahubKafkaEmitter - Test verifies poll(0) is called when OAuth is configured - Test verifies poll(0) is NOT called without OAuth (backwards compatibility) - Add type annotations to kafka_callback (err: Any, msg: Any) to satisfy mypy - All linting passes (ruff check, ruff format, mypy) - All tests pass (2 new tests added) This closes the test coverage gap identified in the backwards compatibility review.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
community-contribution
PR or Issue raised by member(s) of DataHub Community
ingestion
PR or Issue related to the ingestion of metadata
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add OAuth Callback Support for Kafka Sink Producer Config
Summary
This PR completes OAuth callback support for Kafka producers, enabling MSK IAM authentication and custom OAuth mechanisms for Kafka sinks. This matches the existing functionality available for Kafka sources (consumers).
Problem Statement
Customers using Kafka sinks with MSK IAM authentication were encountering:
The
oauth_cbparameter validation and resolution existed only for consumer configs, but DataHub's Kafka sink uses producer configs. This caused authentication failures when trying to use MSK IAM or custom OAuth with Kafka sinks.Changes
1. Rename
CallableConsumerConfig→KafkaOAuthCallbackResolverkafka.py,kafka.pysource,kafka_emitter.py)kafka_consumer_config.py,kafka.py,ingestion/source/kafka/kafka.py2. Producer OAuth Initialization (
kafka_emitter.py)DatahubKafkaEmitter.__init__()producer.poll(0)on startup when OAuth is detected3. Kafka Callback Error Handling Fix (
datahub_kafka.py)_KafkaCallback.kafka_callbacksignature to match actual Kafka delivery callback(err: Optional[Exception], msg: str)to(err, msg)KafkaErrorandMessageobjects fromconfluent-kafkaKafkaErrortoExceptionfor consistent error handling downstreamNoneType: Noneerrors in sink callbacks4. Documentation Simplification
kafka.mdWhy Producer Initialization is Required
The initial
poll(0)call inDatahubKafkaEmitter.__init__()is not just validation - it's required for OAuth authentication to work:poll()is called on the producerpoll()is called, the producer doesn't have an authentication token and can't connect to the brokerpoll(0)forces the OAuth callback to execute immediately, acquiring the auth token before any messages are sentWhy consumers don't need this: Consumers have an implicit polling loop for receiving messages, which automatically triggers the OAuth callback. Producers don't have this natural polling mechanism, so it must be explicit.
Testing
✅ Unit Tests
pytest tests/unit/test_kafka_sink.py -k oauth✅ End-to-End Testing
Successfully tested File → Kafka sink with MSK IAM OAuth:
Test environment: Kubernetes with IRSA (IAM Roles for Service Accounts), MSK with IAM authentication
✅ Linting
ruff checkpassesContext
This PR builds on previous work (PR #15420) that added
oauth_cbvalidation forproducer_config. Together, these changes enable complete MSK IAM authentication support for Kafka sinks, achieving parity with Kafka sources.Configuration Example
Kafka Sink with MSK IAM (now works):
Requirement:
pip install acryl-datahub-actions>=1.3.1.2Deployment Notes
Checklist