For context on getting started with ingestion, check out our metadata ingestion guide.
To install this plugin, run pip install 'acryl-datahub[datahub-rest]'
.
Pushes metadata to DataHub using the GMS REST API. The advantage of the REST-based interface is that any errors can immediately be reported.
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
# source configs
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Required | Default | Description |
---|---|---|---|
server |
✅ | URL of DataHub GMS endpoint. | |
timeout_sec |
30 | Per-HTTP request timeout. | |
retry_max_times |
1 | Maximum times to retry if HTTP request fails. The delay between retries is increased exponentially | |
retry_status_codes |
[429, 502, 503, 504] | Retry HTTP request also on these status codes | |
token |
Bearer token used for authentication. | ||
extra_headers |
Extra headers which will be added to the request. | ||
max_threads |
1 |
Experimental: Max parallelism for REST API calls |
For context on getting started with ingestion, check out our metadata ingestion guide.
To install this plugin, run pip install 'acryl-datahub[datahub-kafka]'
.
Pushes metadata to DataHub by publishing messages to Kafka. The advantage of the Kafka-based interface is that it's asynchronous and can handle higher throughput.
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
# source configs
sink:
type: "datahub-kafka"
config:
connection:
bootstrap: "localhost:9092"
schema_registry_url: "http://localhost:8081"
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Required | Default | Description |
---|---|---|---|
connection.bootstrap |
✅ | Kafka bootstrap URL. | |
connection.producer_config.<option> |
Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.SerializingProducer | ||
connection.schema_registry_url |
✅ | URL of schema registry being used. | |
connection.schema_registry_config.<option> |
Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient |
The options in the producer config and schema registry config are passed to the Kafka SerializingProducer and SchemaRegistryClient respectively.
For a full example with a number of security options, see this example recipe.
If you've got any questions on configuring this sink, feel free to ping us on our Slack!