Skip to content

Commit

Permalink
docs: Ingestion Source Docs Template (#4275)
Browse files Browse the repository at this point in the history
* testing img.shield for status

* update to hyperlink

* changing link format

* adding status options

* updating prerequisities and quickstart

* update to ingestion docs

* updating template with collapse details

* adding linebreak between pip install commands

* Removed incomplete sentence

* typo fix

* pushing current changes

* testing logos in markdown table

* markdown table fix

* markdown table fix

* adding in additional logos

* transposing markdown table

* settling on final table format

* adding commented-out source template to sidebar.js

* moving reference sidebar and adding trailing comma

* fixing docs build
  • Loading branch information
maggiehays authored Mar 30, 2022
1 parent 9ba3610 commit 0be0689
Show file tree
Hide file tree
Showing 4 changed files with 183 additions and 4 deletions.
1 change: 1 addition & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ module.exports = {
{
Guides: [
"metadata-ingestion/adding-source",
//"metadata-ingestion/source-docs-template",
"docs/how/add-custom-ingestion-source",
"docs/how/add-custom-data-platform",
"docs/platform-instances",
Expand Down
34 changes: 30 additions & 4 deletions metadata-ingestion/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,43 @@
# Intro to Metadata Ingestion
# Introduction to Metadata Ingestion

![Python version 3.6+](https://img.shields.io/badge/python-3.6%2B-blue)

This module hosts an extensible Python-based metadata ingestion system for DataHub.
This supports sending data to DataHub using Kafka or through the REST API.
It can be used through our CLI tool, with an orchestrator like Airflow, or as a library.
## Metadata Ingestion Sources

We apply a Support Status to each Metadata Source to help you understand the integration reliability at a glance.

![Certified](https://img.shields.io/badge/support%20status-certified-brightgreen): Certified Sources are well-tested & widely-adopted by the DataHub Community. We expect the integration to be stable with few user-facing issues.

![Incubating](https://img.shields.io/badge/support%20status-incubating-blue): Incubating Sources are ready for DataHub Community adoption but have not been tested for a wide variety of edge-cases. We eagerly solicit feedback from the Community to streghten the connector; minor version changes may arise in future releases.

![Testing](https://img.shields.io/badge/support%20status-testing-lightgrey): Testing Sources are available for experiementation by DataHub Community members, but may change without notice.

### Supported Metadata Ingestion Sources

#### Dataset Sources/SQL Sources

| Support Status | Dataset Sources/SQL Sources |
| --- | --- |
| ![Certified](https://img.shields.io/badge/support%20status-certified-brightgreen) | <img src="/docs-website/static/img/logos/platforms/athena.svg" alt="Athena" width="30"/> |
| ![Incubating](https://img.shields.io/badge/support%20status-incubating-blue) | <img src="/docs-website/static/img/logos/platforms/elasticsearch.svg" alt="Elastic Search" width="30"/> |
| ![Testing](https://img.shields.io/badge/support%20status-testing-lightgrey) | |

#### BI Tools

| Support Status | BI Tools |
| --- | --- |
| ![Certified](https://img.shields.io/badge/support%20status-certified-brightgreen) | <img src="/docs-website/static/img/logos/platforms/looker.svg" alt="Looker" width="30"/> <img src="/docs-website/static/img/logos/platforms/superset.svg" alt="Superset" width="30"/> |
| ![Incubating](https://img.shields.io/badge/support%20status-incubating-blue) | <img src="/docs-website/static/img/logos/platforms/metabase.svg" alt="Metabase" width="30"/> |
| ![Testing](https://img.shields.io/badge/support%20status-testing-lightgrey) | <img src="/docs-website/static/img/logos/platforms/tableau.png" alt="Tableau" width="30"/> |

## Getting Started

### Prerequisites

Before running any metadata ingestion job, you should make sure that DataHub backend services are all running. If you are trying this out locally check out the [CLI](../docs/cli.md) to install the CLI and understand the options available in the CLI. You can reference the CLI usage guide given there as you go through this page.

### Core Concepts

## Recipes

A recipe is a configuration file that tells our ingestion scripts where to pull data from (source) and where to put it (sink).
Expand Down
2 changes: 2 additions & 0 deletions metadata-ingestion/adding-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Tests go in the `tests` directory. We use the [pytest framework](https://pytest.

### 7. Write docs

Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.

Add the plugin to the table under [CLI Sources List](../docs/cli.md#sources), and add the source's documentation underneath the [sources folder](https://github.com/datahub-project/datahub/tree/master/metadata-ingestion/source_docs).

### 8. Add SQL Alchemy mapping (if applicable)
Expand Down
150 changes: 150 additions & 0 deletions metadata-ingestion/source-docs-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Source Name

<!-- Set Support Status -->
![Certified](https://img.shields.io/badge/support%20status-certified-brightgreen)
![Incubating](https://img.shields.io/badge/support%20status-incubating-blue)
![Testing](https://img.shields.io/badge/support%20status-testing-lightgrey)

## Integration Details

<!-- Plain-language description of what this integration is meant to do. -->
<!-- Include details about where metadata is extracted from (ie. logs, source API, manifest, etc.) -->

### Concept Mapping

<!-- This should be a manual mapping of concepts from the source to the DataHub Metadata Model -->
<!-- Authors should provide as much context as possible about how this mapping was generated, including assumptions made, known shortcuts, & any other caveats -->

This ingestion source maps the following Source System Concepts to DataHub Concepts:

<!-- Remove all unnecessary/irrevant DataHub Concepts -->

| Source Concept | DataHub Concept | Notes |
| -- | -- | -- |
| | [Data Platform](docs/generated/metamodel/entities/dataPlatform.md) | |
| | [Dataset](docs/generated/metamodel/entities/dataset.md) | |
| | [Data Job](docs/generated/metamodel/entities/dataJob.md) | |
| | [Data Flow](docs/generated/metamodel/entities/dataFlow.md) | |
| | [Chart](docs/generated/metamodel/entities/chart.md) | |
| | [Dashboard](docs/generated/metamodel/entities/dashboard.md) | |
| | [User (a.k.a CorpUser)](docs/generated/metamodel/entities/corpuser.md) | |
| | CorpGroup | |
| | Domain | |
| | Container | |
| | Tag | |
| | GlossaryTerm | |
| | GlossaryNode | |
| | Assertion | |
| | DataProcess | |
| | MlFeature | |
| | MlFeatureTable | |
| | MlModel | |
| | MlModelDeployment | |
| | MlPrimaryKey | |
| | SchemaField | |
| | DataHubPolicy | |
| | DataHubIngestionSource | |
| | DataHubSecret | |
| | DataHubExecutionRequest | |
| | DataHubREtention | |

### Supported Capabilities

<!-- This should be an auto-generated table of supported DataHub features/functionality -->
<!-- Each capability should link out to a feature guide -->

| Capability | Status | Notes |
| --- | :-: | --- |
| Data Container || Enabled by default |
| Detect Deleted Entities || Requires recipe configuration |
| Data Domain || Requires transformer |
| Dataset Profiling || Requires `acryl-datahub[source-usage-name]` |
| Dataset Usage || Requires `acryl-datahub[source-usage-name]` |
| Extract Descriptions || Enabled by default |
| Extract Lineage || Enabled by default |
| Extract Ownership || Enabled by default |
| Extract Tags || Requires transformer |
| Partition Support || Not applicable to source |
| Platform Instance || Not applicable to source |
| ... | |

## Metadata Ingestion Quickstart

### Prerequisites

In order to ingest metadata from [Source Name], you will need:

* eg. Python version, source version, source access requirements
* eg. Steps to configure source access
* ...

### Install the Plugin(s)

Run the following commands to install the relevant plugin(s):

`pip install 'acryl-datahub[source-name]'`

`pip install 'acryl-datahub[source-usage-name]'`

### Configure the Ingestion Recipe(s)

Use the following recipe(s) to get started with ingestion.

_For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes)._

#### `'acryl-datahub[source-name]'`

```yml
source:
type: source_name
config:
# Required fields
option1: value1

sink:
# sink configs
```

<details>
<summary>View All Recipe Configuartion Options</summary>

| Field | Required | Default | Description |
| --- | :-: | :-: | --- |
| `field1` || `default_value` | A required field with a default value |
| `field2` || `default_value` | An optional field with a default value |
| `field3` || | An optional field without a default value |
| ... | | |
</details>

#### `'acryl-datahub[source-usage-name]'`

```yml
source:
type: source-usage-name
config:
# Required Fields
option1: value1

# Options
top_n_queries: 10

sink:
# sink configs
```

<details>
<summary>View All Recipe Configuartion Options</summary>

| Field | Required | Default | Description |
| --- | :-: | :-: | --- |
| `field1` || `default_value` | A required field with a default value |
| `field2` || `default_value` | An optional field with a default value |
| `field3` || | An optional field without a default value |
| ... | | |
</details>

## Troubleshooting

### [Common Issue]

[Provide description of common issues with this integration and steps to resolve]

0 comments on commit 0be0689

Please sign in to comment.