Write re-usable sql code blocks and compile to a dbt project through yaml configuration to improve data logic DRY-ness
If you're unfamiliar with (dbt)[https://github.com/dbt-labs/dbt-core], it is the leading sql data wrangling toolset (IMO) which adds really nice features and code modularity on top of SQL. dbt-factory aims to abstract dbt another level to increase DRYness and automatibility
Table of Contents
pip install dbtf dbt-core==1.6.6 dbt-duckdb==1.6.1 dbt-core==1.6.6Create a blank dbt-factory project:
dbtf init
Run:
cd dbt_project
dbtf run
-
Add templates into
dbt_project/templatesusing##as templating string (i.e.##table_2##will replace table_2 infactory_config.yml) -
Add logic into
dbt_project/factory_config.yml, i.e.:
nodes:
test_append:
template: append
dependencies:
table_1: test_table_1
table_2: test_table_2
Which uses the template template/append.sql and replaces table_1 and table_2 in the template
-
Add data into
dbt_project/seeds -
Run
dbtf run
- Construct pipeline without knowing SQL
- Construct pipelines without learning dbt
- Avoid duplicative code
- Easier automation of pipeline creation, so your code can interact with yaml instead of sql (advanced users)
Inspired by Airflow's DAG Factory
dbt-factory by Conrad Bezuidenhout is licensed under CC BY-NC-SA 4.0