Hands-on dbt project for building a Medallion-style pipeline on Databricks, with practical notes and transcript-driven checkpoints.
This repository includes:
- a working dbt project in start_dbt
- bronze, silver, and gold model layers
- generic and singular tests
- seeds, macros, analyses, and snapshots
- learning notes and transcript artifacts
- Build a clear Bronze -> Silver -> Gold data workflow.
- Practice dbt source config, refs, testing, snapshots, and macros.
- Capture technical learning in concise notes for revision/interview prep.
.
|-- start_dbt/
| |-- dbt_project.yml
| |-- models/
| | |-- source/
| | |-- bronze/
| | |-- silver/
| | `-- gold/
| |-- tests/
| |-- snapshots/
| |-- seeds/
| |-- macros/
| `-- analyses/
|-- Notes/
|-- transcripts/
|-- scripts/
|-- pyproject.toml
`-- main.py
- Declared in start_dbt/models/source/sources.yml.
- Uses target-aware catalog selection with {{ target.catalog }}.
- Staging-like models over source tables.
- Includes baseline quality checks such as:
- not_null
- unique
- accepted_values
- custom generic non-negative test
- Business-friendly transformed models.
- Current examples include:
- silver_returns: cleaned and aggregated returns metrics
- silver_salesinfo: enriched sales view with customer/product context
- Consumption-ready model for latest item state:
- source_gold_items
- Snapshot config is in start_dbt/snapshots/gold_items.yml.
- Uses timestamp strategy on updateDate for SCD2-style history tracking.
- Catalog is environment-aware via {{ target.catalog }}.
- Configured in model properties files.
- Built-in tests and accepted_values are used across key fields.
- Defined in start_dbt/tests/generic/test_generic_non_negative.sql.
- Reused as test_generic_non_negative.
- start_dbt/tests/test_non_negative_values_fact_sales.sql
- Returns failing rows when gross_amount or net_amount is negative.
- start_dbt/macros/multiply.sql
- utility macro for expression reuse
- start_dbt/macros/generate_schema.sql
- custom schema naming behavior
- start_dbt/seeds/mapping.csv
- Seed schema is configured under seeds in dbt_project.yml.
- Python 3.13+
- dbt-core >= 1.11.6
- dbt-databricks >= 1.11.6
- Access to Databricks SQL Warehouse
Dependencies are listed in pyproject.toml.
From repository root:
python -m venv .venv
source .venv/bin/activate
pip install -e .If you already use uv:
uv sync
source .venv/bin/activateThe project profile name is start_dbt.
Recommended:
- keep credentials in ~/.dbt/profiles.yml
- avoid storing Databricks tokens in tracked repository files
Run these from start_dbt:
dbt debug
dbt seed
dbt run
dbt test
dbt snapshot
dbt docs generate
dbt docs serveTargeted examples:
dbt run --select models/bronze
dbt run --select models/silver
dbt test --select silver_returns- notes/: compact module-style notes from setup to deployment workflow
Typical flow used in this repo:
- Make model/test changes.
- Run dbt run + dbt test locally.
- Stage only intended files.
- Use clear commit messages (feature/docs split when useful).
- Push to origin/main.
- If dbt debug fails on project path, ensure commands run inside start_dbt.
- If test failures occur on accepted_values, inspect casing and whitespace.
- If source references fail, verify catalog/schema/table names in sources.yml.
This project is open source under the MIT License. See the LICENSE file for details.