Skill Boost: Data Modeling Techniques

Overview

Welcome to the Data Modeling Skill Boost training session! This hands-on workshop explores three distinct data modeling techniques using the same dataset (TPC-H). You'll experience firsthand how different modeling approaches affect query complexity, maintainability, and analytical ease.

What You'll Learn

This training uses the TPC-H benchmark dataset (a standard dataset for database benchmarking) to implement three different modeling techniques:

Dimensional Modeling (Kimball) - Star schema with facts and dimensions
Data Vault 2.0 - Flexible, auditable enterprise data warehouse
One Big Table (OBT) - Fully denormalized approach

Each technique is implemented as a separate dbt project using DuckDB. You'll write queries against each model to answer business questions and compare the experience across approaches.

Getting Started

Option 1: GitHub Codespaces (Recommended)

Click the "Open in GitHub Codespaces" badge above. Everything is pre-configured and will be ready in 3-5 minutes!

The setup automatically:

Installs uv and all dependencies
Creates three DuckDB databases with TPC-H data
Configures the Python environment

Option 2: Local Setup

Prerequisites:

Python 3.11 or higher
Git

Installation:

# Clone the repository
git clone https://github.com/datamindedacademy/skill-boost-data-modeling
cd skill-boost-data-modeling

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate

# Follow the setup instructions in each project directory to initialize data and run dbt

Project Structure

.
├── dimensional-modeling/    # Star schema implementation
├── data-vault-20/          # Data Vault 2.0 implementation
├── one-big-table/          # Denormalized OBT implementation
├── questions/              # Business questions to answer
└── pyproject.toml          # Python dependencies

The Three Modeling Techniques

Each modeling approach directory contains detailed information about the philosophy, benefits, drawbacks, and references. Explore each to understand the trade-offs:

dimensional-modeling/ - Star schema with facts and dimensions (Kimball approach)
data-vault-20/ - "Flexible", auditable enterprise warehouse with hubs, links, and satellites
one-big-table/ - Fully denormalized single table approach

Training Exercises

Navigate to the questions/ directory to find business questions to answer using each modeling technique. Compare:

Query Complexity: How many joins? How readable is the SQL?
Performance: How fast do queries execute?
Flexibility: How easy is it to answer new questions?
Maintainability: How would schema changes impact the model?

Visualizing with Apache Superset

Explore your data models visually using Apache Superset with automatic DuckDB connections to all three modeling approaches.

Quick Start

# One-time setup (first time only)
./superset-cli setup

# Start Superset
./superset-cli start

# Wait 1-2 minutes, then access at http://localhost:8088
# Login: admin / admin

# Add DuckDB database connections (if not already added)
./superset-cli connect

# Stop Superset when done
./superset-cli stop

Once connected, you can query and visualize data from all three models

Tip: Run ./superset-cli help to see all available commands.

Additional Resources

Contributing

Found an issue or have suggestions? Open an issue or submit a pull request!

License

This training material is provided for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.devcontainer		.devcontainer
.superset-scripts		.superset-scripts
data-vault-20		data-vault-20
data		data
dimensional-modeling		dimensional-modeling
one-big-table		one-big-table
questions		questions
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
superset-cli		superset-cli
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skill Boost: Data Modeling Techniques

Overview

What You'll Learn

Getting Started

Option 1: GitHub Codespaces (Recommended)

Option 2: Local Setup

Project Structure

The Three Modeling Techniques

Training Exercises

Visualizing with Apache Superset

Quick Start

Additional Resources

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Skill Boost: Data Modeling Techniques

Overview

What You'll Learn

Getting Started

Option 1: GitHub Codespaces (Recommended)

Option 2: Local Setup

Project Structure

The Three Modeling Techniques

Training Exercises

Visualizing with Apache Superset

Quick Start

Additional Resources

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages