Skip to content

andrevsilva98/paradime-dbt-movie-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt™ Data Modeling Challenge - Movie Edition

Welcome to the Paradime dbt™ Data Modeling Challenge - Movie Edition!

Table of Contents

  1. Getting Started
  2. Competition Details
  3. Building Your Project
  4. Example Submission

Getting Started

Step 1: Registration and Verification

Step 2: Account Set-Up

After verification, you'll receive two emails from Paradime:

  1. Snowflake Account Credentials: Contains your Snowflake account details. Search for an email with subject line "Start Your Movie Data Modeling Challenge – Your Snowflake Credentials."
  2. Paradime Platform Invitation: An invitation to access the Paradime Platform. Search for an email with the subject line "[Paradime] Activate your account."

Step 3: Paradime Account Configuration

  • Access Paradime: Use the provided credentials to log into your account. Join the Paradime workspace using the invite email.
  • Snowflake Integration: Add Snowflake credentials (Username, Password, Role, Database) to Paradime.
  • Act Fast - Limited Time Activation: The links to activate your Paradime account expire within 24 hours!

Note: A step-by-step tutorial is available in your Snowflake credentials email, "Start Your Movie Data Modeling Challenge – Your Snowflake Credentials".

Step 4: Kickstart Your Project

  • Create a New Branch: Open the Paradime Editor and create a new branch. Your branch name should follow this format: "movie-<your_email>"
  • Start Developing: Begin crafting SQL queries, developing dbt™ models, and generating insights!

Step 5: Create your Lightdash Cloud Account

Note: If you log in to Snowflake, your default role is public. Switch your role to the one we provide in the Snowflake email (ex. "[your_name]_transformer").

Need Help?: Check out this step-by-step video tutorial, and join the #movie-competition channel on Slack for assistance.


Competition Details


Building Your Project

Now that you're set up, you have until May 26th, 2024, to complete and submit your project!

Step 1: Getting to Know the Tools

Step 2: Getting to Know the Movie Data

Paradime has pre-loaded your Snowflake account with 3 Movie datasets. These data sets contain roughly 1,700,000 rows of detailed Movie and TV Show data. Please understand that these data sets are not entirely accurate; They're simply a starting point - you will need to bring in your datasets to truly excel in this challenge.

  • In Snowflake: Directly explore the datasets in Snowflake for hands-on analysis.
  • GitHub Repository Resources:
    • Staging Files: These files provide a preliminary view and structure of the datasets available in this repository.
    • schema.yml File: This file contains schema definitions, helping you understand the data models and their relationships.
  • Paradime Catalog UI: Use the Paradime Catalog UI for an interactive exploration of the datasets, featuring intuitive search and navigation.

Step 3: Generating Insights

Your primary goal is to construct dbt™ models that unearth compelling insights, captivating Movie fans. These three datasets are your starting point, and as you bring in additional data, the possibilities for discovery are virtually limitless. This is your playground to innovate and explore the depths of Movie and TV data.

Before diving in, ensure you're familiar with the Judging Criteria so you've got a chance to win the $500-$1500 Amazon gift cards!

What's in it for you?

Need a spark of inspiration?

Check out the example submission from Paradime's recent NBA Data Modeling Challenge, as well as the winning submissions from the NBA data modeling Challenge:

Additionally, Here are some questions you might consider answering:

  • Highest grossing films of all time: - Data Required: omdb_movies and/or tmdb_movies. You might also consider bringing in third party data to understand highest grossing films by country.
  • Highest/lowest ROI films of all time: - Data Required: omdb_movies and/or tmdb_movies. See columns "budget", "revenue", and "box office".
  • Actors who appear in most films: - Data Required: omdb_movies, column "actors"
  • Highest grossing directors and writers: - Data Required: omdb_movies, column "director" and "writer"

Creating Data Visualizations

Submitting Your Project

Submission Deadline: May 26th, 2024 Once your project is complete, please submit the following materials to Parker Rogers ([email protected]) with Subject Line "<your_name> - Movie Data Modeling Challenge Submission":

  • GitHub Branch: Send the link to your GitHub branch containing your dbt™ models.
  • README.md: Include a README file that narrates your project's story, methodology, and insights. Check out this example README from our previous NBA Data Modeling challenge.
  • Data Visualizations and Insights: Showcase your findings, ideally within your README.md. For inspiration, refer to these example visualizations from our previous NBA Data Modeling challenge.

If you're having issues submitting your project, watch this interactive tutorial.

We look forward to seeing your creative and insightful analyses!

Example Submission template

Here's an example project that fulfills all requirements and would be elligble eligible for cash prizes. Feel free to use this template for your submission. We also recommend diving into the the winner's submissions from our recent NBA Data Modeling Challenge for inspiration.

Table of Contents

  1. Introduction
  2. Data Sources
  3. Methodology
  4. Visualizations
  5. Conclusions

Introduction

A simple intro. Example - "Explore my project for the dbt™ data modeling challenge - Movie Edition, Hosted by Paradime! This project dives into the analysis and visualization of Movie and TV data!"

Data Sources and Data Lineage

My analysis leverages four key data sets:

  • data set name #1
  • data set name #2
  • data set name #3
  • data set name #4

Data Lineage

  • Copy and paste your data lineage image here. Watch this YouTube Tutorial to learn how.

Methodology

Tools Used

  • Paradime for SQL, dbt™.
  • Snowflake for data storage and computing.
  • Lightdash for data visualization.
  • Other tool(s) used and why.

Data Sources

My analysis leverages four key data sets:

  • data set name #1
  • data set name #2
  • data set name #3
  • data set name #4

Data Lineage

[Image]

Visualizations

vizualization 1

  • Visualizations title

  • Intro sentence to vizualization

  • Image of vizualization

  • Insights

vizualization 2

  • Visualizations title

  • Intro sentence to vizualization

  • Image of vizualization

  • Insights

vizualization 3

  • Visualizations title

  • Intro sentence to vizualization

  • Image of vizualization

  • Insights

Conclusions

Share a clear and concise conclusion of your findings!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •