Skip to content

oyogbeche/nba-data-lake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NBA DATA LAKE

The NBA Data Lake is an end-to-end data analytics pipeline designed to fetch, process, store, and transform NBA player data for analytics and visualization. This README provides an overview of the project structure, components, and instructions to set up and use the system.

Project Overview

  • The NBA Data Lake performs the following tasks:

  • Fetches raw NBA data from an external API.

  • Stores the raw data in an S3 bucket.

  • Processes and cleans the data using an AWS Glue ETL job.

  • Queries the processed data using Amazon Athena.

  • Visualizes the data in Amazon QuickSight.

Architecture

The architecture comprises the following AWS components:

  • Amazon S3: Stores raw and processed data.

  • AWS Glue:

    • Crawler: Catalogs the data and creates a schema in the Glue Data Catalog.

    • ETL Job: Transforms raw data into a structured format.

  • Amazon Athena: Queries the processed data for analytics.

  • Amazon QuickSight: Provides data visualization and reporting.

  • AWS IAM: Manages permissions for resources and services.

Architecture

Project Structure

    nba-data-lake/  
    ├── media/                      # media file
    |  
    ├── src/  
    │   └── nba_data_lake.py        # python script 
    
    ├── .env.example                # configurable env variables
    ├── .gitignore                  # Ignored files  
    ├── manifest.json               # manifest file for quicksight 
    ├── README.md                   # Project documentation 
    └── requirements.txt            # python dependencies    

Getting Started

  • Prerequisites

  • An AWS account with access to the following services: S3, Glue, Athena, and QuickSight.

  • AWS CLI installed and configured.

  • API KEY from soortsdata.io

Steps

  1. Clone the Repository:
    git clone https://github.com/oyogbeche/nba_data_lake.git
    cd nba_data_lake
  1. Configure the .env using .env.example

  2. Create and activate venv

    python -m venv myenv
    myenv\Scripts\Activate
  1. Install dependencies
    pip install -r requirements.txt
  1. Configure aws credentials
    aws configure
  1. Run the application
    python src/nba_data_lake.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages