NBA DATA LAKE

The NBA Data Lake is an end-to-end data analytics pipeline designed to fetch, process, store, and transform NBA player data for analytics and visualization. This README provides an overview of the project structure, components, and instructions to set up and use the system.

Project Overview

The NBA Data Lake performs the following tasks:
Fetches raw NBA data from an external API.
Stores the raw data in an S3 bucket.
Processes and cleans the data using an AWS Glue ETL job.
Queries the processed data using Amazon Athena.
Visualizes the data in Amazon QuickSight.

Architecture

The architecture comprises the following AWS components:

Amazon S3: Stores raw and processed data.
AWS Glue:
- Crawler: Catalogs the data and creates a schema in the Glue Data Catalog.
- ETL Job: Transforms raw data into a structured format.
Amazon Athena: Queries the processed data for analytics.
Amazon QuickSight: Provides data visualization and reporting.
AWS IAM: Manages permissions for resources and services.

Project Structure

    nba-data-lake/  
    ├── media/                      # media file
    |  
    ├── src/  
    │   └── nba_data_lake.py        # python script 
    │           
    ├── .env.example                # configurable env variables
    ├── .gitignore                  # Ignored files  
    ├── manifest.json               # manifest file for quicksight 
    ├── README.md                   # Project documentation 
    └── requirements.txt            # python dependencies

Getting Started

Prerequisites
An AWS account with access to the following services: S3, Glue, Athena, and QuickSight.
AWS CLI installed and configured.
API KEY from soortsdata.io

Steps

Clone the Repository:

    git clone https://github.com/oyogbeche/nba_data_lake.git
    cd nba_data_lake

Configure the .env using .env.example
Create and activate venv

    python -m venv myenv
    myenv\Scripts\Activate

Install dependencies

    pip install -r requirements.txt

Configure aws credentials

    aws configure

Run the application

    python src/nba_data_lake.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBA DATA LAKE

Project Overview

Architecture

Project Structure

Getting Started

Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
media		media
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
manifest.json		manifest.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

NBA DATA LAKE

Project Overview

Architecture

Project Structure

Getting Started

Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages