Skip to content

Comments

feat: Add 2024-2025 NBA season scraping and analytics#8

Merged
ruwzeta merged 1 commit intomainfrom
feature/2025-stats-and-analytics
May 22, 2025
Merged

feat: Add 2024-2025 NBA season scraping and analytics#8
ruwzeta merged 1 commit intomainfrom
feature/2025-stats-and-analytics

Conversation

@ruwzeta
Copy link
Owner

@ruwzeta ruwzeta commented May 22, 2025

This commit introduces several enhancements:

  1. Updated Data Scraping (PyScripts/dataextract25.py):

    • The script now scrapes three types of player statistics for the 2024-2025 NBA season from basketball-reference.com:
      • Per-Game Stats (e.g., PTS, AST, REB)
      • Advanced Stats (e.g., PER, WS, BPM)
      • Shooting Stats (e.g., FG% by distance, %FGA by type)
    • Scraped data is saved into respective CSV files:
      • nba_per_game_stats_2024_25.csv
      • nba_advanced_stats_2024_25.csv
      • nba_shooting_stats_2024_25.csv
  2. MongoDB Integration (PyScripts/MongoDB.py, PyScripts/dataextract25.py):

    • Added functionality to store all three scraped datasets into a MongoDB database.
    • Data is organized into collections per stat type and season (e.g., per_game_stats_2025).
    • Note: Execution in the test environment was hindered by a DNS resolution error for the MongoDB URI, but the integration code is complete.
  3. New Analytics Notebook (notebooks/PlayersStatsAnalysis_2024_25.ipynb):

    • A new Jupyter Notebook has been added to perform analysis on the 2024-2025 season data.
    • The notebook includes steps for:
      • Loading the three new CSV datasets.
      • Merging the data into a comprehensive player DataFrame.
      • Performing Exploratory Data Analysis (EDA), including leaderboards for key statistics and visualizations (bar charts, histograms, correlation heatmap).
  4. Updated Documentation (README.md):

    • The project README has been updated to reflect these new features, including details on the new data sources, scraping script, generated files, MongoDB integration, and the analytics notebook.

This commit introduces several enhancements:

1.  **Updated Data Scraping (`PyScripts/dataextract25.py`):**
    *   The script now scrapes three types of player statistics for the 2024-2025 NBA season from basketball-reference.com:
        *   Per-Game Stats (e.g., PTS, AST, REB)
        *   Advanced Stats (e.g., PER, WS, BPM)
        *   Shooting Stats (e.g., FG% by distance, %FGA by type)
    *   Scraped data is saved into respective CSV files:
        *   `nba_per_game_stats_2024_25.csv`
        *   `nba_advanced_stats_2024_25.csv`
        *   `nba_shooting_stats_2024_25.csv`

2.  **MongoDB Integration (`PyScripts/MongoDB.py`, `PyScripts/dataextract25.py`):**
    *   Added functionality to store all three scraped datasets into a MongoDB database.
    *   Data is organized into collections per stat type and season (e.g., `per_game_stats_2025`).
    *   Note: Execution in the test environment was hindered by a DNS resolution error for the MongoDB URI, but the integration code is complete.

3.  **New Analytics Notebook (`notebooks/PlayersStatsAnalysis_2024_25.ipynb`):**
    *   A new Jupyter Notebook has been added to perform analysis on the 2024-2025 season data.
    *   The notebook includes steps for:
        *   Loading the three new CSV datasets.
        *   Merging the data into a comprehensive player DataFrame.
        *   Performing Exploratory Data Analysis (EDA), including leaderboards for key statistics and visualizations (bar charts, histograms, correlation heatmap).

4.  **Updated Documentation (`README.md`):**
    *   The project README has been updated to reflect these new features, including details on the new data sources, scraping script, generated files, MongoDB integration, and the analytics notebook.
@gitguardian
Copy link

gitguardian bot commented May 22, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
17331532 Triggered MongoDB Credentials f23edf4 PyScripts/MongoDB.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@ruwzeta ruwzeta merged commit e9cd44a into main May 22, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant