This R project analyzes baseball pitcher and hitter statistics using Principal Component Analysis (PCA). This repo includes data scraping from inside edge.com and Grinnell Athletics Website, data preprocessing, PCA visualization, and interactive analysis tools to explore player performance and trends within the Grinnell College baseball team.
- Athletic Stats: Scraped from Grinnell College’s official athletics website (https://pioneers.grinnell.edu/sports/baseball/stats).
- Advanced Metrics: Scraped from Inside Edge's evaluation reports for both hitters and pitchers.
- Preprocessed PCA Data: Included in the repo as zipped CSVs (
PitcherPCA.zipandHitters.zip).
Final330App/
├── data/ # Contains PitcherPCA.zip and Hitters.zip
├── doc/ # Contains user help guide
├── Final330App.R # get_player_athletic_stats, get_player_edge_stats, perform_player_pca & interpret_pc
├── README.md
├── .gitignore
Returns cleaned data for "pitcher" or "hitter" from zipped CSVs.
Scrapes and cleans Grinnell College baseball stats for a given year.
Scrapes player-specific advanced metrics from Inside Edge reports.
Performs PCA and returns both transformed data and loading vectors.
Identifies top contributing stats to each principal component.
TODO option to return the new dataset(original + scrapped) and allow user to scrape for multiple years
- PCA plots categorizing players by season (e.g. 2016 Team, 2023 Team, Other Years)
- Component interpretations revealing which stats drive variance in performance
This project was built to explore how data analytics can uncover underlying patterns in player performance, distinguish great seasons from average ones, and provide insights useful for coaching and recruitment.
- Alyssa Trapp
- Sean Tashjian
- Nifemi Ogunmesa
- Grinnell College Athletics for public stats
- Inside Edge for player evaluation tools
- This repo is purely for educational/non-commercial use.