An interactive Streamlit web application that automates Exploratory Data Analysis (EDA) for any uploaded CSV file.
It provides dataset diagnostics, missing value analysis, statistical summaries, and visual insights through univariate and bivariate analysis.
- Dataset shape (rows, columns)
- Data preview (top rows)
- Schema summary:
- Column names
- Data types
- Missing value counts and percentages
- Smart warnings:
- Columns with high missing values
- High-cardinality categorical columns (ID-like features)
- Summary statistics: count, mean, median, std, skewness
- Histograms with KDE
- Boxplots for outlier visualization
- Skewness interpretation labels
- Unique count and mode
- Top-N category distributions
- Percent share + “Others” bucket
- High-cardinality warnings
- Pearson and Spearman correlation options
- Masked correlation heatmap (lower triangle)
- Top correlated feature pairs table (interactive Top-N)
- Scatter plot
- Correlation value
- Grouped boxplot (numeric distribution across categories)
- Option to limit category count for readability
- Crosstab table
- Stacked bar chart (count or percentage mode)
- Python
- Streamlit
- Pandas / NumPy
- Matplotlib
- Seaborn
eda-automation-tool/
│
├── main.py
├── requirements.txt
└── README.md
pip install -r requirements.txtstreamlit run main.py- Quick dataset understanding before ML modeling
- Missing value diagnostics & schema validation
- Feature relationship analysis
- Detecting outliers and class/category imbalance
Ishatva Singh Panwar