Yarr! Welcome to the Developer Insights Analytics Dashboard, a comprehensive data analysis treasure chest that helps data analysts explore and visualize developer survey data! This be a flexible, full-stack application built with modern data analysis practices in mind.
π Plus: Complete GitHub Copilot Training Materials! This project includes comprehensive exercises from beginner to advanced level, teaching you to master AI-assisted development with real-world scenarios.
python-fullstack/
β
βββ .gitignore
βββ README.md
βββ requirements.txt
βββ pyproject.toml
β
βββ data/
β βββ kaggle_so_2023_data.zip # Stack Overflow 2023 survey data (auto-extracted on first run)
β
βββ app/
β βββ __init__.py
β βββ main.py # Main FastAPI application
β βββ data_config.py # Data source configuration & analysis
β βββ templates/
β βββ index.html # Analytics dashboard frontend
β
βββ docs/
β βββ specifications/
β βββ project_specs.md # Technical specifications
β
βββ exercises/ # π GitHub Copilot Training Materials
β βββ copilot-cheatsheet.md # Quick reference guide
β βββ copilot-beginner-exercises.md # Foundation skills (2-3 hrs)
β βββ copilot-advanced-exercises.md # Mode selection guide
β βββ copilot-advanced-mode-a.md # Hands-on track (8-12 hrs)
β βββ copilot-advanced-mode-b.md # Agent-driven track (12-18 hrs)
β βββ copilot-ci-exercise.md # DevOps with Act
β
βββ tests/
βββ __init__.py
βββ test_main.py # Comprehensive test suite
- Backend: Python 3.10+ with FastAPI
- Data Analysis: Pandas with flexible data source management
- Web Server: Uvicorn with auto-reload
- Frontend: HTML5, JavaScript (ES6+), Chart.js with interactive controls
- API Design: RESTful with Pydantic models and comprehensive error handling
- Testing: Pytest with full API coverage
This application is designed specifically for data analysts who need:
- Multiple Technology Categories: Languages, Databases, Platforms, Web Frameworks
- Configurable Results: Choose top 10, 15, 20, or 25 results
- Real-time Analysis: Interactive dashboard with instant results
- Comparison Views: "Have Worked With" vs "Want to Work With" analysis
- Modular Design: Easy to add new data sources
- Schema Validation: Built-in data validation and error handling
- Multiple Format Support: CSV with automatic schema detection
- Data Quality Insights: Response counts and unique technology metrics
The application features an intelligent data management system designed for data analysts who work with multiple datasets:
- Zero Configuration: Drop any survey data zip file into
data/
folder - Auto-Extraction: Zip files are automatically extracted on application startup
- Smart Detection: CSV files are automatically discovered and configured
- Technology Analysis: Columns with semicolon-separated tech lists are auto-detected
- Stack Overflow 2023:
kaggle_so_2023_data.zip
(20MB compressed β 151MB extracted)- Extracts to
kaggle_so_2023_data/
folder on first application startup - Contains
survey_results_public.csv
with 89,000+ developer responses - Includes
survey_results_schema.csv
with column definitions - Pre-configured with 8 technology analysis categories
- Extracts to
Perfect for data analysts working with multiple survey datasets:
-
Prepare Your Data:
your_survey_data/ βββ main_survey_responses.csv # Main data (any CSV name works) βββ schema_definitions.csv # Optional (detected by "schema" in name) βββ documentation.txt # Additional files (ignored)
-
Create Zip Archive:
zip -r your_survey_2024.zip your_survey_data/
-
Deploy to Application:
cp your_survey_2024.zip /path/to/project/data/ # Application auto-detects and configures on next startup
-
Automatic Configuration:
- Main data file detected (largest CSV or one with "survey"/"results" in name)
- Schema file detected (contains "schema" in filename)
- Technology columns identified (contain "language", "database", "platform", etc.)
- New data source registered and available in dashboard
- Primary Format: CSV files with semicolon-separated technology lists
- Column Detection: Automatic detection of technology-related columns
- Schema Support: Optional schema files for column descriptions
- Size Limit: Zip files should be under GitHub's 100MB limit
data/
βββ kaggle_so_2023_data.zip # Stack Overflow 2023
βββ kaggle_so_2023_data/ # Auto-extracted
βββ github_dev_survey_2024.zip # Your GitHub survey
βββ github_dev_survey_2024/ # Auto-extracted
βββ company_internal_survey.zip # Internal survey
βββ company_internal_survey/ # Auto-extracted
βββ .gitignore # Excludes CSV files, includes zips
Each data source becomes automatically available in the dashboard with detected technology categories!
Data Contents:
survey_results_public.csv
- Main survey responses (151MB)survey_results_schema.csv
- Data schema and column descriptionsso_survey_2023.pdf
- Survey documentationREADME_2023.txt
- Additional information
Make sure ye have Python 3.10+ installed, then install the required packages:
# Activate the virtual environment (if ye haven't already)
source venv/bin/activate # On macOS/Linux
# or
venv\\Scripts\\activate # On Windows
# Install the treasure chest of dependencies
pip install -r requirements.txt
Start the FastAPI server like hoisting the main sail:
# Run the application with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Once the server be running, open yer browser and navigate to:
- Interactive Dashboard: http://localhost:8000
- API Documentation: http://localhost:8000/docs (FastAPI auto-generated)
- Data Sources API: http://localhost:8000/api/data-sources
To run the comprehensive test suite:
# Run all tests with verbose output
pytest -v
# Run tests with coverage report
pytest --cov=app --cov-report=html
# Run specific test categories
pytest tests/test_main.py::test_technology_analysis_endpoint -v
- Description: Lists all available data sources and their analysis capabilities
- Response: Array of data source information with available columns
- Description: Flexible technology usage analysis with multiple parameters
- Parameters:
source
: Data source name (default: "stackoverflow_2023")column
: Technology category to analyze (default: "LanguageHaveWorkedWith")top_n
: Number of results to return (1-50, default: 10)
- Response: Comprehensive analysis results with metadata
- Description: Returns schema information for a data source
- Response: Data structure and column definitions
- Description: Backward-compatible endpoint for original specification
- Response: Top 10 programming languages in legacy format
- Description: Interactive analytics dashboard
- Response: Full-featured HTML dashboard with controls
- Data Source Selection: Choose from available datasets
- Technology Categories: 8+ different analysis dimensions
- Programming Languages (Used/Wanted)
- Databases (Used/Wanted)
- Platforms (Used/Wanted)
- Web Frameworks (Used/Wanted)
- Result Customization: Adjustable result counts
- Real-time Updates: Instant analysis with loading indicators
- Interactive Bar Charts: Hover details with percentages
- Color-coded Categories: Professional color schemes
- Responsive Design: Works on all screen sizes
- Export Ready: High-quality charts suitable for presentations
- Response Counts: Total survey responses analyzed
- Technology Coverage: Number of unique technologies found
- Data Quality: Insights into data completeness
- Source Attribution: Clear data provenance
This application be designed with extensibility in mind! Future versions could include:
- Cross-tabulation Analysis: Technology combinations and correlations
- Trend Analysis: Year-over-year comparisons when historical data is available
- Demographic Breakdowns: Analysis by experience level, company size, location
- Salary Analysis: Compensation trends by technology stack
- Multiple Data Sources: Support for different survey years and sources
- Data Refresh Automation: Scheduled data updates and processing
- Data Quality Monitoring: Automated validation and completeness checks
- Custom Data Uploads: Allow analysts to upload their own datasets
- Multiple Chart Types: Scatter plots, heatmaps, time series
- Interactive Filtering: Dynamic data exploration with multiple dimensions
- Export Capabilities: PDF reports, CSV exports, chart images
- Dashboard Customization: Save and share custom analysis configurations
- User Authentication: Multi-user support with role-based access
- API Rate Limiting: Production-ready API with proper throttling
- Database Integration: PostgreSQL/MongoDB for larger datasets
- Caching Layer: Redis for improved performance with large datasets
This application follows data analysis best practices:
- Reproducible Analysis: All analysis parameters are configurable and documented
- Data Validation: Built-in checks for data quality and completeness
- Error Handling: Graceful handling of missing data and edge cases
- Performance Optimization: Efficient data processing for large datasets
- API-First Design: Easy integration with other analysis tools and notebooks
- Comprehensive Testing: Full test coverage ensures reliability
- Modular Architecture: Easy to extend with new data sources and analysis types
- Clean Code Principles: Well-documented, maintainable codebase
- Type Safety: Pydantic models for API contract enforcement
- Async Support: Built for high-performance concurrent requests
- Docker Ready: Easy containerization for deployment
- All code be commented in proper pirate fashion, yarr!
This treasure be open source - use it freely for yer data analysis adventures, but remember to give credit where it be due!
Built with β€οΈ and β by data analyst pirates who love clean code, robust analysis, and beautiful visualizations
Perfect for:
- Data Analysis: Survey data exploration, technology trend analysis, multi-source data integration
- Learning: Modern full-stack development with Python/FastAPI and data science applications
- GitHub Copilot Training: Comprehensive exercises from beginner to advanced AI-assisted development
- DevOps: Local CI/CD workflows and development optimization
Exercise | Link | Description | Requirements |
---|---|---|---|
Cheatsheet | copilot-cheatsheet.md |
Quick reference for context variables and commands | None |
Beginner | copilot-beginner-exercises.md |
Foundation Copilot skills | Basic Python, VS Code + Copilot |
Advanced Mode A | copilot-advanced-mode-a.md |
Hands-on development track | Intermediate Python/FastAPI |
Advanced Mode B | copilot-advanced-mode-b.md |
Agent-driven development track | Advanced Python, architecture knowledge |
CI/CD with Act | copilot-ci-exercise.md |
Local GitHub Actions workflow | Docker Desktop, Act installation |
Choose based on your experience level and preferred learning style. Start with the Cheatsheet for quick reference or Beginner exercises if new to Copilot.