NVIDIA GPU Performance Monitor

"Information should not be displayed all at once; let people gradually become familiar with it." - Edward Tufte

Tired of playing "Find the Important Number" in your terminal while your ML model trains? Watching your GPU temperature shouldn't feel like decoding the Matrix.

Transform complex GPU metrics into intuitive visual patterns

Do you find yourself:

Constantly switching between terminal windows?
Squinting at rapidly changing numbers?
Missing critical spikes in GPU usage?
Wondering if that temperature is actually concerning?
Spending mental energy parsing dense data when you should be focusing on your work?

Stacked view of GPU Perf Monitor and nvidia-smi

We've reimagined GPU monitoring with human-centered design principles. Instead of parsing dense terminal output, our dashboard leverages intuitive visual affordances - color gradients that immediately signal temperature states, progress bars that show memory usage at a glance, and trend indicators that make pattern recognition effortless.

This beautiful, real-time GPU monitoring dashboard transforms complex metrics into an intuitive interface. Built with React and Flask, it reduces cognitive load through thoughtful information hierarchy and visual signifiers, letting you focus on your work while maintaining awareness of your GPU's health.

From Terminal to Visual Intelligence

The Old Way: Dense Terminal Output

Traditional nvidia-smi command line output - dense numbers requiring constant cognitive processing

Real-world Usage Examples

Machine Learning Workload Monitoring

Dashboard showing Ollama running the Mistral-Small model - clear resource utilization

Stress Test Monitoring

Intensive GPU stress testing with gpu-burn - immediate visual alerts

Mobile-Optimized View

Responsive design automatically adapts to any screen size

Tech Stack

Frontend: React 18, TypeScript, Vite
Backend: Python, Flask
System Tools:
- nvidia-smi
- gpu-burn
- CUDA Toolkit

Installation

Clone the repository:

git clone https://github.com/jackccrawford/nvidia-gpu-perf-monitor.git

Install backend dependencies:

cd backend
pip install -r requirements.txt

Install frontend dependencies:
```
cd frontend
npm install
```
Start the application:
```
./restart.sh
```
The restart.sh script handles:
- Stopping any existing instances of the frontend and backend services
- Starting the Flask backend server
- Starting the React frontend development server
- Ensuring proper startup sequence and port availability
Pro Tip: Always use restart.sh to start/restart the application. It ensures clean startup and proper service coordination.

If you need to start services manually (not recommended):
```
# Backend
cd backend
python gpu_service.py

# Frontend (in a new terminal)
cd frontend
npm run dev
```

Service Management Scripts

The project includes scripts for managing the frontend and backend services:

restart.sh: Restarts both the frontend and backend services with a graceful shutdown method, ensuring reliable operation.
stop_servers.sh: Stops the services gracefully, freeing up ports and ensuring no residual processes remain.

These scripts are designed for development use and may require adjustments for production environments.

Frontend Polling Mechanism

The frontend application uses a polling mechanism to fetch GPU statistics at regular intervals. Recent improvements include:

Enhanced logging to track fetch operation timing and identify potential delays.
Improved error handling to capture detailed error information during data fetching.

Testing and Debugging

Recent efforts focused on resolving an intermittent refresh issue in the frontend. Key actions included:

Adding detailed console logging to monitor API calls and responses.
Verifying backend API response consistency and improving state management in the React component.

These changes have stabilized the application's performance, ensuring accurate and timely GPU monitoring.

Usage Examples

Monitoring Machine Learning Workloads

# Run your ML training
python train.py --model large --epochs 100

Monitor in real-time:

GPU utilization during training
Memory consumption patterns
Temperature trends
Process-specific metrics

Stress Testing

# Run gpu-burn
./gpu-burn 60  # 1 minute stress test - watch closely!

Monitor in dashboard:

Temperature peaks
Error detection
Duration tracking
Performance metrics

CAUTION: Never run stress tests for extended periods. This can damage your GPU!

Temperature Monitoring

Color-coded temperature ranges for intuitive monitoring:

Red (≥85°C): Danger zone
Orange (80-84°C): Warning
Yellow (70-79°C): Normal gaming temp
Green (65-69°C): Ideal temperature
Blue (50-64°C): Cool
Indigo (<50°C): Very cool

Security Notes

No sensitive data collection
Local-only operation
Process information filtered
Safe subprocess execution
Error handling for all operations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

NVIDIA for nvidia-smi toolkit
gpu-burn for stress testing
React for frontend framework
Flask for backend framework
Icons by Heroicons and Phosphor

Made with for the GPU community

Developed with assistance by Codeium Windsurf

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
images		images
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONFIGURATION.md		CONFIGURATION.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
analyze_repo.py		analyze_repo.py
backend_test_example.json		backend_test_example.json
gpu_monitor.py		gpu_monitor.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
restart.sh		restart.sh
restart_services.sh		restart_services.sh
stop_servers.sh		stop_servers.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA GPU Performance Monitor

Stacked view of GPU Perf Monitor and nvidia-smi

More video and images under various scenarios

From Terminal to Visual Intelligence

The Old Way: Dense Terminal Output

Real-world Usage Examples

Machine Learning Workload Monitoring

Stress Test Monitoring

Mobile-Optimized View

Tech Stack

Installation

Service Management Scripts

Frontend Polling Mechanism

Testing and Debugging

Usage Examples

Monitoring Machine Learning Workloads

Stress Testing

Temperature Monitoring

Security Notes

License

Acknowledgments

About

Releases

Packages

Languages

License

jackccrawford/nvidia-gpu-perf-monitor

Folders and files

Latest commit

History

Repository files navigation

NVIDIA GPU Performance Monitor

Stacked view of GPU Perf Monitor and nvidia-smi

More video and images under various scenarios

From Terminal to Visual Intelligence

The Old Way: Dense Terminal Output

Real-world Usage Examples

Machine Learning Workload Monitoring

Stress Test Monitoring

Mobile-Optimized View

Tech Stack

Installation

Service Management Scripts

Frontend Polling Mechanism

Testing and Debugging

Usage Examples

Monitoring Machine Learning Workloads

Stress Testing

Temperature Monitoring

Security Notes

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages