This document outlines the migration of ThinkDSP from CPU-only to GPU-accelerated computation using CUDA, CuPy, and cuSignal.
The goal is to add GPU acceleration while maintaining:
- API Compatibility: Public API remains unchanged
- CPU Fallback: All operations work on CPU-only machines
- Educational Value: Code remains readable and educational
A thin backend layer (thinkdsp_gpu/backend.py) provides:
- Automatic GPU detection and selection
- Unified interface for CPU/GPU operations
- Conversion utilities for host/device transfers
| CPU Operation | GPU Equivalent | Notes |
|---|---|---|
numpy.fft.fft |
cupy.fft.fft |
Direct mapping |
numpy.fft.ifft |
cupy.fft.ifft |
Direct mapping |
numpy.fft.rfft |
cupy.fft.rfft |
Direct mapping |
numpy.fft.irfft |
cupy.fft.irfft |
Direct mapping |
numpy.fft.fftfreq |
cupy.fft.fftfreq |
Direct mapping |
numpy.fft.rfftfreq |
cupy.fft.rfftfreq |
Direct mapping |
numpy.fft.fftshift |
cupy.fft.fftshift |
Direct mapping |
numpy.fft.ifftshift |
cupy.fft.ifftshift |
Direct mapping |
numpy.convolve |
cusignal.convolve or cupy.convolve |
Prefer cuSignal for signal processing |
numpy.hamming |
cusignal.windows.hamming |
Window functions via cuSignal |
scipy.fftpack.dct |
cusignal.dct or CuPy implementation |
DCT via cuSignal |
scipy.fftpack.idct |
cusignal.idct or CuPy implementation |
IDCT via cuSignal |
scipy.signal.gaussian |
cusignal.windows.gaussian |
Gaussian window via cuSignal |
scipy.signal.fftconvolve |
cusignal.fftconvolve |
FFT-based convolution |
- Default: Auto-detect GPU if CuPy is available and CUDA device exists
- Override:
THINKDSP_BACKEND=cpu|gpuenvironment variable - Fallback: Always fall back to CPU if GPU unavailable
- RAPIDS-style: Keep arrays on GPU once moved
- Minimize transfers: Only convert to CPU for:
- Matplotlib plotting (requires CPU arrays)
- File I/O (WAV files)
- NumPy-only operations (e.g., scipy.stats)
- Preserve original dtypes (float32/float64)
- CuPy arrays maintain dtype compatibility with NumPy
- Document any numerical precision differences
- Convert GPU arrays to CPU only at plotting boundaries
- Use
backend.to_cpu()utility for matplotlib operations - Keep computation on GPU until visualization
- Create backend abstraction layer
- Implement auto-detection
- Add conversion utilities
- Environment variable override
- Replace FFT operations in
Wave.make_spectrum() - Replace FFT operations in
Spectrum.make_wave() - Replace convolution in
Wave.convolve() - Replace window functions (hamming, etc.)
- Replace DCT operations
- Spectrogram computation
- Filter operations (low_pass, high_pass, etc.)
- Signal generation (keep on CPU for now, or move to GPU)
- Correctness tests (CPU vs GPU comparison)
- Performance benchmarks
- Notebook updates
- Update README with GPU instructions
- Add GPU status cells to notebooks
- Performance documentation
code/thinkdsp.py- Main DSP library (uses backend)thinkdsp_gpu/backend.py- Backend abstraction (NEW)thinkdsp_gpu/__init__.py- Package init (NEW)
environment.yml- Add GPU dependenciesrequirements.txt- Add optional GPU depspyproject.toml- Update dependencies
README.md- GPU installation and usagedocs/GPU_MIGRATION.md- This filedocs/PERFORMANCE.md- Benchmark results
tests/test_backend.py- Backend tests (NEW)tests/test_correctness.py- CPU/GPU correctness (NEW)tests/test_performance.py- Performance benchmarks (NEW)
- DCT Implementation: cuSignal may not have DCT, may need CuPy-based implementation
- scipy.stats: Some statistical operations remain CPU-only (scipy dependency)
- File I/O: WAV file reading/writing requires CPU arrays
- Small Arrays: GPU overhead may not benefit very small signals (< 1K samples)
- Large FFTs (> 10K samples): 5-50x speedup expected
- Convolutions: 3-20x speedup depending on kernel size
- Spectrograms: 5-30x speedup for large signals
- Small operations: May be slower due to GPU overhead
- All existing code continues to work
- CPU path is default if GPU unavailable
- No breaking changes to public API
- Notebooks work in both CPU and GPU modes