This repository contains all code from the YouTube series "GPGPU Programming with CUDA" by CoffeeBeforeArch.
Suggestions for specific content can be sent to: [email protected]
An up to date list on all series is available at: Google Sheets
Operating System: Windows 10 & Ubuntu 18.04
IDE: Visual Studio 2017
Text Editor: VIM
GPU: NVIDIA GTX 1050 Ti
CUDA version: 10.0, 9.1
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: Visual Studio 2017 Environment Setup | Setup, Linker, Visual Studio, Environmen, Build Paths | vs_setup.cu |
CUDA Crash Course: Programming in Linux | NVCC, NVprof, Vector Addition | vector_add.cu |
Video | Concepts | Files |
---|---|---|
GPGPU Programming with CUDA: Vector Add | GPU Threads, Memory Allocation, Memory Copy, GPU Kernels, Running Kernels | vector_add.cu |
GPGPU Programming with CUDA: Vector Add with Unified Memory | Unified Memory, Prefetching | vector_add_um.cu |
Video | Concepts | Files |
---|---|---|
GPGPU Programming with CUDA: Matrix Multiplication | 2-D Threadblocks, Alligned Memory Accesses | matrix_mul.cu |
GPGPU Programming with CUDA: Tiled Matrix Multiplication | Shared Memory, Cache Tiling, Performance Analysis, Optimization | tiled_matrix_mul.cu |
CUDA Crash Course: Why Coalescing Matters | Transposing Matrices, Coalescing Techniques | alignment_matrix_mul.cu |
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: cuBLAS for Vector Add | cuBLAS, SAXPY | simple_cublas.cu |
CUDA Crash Course: cuBLAS for Matrix Multiplication | Column-Major Order, SGEMM, cuRAND | cublas_matrix_mul.cu |
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: Sum Reduction Part 1 | Sum Reduction, Warp Divergence | sum_reduction_diverged.cu |
CUDA Crash Course: Sum Reduction Part 2 | Expensive Operations, Optimization, Warp Divergence | sum_reduction_bank_conflicts.cu |
CUDA Crash Course: Sum Reduction Part 3 | Optimization, Shared Memory Bank Conflicts | sum_reduction_no_conflicts.cu |
CUDA Crash Course: Sum Reduction Part 4 | Optimization, Idle Threads | sum_reduction_reduce_idle_threads.cu |
CUDA Crash Course: Sum Reduction Part 5 | Optimization, Device Function, Loop Unrolling | sum_reduction_device_function.cu |
CUDA Crash Course: Sum Reduction Part 6 | Cooperative Groups, Synchronization, Atomic Instructions | sum_reduction_cooperative_groups.cu |
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: Naive 1-D Convolution | 1-D Convolution | convolution.cu |
CUDA Crash Course: 1-D Convolution with Constant Memory | Constant Memory, Constant Cache | convolution.cu |
CUDA Crash Course: Tiled 1-D Convolution | Shared Memory, Tiling | convolution.cu |
CUDA Crash Course: 1-D Convolution Cache Simplification | Shared Memory, Tiling, Programmability | convolution.cu |
CUDA Crash Course: 2-D Convolution | 2-D Convolution, Multi-Dimensional Thread Blocks | convolution.cu |
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: Optimizing Histogram Kernels | Global Atomics, Shared Memory Atomics, Histograms, GNU Plot | histogram.cu histogram.cu |
Video | Concepts | Files |
---|---|---|
CUDA Crash Course: Video Corrections | TB Calculations, Verification | vector_add.cu matrix_mul.cu |