Starred repositories
Convert .ninja_log files to chrome's about:tracing format.
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Benchmarking unity builds on real c++ projects.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
A Python framework for high performance GPU simulation and graphics
Flax is a neural network library for JAX that is designed for flexibility.
Task-based datasets, preprocessing, and evaluation for sequence models.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A tool to classify and statistic GPU kernel information.
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
An efficient GPU resource sharing system with fine-grained control for Linux platforms.
An Aspiring Drop-In Replacement for NumPy at Scale
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
AddressSanitizer, ThreadSanitizer, MemorySanitizer
CUDA Python: Performance meets Productivity
Development repository for the Triton language and compiler
⚡ Dynamically generated stats for your github readmes
The Triton Inference Server provides an optimized cloud and edge inferencing solution.