Expand CUDA intrinsics coverage to 120+ operations#8
Merged
mivertowski merged 1 commit intoDec 11, 2025
Merged
Conversation
This commit significantly expands the CUDA codegen transpiler's intrinsic coverage from 45+ to 120+ GPU intrinsics across 13 categories: ## New Intrinsics Added: ### Synchronization (3 new) - sync_threads_count, sync_threads_and, sync_threads_or ### Atomic Operations (5 new) - atomic_and, atomic_or, atomic_xor, atomic_inc, atomic_dec ### Math Functions (8 new) - trunc, fmod, remainder, copysign, cbrt, hypot, plus warp_size ### Trigonometric (7 new) - asin, acos, atan, atan2, sincos, sinpi, cospi ### Hyperbolic (6 new) - sinh, cosh, tanh, asinh, acosh, atanh ### Exponential/Logarithmic (14 new) - exp2, exp10, expm1, log2, log10, log1p, ldexp, scalbn, ilogb - lgamma, tgamma, erf, erfc, erfinv, erfcinv ### Classification (8 new) - isnan, isinf, isfinite, isnormal, signbit, nextafter, fdim, nan ### Warp Operations (8 new) - warp_match_any, warp_match_all - warp_reduce_add/min/max/and/or/xor ### Bit Manipulation (8 new) - popc, clz, ctz, ffs, brev, byte_perm - funnel_shift_left, funnel_shift_right ### Memory Operations (3 new) - ldg, prefetch_l1, prefetch_l2 ### Special Functions (13 new) - rcp, fdividef, saturate - j0, j1, jn, y0, y1, yn (Bessel functions) - normcdf, normcdfinv, cyl_bessel_i0, cyl_bessel_i1 ### Clock/Timing (3 new) - clock, clock64, nanosleep ## 3D Stencil Support - Added pos.up(buf) and pos.down(buf) for 3D volumetric kernels - Full 3D at() offset support: pos.at(buf, dx, dy, dz) ## DSL Module - Comprehensive CPU fallback implementations for all intrinsics - 20+ new tests for DSL functions ## Testing - 171 total tests (up from 143) - New test coverage for all intrinsic categories - Tests for intrinsic flags, categories, and CUDA output ## Documentation - Updated README with complete intrinsic reference - Updated CLAUDE.md with new capabilities
mivertowski
added a commit
that referenced
this pull request
Dec 11, 2025
- Add CUDA Codegen Intrinsics Expansion section to CHANGELOG - Update README with 120+ intrinsics count and 3D stencil patterns - Update docs/13-cuda-codegen.md with complete intrinsics reference - Fix clippy excessive_precision warnings in dsl.rs erf() function - Format code with cargo fmt Changes from merged PR #8: - Expanded GPU intrinsics from ~45 to 120+ operations - Added 11 atomic operations (and, or, xor, inc, dec, etc.) - Added 3D stencil intrinsics (up, down, at with dz) - Added warp match/reduce operations (Volta+/SM 8.0+) - Added bit manipulation, memory, special, and timing ops - Updated tests from 143 to 171 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit significantly expands the CUDA codegen transpiler's intrinsic coverage from 45+ to 120+ GPU intrinsics across 13 categories:
New Intrinsics Added:
Synchronization (3 new)
Atomic Operations (5 new)
Math Functions (8 new)
Trigonometric (7 new)
Hyperbolic (6 new)
Exponential/Logarithmic (14 new)
Classification (8 new)
Warp Operations (8 new)
Bit Manipulation (8 new)
Memory Operations (3 new)
Special Functions (13 new)
Clock/Timing (3 new)
3D Stencil Support
DSL Module
Testing
Documentation