Add CUDA attention kernels, gradient norms, and CI improvements#69
Open
Eamon2009 wants to merge 31 commits into
Open
Add CUDA attention kernels, gradient norms, and CI improvements#69Eamon2009 wants to merge 31 commits into
Eamon2009 wants to merge 31 commits into
Conversation
* docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com>
- Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro.
- Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`.
…ker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options.
Added macOS binary build and release steps to CI workflow.
Removed dependency on build-macos-x64 for the release job.
Owner
Author
|
/run-checks |
|
✅ All checks passed! |
Co-Authored-By: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-Authored-By: Eamon Sippy <eamon112009@gmail.com>
Removed s390x build configurations and added a step to write detailed release notes.
Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser.
Owner
Author
|
/run-checks |
|
✅ All checks passed! |
Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.