20 Apr 20:29

v1.1.0

d410df1

v1.1.0 Latest

Latest

v1.1.0 — Multi-GPU runtime + VynGraph NSAI integration

Second release. Adds multi-GPU migration over NVLink P2P, per-tenant
K2K isolation, PROV-O provenance, hot rule reload, live introspection
streaming, and TLA+ formally verified protocols. Validated on 2× H100
NVL (Azure NC80adis_H100_v5, NV12 NVLink topology).

Headline results (2× NVIDIA H100 NVL, NVLink 12-link):
- NVLink P2P migration: 8.7× faster than host-staging at 16 MiB
- Multi-GPU K2K sustained bandwidth: 258 GB/s (81% of 318 GB/s peak)
- K2K tier hierarchy measured directly: SMEM 6.7us / DSMEM 9-15us /
  HBM 10-18us (all three tiers via cluster_hbm_k2k kernel)
- Lifecycle rule overhead: 23 ns mean / 30 ns p99, flat across all
  5 rules (Spawn/Activate/Quiesce/Terminate/Restart)
- Sustained throughput: 5.10M ops/s, CV 0.66% over 4× 60s trials
- Cross-tenant leak count: 0 across 13 isolation tests
- Formal verification: 6/6 TLA+ specs pass TLC, no counterexamples

Single-GPU v1.0 baseline preserved:
- 8,698× faster persistent actor injection vs cuLaunchKernel
- 3,005× faster than CUDA Graph replay
- 0.628 us cluster.sync() (2.98× vs grid.sync())
- 0.544 ns zero-copy serialization

New since v1.0.0:
- Multi-GPU runtime facade (cuCtxEnablePeerAccess + cuMemcpyPeerAsync)
- NVLink topology probe + PlacementHint::NvlinkPreferred
- 3-phase actor migration with CRC32 byte-for-byte verification
- PROV-O provenance header (8 relations, chain walk, signature hook)
- Multi-tenant K2K (per-tenant sub-brokers, AuditTag{org_id, engagement_id},
  quota enforcement, cross-tenant rejection audit)
- Hot rule reload (CompiledRule artifact, version-monotonic, rollback)
- Live introspection streaming (EWMA, drop-tolerant ring)
- Six TLA+ specs + TLC model-checking pipeline
- HBM tier direct K2K measurement via cluster_hbm_k2k kernel
- Intra-block warp work stealing (warp_work_steal kernel)
- Delta checkpoints (content_digest, delta_from, applied_with_delta)
- cudarc 0.19.3 upgrade; RUSTUP_TOOLCHAIN stabilized to 1.95 in CI

See CHANGELOG.md for full detail and
docs/benchmarks/v1.1-2x-h100-results.md for the reproducible paper-
quality benchmark suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Assets 6

06 Feb 22:13

mivertowski

v0.4.2

c963bed

v0.4.2: Warp-Shuffle Reductions, __nanosleep, libcu++ Atomics

What's New

This release upgrades the CUDA codegen with practical findings from CUDA hardware research, targeting CC 6.0+ GPUs with the existing cudarc 0.18.2 runtime.

Warp-Shuffle Block Reductions

Two-phase warp-shuffle reduction replaces tree reduction in all generated CUDA reduction code
Phase 1: Intra-warp __shfl_down_sync(0xFFFFFFFF, val, offset) — zero __syncthreads() calls
Phase 2: Cross-warp reduction via shared memory — one __syncthreads() call
Reduces barrier count from O(log N) to 1 per block reduction (e.g., 9 → 1 for 512-thread blocks)
Applied to: persistent FDTD energy reduction, standalone block/grid reduce helpers, and all inline reduction generators

`__nanosleep()` Power Efficiency

Persistent FDTD idle spin-wait now uses __nanosleep() instead of volatile counter loop
Software grid barrier spin-loop uses __nanosleep(100) to reduce power consumption
Configurable via PersistentFdtdConfig::with_idle_sleep(ns) (default: 1000ns)

libcu++ Ordered Atomics (opt-in)

Opt-in cuda::atomic_ref support for H2K/K2H queue operations and software barriers
Uses memory_order_acquire/memory_order_release instead of __threadfence_system() pairs
Software barrier uses cuda::thread_scope_device (narrower scope) with memory_order_acq_rel
Compile-time CUDA 11.0+ version guard
Enable via PersistentFdtdConfig::with_libcupp_atomics(true)

Files Changed

crates/ringkernel-cuda-codegen/src/persistent_fdtd.rs — config fields, nanosleep, warp-shuffle reduction, libcu++ atomics
crates/ringkernel-cuda-codegen/src/reduction_intrinsics.rs — warp-shuffle upgrade for all reduction helpers

Test Results

215 codegen unit tests + 12 integration tests — all passing
6 CUDA GPU execution tests — verified on RTX 2000 Ada (CC 8.9)
Full workspace — zero failures

Full Changelog: v0.4.1...v0.4.2

Assets 2

06 Feb 21:19

mivertowski

v0.4.1

d91794c

v0.4.1

What's New

Property-Based Testing

13 proptest property tests for queue invariants (FIFO ordering, capacity bounds, stats consistency) and HLC properties (total ordering, causality preservation, pack/unpack round-trip)

Ecosystem Feature Bundles

web = axum + tower + grpc
data = arrow + polars
monitoring = tracing-integration + prometheus

Codebase Consolidation

Shared DSL marker functions — 27 functions deduplicated across CUDA and WGSL codegen backends (~300 lines removed)
unavailable_backend! macro — single macro replaces triplicated backend stubs (~100 lines removed)
Structured logging — replaced eprintln! with tracing macros across 6 crates
Unsafe documentation — // SAFETY: comments on all ~80+ unsafe blocks in GPU code
Hot-path #[inline] — queue operations, HLC timestamps, control block accessors

Bug Fixes

Tenant suspension now correctly deactivates tenants (was a no-op)
Handler registration returns Result instead of panicking on duplicate ID
TLS session resumption stores actual session ticket data
CloudWatch audit sink returns explicit error instead of silently dropping events

Security Upgrades

jsonwebtoken 9.2 → 10.3.0 (type confusion auth bypass)
pyo3 0.22 → 0.24.2 (buffer overflow in PyString)
iced 0.13 → 0.14.0 (fixes lru Stacked Borrows violation)
bytes 1.11.0 → 1.11.1 (integer overflow in BytesMut)
time 0.3.44 → 0.3.47 (stack exhaustion DoS)

Stats

1,416 tests passing, 0 failures, 96 GPU-only ignored
Zero clippy warnings
Net -224 lines of code (consolidation)

Install

[dependencies]
ringkernel = "0.4.1"

Full Changelog: v0.4.0...v0.4.1

Assets 2

25 Jan 21:23

mivertowski

v0.4.0

f42a6c2

v0.4.0: GPU Infrastructure Generalization & Python Bindings

Highlights

This release extracts ~7,000+ lines of proven GPU infrastructure from RustGraph into RingKernel, making these capabilities available to all RingKernel users.

New: Python Bindings (`ringkernel-python`)

PyO3-based Python wrapper with full async/await support:

import ringkernel
import asyncio

async def main():
    runtime = await ringkernel.RingKernel.create(backend="cpu")
    kernel = await runtime.launch("processor", ringkernel.LaunchOptions())
    await kernel.terminate()
    await runtime.shutdown()

asyncio.run(main())

Features:

Async/await with sync fallbacks
HLC timestamps and K2K messaging
CUDA device enumeration and GPU memory pool management
Benchmark framework with regression detection
Hybrid CPU/GPU dispatcher with adaptive thresholds
Resource guard for memory limit enforcement
Type stubs for IDE support

New: PTX Compilation Cache

Disk-based PTX caching for faster kernel loading with SHA-256 content hashing and compute capability awareness.

New: GPU Stratified Memory Pool

Size-stratified GPU VRAM pool with 6 size classes (256B-256KB), O(1) allocation from free lists.

New: Multi-Stream Execution Manager

Multi-stream CUDA execution for compute/transfer overlap with event-based synchronization.

New: Benchmark Framework

Comprehensive benchmarking with regression detection, baseline comparison, and multiple report formats (Markdown, JSON, LaTeX).

New: Hybrid CPU-GPU Dispatcher

Intelligent workload routing with adaptive threshold learning between CPU and GPU execution.

New: Resource Guard

Memory limit enforcement with safety margins and RAII reservation patterns.

New: Kernel Mode Selector

Intelligent kernel launch configuration based on workload profile and GPU architecture.

See CHANGELOG.md for full details.

Assets 2

21 Jan 09:54

mivertowski

v0.3.2

0481bf5

v0.3.2: GPU Profiling Infrastructure

What's New

GPU Profiling Infrastructure

CUDA event-based timing and NVTX markers
Memory allocation tracking
Chrome trace export for visualization

Publishing Fixes

Fixed publish script to add User-Agent header for crates.io API
Updated dependency versions across all crates for v0.3.2 publishing
ringkernel-ir, ringkernel-graph, ringkernel-montecarlo now use workspace versions

Crates Published

ringkernel-core, ringkernel-cuda-codegen, ringkernel-wgpu-codegen
ringkernel-derive, ringkernel-cpu, ringkernel-cuda, ringkernel-wgpu, ringkernel-metal
ringkernel-codegen, ringkernel-ecosystem, ringkernel-audio-fft
ringkernel (main crate)

See crates.io/crates/ringkernel for the published crates.

Assets 2

19 Jan 20:16

mivertowski

v0.3.1

e92adeb

v0.3.1: Enterprise Readiness

RingKernel v0.3.1: Enterprise Readiness

This release adds comprehensive enterprise-grade features for production deployments.

🔐 Enterprise Security

Real Cryptography: AES-256-GCM, ChaCha20-Poly1305, Argon2 key derivation
Secrets Management: SecretStore trait with key rotation, caching, and chained stores
K2K Message Encryption: Kernel-to-kernel encryption with forward secrecy
TLS/mTLS Support: Full TLS with rustls, certificate rotation, SNI resolution

🔑 Authentication & Authorization

Authentication Providers: ApiKeyAuth, JwtAuth (RS256/HS256), ChainedAuthProvider
RBAC: Role-based access control with deny-by-default PolicyEvaluator
Multi-tenancy: TenantContext, ResourceQuota, usage tracking

📊 Observability

OpenTelemetry: OTLP export to Jaeger, Honeycomb, Datadog, Grafana Cloud
Structured Logging: Multi-sink logger with trace correlation (JSON/Text)
Alert Routing: Severity-based routing with deduplication (Slack, Teams, PagerDuty)
Remote Audit Sinks: Syslog, CloudWatch Logs, Elasticsearch

⚡ Rate Limiting

Algorithms: TokenBucket, SlidingWindow, LeakyBucket
Builder API: Fluent configuration with RateLimiterBuilder
Distributed: SharedRateLimiter for multi-instance deployments

🔧 Operational Excellence

Automatic Recovery: Configurable policies per failure type (Restart, Migrate, Checkpoint, Notify, Escalate, Circuit)
Operation Timeouts: Deadline propagation with Timeout and Deadline types
Recovery Manager: Retry tracking, cooldown periods, automatic escalation

📦 Feature Flags

[dependencies]
ringkernel-core = { version = "0.3.1", features = ["enterprise"] }

# Or select specific features:
ringkernel-core = { version = "0.3.1", features = ["crypto", "auth", "tls", "rate-limiting", "alerting"] }

📈 Metrics

Test Coverage: 900+ tests (up from 825+)
Crates Published: 21 crates to crates.io

🚀 Quick Start

use ringkernel_core::prelude::*;

// Enterprise runtime with production preset
let runtime = RuntimeBuilder::new()
    .production()
    .build()?;

// API key authentication
let auth = ApiKeyAuth::new()
    .add_key("sk-prod-abc123", Identity::new("service-a"));

// Rate limiting
let limiter = RateLimiterBuilder::new()
    .algorithm(RateLimitAlgorithm::TokenBucket)
    .rate(1000)
    .burst(100)
    .build();

Full Changelog

See CHANGELOG.md for complete details.

Assets 2

19 Jan 09:34

mivertowski

v0.3.0

8d81824

v0.3.0: Multi-Kernel Dispatch, Memory Pools, Global Reductions

RingKernel v0.3.0

GPU-native persistent actor model framework for Rust. This release adds multi-kernel dispatch, memory pools, global reduction primitives, and two new crates.

Highlights

21 crates published to crates.io - Full workspace now available
825+ tests across the workspace
cudarc 0.18.2 and wgpu 27.0 support

New Features

Multi-Kernel Dispatch and Persistent Message Routing

#[derive(PersistentMessage)] macro for GPU kernel dispatch
KernelDispatcher component with builder pattern and metrics
CUDA handler dispatch code generator (CudaDispatchTable)
Queue tiering system (QueueTier, QueueFactory, QueueMonitor)

Memory Pool Management

StratifiedMemoryPool with 5 size buckets (256B to 64KB)
AnalyticsContext for grouped buffer lifecycle
PressureHandler for memory pressure monitoring
CUDA ReductionBufferCache and WebGPU StagingBufferPool

Global Reduction Primitives

ReductionOp enum: Sum, Min, Max, And, Or, Xor, Product
ReductionBuffer<T> using mapped memory (zero-copy host read)
Multi-phase kernel execution with SyncMode (Cooperative, SoftwareBarrier, MultiLaunch)
PageRank example with dangling node handling

CUDA NVRTC Compilation

compile_ptx() function for runtime CUDA compilation
Downstream crates can compile CUDA without direct cudarc dependency

Domain System

20 business domains with reserved type ID ranges
#[message(domain = "FraudDetection")] attribute
Domains: GraphAnalytics, FraudDetection, ProcessIntelligence, Banking, etc.

New Crates

ringkernel-montecarlo - Philox RNG, antithetic variates, control variates, importance sampling
ringkernel-graph - CSR matrix, BFS, SCC (Tarjan/Kosaraju), Union-Find, SpMV

Breaking Changes

cudarc API updated to 0.18.2 (module loading, kernel launch builder pattern)
wgpu API updated to 27.0 (Arc-based resources)

Installation

[dependencies]
ringkernel = "0.3.0"

# Optional backends
ringkernel-cuda = "0.3.0"
ringkernel-wgpu = "0.3.0"

Documentation

Full Changelog: v0.2.0...v0.3.0

Assets 2

14 Jan 16:48

mivertowski

v0.2.0

210073c

RingKernel v0.2.0

What's Changed

Claude/persistent kernel implementation d nc3 o by @mivertowski in #9

Full Changelog: v0.1.3...v0.2.0

Contributors

mivertowski

Assets 2

17 Dec 14:18

mivertowski

v0.1.3

01edfbf

v0.1.3 - Dependency Updates & CI Fixes

Highlights

wgpu 27.0 - Major update with Arc-based resource tracking (~40% performance improvement in some workloads)
Dependency updates - tokio 1.48, axum 0.8, tonic 0.14, egui 0.31, winit 0.30
CI/CD fixes - Workspace builds without CUDA/nvcc installed

What's Changed

Dependencies Updated

Package	From	To
wgpu	0.19	27.0
tokio	1.35	1.48
thiserror	1.0	2.0
axum	0.7	0.8
tower	0.4	0.5
tonic	0.11	0.14
prost	0.12	0.14
egui/egui-wgpu/egui-winit	0.27	0.31
winit	0.29	0.30
glam	0.27	0.29
metal	0.27	0.31
arrow	52	54
polars	0.39	0.46
rayon	1.10	1.11
actix-rt	2.9	2.10

Deferred Updates

iced: Kept at 0.13 (0.14 requires major application API rewrite)
rkyv: Kept at 0.7 (0.8 has incompatible data format)

CI/CD Improvements

CUDA features are now opt-in (not default)
Workspace builds succeed without nvcc installed
Feature-gated CUDA tests with #[cfg(feature = "cuda")]

See CHANGELOG.md for full details.

Assets 2

11 Dec 09:55

github-actions

v0.1.2

581c539

v0.1.2

Release v0.1.2

- **WaveSim3D** - 3D acoustic wave simulation with realistic physics
  - Full 3D FDTD wave propagation solver
  - Binaural audio rendering with HRTF support
  - Volumetric ray marching visualization
  - GPU-native actor system for distributed simulation

- Expanded GPU intrinsics from ~45 to 120+ operations across 13 categories
- Atomic operations: and, or, xor, inc, dec
- 3D stencil intrinsics: up, down, at(dx, dy, dz)
- Warp match/reduce operations (Volta+/SM 8.0+)
- Bit manipulation, memory, special, and timing ops
- 171 tests (up from 143)

- Added required-features to CUDA-only wavesim binaries
- Updated GitHub Actions release workflow

See CHANGELOG.md for full details.

Assets 6

Uh oh!

Releases: mivertowski/RustCompute

v1.1.0

Uh oh!

v0.4.2: Warp-Shuffle Reductions, __nanosleep, libcu++ Atomics

What's New

Warp-Shuffle Block Reductions

__nanosleep() Power Efficiency

libcu++ Ordered Atomics (opt-in)

Files Changed

Test Results

Uh oh!

v0.4.1

What's New

Property-Based Testing

Ecosystem Feature Bundles

Codebase Consolidation

Bug Fixes

Security Upgrades

Stats

Install

Uh oh!

v0.4.0: GPU Infrastructure Generalization & Python Bindings

Highlights

New: Python Bindings (ringkernel-python)

New: PTX Compilation Cache

New: GPU Stratified Memory Pool

New: Multi-Stream Execution Manager

New: Benchmark Framework

New: Hybrid CPU-GPU Dispatcher

New: Resource Guard

New: Kernel Mode Selector

Uh oh!

v0.3.2: GPU Profiling Infrastructure

What's New

GPU Profiling Infrastructure

Publishing Fixes

Crates Published

Uh oh!

v0.3.1: Enterprise Readiness

RingKernel v0.3.1: Enterprise Readiness

🔐 Enterprise Security

🔑 Authentication & Authorization

📊 Observability

⚡ Rate Limiting

🔧 Operational Excellence

📦 Feature Flags

📈 Metrics

🚀 Quick Start

Full Changelog

Uh oh!

v0.3.0: Multi-Kernel Dispatch, Memory Pools, Global Reductions

RingKernel v0.3.0

Highlights

New Features

Multi-Kernel Dispatch and Persistent Message Routing

Memory Pool Management

Global Reduction Primitives

CUDA NVRTC Compilation

Domain System

New Crates

Breaking Changes

Installation

Documentation

Uh oh!

RingKernel v0.2.0

What's Changed

Contributors

Uh oh!

v0.1.3 - Dependency Updates & CI Fixes

Highlights

What's Changed

Dependencies Updated

Deferred Updates

CI/CD Improvements

Uh oh!

v0.1.2

Uh oh!

`__nanosleep()` Power Efficiency

New: Python Bindings (`ringkernel-python`)