Compare
Sorry, something went wrong.
No results found
v1.1.0 — Multi-GPU runtime + VynGraph NSAI integration
Second release. Adds multi-GPU migration over NVLink P2P, per-tenant
K2K isolation, PROV-O provenance, hot rule reload, live introspection
streaming, and TLA+ formally verified protocols. Validated on 2× H100
NVL (Azure NC80adis_H100_v5, NV12 NVLink topology).
Headline results (2× NVIDIA H100 NVL, NVLink 12-link):
- NVLink P2P migration: 8.7× faster than host-staging at 16 MiB
- Multi-GPU K2K sustained bandwidth: 258 GB/s (81% of 318 GB/s peak)
- K2K tier hierarchy measured directly: SMEM 6.7us / DSMEM 9-15us /
HBM 10-18us (all three tiers via cluster_hbm_k2k kernel)
- Lifecycle rule overhead: 23 ns mean / 30 ns p99, flat across all
5 rules (Spawn/Activate/Quiesce/Terminate/Restart)
- Sustained throughput: 5.10M ops/s, CV 0.66% over 4× 60s trials
- Cross-tenant leak count: 0 across 13 isolation tests
- Formal verification: 6/6 TLA+ specs pass TLC, no counterexamples
Single-GPU v1.0 baseline preserved:
- 8,698× faster persistent actor injection vs cuLaunchKernel
- 3,005× faster than CUDA Graph replay
- 0.628 us cluster.sync() (2.98× vs grid.sync())
- 0.544 ns zero-copy serialization
New since v1.0.0:
- Multi-GPU runtime facade (cuCtxEnablePeerAccess + cuMemcpyPeerAsync)
- NVLink topology probe + PlacementHint::NvlinkPreferred
- 3-phase actor migration with CRC32 byte-for-byte verification
- PROV-O provenance header (8 relations, chain walk, signature hook)
- Multi-tenant K2K (per-tenant sub-brokers, AuditTag{org_id, engagement_id},
quota enforcement, cross-tenant rejection audit)
- Hot rule reload (CompiledRule artifact, version-monotonic, rollback)
- Live introspection streaming (EWMA, drop-tolerant ring)
- Six TLA+ specs + TLC model-checking pipeline
- HBM tier direct K2K measurement via cluster_hbm_k2k kernel
- Intra-block warp work stealing (warp_work_steal kernel)
- Delta checkpoints (content_digest, delta_from, applied_with_delta)
- cudarc 0.19.3 upgrade; RUSTUP_TOOLCHAIN stabilized to 1.95 in CI
See CHANGELOG.md for full detail and
docs/benchmarks/v1.1-2x-h100-results.md for the reproducible paper-
quality benchmark suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>