Skip to content

Non-Record: TTSM — Typical Ternary State-Space Model, 2.0032 bpb#1999

Open
dd-dent wants to merge 1 commit intoopenai:mainfrom
dd-dent:submission/ttsm-ternary-ssm
Open

Non-Record: TTSM — Typical Ternary State-Space Model, 2.0032 bpb#1999
dd-dent wants to merge 1 commit intoopenai:mainfrom
dd-dent:submission/ttsm-ternary-ssm

Conversation

@dd-dent
Copy link
Copy Markdown

@dd-dent dd-dent commented Apr 30, 2026

TTSM: Typical Ternary State-Space Model

val_bpb: 2.0032 (seed 42)
Track: non-record, 10min/16MB
Artifact: 12,039,626 bytes
Params: 11M (7.8M ternary at 1.6 bits/param, 3.3M fp16/fp32 dynamics)
Hardware: 8×H100 SXM, 154 ms/step, 3889 steps in 600s

First ternary SSM submission. Mamba-1 selective SSM with B/C projections quantized to {-1,0,+1} via STE. Hidden state remains fp16 — protected from quantization errors at both write gate (B) and readout selector (C).

Key findings

  • State is protected: ternary boundary at B/C gates, not in hidden state. DeltaNet k_t is the harder problem. mradassaad (PR Non-record: Mamba-3 Hybrid + Multi-Epoch TTT + Dynamics-Protected Quant — 1.1456 bpb (3-seed mean) #1890) independently reached the same dynamics-protection boundary.
  • Reversed-scan backward: backward recurrence = forward scan reversed in time. Reuse forward Triton kernel. 15 lines, 31s→1.2s/step (26×). Generalizes to any reversible recurrence.
  • NS=5 > NS=10: less-orthogonal Muon step acts as diversity regularizer under ternary STE. 52× worse orthogonality, better val_bpb.
  • Overtraining degrades quality: phase transition at step ~3000, 600s budget is coincidentally near-optimal.
  • Frozen conv > trained conv: +0.07 bpb. Discovered after submission run; GPU availability precluded rerun.

Compliance

  • All seeds train in ≤600s
  • All artifacts ≤16,000,000 bytes
  • No SLOT, no pre-quant TTT, no n-gram cache

Attributions

See records/track_non_record_16mb/2026-04-30_TTSM_TernarySSM/README.md for full writeup.

First ternary SSM submission. Mamba-1 with B/C projections quantized
to {-1,0,+1}. State is protected: ternary boundary at gates, not
in hidden state. Reversed-scan Triton backward (26x speedup).
11M params, 12MB artifact, 8xH100 SXM.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@dd-dent dd-dent changed the title Submission: TTSM — Ternary Selective State-Space Model, 2.0032 bpb Non-record: TTSM — Ternary Selective State-Space Model, 2.0032 bpb Apr 30, 2026
@dd-dent dd-dent changed the title Non-record: TTSM — Ternary Selective State-Space Model, 2.0032 bpb Non-Record: TTSM — Ternary Selective State-Space Model, 2.0032 bpb Apr 30, 2026
@dd-dent dd-dent changed the title Non-Record: TTSM — Ternary Selective State-Space Model, 2.0032 bpb Non-Record: TTSM — Typical Ternary State-Space Model, 2.0032 bpb Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant