Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb

Non-record submission. See PR description for full details.

## Run

```bash
# Install Mamba-3
bash setup_mamba3.sh

# Generate SP8192 data (~35 min)
cd data && python3 download_hf_docs_and_tokenize.py \
--output-root . --tokenizer-config tokenizer_specs_8192.json --skip-byte

# Train + eval
VOCAB_SIZE=8192 NUM_LAYERS=7 NUM_ATTN_LAYERS=2 USE_BIGRAM_HASH=0 TRAIN_SEQ_LEN=4096 \
WARMDOWN_ITERS=2600 WARMDOWN_SHAPE=linear MUON_EQ_R=1 \
LATE_QAT_THRESHOLD=0.15 USE_GPTQ=1 QUANT_BITS=6 QUANT_BITS_EMBED=8 GPTQ_NUM_SEQS=32 \
EVAL_OVERLAP=1024 USE_LZMA=1 EVAL_TEMP=0.9 \
WEIGHT_DECAY=0.04 MUON_MOMENTUM=0.99 MATRIX_LR=0.025 \
torchrun --nproc_per_node=8 train_mamba3_hybrid.py
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
torch>=2.9.1
triton>=3.5.0
mamba-ssm>=2.3.1
sentencepiece
einops
numpy
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"author": "mradassaad",
"github_id": "mradassaad",
"name": "Mamba-3 Hybrid SSM + SP8192 + Legal TTT",
"blurb": "7L Mamba-3 SISO hybrid (5 SSM + 2 attn), SP8192, 25.2M params. AR GPTQ with INT8 embed + embed Hessian. Chunk score-first TTT (SGD lr=0.010). Stateful-overlap eval.",
"date": "2026-04-15",
"val_loss": 2.96361204,
"val_bpb": 1.14730259,
"bytes_total": 15930354,
"bytes_code": 104754
}
Loading