Skip to content

[Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB#1757

Open
aiejvn wants to merge 1 commit intoopenai:mainfrom
aiejvn:submission/mamba-hnet
Open

[Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB#1757
aiejvn wants to merge 1 commit intoopenai:mainfrom
aiejvn:submission/mamba-hnet

Conversation

@aiejvn
Copy link
Copy Markdown

@aiejvn aiejvn commented Apr 20, 2026

Investigates the effect of MAMBA outer-layer count on H-Net training stability within a fixed 9-layer budget. Increasing from OL1 to OL2 (2 enc + 5 main + 2 dec) caused catastrophic training collapse at step 2000 — val BPB spiked from 1.81 to 2.53 in a single validation window — while also running 2.2× slower per step (~9.2s vs ~4.3s). Reverting to OL1 (1 enc + 7 main + 1 dec) stabilized training across all 20k steps, converging to 1.4770 float BPB and 1.5194 INT6 sliding-window BPB. Val BPB was still decreasing at the 20k cutoff, suggesting longer runs would improve further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant