Skip to content

MSLTM was improved#2

Open
emanuelbertey wants to merge 19 commits into
thmasq:masterfrom
emanuelbertey:master
Open

MSLTM was improved#2
emanuelbertey wants to merge 19 commits into
thmasq:masterfrom
emanuelbertey:master

Conversation

@emanuelbertey

Copy link
Copy Markdown

The MSLTM was improved, BEL tokenizer was added and stabilized, and text generation tests were added in 3 modes.

emanuelbertey and others added 19 commits January 13, 2026 16:43
The MSLTM was improved, BEL tokenizer was added and stabilized, and text generation tests were added in 3 modes.
Update: xLSTM Hybrid Architecture and Character Emergence

Core Stability: Stable execution of a 3-block hybrid (sLSTM-mLSTM-sLSTM) with state persistence. Successfully bypassed gradient instability in the mLSTM block.

Metric Performance: Reached ~26% Accuracy using a 1024 BPE tokenizer.

Structural Learning: The model demonstrates advanced formatting retention, including autonomous generation of new character identities (e.g., "KALINA") and consistent archaic linguistic suffixes, proving high-level morphological pattern recognition.

Specs: Hidden size 256, 2 mLSTM Heads, Input Projection enabled.
xlstm
paper anfitrion
Component State Fidelity
Memory [batch, heads, head_dim, head_dim] ✅ Matrix per head
Gates Scalars per head ✅
log_weights F[t] - F[k] + i[k] ✅
m_t stabilization max(max_k, log_initial_contrib) ✅
Numerator weights * qk * v with scaling 1/√d ✅ (practical improvement)
Denominator (n_parallel * q).sum_dim(3) with max(|·|,1) ✅
State update last_scale*C_0 + sum w*(v@k^T) ✅
Output gate After normalization ✅
Multi-head ✅ Yes , slstm: Gates: ĩ, f̃, z̃, õ i_log, f_log, z, o 166-169 ✅
m_t stabilization: m_t = max(f̃_t + m_{t-1}, ĩ_t) m_new = m_prev_plus_f.max_pair(i_log) 171 ✅
Stabilized gates: i_t = exp(ĩ_t - m_t) i_exp = (i_log - m_new).exp() 174 ✅
Stabilized gates: f_t = exp(f̃_t + m_{t-1} - m_t) f_exp = (m_prev_plus_f - m_new).exp() 175 ✅
Input content: z_t = tanh(z̃_t) z = z_gate.tanh() 168 ✅
Output gate: o_t = σ(õ_t) o = sigmoid(o_gate) 169 ✅
Cell update: c_t = f_t ⊙ c_{t-1} + i_t ⊙ z_t c_new = f_exp * cell + i_exp * z 178 ✅
Normalizer: n_t = f_t ⊙ n_{t-1} + i_t n_new = f_exp * normalizer + i_exp 179 ✅
Output: h_t = o_t ⊙ (c_t / n_t) h_new = o * (c_new / n_stable) 182 ✅
hfggh
ff
 mlstm ,  mingru , model , block , slstm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant