MSLTM was improved#2
Open
emanuelbertey wants to merge 19 commits into
Open
Conversation
The MSLTM was improved, BEL tokenizer was added and stabilized, and text generation tests were added in 3 modes.
Update: xLSTM Hybrid Architecture and Character Emergence Core Stability: Stable execution of a 3-block hybrid (sLSTM-mLSTM-sLSTM) with state persistence. Successfully bypassed gradient instability in the mLSTM block. Metric Performance: Reached ~26% Accuracy using a 1024 BPE tokenizer. Structural Learning: The model demonstrates advanced formatting retention, including autonomous generation of new character identities (e.g., "KALINA") and consistent archaic linguistic suffixes, proving high-level morphological pattern recognition. Specs: Hidden size 256, 2 mLSTM Heads, Input Projection enabled.
paper anfitrion
Component State Fidelity
Memory [batch, heads, head_dim, head_dim] ✅ Matrix per head
Gates Scalars per head ✅
log_weights F[t] - F[k] + i[k] ✅
m_t stabilization max(max_k, log_initial_contrib) ✅
Numerator weights * qk * v with scaling 1/√d ✅ (practical improvement)
Denominator (n_parallel * q).sum_dim(3) with max(|·|,1) ✅
State update last_scale*C_0 + sum w*(v@k^T) ✅
Output gate After normalization ✅
Multi-head ✅ Yes , slstm: Gates: ĩ, f̃, z̃, õ i_log, f_log, z, o 166-169 ✅
m_t stabilization: m_t = max(f̃_t + m_{t-1}, ĩ_t) m_new = m_prev_plus_f.max_pair(i_log) 171 ✅
Stabilized gates: i_t = exp(ĩ_t - m_t) i_exp = (i_log - m_new).exp() 174 ✅
Stabilized gates: f_t = exp(f̃_t + m_{t-1} - m_t) f_exp = (m_prev_plus_f - m_new).exp() 175 ✅
Input content: z_t = tanh(z̃_t) z = z_gate.tanh() 168 ✅
Output gate: o_t = σ(õ_t) o = sigmoid(o_gate) 169 ✅
Cell update: c_t = f_t ⊙ c_{t-1} + i_t ⊙ z_t c_new = f_exp * cell + i_exp * z 178 ✅
Normalizer: n_t = f_t ⊙ n_{t-1} + i_t n_new = f_exp * normalizer + i_exp 179 ✅
Output: h_t = o_t ⊙ (c_t / n_t) h_new = o * (c_new / n_stable) 182 ✅
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The MSLTM was improved, BEL tokenizer was added and stabilized, and text generation tests were added in 3 modes.