diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/README.md b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/README.md new file mode 100644 index 0000000000..b46151d951 --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/README.md @@ -0,0 +1,54 @@ +# Record: SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Fused MLP — val_bpb 1.0777 (3-seed mean) + +**val_bpb = 1.0777** (3-seed mean, std 0.0003) | **~15.99 MB** | 8xH100 SXM + +## 3-Seed Results + +| Seed | **LoRA TTT BPB** | val_loss (nats) | Artifact | +|------|------------------|-----------------|----------| +| 42 | **1.0775** | 2.7834 | 15,991,008 | +| 314 | **1.0776** | 2.7834 | 15,993,539 | +| 999 | **1.0780** | 2.7845 | 15,991,008 | +| **Mean** | **1.0777** | **2.7838** | | + +Merged SOTA (PR #1493): **1.0810 BPB / 2.7920 nats**. Delta: **-0.0082 nats**. Clears 0.005 threshold by 0.0032. + +## Novel Contribution: Fused Triton MLP via importlib Wrapper + +This submission integrates the Triton TMA fused MLP kernel from PR #1523 into PR #1536's VarLen + LoRA TTT stack using a novel **importlib-based code loader**. Triton's @jit requires source files accessible via `inspect.getsourcelines()`, which fails with standard `exec()` from compressed wrappers. Our solution: + +```python +_s = importlib.util.spec_from_file_location('__main__', temp_file) +_m = importlib.util.module_from_spec(_s) +_m.__name__ = '__main__' +_s.loader.exec_module(_m) +``` + +This writes the decompressed code to a temp file and loads it as a proper Python module, enabling Triton JIT compilation while keeping the submission wrapper at ~30KB. + +## Full Stack + +1. **VarLen Attention** — Flash Attention 3 `flash_attn_varlen_func` with document boundaries (PR #1536 @dexhunter, PR #1530 @samacqua) +2. **Doc-Independent LoRA TTT** — LoRA adapters (rank 96) trained per-document during score-first eval, no inter-document dependence (PR #1536 @dexhunter) +3. **Fused Triton TMA MLP** — `fc → LeakyReLU(0.5) → square` in one kernel, +5% throughput (PR #1523 @abaybektursun) +4. **Triple Depth Recurrence** (L3-5, 17 virtual layers) + **Parallel Residuals** (L7+) +5. **Parameter Banking** + **Muon 0.97** + **QK-Gain 5.25** + **SDClip** + **Brotli** + +## Compliance (Track B — Score-First LoRA TTT) + +- LoRA TTT: each document scored BEFORE LoRA weight update. No inter-document dependence. +- No SLOT, no hash embedding, no pre-quant TTT, no n-gram cache, no ETLB +- All four conditions from Issue #1017 satisfied +- All artifacts < 16MB, train < 600s, eval < 600s + +## Reproduction + +```bash +pip install brotli python-minifier +MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --skip-manifest +SEED=42 torchrun --standalone --nproc_per_node=8 train_gpt.py +``` + +## Credits + +PR #1536 @dexhunter (VarLen + LoRA TTT base), PR #1523 @abaybektursun (fused Triton MLP + banking), PR #1530 @samacqua (VarLen concept), PR #1394 @clarkkev (SP8192 + SDClip), PR #1493 @bigbag (merged #1 hyperparameters) diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/submission.json b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/submission.json new file mode 100644 index 0000000000..b2243c2c76 --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/submission.json @@ -0,0 +1 @@ +{"author":"aryanbhosale","github_id":"aryanbhosale","name":"SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Fused MLP + Triple Recurrence + Parallel Residuals","date":"2026-04-11","track":"10min_16mb","val_bpb":1.07769581,"val_bpb_std":0.00025118,"seeds":[42,314,999],"seed_results":{"42":{"val_bpb":1.07754544,"val_loss":2.78341179,"artifact_bytes":15991008},"314":{"val_bpb":1.07755689,"val_loss":2.78344135,"artifact_bytes":15993539},"999":{"val_bpb":1.07798510,"val_loss":2.78454746,"artifact_bytes":15991008}},"hardware":"8xH100 80GB SXM","pytorch_version":"2.9.1+cu128","technique_summary":"SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Fused Triton MLP (importlib wrapper) + Triple Recurrence (L3-5) + Parallel Residuals (L7+) + Muon 0.97 + QK-Gain 5.25 + SDClip + Brotli"} diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_gpt.py b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_gpt.py new file mode 100644 index 0000000000..a3ffbce832 --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_gpt.py @@ -0,0 +1,14 @@ +import lzma as L,base64 as B,importlib.util as IU,os,sys,fcntl +_c=L.decompress(B.b85decode("{Wp48S^xk9=GL@E0stWa8~^|S5YJf5;lx~8@?8Kln@VT6Qap3Y@@2YR==kt^V3b|z;4_%p4p-u~Q2v~X2+~vf(+18k!VGi-te#rHhO<(rinAzxYrg~~QZ7*9X?;Iq}Z_LW|y7TL&VA+w*HJF8+d(x^`on(EEI8AxJ`9@%CIQ+*n#Ev)%TL=J)M1$4gXanYH?YGbjKEZ4W7r4pR1e(HfkH&nT!6DC7lPUHu)tn%K6?ATwT{Y)24wUn3gT2DIlcIL2a^LAMPQ}A*}Hu6Xm-O66MhR4e5spc0G<2XSF{k&7nIrK>TfSE!!vBLP0!rbmA~D^3iUQeaMm95NN#M){yK@VMw-og(j-C54KVC`)V7DMDl5hfg6W`{h<<)1lh%}|0E-t*a<%Z~HiPhR!DEI>6jpn(fbr!1P`@cHAc7C;2hB0e5in+3fFL6kc~+d219diY(Awy>zA5W;O$DtIVtDH77aer@-fZ&~FiL<>kehUoH|BcZA$M#rCyO1DN;pOb44IwQbUX&5w3)>6(p?_Htj;@=&l(3A9{uGV9RF(4AkZ2$w$xiD?PFLi=GJduHKv*9eke0ez!7>>xTuPi_5|?cM~{p=>NX@&B~YHk-Xkh3(VGn~+FnA_6?JvTdQ)aGUh5o5rX@H%7=`$KpE-l&9*zflPQmJ@)=*k#2UQA0wx{-RBE)bnI7Bh8$#nSEP+}i0U8l=uaPw4eZyUhty-7reod)?1`7W9L-H*P))ezQQeJeE;F4R?t;vJcB*h0-P2kAg%^b)4g7{iG$rp0D2&ms6LW&Ch}9@%*P9C}k@2{A3H+Tz&8#sodiZ3WDK_mKHO$)P*8{`tPH)c}V-@1p|QIb&1vk-5{yoU|xk&YBZ0l}bPm-X#k0klw#x1VTfL;GgOV^>H`S^be94P7!O*ws_Z_3zVC680@$gV~K|DZge&UoAY2z{hvK>sO=rb|+LAx5@a!4M)Pyj6*E^fCnf2ifVC|mmB>CCM*Q^we8T1A&J<#;B+IJ2+}}fiXcBC<U&kIS?{7f|Uxv$f9+O;#`>MM_NUDgXf%j-vf4g;$+_QgCHG=YKtaf}`K`*3lEUer#kzTo&t(-t-WSdG|HxdSIjiYXE-~O$m}TE_mX-X8(YVcQ%p{QamT&N`923CY&54W>73?Ig7+ov8McUTg~R|_B|1hm&CrzJsMJ1UXB5u)G3M{_auRx8ZP76#a{z$V%buS+nYP7^*TfVa2~5nB!dWjxo$1m@pr2$U2O04kcb$UZzQyVnrx&=5E9AW)-S#BrCvV$nL?tnwHDw@6Wy6SGxaYd1>eKv_O_r~;P^YDOHU`EOUcSa6YOlB;U0JZO8)#oJej&~uXCpKTs>fyAW=zZ;%i{zw6@&d1ZXRA;XoILC+{uUn-3AXC=5lm9#(crqR~vk)0|otMWn(!4x8w5D77zjN9YcsM%}>fa)XFM(wZO(K1oppD803iMv%G9uR_Q0r#QD1Ag8~EmrA{MpD@3$w$cL>(5+as!iR_ZcmyvJzH>_V|+4pb&|`;Q**w1u>DVZEWD@&?lEKEu7LjumiP=wIID)wn`5NGmA8^pzsagwuRO0hp_wK>>{sOz?4OvJ^OQ*bN$aue8-~h%T!=((_x$q&Ew^+!04bJi*3U_K=Su1qJ5Y&$-CXtw|c2=A~DTT}M4@MIqo)xB@ooYJV2rT94gOukzC;rj^2dmo{fePGwu#xN*8K90L;ySxc7O&@I#Q9T1wEFQ$qMGSAD?P~72iewWgG5qF&3*Q`RHLJK0iy}*-lezl6tOYC19msCGprXU0x?0%EOl5}%(E|T_Mqnb%JToQ@nDBRMHULylgU|5ns!E+vvDm3Mrm;rp_-Yk6P1?9XLIP{&S4DSRuq`tYi^IcS)wHY96jAUyXMBApBE|+|u-Y=`Txcr+;4wtN^BZx@%r~Wx#dCNxh0uDCdJF;$oy)eO{Sh9+-MK+3WBntLrH9CyC8tdyZ*uWjw|_u;EO}+;DM1O(qKTW8$x^f0-QUbl*6N-$^E7-TR_D$w30>TCJ^7A-MhlaGx*`9;yo(qE#P}c96Zt#ukFI0XAWVr&o?{ia?}_gvjiScJc1R5yaxMCpgRoQ1WvL}ftn;*JsEfGj5d(9e;#*~pf#N0LZeU?Eo6sHOUM$NRPZvnyP7RaFo>{(UpX8IPT++6aliPIV(;zHqN#bx;XxJH=@18G&<9*@VkksXB{EX<-EG5`IXGw7r7RRdv8g??=4ipJ%YPN!GOnRJ{0u$&eWAzJFDLL$WQ}QTbox^vu*uM!Ldg~op;Hbz!Df)+w^NPtEn&ic%fkYScM7Otso05gnJeL|%JY$>CwH=IW$^JkT$;3{79nuLwD=RqyQsAuaV!4my4LridlO%i$-dAkN>?DyhV_!{3?;3__najJfP<ktfu%C9h>)y7T~?Ru)AWxQ@hO_Az1P{u1PZ3pfYtrTb`W}voP?8fq{Q9_F52#EKPm~`;tOCvU`I)VceCVWvNJS%qXb451EVkk>1kM=z~aD1qDstiQqdMbXsF&n>b<3VK3cU?7>7?ZHg;04q#CT(oY2Uw;LeK}*y-FLLH^5?%MwaV;wiG4Aw)>1$>y0jK|+6Q(oF+acd5A_F0(ODe@)Fvmm;0|<0l-J!1pl4E*L99X{B#TRDFZ2u!-o!p()ZYc~;&%*gwlzsv2`ykc7igI_5TQIo3^7Og4peaGD!F}@!fFSKWF(o8=wm^r>I~1$dJN<|_+bJ7K$T6}jZ6ho{a2q$7_28H4eHaA2EhKd#knbpPEthTF(-=(1dJNq&!^6SPdQH^ghbW;G~BAmqnv9;`sp9BEYUc5N%>ynw*@{XK+a82YY=92o{Y+MQ+rXworEUcNa011H)FCsh$1MND;}6ahgny#dYfdb2H$hop`J$L8`#sPhF9%5eEu{_q95&a->Wc=N)f89$Ug*1kfac_Wg5clKaWP>&v1KmaoX=HW*oKqry7J8lXw7gn0YLO=vewXkJc36axJPl(*d{Yq1)zi7gr?S+||Ggqx||14Cv9Ww}D;Hl1PUdG93Qnr#75NpNlyf8c+gaSJ`>&GukXD`&E5A#)omwu89RJUtgYZT_*EzWtSfVAkh{9do2WIx_FXWXWo%-`=hg2QX#fBJr{^0ss%j9NX~qcN0DNUP=w8D@^;>``AE}L`5{^o}9YR#kpRA6TwKiBN6aziWPE)=nae8^D07bW@u}Rq44NnHhcmwMC;?^zeJJ6B?aefv#GaJhy$}k$g||g`knBz}rQtuC2|?_ejyh3YT&YmLLy5StKbWpn-V#m?|=f{f;!vojCkKUFh`XpiY!MZAh{ELo_zUXQJ%ZW57obx@jMoY^PcgA%{WCvhBbei{|P?*ou6#A!MjV5(@I0As@F>$)j9BNAWy=w$xhq57m9sIimZ~3282SiRM7<4-ersZU4hLL^HtDqolB>OR=Ls+KH~VH(H@TUFts`SNejF0f#|xgK|K&j+N`l(wL1q#JUWT&iA(Vp5VyLPS68OY<7H_usyHWyrJddbyiR4A`z@2tKh~OY)|?RT+`~A(7#T12cveho*^_zd%mtD@A5VmZB^>JJ}B@gZB&qmq10CqmGF~i$YYz*5xDi}l`^wQtO^ui7b|l@UUZ0S=E?8en&ieXo_blS53{v%kxbKVzO3OI?*WbI(Y8`qx2CJDW;c(PM6Bxi=_f9bx9Ok5p!fS`Evuk%v4zAJ=T4YVI@w#PkbmGklA!NfLtj0_6|NCdkS8~_JqNaD@uGd%bAIk)QUYjp3hI3aTd<{Qa^Q_yB?oYU~{>N8J3YeIDfv*(U5J6E~D;5hz-`!jdOLnNw8)fWOMQB_vEo|IJI52%>j%|l{;zTm-@4eajlD+&uFL0dPN(Qf3LGCfbk@7+O&%Ip~=B=~2udn@O`))$3Ch|tp^NqcDbsh=FMIhUKvii?0oU(wL`Mgu{lYOs%)f&LakdlGIO^4BTT0Vrnsq-PWT=a@P)R$0Zwq%p3@-~&1}igC`MFq{H!ejcg6daBVla}nf@m{t>(vs{`<tvPwt)Ep(*8jclfO+=LchCaGzQtQxhpqeMurL1sPA6ho>gcNo-4HYUBGE=TK8y*(%t=GiWYt4{=a)E+PKG3|P3NoeuFM@%;sokmY6wEUC)FeP6Ay^Q;w&Z7LyyoAzr5bx$?2Z*vZb}W!Cf};sptW!&O6kct}qNTsmj%Uq$BDLS8@;y$u^v>f1+MAa}ihOF|jby18R|!DAV$46@1GpG`dIldJ|Qzy=r{wq1THv_6q50e8HHmyw=QI#oYuoM%{I&nLnmsS;}RuaS&*TI>?7)pg2x-@qjYq_L8iYZ2AGbMQ9&JqUAJvvmDkvjGbNhrx0zbuitd@pBSjfg>QA>7d4ZhFgj#J?&damZhE=w(T=b*(e?`)OLdY=z}Ozg^PdEm)=zX+KB~j%8L*EFt}z2Z;215WyAcGf-Q4qukSy?e~~nL39z6v-8w3l9b}wK$dE6PRuSNk0vkfE#(m%nI1}e>dFKwvcMY|Do(=P>V1MpZc@T@jS}x$dIU7|s&9A?U3_QU2;G}YdGju%mR~wjggJ*xNM@m{s+;B$Hb)`(V{_jH*pO3pm?uvNMWD(k7H&4<=ppZ;Gk$OxuOsbMQK9CJVfjE?u@6CoeL)9x^O26Nk{*#A?({Fi8V^EXc2gf0tMSaelW4cBzp^3=}Uo03eHb-XzykK=nCu~{q%;~&pfaEh{&prY9Y=FLdwJs2pj>VaSKl1@-DVSx(JRpfz>xr6VWqN_!(;F=3nCXX~W-)&v)N-(tCP)H?IUD5L_0>G1NRk%V_8N^%cNHQRd@2Ig>*a{r9FL&XBJ62mq|&9Js6yC+_P5^$1w`!qcdM-g!b;FKYo6(rP}d1_>GFpR|A`mHU#2B=!ya;RU|y~mJOzRDx}%X00fi+i>M|VVr#40cTFkE!lQ8r6eG#z?KO@MiW9yiOAQSRxUt<51Z}=2|xiI($0CFDAjm8x}yMi31y{6FSK}*F&?9_nOwAMAeR3^$C@3{*&9}IQ~p~S&~*q`U7Yc_HVwePG0a=URpJhPU>=v}6@a`5XjQ&Jn1IKCUYd#?!ng-thDH?e^O>QMJfg>9>GCW>-B_fb^k$cesLXu-iLmF3_XwVw!BQ`f&Wi{VOrzAU&8O=qg=O^u@og`-{RHquzkzO-pBH_MxRb7=mLrYw=`6UOGTkI%(b`N%Hdiie%)4bxqs2zoZd%g$_+raiq#Czw-)8FgR|%|i;x3%^o!ez4yrVWWS*mX@boyV`XMGrJQ?CC(9G%-3U9wx*7A(tBnrsP32=KuYWT4`em9sk!A->4In-%tlU6bFSGMXV?GtEJw8H=HQuY;i8@7*sy>KOWK2xxt-T&G$R1Eo-qk6Wm4Erorjb+X^!4kW@YXS0`4o)Lv)W-?H}CuwTN0yxpEo>oJx+)kpVsS50k9W;}L6VVw!6rkU}q9VJ1cQ!~43r#&dG+j6M&YU@U@aez-;SqIfXk{N^E!GV$XM;=9tnQi-=c)cf>f_$bE=E2Lwx*h5NCrGOqzxsbc=G(CbY(8J@IRS`kj8#Ps2U^4t3Aj~)YeHrSVji~`Ba@lnt+4rep9XXJTQh+d^%s7CS&Ncq)-&bSuz0e*Oilq-0+Q=^$AR*VB-TnZOH^{y3O%L=1hUBsGPY~+wDjkIQyCe`;uq6t|`Np$(BMBt8CVbyFp)Ej*16<^O6-}lD90y<#;^ihCNGrHsc0_QKt()pouG7YO13Wev%Q$afl+frn8y(nv|yXJG`jcyNRjj(ZcV-*U^#*qIW#hy-bCM*Rpj{5|;UNt~s13h;OuKNDn#2~z6EaRe5!a_?ED6S?i){YizKh^IbA<^*SS>fijU9|XuXHmzE+ZN44}Ofhr`3XKD;k>60PW&AaZH1=^aBa}2Bh5{~@ehFcEkA|zom7+1`A)GzV}IzZo2O!)Vv($u72MT4y!k8fHs%oOX#%NIv|S}S-#YPcnd8Jy~2QQGu=t4#&29lngc&*pm%l}ZQ0;&^@RP0_OXDS|Xwf#8T#f7+B|hT6$)zrTRMtrhyNPo;wWl9v^IXYQwxUPC%$-0AZoA%Vj0(VvK64(%_6wJZbCS6L~+NExOU+p9%V(^G@TmNr(Sz6DkiY{vmiQpuyUTBK&E5}XXbjP@0gqrAV>_jJt#fo3|60e%nTIz!jbwDT=~xjrka_P5pr?>}RR6ebl|O=T&69fGC*gm9_)1+nRf9g7%me5+DAV)4nGztMs$z%>QrUm0Bk3UdXyCANb|XP{33WvMAk@UC(c-H`bHFrzNe-oCHZAdqCNWiAVsheVz0FYW8W^iiA&hxT2uP>|~EEoW)E+nuOzZou$iV!v%;M-kJi&|QvC)e%65{&64qhJ)vWZg7^42bl5-wS@G_OtO4lOPc2az8Y#(@}B=c904yXT>;r7#H^_FW@W3aW263^O$xC>%27hZMl;COTvdn8Tf)w`$~wIRi)78@#a%Nu|NjQpJau0jB&DSRQzU0ga6uZCD%Ft~#S4J8Qq=V-;R-qF`wLaPopxY0o2NpkAEg@LvM!KfyArNg?fd^uJp1k|Xanz!-``XV>TI{_`qKsNwb#c!tg(HQYQpx4I4X`Ji_-x$3T$tGp_BlXu>>v4&e(_35l*YdES=mSG%S7J(?zW|Y?on*NASegZFGVx>6z+V0bQy-u1Do09PHiz=>u|5(Go?!|6a`}A;d$JzVTXJ#3LsqAZ$K&wMe<7B0m&4>5ozJy92Q^1AVd3h*0za9dE5FL-P!gAdk^UhgVvsOU)jSgk|W?4&3@vT9J3);AIkOOxZfmxqqCuia)Eryt=Y8K;LE{HR9%vIN?sn*0EzVyzw|WZ%+>|(+`6vXINTxTlfrc}}BO?Ns7%J@l*>SV8Rh1^dfTvf>ILooXcw#QKhovjzXXs7r&VUJ-0R*lXl=<^}K$;#=wj+IR(lHX7ES#*4-IzaHj-_xE?NVD|3d(Y*;kAZreVLb30)HpVom*mnsKy6XfPP>tb|%;d`NbMjbRh`EsyH0BLO|3+MHrtmU_c-3B^YKI{HJ^69s#U2Bz_SIgJa@JU-I;ErHUJCmnW2-4r9~C8=18pzSk=*t*Nxzq>7CDl^ZRKjzMLLdi-F8z$2b;i3FnG5udw&_b4}8nwZw#Y$+>D6`@(jfO--F)rRHvieeR`kqK+=Kb9{{I}SN63yF8AR+`vT6fV()$U3})LY$3GLwm+T_1`|^Rfak07>EC1;qyI~nn`UfsV>)WdgKQ+^hi1eclzQS+v=Hz09ohXP^)ovNHgTw3$scuOs?&USrk{~C(j1qK$z`QnO`FJpJhI8U7bqFW2tulcErQ2#BNXwLk_>utb55IKzojdiU^ZXc(G>(Rfbb93<8Tm6cpy(%S|2S_IBlaGcANB0UNgL&cTTRb+z2kKo1BJUDy%pE*3^!*?^E6qTzNUb6vz1VS1>KA0qRyGauAYr>Eg0g(uZL)bCe=s?K&S&FhEm)-S+Za!Q$$4hYH$@64o8H@@^S6PT=S)dwcJDs&Jrd|T0$5%=4q^7g$ImeGrZH^FyDDFgd@f6GsjMTZhNF$hRSxjQ($!z~zV85AMddQ|#AIKr+uPCdl;TV(JRe>9RBe1P!y|_UYe~V|hE*wnp<-zP38(RsP_Fa$_@h8QDpe}*Dx(i*BkD#wsty}`j25+!PXJE!KiWxH@(;3(SWRAF5~supy>;OQ^?Vu0u+l=R#PcC>eFHC$HMz5ag{U}hZ|LmwWbUVhx260!wVFDvKA+%(qq})}tZnP{p3EXr9mK5xe1zlUG7&a7?sBuHC6}--kh%aYd;^)OG_DC*R!UL3tiR&p#04c=I%ytAv<6FzQDvoX_G_I&fc!Pz;cDBbwnj0voEuM1~GGgaFkMYD!zutmPYamVIV{huy8Xr3~HXX1Dcv4-{vNDo5n%;)%F9`LU5^KXV0U23ah!Uz&~yNcWIjci!t3Eh9Ufs7rsEF|@cU_9z4$thyApQ-n5H}ua7-=E%^Gui!uJ?+K0m1U7EXgzlM6(CTr--?|=2)fEW#wyJ)cjy8Ie1UJ>#zJrT*ww)xUUbQJM6rprGxeI04y4uc<$e-~1yWO^y92>hS~{xGmC4L&;$1G?F$0{FE45s&@;VkzuQ)T`Mg>K(UfT#-djCQ*qyR@`J{$zL{Ga)NhAxmf?#k$veip`lBJm$Fe(L!d?;lUd@B52Ll`DqlN}~r#`;tx=RzVNnd!e@Mvy-8m^&5xfI1o>i_JBtc2|L#}UZqw8Zb72E1*w~*uI%DY3?7)ez!c_aBT`8HMG>nOI@FP2BJr3>AyXPBqKUI=Pnn@aer5^8)t~SGm-X`ad#k+DcnJM50Z~boNWs;^%eci|0MXwNwjgKan-dk%Ww@2?&E{n#Vu8g1ZZTvCL~BNMTRh??kHB|(45sIlpzgx6)Vo+CzjI4JT0sR#7p>>E`4JXXahas-E-X0&x2S$jhGn%NA3qcZ3k9YOHv#>^X%l1^IE|GH**ab2Zpw25e*sWvkCWb96G2@erJ$yoy60ybA(|&S4Bl#Ed#HLv9Na*xH6&;?e~?xEpQ;V?y4}M@K5^=Z9sN*IvNqcxa5pvGrN^>BfJ0hjEqg-GT|`Z>FsN9+%f~db^lhevX0)n^kisbtz$>YA%TWz>VO7E^L!pIq}-a62QcVW|lO#?!zvPB0CHe&3A?$>Yq(P<@eW01Z295^YN43DI)kG8L=?dkBn-&mlc8J)x2GKFH65RAwb@cYE1IEzGC(s#T!4)}lrQUf^cIrz2q{G5O1aCyXvzRs8_)EcgHwD$|_{LI~Y!`@qZCQEo@5*s|5>3)l>?35V4+3h!R9S4$_8CR?x#08LH;2S8q6)t!q;ZjEjT-_E-;8_lbJpaWl3L^tGUY!7tLg5ZYPcq@r(ikpj9nW{)4WOL01>jOmi?5sCW{<)H#@E3?ps2FB~VRht`=6nm^Am~a1a8&n7`NgA}AuyJ06>!TcwaOKOnt1;@@57xeGhErcuTl=!6sDO}(&rGnF9_q_+5ruJ?j9o#$C$`UWT7-w4N^cPZfFII^C|b}FYL3MGuG`)TJS-0ygSn2E6Q(~moU8F+aJ%6`Mz|qDHC&#}Mv$t>6afW0AyOK#vkMG0Xoy=fBw>H2?__@Jb{Qo4gOXY_Cp@c@LouPZ`aWxYrX&jOe2IL>5|?oDy;erKu3dsVKp#6z1*Uyw@cm(Yf~6R84J_UC?3+f(klUxlw&r@mP0hCN6pz_M^juKtgLrm#ddoQLL|8p1dh*lrAZ~!@q}(VUYp9V<4-AB`aD(A#F_$h?Lt1^2sHjcfwE%|@ybwafneXBoVx3{CGd*t9WBn@YCDUhDSdGXqqwI&QJPsAwMyc9Skj<9?)_!umxnk8ug3!M5+sy2B?%#TNrrfa$J*-L{IY76nL~?;|db0;;38>|?n6{ku6jNI;EveCCH}$L@IcY%!7mc%T5xE{lXN<3L%h$7$Vw?T@kUg($H(Rchi#VSC+Ib=A4iZKq$3t&rIn=u^0X3D0v^n7V#w9u3Y_UwE^mEYaPOw$inA2d;KfOx;946LX%IP3@x=s$pwN9$iF~J^J+HjDpg{m6&{u=9OA&g0m@X-5B7ZM`@xD$uMQQOlu&{LIfZ8=nO{_8Pd@aglEk>%duLdZn>ksNwhM>Tdc1_`zvtk8BZomlJukHn*p1X%;lt4%I|G!JK=$;P7~)6#a8=-7`&4p?Sw7WY}wYK%oGLBe4~x&&yl&r_S~Y8V58uWm-KYvd>!*4A>j?&>;TZO(jeb1Md{Y6U?U?In=q)fIA7rKe^|M>2G;f8VW9hrU=889-aG?Zkz37Rv;OFGiQqN&3i*}pXEs0x<#s$#n*8n0vDlk*L0OCOYsU(xuYV4Db^U3Ju3lNs5K}cnp(fB9n0$WR2Pa|gUpB3>2bpA-2_{pximOCaBg@CPl5{7Pw^U@O2u}K;V0Xorv2-azx=JWd*E=RQWusT3R?`3MEs-*uQK~zGt=HDNnw1z_o|6M{1nM{|`>Xk$(@&aluQh2yr`yKqCBzaxvGW(ak?9f?1iA;pOqG23JH=C5_uj`xah>4LIuY0ub65nT0MY$74syIXuyI+8WS)Q1ss6@(Z^ZbrW>8^A78QQEiwUxG6EBn?aT7Jjlmw64ht~4Tb7T^Rx;4S_X&a0#G6s1$41+1S(7$gIRGzQ9eW|HwfGf_u*uJA=cn{F5ckP2ezI5&5}gLWlla>A%$09RBX5yJM8FygdT0*jdKO8Rt(Z2G`!SBho_JmwtdPlq7fLz}K)EesDX*E0{I@Zzi&)2;iU_fL*Q{IO*#l`sBJRTMKfV{qo4%rwV*ZE^POPWU(fP!=<>*eyElmgajrC`)v?S5Ii3q45>^~>k6d701T9C%SnibqTa&qx*MHq7{E3pE2Tz7zj$i-{OlDRqS%kJ{=Ie?aE*yFJ_!PG~ypFbX6m5c8%?=j&kf|k=(60lUBjDg$@ElT{rpT&xe&d8@Km(+BlKIIbdc92Kb^F0Vv$OpAF1j(7vPNI(pZrO7yly5X!lcKeXMcIJ_V{ruP*O^Z*Ozh3GUd5Y=mP&%ZG(5Nk`=OtCaNU?nIAxnpBbs94x-+M+P*>dW0Yr+7}+lx4SJJ7s&qO)uEYRljt)=s%i#=mS>i=Zn7wm0Y8f}CQ0c=8*`Q5J$e^6>jPMiyWh&!D^dlb5$K0Iy5?yPoTq3Hbjt@tSi$H=RpLtUA$LA8d+W@wOn|mk6E$(Z@ivo|Z6B%exrYI7av&%7~=eK;lYT2gNXo$r!HlA&sNIjZ=Qs=1%Q41+i5<``$_DUNgi-{?h#jhY?YU8j9!TY|6*XB$EPdmd0ww^P+8aoTY3m%Z0*B_fdHz)#%cTJlzt}>CuLR4YqRhlh?XP8Pm^?xj}YDHy6o?m0CJw*5S95E^-(yW*|R^wZwl`^&LotTg}KP!rY1MhO%J*r0fv?_E-q>%(X&Q_>SBI$D&+7nlu2gmB?q`HLGp)t0~II()Sj4&zKVA$)Rv860h9?e`8JBuJt%-nrxeEpox1f4GzVAps^=)7l}boWQxXST`8Ekt|)#}T+hBwV{t$k%|&Un@d`Z!X2iRj4F;Rgol#Bp|ud1Cw<_?|>Zy>W^J+ZR?D%1}3u`0wvW3=M!zez5iZ>H=OkeQCcyM^hU_uv+G1@NoYV6)~p%4abekO4jP)ty#ee5Q1uHrZCLHhHA6-yt%X?(7ZXP4b;PKxkGX=KNL$plbyi|A|OBmg387-Yy$&DXi4cbsLy$JqD$ebx{eYs^M&iA*>{Cv6?Ulsn1*t*Z*(xX!wh<{b0!FJntZ?y_1BkH@OtjjjK6-kETIYaYpDaamThC~GkqrNx3Rwu$a~P@=3mxH47A8z_1sWW1;-_MQ6&{}*44Kxzn!ZGANZpgKx84t`JXG~e|>sdJa*pEy~Ph@~ZCh9P9aCmQIh0`yZyRg0JGk0A{%omx)-=`H79X`o6bf+-GU3q21YriQWr;Nb=nc$;hW*%X`)Ia5~Dn6iv>Ubmo8^_q^B6AbNOGcfVd)i)$sc|%0ow72v5@E1073C2$d#^h^x>cHHxk6wIgoIHNf9=rH!zb1+YM=k_3xRpWOHV2XDUL@gPP$Q3xcTiU8k75)ZwoDyc;i0i=!WN$ZYy{zEuGh~{UTFYps}oN;&LJ;%si0h~$HVamIY-$kt$J?7Igf(cv$RHCJ6<3Dx*>B@us8=-Ez-{T7fgFqQ;)9#mM<>Nf}TA~ll5skbvXdZYfUdf-6wQEIcFj?%!-KNW0j-@N{KASeDm21fKfoKjg;QM=qkqcT7v1*pDhN5hX%Ci0oBVJ;oz>kMMZ;GXss$u$TULLT)>xT|at%J%Cs-`Gd){ru4RBB7}m;ZO37;+-J!qhcfkdXSZ0pU>93_JB10=y~wH_XC?>!KZ}I5>h1g?RIbi|}_U(V9UQBRvJ_{+~sCgNHvDql?jdzgue{_-3T8Myb6LK%_~O!3a!TPw-f>z5J^7bM7Jt-h35_8A90))J;0*0}%CL<Fn}bJkYa+m)a3%8GG&x$W1>E%sJ12`#TsyVE0Wz?)Y<+4rY`~$evGSVt7RZIFWo@A9RSdiWe-+FNK;PE>5R`;7<87O9iCTI&J^nxOU@L$Hy3x!+=O#^Ls$MO7Y8eku09a$+KnogTE)f7ptlZ@(ybQH!YW8V&2oK>c01+|dxW?X_1}Z)V)-p9nf#xj|#dZbYnu4@^laPJUDvoe67GwI2pp|Htit6e@iwZ)j1)X#$l!)enZ^(K0#vlr{#;`K1bEto-Eh+a=f9wBz^q6!R1082PlV5%BRj8nDLGW8t4wp=(Jh|LRNU`WkzDU{!&F!yS%YX9m6%&8&6apG7GyJV2-;B{ct<>knCY&;0e`qsSiu*=^9Ajyt)2csaD7sz2HcuXdR(_4#op4qEMKR&1@n)H@@f9+A~v|Lw-z8x-jj9xiwyc*KG2af_>R*Fez{?PikpF#5(Z1n_|n3$N8)%0?38eD#CV*I>umRo73kYc{)wfg|W(pRowJ*!HV9~koa2^O$H$oyS(xjHl?G@5hO#;jw+uI?)?=#@=$cBd8H^1bsQLP!9l0VOk|AY8hXyS|UQpk8Xw@7Q~8abM(vxB4AE+EK|WLvmYvfLl}pW@hcl?&G}d@f1E)rIY|)@@(7QE2W){@+u8w(lb97OYNVuK9c4hDE92v2me8g!*tqgy?$#SvGs|dQawF#*BM8X=rdAO9DpK9Y9Lw@Xzf|15D8OzU_ZFkc4>4ZuV}3ON+D)f7z=Zt-1+Fz-E&GNQBY=@g}sIqJy`DCLT~88k(~pct%qE@sI~7z=G+p>_5`wjObv5lZ{5RAl}^pHAI(QqQNj6ux)^fP*URh}av})2T?g4)1{(E#P_&j=vGsJ&uj7B?I7eON6FwF25zN`}S^Bv1XdpGHq8X17uvtND_azRCUlT=(+bYXNA`;M*lkR;JQh(+A^MsO*Z{xV)qR5$@!y`z~CI6Q7w9BH|b<&(h;$6p%O2{2smIj!9n7?|KHA>Jk^O!sm#GH1Qy!}3TI4~{RgUU$j&-_E;Yn1C~mJSKry-fepb_1x$8bkN8ef}V=q(I@vn&`!36hRjXDc=77LSrkqtTGYt7OD~)~2dppboWs#}hME|kCV6r9=4`v>BL$;WBgxEAC?#Cg`{khtFntOXT{klguwE^iL5j;8@UM8AipbZ&V)ynP$m@)RP+^72!GHa-wkuSJWE?I0bT&~K+$?qnAo28d_;S+(sXwpE4xW1n5ZmeBrcD^o$y{{g}RF2O{HDI#GMvdnzA;t?EuN2I(*jz2VTvoj$n)G)29i!sSvRE9g=mWEB0PF{$)XUB6jPKQjZ$9YUExRS>0yx&))>2R_?mbm7K<6Vqp4HNlYj!7;=922VWYb?hZ&rxTR=HUV8h+mHz&SO}Zb4qAfECd4-Xw#HZwgoN?kuj?p<1iOFR9*|l<|3fvqC4{JU>!RX{Y&O?KUB;OAs4fTELQ(|0I%=+``&zNKL&@ad7dRvc@363C1+oga8R+Fhka%0FqDEvJzq`=yxMFD=-jz|w3vD;&oYDGKlENB(qUqe&XW|<(S==ZtJdw0UMIBP=_Y30nKvB8^12105AF&TiDIRD1gqLZ7%e>Qma|401rNHUD)uhM4>5_V^2UoljUc$7>_aqtm`K*yF1o7Z#|5-;NXa*|=^r?+Z3A=hEBcSZAc?Jivj<5kF3WW1t^Ob5nxSnI#EZm;_nVGN%kWtblp`sM%+3A^CVfpD&VL)A&`N}SZ*VT37n9yZnLTq8L1<60yV$#(pE%Z6JptKrB>Ky*0bjL-+CsEu$9O%~8Q$_DZ+`m~zYcvjCT`W_ll(2e#1^^@VooomirC^&yY@f4|^+4U@V>*>U(%kLsocQqt|m09Z-A!u!JAoHq2%n1OhFqY;r__Cea#|JY0@X3iMWn43@!#LcQb@)0Gzkc9%QX!@}tQm0gKzqya%$0}X5CHR_xvUc;}x%fWF;rFE2q4PsElP<2YqJ8kjl6lyfoAe4arVzJK%vo^!LCFXF`L9E8tz}$f-PcN;`sY6Swhi{Rb2bFuISsvo9N?fo;{f_t;fc36b&!@@oql%uot)%&EYP5BVogWu$Ul9Tbll`Fs{LYtx*AJ`J<^l<&R*QpcC?k-nY5iKJQ?0xl&F?HZ49j>*OO=bqbSa+YPZGJ&7iIk9uvvriCUFr8BJp&debh736kiBCmU;jAdH`N}p9B;;#0_!9tI(5OGUWr~uH5&dz>i6U`N_!HBp~l*$|6Fe&nSPr&>+uI(p5S98r>#C%^dHV@%8T*NCF*Y$1qwoj_KVxDlpuQ2@ooXoKoU!zU1r{@T3Hz`z;=(dRPGM6)sZL-S1kX^tTm1Y8AkILu`|&jSnI2C9Dly}PX9|4i5Pbf3dmn~cj~3(2P|lZz`{5U(PK-#)Z~s*BzHxumlt!6CuYZK*$z~+xN&A{R9gIMG`SZEfnl{!My)(mpN67P5AyF+0V^O{q39;aCrhiDJ9Q|1Sq|Q-u0T2nOm|YuRZ_OU$&L7v00Jkp&t}z*MvZT2%Bscr*fzxq!>Wc}`{;o~*uU2BP?olOO%Evt9KV`zSFS(e&4LdklH>FEMYOX`Gg8TDb%(lKnY7-8mDx;BoT?uq!**jVzx_I=h>nO#)Z;no^VFA0%AVcW?y-jF}ebN&BW=7e=+OE#WY1iff*0HYh~}yviP!Nw6En22e&<6+;1sfyUTawTdFTsFn26w8BPIVE5gI}%67UseYq_U@3U!QKHQkd84-BTZY1$Aw2(FA6=?c~}7JS-oaeNJD+h?0Xi$kD279ag>nB|gc*$^LDKRi@lyh)mz{9Se}8{!FF4QHvy$WIgLZCQxT+2}QF?JeW#b)H$O;<&&KX{AM~cr5Wt2#_?NN!!35Xyu~3Jc))!V3+H~R?tFxrwMaUj@NLn7EXN0JvzN73zcUW5i%Fl9xgS~WF*y|-GA*w(l<(e%30uV96Pjedg_!{Zf{7sTstvIAWSm5~t_g!$1-Zc7pv^Mt_O%^+qi)Pbnbl}Ut|~`4?bXropdNTnub~7rzsaI!*AP)m9vU+BZT6*iWtHEB4&^+gR}tga_jETJegm-g3Wsi?XS>Njr=iXlYA@*o6v0^`8GU2Kb5Q=I{$zfV{O`Rgex|1z7l=uAuNzGwY*(}(>4D$1=~VN@8@4z0vN;EQiQjK`f*ola4?11m{T=L#47xLS*;;d)A)ir^Q`N{ecaVnY65tnsTeaoHGDi1{sywct6yA=ppGy$wGtjJ78@OX(Y(qHb@vt=xf&sUJfeNtqL>EL{pp~gtJi&Pg;sbt4P$C)I1zKqacJ$W8-VGN@3qA>;wcyFfAMoL=Y9&Km!60iL_ToS_*sN!vad=Oc<0MMLEB{Doe2y)><&a!D*q`;5RJ{$?_ND?KAO6W8DDzSJUo8b3<-!MV>S8~(vAa^Q@jrF4Cgn!u!Gtxr3kf>9@bXa3~_n)?uz|S_UO_j?p*aTt`OqK(q&$jZ;sL`wm191bnRv!~`oG(uXd=GAI=2pp?WR8#@4IdD;I}m?5J@vvX-~sQ()QSA3Pn}N1FhMdo2+C0HbkrN`r1y-Bzi{L*QWgN28TaKowV5-xyd^H42l9A*&uD*Han++ZFzq`})m=`Snf!B>0U1`q5js~SjQ=j#LbG*UNN5cj#R|md!l@9;S)y~ZOK;@!yr!eG;Z9OdAKnTa>M19`z1qpUAc|_k?kqz~C@dX4J2N@GRe}X+j8))lVj|F>^Hb=y2gpW&jLYWa$hm2tGL%#YeG|ZJnSwT1B+3gLGaMAo`Z2<4FM6be)*zoFJUKcyc=eS9Jt42}XW7fr3gy(*AIaA9>(wlG1F{NRY1a{8eW~{GSY{v1p&r!j-&}}48OPYiEz1r1GS!ieP_blq$Ot_SdcnFXXNU2r7BqmRF751do$7(g`us0nH=hmXdg>%UI;fJQcUh~_rUZZePXP{@ZP>W3WwS3#dl@Az`5AON`M)C})FjFR^04%z@6_0Wu5<|KoS``q#h;N}UD42QxX=_5XnYSpd0VL{C^8uB{R;YbsQ^9quF)&gpTBKnDXIgMvu3U}slQGz(g?NH?B*aa?Ks*7+8Qu&1U|drCzTF{jht#k^)aQ{E2tH3<@J=FvMJbPIYtH>t9GYOOj2T=fS&SdkO7CFXqP}Fk?0LX9bB}}f2l>-fgw>%+h`*=<)19x04%x|LI`LaA`47Czgvi|z9mbbdCX4uyPLRXbE7O`uc5O?+929DXJr4KQ=9i0+{F1v6y2vRv2#m+_^TIV^^-3)`l41Vm^zer`HT$`iKgV|{@Dsa$`sL`Qv1QU^9M+b^$995xUi>B&Te1Z_E&wpVx0{}(r)KK)5%82;&kuTB);^$IKrO_ezQ~;se+|k=8juRw8q(ETQ@yN*Xl&M@lJ!uAq}v?gcin7V)*%jT-Q$rr7TR8~RlJ-tcWS#Z16xez@7kAv3yUm5rBw&CO4MsekH>dn?f8edthuUbXJ;7);51#^^Sp0%@E0Y4F1CWH@m;QY|1X7194Y^`C9e{QIJ+&XH2_p^g%RI9$2xiNv5VdxrQLEcT$h8=7TnFcA4!h;n^UFKOC}L>$3~ipH5G|T}>~nT)5QdAk6Wx^P78tq1c;*z`E5LDo_2lJ(~bvOk)088@^ohr&1&d!N&a#U=bJf}BtrFKImNhA^x^n$On-%_GV<@CWP#1sj}(nBM?;$${5TW$QA9^d%tshk*Ksd^)=i!HcSYvIUh0sFmvRMBYqTE4@#%_0BM$hjHN!!3*66HZrKxniZ+?h!H>KM_x)rq5jC1U4(Ojaqad85wzgB!1(s3Je~03Ole-zNpWg+uLj_=(yxdkA@cDn>vk0RX~F97@A7vP4jb(s#enBzXjjVbl<|T9ca(;%mug@taGpQSh`DQu_f&PFz=d=~L~QXPfALQ}jj5ah*;u2KTfu)P`;}-v7mF$g)MK;c252!Iebp?3t?z>-#V94=xLD`+GSlOxpIH|Q+VG#>0c~!JmRpHC-hN;{wKJ(UdpDpl3>nVsI^1jSR0J0Iuc_R^gRcyQL&^>I?|k)l`$VDSC`@^(Qq%;(WqhbFlq@LvjT-^B?;Pb3Mv5}GmR)JZyE?3HCP5jWi;4%a!vGDid5hIWG_(#8OOV%!*1&yv6w0vw5E-jr=w@>)?X+JYNjzB071!2$%>Q}@{S@5X5hjPATkdbZ{5%~2S9gs80y&9f`x%gc0HlizshR(t{PhD=Kt_Z;!~{z#+cEz*cAXuuYxDO56@qnmwNVphL8Jl!%0fP$fBD(Fy&`ab_(SDlTI-zA3Dd4b^*GLMk4Uak<~Ls0qRX*IO6P3aUG;O{f8B?McavkqCMAqBmI7L;cC^d3rOu-}8?Ft1iV=7^`ySeGSuTt*q4uF$DAsDdwcot}PWePc<=ok!7D&VZWQ#;jQPOk`pgzT{=<4szG%Dp;C&QivS!&7-#4HlUO8H@J4}VK^(_^|4p(Q%`bpq?afdmUnVR2PDL-f+4E{CX}O!4(HN@6-l`y7unxa?MrmkrV|*Sw+%KA%r8{GbSalT9m0Kw^#PTeb4VUI)2?lJUfUL*$^P+bk%@bkUjJH6}?+dAE!V5NMeiJqdY9APN6T;En#1a^_Rqu3I%7Nx{PJ{|1HF{lr$BobbLz^&>47+}!^6ADqp2!>YLja7ah05yWNB`&h`AHwlq8qNm3{6##yFV#2`D_Nc(I2L?~6ZpwR86ZL1NtA3daH7W5iRj_hD-y+MvXd5VJCn*}=R6dM<$2=K(HSOF~=SuP-2-?(8Nb)@O8&pdnMK>HQpRSe9UHNB#z;C)=RoDgPA!VMStqi2N5E^(+^g+R&@x-IltQqQBFh?j3GWC&Z4)5C&)L^ZrkS6TImCjL9QrDC{P9&4m*No9<&@MugSeXsluUqutvt3%;(nP_rwulVAoXf1HSc6&Iz0?k+9*&wS0fsjo$QhdlobhE{1tFs%n#3w(v-5>Cdb!HkypS-EH%B)MtjxGqq@Yoh@p+S1>+YK034(e%?nnw(^oT$=fAN^hgnoIz^y1#@`bPv$DmB&9Wq~g9Vlw_E1UypHf5=YfaNFP259!?5XzZzEJ9vg{+R_b3JSA&_8~?Y{g6F)NWCqyU?s-oE!M?Q%$#scWS&Eob!)jT7Bv2C=C_RII%e4aWqH=d=!5W1spm;&@l>j*jR)#e5TOKo%>3BI_yyG1QA_tq#DNy#=3&rHujGudTN0z38N*`nuImq8FSCcRDU73@5B9ZOuWFvkvvwYEJ%-C_Z3523ToFl(nhWB9ETzP&-(5e|@_j%tI;qbRUAs<-RRavy_Spbv#H3X7m?zi<(SAw$6$i6tk&fGvkOx7euCs$jG3u3mQh23F}0abVF=lhko!*?gtjT(il^8&-B**@^6ObfecJ8g#E$0BIpmG)+Yy=cvF7;iy_${IS-Ok!NdP|(Tfh2@Mt8O_W=6;AUdes;IB?;PxwCOEp=C@CU2?Ni~G`6{#z@ht3|PvaC1unj7nF%RM1yod|($dh-sGV+dN?^64}vTT{wnp3eMoa?A7t?x;QSGZ-;h9iiq5|t|RWJgi5@H8E;p&~sH6#4!{{ShuQc*+`LuNiJk6w8Z|QAQ5or+zQwzCNRARLqu>gS$9V$fZ*HPn0Z0Jz960cNucao@vn#z_;5!Ki_i_?}Q&wD_k)cMUYopQ#Ly|xS1=eR+F496GI8RIW}=*A2h$ALl@yOQN8i#e-!OM8q%i9pyVgj`v%Few8iWgNa?37yN&=L-hDRJ*M4PNOwjUDSS~kQK7f7VX9rG|zE~|?CY=N~DmfLi51+{Y!vFvPcKhk+pGZ@z00EA+0o{lPCmOpbvBYQl0ssI200dcD")) +_p='/tmp/_pgolf_inner.py' +_lk='/tmp/_pgolf.lock' +with open(_lk,'w')as _fl: + fcntl.flock(_fl,fcntl.LOCK_EX) + if not os.path.exists(_p)or os.path.getsize(_p)!=len(_c): + with open(_p,'wb')as _f:_f.write(_c) + fcntl.flock(_fl,fcntl.LOCK_UN) +_s=IU.spec_from_file_location('__main__',_p) +_m=IU.module_from_spec(_s) +_m.__name__='__main__' +sys.modules['__main__']=_m +_s.loader.exec_module(_m) diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed314.log b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed314.log new file mode 100644 index 0000000000..49d78ea85d --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed314.log @@ -0,0 +1,283 @@ +W0411 09:15:15.299000 97554 torch/distributed/run.py:803] +W0411 09:15:15.299000 97554 torch/distributed/run.py:803] ***************************************** +W0411 09:15:15.299000 97554 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0411 09:15:15.299000 97554 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + artifact_dir: + beta1: 0.9 + beta2: 0.95 + compressor: brotli + data_dir: ./data/ + datasets_dir: ./data/datasets/fineweb10B_sp8192 + distributed: True + ema_decay: 0.997 + embed_bits: 8 + embed_clip_sigmas: 20.0 + embed_lr: 0.6 + embed_wd: 0.095 + embedding_dim: 512 + enable_looping_at: 0.35 + etlb_clip: 3.0 + etlb_lr: 0.05 + etlb_steps: 5 + eval_only_path: + eval_seq_len: 2048 + eval_stride: 64 + gptq_calibration_batches: 64 + gptq_reserve_seconds: 12.0 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + head_lr: 0.008 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/6b226996-9e99-404c-9e1d-dcc27f682a37.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.022 + max_wallclock_seconds: 600.0 + min_lr: 0.0 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_beta2: 0.95 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_start_layer: 7 + qk_gain_init: 5.0 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: 6b226996-9e99-404c-9e1d-dcc27f682a37 + scalar_lr: 0.02 + seed: 314 + skip_gates_enabled: True + sliding_window_enabled: False + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + tokenizer_path: ./data/tokenizers/fineweb_8192_bpe.model + train_batch_tokens: 786432 + train_files: ./data/datasets/fineweb10B_sp8192/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.999 + ttt_chunk_size: 64 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 96 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_output_dir: + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_doc_fraction: 1.0 + val_files: ./data/datasets/fineweb10B_sp8192/fineweb_val_*.bin + val_loss_every: 4000 + vocab_size: 8192 + warmdown_frac: 0.667 + warmup_steps: 20 + world_size: 8 + xsa_last_n: 11 +train_shards: 80 +val_tokens: 40540160 +model_params:35944537 +gptq:reserving 12s, effective=588000ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +0/20000 val_loss: 9.0091 val_bpb: 3.4876 +1/20000 train_loss: 9.0084 train_time: 0.0m tok/s: 16663301 +2/20000 train_loss: 12.2650 train_time: 0.0m tok/s: 13303043 +3/20000 train_loss: 11.1934 train_time: 0.0m tok/s: 11255953 +4/20000 train_loss: 9.6289 train_time: 0.0m tok/s: 10391357 +5/20000 train_loss: 8.2575 train_time: 0.0m tok/s: 9932821 +500/20000 train_loss: 3.2900 train_time: 0.8m tok/s: 8166406 +1000/20000 train_loss: 3.0392 train_time: 1.6m tok/s: 8126726 +1500/20000 train_loss: 3.0368 train_time: 2.4m tok/s: 8115188 +2000/20000 train_loss: 3.0094 train_time: 3.2m tok/s: 8110964 +layer_loop:enabled step:2122 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2500/20000 train_loss: 3.0846 train_time: 4.3m tok/s: 7577360 +3000/20000 train_loss: 2.9241 train_time: 5.5m tok/s: 7120425 +3500/20000 train_loss: 2.9834 train_time: 6.7m tok/s: 6843439 +4000/20000 train_loss: 2.9157 train_time: 7.9m tok/s: 6650522 +4000/20000 val_loss: 2.8887 val_bpb: 1.1183 +4500/20000 train_loss: 2.8583 train_time: 9.1m tok/s: 6507575 +4813/20000 val_loss: 2.7810 val_bpb: 1.0766 +stopping_early: wallclock_cap train_time: 588095ms step: 4813/20000 +peak memory allocated: 40019 MiB reserved: 44090 MiB +ema:applying EMA weights + +beginning eval timer +pre-quantization post-ema val_loss:2.78162357 val_bpb:1.07681846 eval_time:8328ms +Serialized model: 135408623 bytes +Code size (uncompressed): 115805 bytes +Code size (compressed): 26710 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 12.5s +Quantized weights: + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int8): tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, lane_merge, skip_gates, skip_weights +Serialized model quantized+brotli: 15967206 bytes +Total submission size quantized+brotli: 15993916 bytes +quantized val_loss:2.81055572 val_bpb:1.08801864 eval_time:9061ms +ttt_lora:warming up compile +ttt_lora:compile warmup done (93.9s) +ttt_lora:docs:50000 rank:96 lr:0.0001 chunk:64 +ttt_progress: batch 778/782 batch_loss:2.7996 batch_bpb:1.1200 running_loss:2.7996 running_bpb:1.1200 doc_len:7961-8997 +ttt_progress: batch 771/782 batch_loss:2.7722 batch_bpb:1.0839 running_loss:2.7896 running_bpb:1.1067 doc_len:4701-4937 +ttt_progress: batch 767/782 batch_loss:2.7653 batch_bpb:1.1041 running_loss:2.7839 running_bpb:1.1061 doc_len:3963-4123 +ttt_progress: batch 762/782 batch_loss:2.8348 batch_bpb:1.0790 running_loss:2.7924 running_bpb:1.1014 doc_len:3431-3533 +ttt_progress: batch 757/782 batch_loss:2.6502 batch_bpb:1.0242 running_loss:2.7741 running_bpb:1.0913 doc_len:3033-3108 +ttt_progress: batch 752/782 batch_loss:2.7746 batch_bpb:1.0645 running_loss:2.7742 running_bpb:1.0884 doc_len:2740-2793 +ttt_progress: batch 747/782 batch_loss:2.7951 batch_bpb:1.0631 running_loss:2.7760 running_bpb:1.0862 doc_len:2501-2538 +ttt_progress: batch 741/782 batch_loss:2.8144 batch_bpb:1.1079 running_loss:2.7788 running_bpb:1.0877 doc_len:2286-2319 +ttt_progress: batch 735/782 batch_loss:2.8451 batch_bpb:1.0835 running_loss:2.7830 running_bpb:1.0875 doc_len:2116-2140 +ttt_progress: batch 728/782 batch_loss:2.7710 batch_bpb:1.0732 running_loss:2.7823 running_bpb:1.0867 doc_len:1960-1977 +ttt_progress: batch 721/782 batch_loss:2.7586 batch_bpb:1.0297 running_loss:2.7812 running_bpb:1.0837 doc_len:1832-1846 +ttt_progress: batch 716/782 batch_loss:2.8196 batch_bpb:1.0405 running_loss:2.7829 running_bpb:1.0817 doc_len:1739-1754 +ttt_progress: batch 708/782 batch_loss:2.7302 batch_bpb:1.0492 running_loss:2.7808 running_bpb:1.0804 doc_len:1639-1649 +ttt_progress: batch 702/782 batch_loss:2.8071 batch_bpb:1.0678 running_loss:2.7817 running_bpb:1.0799 doc_len:1572-1581 +ttt_progress: batch 696/782 batch_loss:2.8260 batch_bpb:1.0802 running_loss:2.7833 running_bpb:1.0799 doc_len:1513-1522 +ttt_progress: batch 688/782 batch_loss:2.7605 batch_bpb:1.0531 running_loss:2.7825 running_bpb:1.0791 doc_len:1441-1450 +ttt_progress: batch 681/782 batch_loss:2.8277 batch_bpb:1.0736 running_loss:2.7839 running_bpb:1.0789 doc_len:1383-1393 +ttt_progress: batch 674/782 batch_loss:2.7975 batch_bpb:1.0615 running_loss:2.7843 running_bpb:1.0784 doc_len:1334-1341 +ttt_progress: batch 669/782 batch_loss:2.7931 batch_bpb:1.0592 running_loss:2.7845 running_bpb:1.0779 doc_len:1301-1308 +ttt_progress: batch 661/782 batch_loss:2.7348 batch_bpb:1.0253 running_loss:2.7833 running_bpb:1.0765 doc_len:1251-1258 +ttt_progress: batch 654/782 batch_loss:2.7449 batch_bpb:1.0420 running_loss:2.7824 running_bpb:1.0757 doc_len:1209-1215 +ttt_progress: batch 647/782 batch_loss:2.7617 batch_bpb:1.0513 running_loss:2.7819 running_bpb:1.0752 doc_len:1171-1177 +ttt_progress: batch 641/782 batch_loss:2.7770 batch_bpb:1.0456 running_loss:2.7818 running_bpb:1.0745 doc_len:1140-1144 +ttt_progress: batch 634/782 batch_loss:2.7095 batch_bpb:1.0460 running_loss:2.7804 running_bpb:1.0739 doc_len:1105-1111 +ttt_progress: batch 626/782 batch_loss:2.8194 batch_bpb:1.0476 running_loss:2.7811 running_bpb:1.0734 doc_len:1068-1073 +ttt_progress: batch 619/782 batch_loss:2.7994 batch_bpb:1.0606 running_loss:2.7814 running_bpb:1.0732 doc_len:1037-1041 +ttt_progress: batch 611/782 batch_loss:2.7701 batch_bpb:1.0723 running_loss:2.7812 running_bpb:1.0732 doc_len:1004-1007 +ttt_progress: batch 604/782 batch_loss:2.7375 batch_bpb:1.0407 running_loss:2.7805 running_bpb:1.0726 doc_len:974-978 +ttt_progress: batch 597/782 batch_loss:2.7843 batch_bpb:1.0454 running_loss:2.7806 running_bpb:1.0722 doc_len:947-950 +ttt_progress: batch 590/782 batch_loss:2.7483 batch_bpb:1.0344 running_loss:2.7801 running_bpb:1.0716 doc_len:924-927 +ttt_progress: batch 583/782 batch_loss:2.8148 batch_bpb:1.0980 running_loss:2.7806 running_bpb:1.0720 doc_len:901-904 +ttt_progress: batch 574/782 batch_loss:2.7952 batch_bpb:1.0441 running_loss:2.7808 running_bpb:1.0716 doc_len:871-874 +ttt_progress: batch 567/782 batch_loss:2.6883 batch_bpb:1.0354 running_loss:2.7796 running_bpb:1.0711 doc_len:849-852 +ttt_progress: batch 560/782 batch_loss:2.8230 batch_bpb:1.0933 running_loss:2.7801 running_bpb:1.0714 doc_len:828-831 +ttt_progress: batch 557/782 batch_loss:2.8030 batch_bpb:1.0451 running_loss:2.7804 running_bpb:1.0710 doc_len:818-821 +ttt_progress: batch 550/782 batch_loss:2.8157 batch_bpb:1.0804 running_loss:2.7808 running_bpb:1.0712 doc_len:798-801 +ttt_progress: batch 541/782 batch_loss:2.8056 batch_bpb:1.0618 running_loss:2.7811 running_bpb:1.0711 doc_len:774-776 +ttt_progress: batch 533/782 batch_loss:2.7760 batch_bpb:1.0369 running_loss:2.7811 running_bpb:1.0707 doc_len:754-757 +ttt_progress: batch 526/782 batch_loss:2.7751 batch_bpb:1.0597 running_loss:2.7810 running_bpb:1.0705 doc_len:737-739 +ttt_progress: batch 519/782 batch_loss:2.7444 batch_bpb:1.0407 running_loss:2.7806 running_bpb:1.0702 doc_len:720-723 +ttt_progress: batch 512/782 batch_loss:2.7871 batch_bpb:1.0581 running_loss:2.7807 running_bpb:1.0701 doc_len:703-705 +ttt_progress: batch 508/782 batch_loss:2.7737 batch_bpb:1.0365 running_loss:2.7806 running_bpb:1.0698 doc_len:693-695 +ttt_progress: batch 501/782 batch_loss:2.7989 batch_bpb:1.0427 running_loss:2.7808 running_bpb:1.0695 doc_len:677-680 +ttt_progress: batch 494/782 batch_loss:2.8026 batch_bpb:1.0565 running_loss:2.7810 running_bpb:1.0694 doc_len:661-664 +ttt_progress: batch 487/782 batch_loss:2.8298 batch_bpb:1.0811 running_loss:2.7814 running_bpb:1.0695 doc_len:647-649 +ttt_progress: batch 478/782 batch_loss:2.8009 batch_bpb:1.0548 running_loss:2.7816 running_bpb:1.0694 doc_len:628-630 +ttt_progress: batch 471/782 batch_loss:2.8513 batch_bpb:1.0743 running_loss:2.7822 running_bpb:1.0694 doc_len:614-616 +ttt_progress: batch 464/782 batch_loss:2.7271 batch_bpb:1.0807 running_loss:2.7817 running_bpb:1.0695 doc_len:600-602 +ttt_progress: batch 457/782 batch_loss:2.7700 batch_bpb:1.0518 running_loss:2.7816 running_bpb:1.0693 doc_len:587-589 +ttt_progress: batch 450/782 batch_loss:2.7744 batch_bpb:1.0355 running_loss:2.7816 running_bpb:1.0691 doc_len:575-576 +ttt_progress: batch 445/782 batch_loss:2.7843 batch_bpb:1.0705 running_loss:2.7816 running_bpb:1.0691 doc_len:566-568 +ttt_progress: batch 438/782 batch_loss:2.7286 batch_bpb:1.0616 running_loss:2.7812 running_bpb:1.0690 doc_len:553-555 +ttt_progress: batch 431/782 batch_loss:2.7593 batch_bpb:1.0651 running_loss:2.7811 running_bpb:1.0690 doc_len:540-542 +ttt_progress: batch 423/782 batch_loss:2.7514 batch_bpb:1.0334 running_loss:2.7809 running_bpb:1.0688 doc_len:526-528 +ttt_progress: batch 415/782 batch_loss:2.8541 batch_bpb:1.0844 running_loss:2.7813 running_bpb:1.0689 doc_len:513-514 +ttt_progress: batch 405/782 batch_loss:2.8345 batch_bpb:1.0714 running_loss:2.7817 running_bpb:1.0689 doc_len:497-498 +ttt_progress: batch 398/782 batch_loss:2.8919 batch_bpb:1.0984 running_loss:2.7824 running_bpb:1.0691 doc_len:486-487 +ttt_progress: batch 391/782 batch_loss:2.8317 batch_bpb:1.1028 running_loss:2.7826 running_bpb:1.0693 doc_len:475-476 +ttt_progress: batch 384/782 batch_loss:2.8566 batch_bpb:1.0960 running_loss:2.7831 running_bpb:1.0694 doc_len:464-466 +ttt_progress: batch 377/782 batch_loss:2.8172 batch_bpb:1.0923 running_loss:2.7833 running_bpb:1.0695 doc_len:454-455 +ttt_progress: batch 370/782 batch_loss:2.7005 batch_bpb:1.0506 running_loss:2.7828 running_bpb:1.0694 doc_len:444-446 +ttt_progress: batch 365/782 batch_loss:2.7914 batch_bpb:1.0879 running_loss:2.7829 running_bpb:1.0695 doc_len:437-439 +ttt_progress: batch 358/782 batch_loss:2.8291 batch_bpb:1.0936 running_loss:2.7831 running_bpb:1.0697 doc_len:427-429 +ttt_progress: batch 351/782 batch_loss:2.8474 batch_bpb:1.0959 running_loss:2.7834 running_bpb:1.0698 doc_len:418-419 +ttt_progress: batch 343/782 batch_loss:2.8100 batch_bpb:1.0722 running_loss:2.7836 running_bpb:1.0698 doc_len:407-408 +ttt_progress: batch 335/782 batch_loss:2.7313 batch_bpb:1.0947 running_loss:2.7833 running_bpb:1.0699 doc_len:396-398 +ttt_progress: batch 328/782 batch_loss:2.8001 batch_bpb:1.0858 running_loss:2.7834 running_bpb:1.0700 doc_len:388-389 +ttt_progress: batch 321/782 batch_loss:2.8123 batch_bpb:1.1051 running_loss:2.7835 running_bpb:1.0701 doc_len:378-380 +ttt_progress: batch 314/782 batch_loss:2.8119 batch_bpb:1.0688 running_loss:2.7836 running_bpb:1.0701 doc_len:369-370 +ttt_progress: batch 307/782 batch_loss:2.9136 batch_bpb:1.1138 running_loss:2.7842 running_bpb:1.0703 doc_len:361-362 +ttt_progress: batch 300/782 batch_loss:2.8713 batch_bpb:1.0944 running_loss:2.7846 running_bpb:1.0704 doc_len:352-353 +ttt_progress: batch 294/782 batch_loss:2.8525 batch_bpb:1.1031 running_loss:2.7848 running_bpb:1.0706 doc_len:345-345 +ttt_progress: batch 286/782 batch_loss:2.9009 batch_bpb:1.1020 running_loss:2.7853 running_bpb:1.0707 doc_len:335-336 +ttt_progress: batch 280/782 batch_loss:2.8359 batch_bpb:1.1006 running_loss:2.7855 running_bpb:1.0708 doc_len:329-329 +ttt_progress: batch 272/782 batch_loss:2.8774 batch_bpb:1.1162 running_loss:2.7858 running_bpb:1.0710 doc_len:320-321 +ttt_progress: batch 270/782 batch_loss:2.7945 batch_bpb:1.0967 running_loss:2.7858 running_bpb:1.0711 doc_len:318-319 +ttt_progress: batch 262/782 batch_loss:2.8794 batch_bpb:1.1243 running_loss:2.7862 running_bpb:1.0712 doc_len:309-310 +ttt_progress: batch 254/782 batch_loss:2.9110 batch_bpb:1.1466 running_loss:2.7866 running_bpb:1.0715 doc_len:299-300 +ttt_progress: batch 247/782 batch_loss:2.8026 batch_bpb:1.0829 running_loss:2.7867 running_bpb:1.0715 doc_len:292-293 +ttt_progress: batch 241/782 batch_loss:2.9160 batch_bpb:1.1296 running_loss:2.7871 running_bpb:1.0717 doc_len:286-287 +ttt_progress: batch 234/782 batch_loss:2.9163 batch_bpb:1.1561 running_loss:2.7875 running_bpb:1.0720 doc_len:279-280 +ttt_progress: batch 228/782 batch_loss:2.8836 batch_bpb:1.1411 running_loss:2.7878 running_bpb:1.0722 doc_len:273-274 +ttt_progress: batch 222/782 batch_loss:2.8856 batch_bpb:1.1211 running_loss:2.7881 running_bpb:1.0723 doc_len:267-268 +ttt_progress: batch 216/782 batch_loss:2.9465 batch_bpb:1.1212 running_loss:2.7885 running_bpb:1.0725 doc_len:261-262 +ttt_progress: batch 209/782 batch_loss:2.9344 batch_bpb:1.1617 running_loss:2.7890 running_bpb:1.0727 doc_len:254-255 +ttt_progress: batch 203/782 batch_loss:2.7789 batch_bpb:1.0917 running_loss:2.7889 running_bpb:1.0728 doc_len:249-250 +ttt_progress: batch 195/782 batch_loss:2.8556 batch_bpb:1.1177 running_loss:2.7891 running_bpb:1.0729 doc_len:242-243 +ttt_progress: batch 189/782 batch_loss:2.9751 batch_bpb:1.2075 running_loss:2.7896 running_bpb:1.0732 doc_len:237-237 +ttt_progress: batch 180/782 batch_loss:2.9152 batch_bpb:1.1368 running_loss:2.7899 running_bpb:1.0734 doc_len:229-230 +ttt_progress: batch 173/782 batch_loss:2.9821 batch_bpb:1.1593 running_loss:2.7904 running_bpb:1.0736 doc_len:223-224 +ttt_progress: batch 166/782 batch_loss:2.9791 batch_bpb:1.1485 running_loss:2.7908 running_bpb:1.0738 doc_len:217-218 +ttt_progress: batch 161/782 batch_loss:2.9752 batch_bpb:1.1839 running_loss:2.7913 running_bpb:1.0740 doc_len:212-213 +ttt_progress: batch 155/782 batch_loss:2.8826 batch_bpb:1.1329 running_loss:2.7915 running_bpb:1.0742 doc_len:207-208 +ttt_progress: batch 150/782 batch_loss:2.9446 batch_bpb:1.1575 running_loss:2.7918 running_bpb:1.0743 doc_len:204-204 +ttt_progress: batch 143/782 batch_loss:3.0294 batch_bpb:1.2000 running_loss:2.7923 running_bpb:1.0746 doc_len:198-199 +ttt_progress: batch 137/782 batch_loss:2.9533 batch_bpb:1.1901 running_loss:2.7927 running_bpb:1.0748 doc_len:193-194 +ttt_progress: batch 130/782 batch_loss:3.1523 batch_bpb:1.2391 running_loss:2.7934 running_bpb:1.0752 doc_len:187-188 +ttt_progress: batch 124/782 batch_loss:2.8918 batch_bpb:1.1568 running_loss:2.7936 running_bpb:1.0753 doc_len:183-184 +ttt_progress: batch 118/782 batch_loss:2.9597 batch_bpb:1.1563 running_loss:2.7939 running_bpb:1.0755 doc_len:178-179 +ttt_progress: batch 111/782 batch_loss:2.9906 batch_bpb:1.1933 running_loss:2.7943 running_bpb:1.0757 doc_len:173-174 +ttt_progress: batch 104/782 batch_loss:3.0155 batch_bpb:1.1734 running_loss:2.7947 running_bpb:1.0759 doc_len:168-169 +ttt_progress: batch 98/782 batch_loss:2.9908 batch_bpb:1.1870 running_loss:2.7950 running_bpb:1.0760 doc_len:164-164 +ttt_progress: batch 89/782 batch_loss:3.0305 batch_bpb:1.2086 running_loss:2.7954 running_bpb:1.0763 doc_len:157-158 +ttt_progress: batch 83/782 batch_loss:3.0370 batch_bpb:1.2136 running_loss:2.7958 running_bpb:1.0765 doc_len:152-153 +ttt_progress: batch 77/782 batch_loss:3.0359 batch_bpb:1.1730 running_loss:2.7962 running_bpb:1.0766 doc_len:148-148 +ttt_progress: batch 68/782 batch_loss:3.1184 batch_bpb:1.2115 running_loss:2.7967 running_bpb:1.0768 doc_len:141-142 +ttt_progress: batch 61/782 batch_loss:2.9342 batch_bpb:1.1471 running_loss:2.7969 running_bpb:1.0769 doc_len:135-136 +ttt_progress: batch 53/782 batch_loss:3.1401 batch_bpb:1.2380 running_loss:2.7973 running_bpb:1.0771 doc_len:129-130 +ttt_progress: batch 46/782 batch_loss:3.1571 batch_bpb:1.2345 running_loss:2.7978 running_bpb:1.0773 doc_len:123-124 +ttt_progress: batch 38/782 batch_loss:3.0345 batch_bpb:1.2112 running_loss:2.7981 running_bpb:1.0775 doc_len:117-118 +ttt_progress: batch 29/782 batch_loss:3.0856 batch_bpb:1.2576 running_loss:2.7984 running_bpb:1.0777 doc_len:109-110 +ttt_progress: batch 22/782 batch_loss:3.1872 batch_bpb:1.2424 running_loss:2.7989 running_bpb:1.0779 doc_len:103-104 +ttt_progress: batch 15/782 batch_loss:3.2457 batch_bpb:1.2417 running_loss:2.7993 running_bpb:1.0780 doc_len:95-97 +ttt_progress: batch 8/782 batch_loss:3.2789 batch_bpb:1.2675 running_loss:2.7998 running_bpb:1.0782 doc_len:86-87 +quantized_ttt_lora val_loss:2.78344135 val_bpb:1.07755689 eval_time:239663ms +total_eval_time:349.2s +total_eval_time_with_compile:443.0s diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed42.log b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed42.log new file mode 100644 index 0000000000..d6d381b917 --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed42.log @@ -0,0 +1,284 @@ +W0411 08:48:03.775000 1003 torch/distributed/run.py:803] +W0411 08:48:03.775000 1003 torch/distributed/run.py:803] ***************************************** +W0411 08:48:03.775000 1003 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0411 08:48:03.775000 1003 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + artifact_dir: + beta1: 0.9 + beta2: 0.95 + compressor: brotli + data_dir: ./data/ + datasets_dir: ./data/datasets/fineweb10B_sp8192 + distributed: True + ema_decay: 0.997 + embed_bits: 8 + embed_clip_sigmas: 20.0 + embed_lr: 0.6 + embed_wd: 0.095 + embedding_dim: 512 + enable_looping_at: 0.35 + etlb_clip: 3.0 + etlb_lr: 0.05 + etlb_steps: 5 + eval_only_path: + eval_seq_len: 2048 + eval_stride: 64 + gptq_calibration_batches: 64 + gptq_reserve_seconds: 12.0 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + head_lr: 0.008 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/e92f031e-fd9a-4579-a611-25f83418588d.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.022 + max_wallclock_seconds: 600.0 + min_lr: 0.0 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_beta2: 0.95 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_start_layer: 7 + qk_gain_init: 5.0 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: e92f031e-fd9a-4579-a611-25f83418588d + scalar_lr: 0.02 + seed: 42 + skip_gates_enabled: True + sliding_window_enabled: False + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + tokenizer_path: ./data/tokenizers/fineweb_8192_bpe.model + train_batch_tokens: 786432 + train_files: ./data/datasets/fineweb10B_sp8192/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.999 + ttt_chunk_size: 64 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 96 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_output_dir: + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_doc_fraction: 1.0 + val_files: ./data/datasets/fineweb10B_sp8192/fineweb_val_*.bin + val_loss_every: 4000 + vocab_size: 8192 + warmdown_frac: 0.667 + warmup_steps: 20 + world_size: 8 + xsa_last_n: 11 +train_shards: 80 +val_tokens: 40540160 +model_params:35944537 +gptq:reserving 12s, effective=588000ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +0/20000 val_loss: 9.0078 val_bpb: 3.4871 +1/20000 train_loss: 9.0072 train_time: 0.0m tok/s: 16557615 +2/20000 train_loss: 12.2941 train_time: 0.0m tok/s: 12933081 +3/20000 train_loss: 11.2383 train_time: 0.0m tok/s: 11060181 +4/20000 train_loss: 9.5960 train_time: 0.0m tok/s: 10156181 +5/20000 train_loss: 8.2340 train_time: 0.0m tok/s: 9758532 +500/20000 train_loss: 3.2763 train_time: 0.8m tok/s: 8169590 +1000/20000 train_loss: 3.0339 train_time: 1.6m tok/s: 8128945 +1500/20000 train_loss: 3.0356 train_time: 2.4m tok/s: 8118877 +2000/20000 train_loss: 3.0061 train_time: 3.2m tok/s: 8117520 +layer_loop:enabled step:2124 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2500/20000 train_loss: 3.0814 train_time: 4.4m tok/s: 7522506 +3000/20000 train_loss: 2.9215 train_time: 5.5m tok/s: 7097438 +3500/20000 train_loss: 2.9834 train_time: 6.7m tok/s: 6823011 +4000/20000 train_loss: 2.9114 train_time: 7.9m tok/s: 6632061 +4000/20000 val_loss: 2.8857 val_bpb: 1.1171 +4500/20000 train_loss: 2.8542 train_time: 9.1m tok/s: 6467504 +4774/20000 val_loss: 2.7815 val_bpb: 1.0768 +stopping_early: wallclock_cap train_time: 588111ms step: 4774/20000 +peak memory allocated: 40027 MiB reserved: 44130 MiB +ema:applying EMA weights + +beginning eval timer +pre-quantization post-ema val_loss:2.78209434 val_bpb:1.07700071 eval_time:9571ms +Serialized model: 135408623 bytes +Code size (uncompressed): 115805 bytes +Code size (compressed): 26710 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 12.4s +Quantized weights: + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int8): tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, lane_merge, skip_gates, skip_weights +Serialized model quantized+brotli: 15964298 bytes +Total submission size quantized+brotli: 15991008 bytes +quantized val_loss:2.81087436 val_bpb:1.08814199 eval_time:58131ms +ttt_lora:warming up compile +ttt_lora:compile warmup done (181.0s) +ttt_lora:docs:50000 rank:96 lr:0.0001 chunk:64 +ttt_progress: batch 777/782 batch_loss:2.7373 batch_bpb:1.0948 running_loss:2.7373 running_bpb:1.0948 doc_len:7190-7938 +ttt_progress: batch 772/782 batch_loss:2.7766 batch_bpb:1.1106 running_loss:2.7531 running_bpb:1.1012 doc_len:4937-5193 +ttt_progress: batch 767/782 batch_loss:2.7650 batch_bpb:1.1040 running_loss:2.7560 running_bpb:1.1018 doc_len:3963-4123 +ttt_progress: batch 761/782 batch_loss:2.7606 batch_bpb:1.0679 running_loss:2.7568 running_bpb:1.0959 doc_len:3336-3430 +ttt_progress: batch 755/782 batch_loss:2.7036 batch_bpb:1.0475 running_loss:2.7500 running_bpb:1.0896 doc_len:2899-2972 +ttt_progress: batch 748/782 batch_loss:2.8135 batch_bpb:1.0774 running_loss:2.7563 running_bpb:1.0884 doc_len:2539-2578 +ttt_progress: batch 743/782 batch_loss:2.7200 batch_bpb:1.0472 running_loss:2.7533 running_bpb:1.0848 doc_len:2355-2388 +ttt_progress: batch 735/782 batch_loss:2.8458 batch_bpb:1.0837 running_loss:2.7598 running_bpb:1.0847 doc_len:2116-2140 +ttt_progress: batch 731/782 batch_loss:2.7854 batch_bpb:1.0631 running_loss:2.7614 running_bpb:1.0833 doc_len:2017-2041 +ttt_progress: batch 727/782 batch_loss:2.7842 batch_bpb:1.0599 running_loss:2.7627 running_bpb:1.0819 doc_len:1936-1960 +ttt_progress: batch 717/782 batch_loss:2.8054 batch_bpb:1.0565 running_loss:2.7648 running_bpb:1.0806 doc_len:1754-1773 +ttt_progress: batch 712/782 batch_loss:2.8433 batch_bpb:1.0825 running_loss:2.7684 running_bpb:1.0807 doc_len:1684-1697 +ttt_progress: batch 704/782 batch_loss:2.7542 batch_bpb:1.0272 running_loss:2.7678 running_bpb:1.0784 doc_len:1595-1606 +ttt_progress: batch 695/782 batch_loss:2.7930 batch_bpb:1.0828 running_loss:2.7687 running_bpb:1.0786 doc_len:1504-1513 +ttt_progress: batch 692/782 batch_loss:2.7799 batch_bpb:1.0547 running_loss:2.7691 running_bpb:1.0777 doc_len:1477-1484 +ttt_progress: batch 682/782 batch_loss:2.8158 batch_bpb:1.0756 running_loss:2.7706 running_bpb:1.0776 doc_len:1393-1400 +ttt_progress: batch 679/782 batch_loss:2.8634 batch_bpb:1.0909 running_loss:2.7735 running_bpb:1.0781 doc_len:1368-1374 +ttt_progress: batch 671/782 batch_loss:2.8918 batch_bpb:1.1208 running_loss:2.7769 running_bpb:1.0793 doc_len:1316-1321 +ttt_progress: batch 664/782 batch_loss:2.7125 batch_bpb:1.0458 running_loss:2.7751 running_bpb:1.0784 doc_len:1270-1275 +ttt_progress: batch 659/782 batch_loss:2.7270 batch_bpb:1.0269 running_loss:2.7739 running_bpb:1.0770 doc_len:1239-1245 +ttt_progress: batch 652/782 batch_loss:2.8087 batch_bpb:1.0761 running_loss:2.7747 running_bpb:1.0770 doc_len:1198-1203 +ttt_progress: batch 645/782 batch_loss:2.8078 batch_bpb:1.0985 running_loss:2.7755 running_bpb:1.0775 doc_len:1160-1166 +ttt_progress: batch 634/782 batch_loss:2.7085 batch_bpb:1.0456 running_loss:2.7741 running_bpb:1.0768 doc_len:1105-1111 +ttt_progress: batch 627/782 batch_loss:2.7479 batch_bpb:1.0402 running_loss:2.7735 running_bpb:1.0760 doc_len:1073-1077 +ttt_progress: batch 624/782 batch_loss:2.8026 batch_bpb:1.0784 running_loss:2.7741 running_bpb:1.0761 doc_len:1060-1064 +ttt_progress: batch 617/782 batch_loss:2.7524 batch_bpb:1.0418 running_loss:2.7737 running_bpb:1.0754 doc_len:1027-1031 +ttt_progress: batch 605/782 batch_loss:2.7503 batch_bpb:1.0609 running_loss:2.7733 running_bpb:1.0752 doc_len:978-982 +ttt_progress: batch 598/782 batch_loss:2.8128 batch_bpb:1.0714 running_loss:2.7740 running_bpb:1.0751 doc_len:950-954 +ttt_progress: batch 595/782 batch_loss:2.7435 batch_bpb:1.0607 running_loss:2.7735 running_bpb:1.0749 doc_len:940-943 +ttt_progress: batch 588/782 batch_loss:2.7521 batch_bpb:1.0500 running_loss:2.7731 running_bpb:1.0745 doc_len:917-921 +ttt_progress: batch 581/782 batch_loss:2.7373 batch_bpb:1.0211 running_loss:2.7726 running_bpb:1.0737 doc_len:894-897 +ttt_progress: batch 575/782 batch_loss:2.8061 batch_bpb:1.0566 running_loss:2.7731 running_bpb:1.0734 doc_len:874-877 +ttt_progress: batch 568/782 batch_loss:2.8085 batch_bpb:1.0594 running_loss:2.7736 running_bpb:1.0732 doc_len:852-855 +ttt_progress: batch 561/782 batch_loss:2.7226 batch_bpb:1.0677 running_loss:2.7729 running_bpb:1.0731 doc_len:831-834 +ttt_progress: batch 549/782 batch_loss:2.7748 batch_bpb:1.0676 running_loss:2.7729 running_bpb:1.0731 doc_len:795-798 +ttt_progress: batch 542/782 batch_loss:2.8443 batch_bpb:1.0774 running_loss:2.7738 running_bpb:1.0731 doc_len:777-779 +ttt_progress: batch 535/782 batch_loss:2.8027 batch_bpb:1.0626 running_loss:2.7741 running_bpb:1.0730 doc_len:759-762 +ttt_progress: batch 530/782 batch_loss:2.8184 batch_bpb:1.0433 running_loss:2.7746 running_bpb:1.0726 doc_len:747-750 +ttt_progress: batch 524/782 batch_loss:2.8237 batch_bpb:1.0552 running_loss:2.7752 running_bpb:1.0724 doc_len:732-735 +ttt_progress: batch 517/782 batch_loss:2.7862 batch_bpb:1.0546 running_loss:2.7753 running_bpb:1.0723 doc_len:715-717 +ttt_progress: batch 510/782 batch_loss:2.7691 batch_bpb:1.0241 running_loss:2.7752 running_bpb:1.0717 doc_len:698-700 +ttt_progress: batch 504/782 batch_loss:2.8791 batch_bpb:1.1032 running_loss:2.7763 running_bpb:1.0721 doc_len:685-686 +ttt_progress: batch 498/782 batch_loss:2.6879 batch_bpb:1.0406 running_loss:2.7754 running_bpb:1.0717 doc_len:671-673 +ttt_progress: batch 491/782 batch_loss:2.7468 batch_bpb:1.0349 running_loss:2.7751 running_bpb:1.0714 doc_len:655-657 +ttt_progress: batch 484/782 batch_loss:2.8157 batch_bpb:1.0746 running_loss:2.7755 running_bpb:1.0714 doc_len:641-643 +ttt_progress: batch 473/782 batch_loss:2.8452 batch_bpb:1.0825 running_loss:2.7761 running_bpb:1.0715 doc_len:618-620 +ttt_progress: batch 466/782 batch_loss:2.8132 batch_bpb:1.0695 running_loss:2.7764 running_bpb:1.0715 doc_len:604-606 +ttt_progress: batch 459/782 batch_loss:2.7486 batch_bpb:1.0430 running_loss:2.7762 running_bpb:1.0713 doc_len:591-593 +ttt_progress: batch 452/782 batch_loss:2.7560 batch_bpb:1.0631 running_loss:2.7760 running_bpb:1.0712 doc_len:579-580 +ttt_progress: batch 445/782 batch_loss:2.7812 batch_bpb:1.0693 running_loss:2.7761 running_bpb:1.0712 doc_len:566-568 +ttt_progress: batch 438/782 batch_loss:2.7286 batch_bpb:1.0616 running_loss:2.7757 running_bpb:1.0711 doc_len:553-555 +ttt_progress: batch 431/782 batch_loss:2.7667 batch_bpb:1.0680 running_loss:2.7756 running_bpb:1.0711 doc_len:540-542 +ttt_progress: batch 424/782 batch_loss:2.8130 batch_bpb:1.0872 running_loss:2.7759 running_bpb:1.0712 doc_len:528-530 +ttt_progress: batch 417/782 batch_loss:2.8247 batch_bpb:1.0592 running_loss:2.7762 running_bpb:1.0711 doc_len:516-517 +ttt_progress: batch 410/782 batch_loss:2.7901 batch_bpb:1.0594 running_loss:2.7763 running_bpb:1.0710 doc_len:505-507 +ttt_progress: batch 403/782 batch_loss:2.8339 batch_bpb:1.0590 running_loss:2.7767 running_bpb:1.0710 doc_len:493-495 +ttt_progress: batch 396/782 batch_loss:2.7758 batch_bpb:1.0622 running_loss:2.7767 running_bpb:1.0709 doc_len:482-484 +ttt_progress: batch 389/782 batch_loss:2.7968 batch_bpb:1.0653 running_loss:2.7768 running_bpb:1.0709 doc_len:471-473 +ttt_progress: batch 382/782 batch_loss:2.9179 batch_bpb:1.1359 running_loss:2.7777 running_bpb:1.0713 doc_len:461-463 +ttt_progress: batch 375/782 batch_loss:2.8233 batch_bpb:1.1126 running_loss:2.7779 running_bpb:1.0715 doc_len:452-453 +ttt_progress: batch 368/782 batch_loss:2.8589 batch_bpb:1.0908 running_loss:2.7784 running_bpb:1.0716 doc_len:441-443 +ttt_progress: batch 361/782 batch_loss:2.8093 batch_bpb:1.0741 running_loss:2.7785 running_bpb:1.0716 doc_len:432-433 +ttt_progress: batch 354/782 batch_loss:2.8029 batch_bpb:1.0875 running_loss:2.7787 running_bpb:1.0717 doc_len:422-423 +ttt_progress: batch 347/782 batch_loss:2.8663 batch_bpb:1.0925 running_loss:2.7791 running_bpb:1.0718 doc_len:413-414 +ttt_progress: batch 340/782 batch_loss:2.8361 batch_bpb:1.0970 running_loss:2.7794 running_bpb:1.0719 doc_len:403-404 +ttt_progress: batch 333/782 batch_loss:2.9176 batch_bpb:1.1363 running_loss:2.7801 running_bpb:1.0722 doc_len:394-395 +ttt_progress: batch 326/782 batch_loss:2.8672 batch_bpb:1.1332 running_loss:2.7805 running_bpb:1.0725 doc_len:385-387 +ttt_progress: batch 318/782 batch_loss:2.8291 batch_bpb:1.0731 running_loss:2.7807 running_bpb:1.0725 doc_len:374-376 +ttt_progress: batch 311/782 batch_loss:2.8683 batch_bpb:1.0988 running_loss:2.7811 running_bpb:1.0726 doc_len:365-367 +ttt_progress: batch 305/782 batch_loss:2.8749 batch_bpb:1.0907 running_loss:2.7815 running_bpb:1.0727 doc_len:358-359 +ttt_progress: batch 298/782 batch_loss:2.8563 batch_bpb:1.1056 running_loss:2.7818 running_bpb:1.0729 doc_len:349-351 +ttt_progress: batch 291/782 batch_loss:2.9678 batch_bpb:1.1204 running_loss:2.7826 running_bpb:1.0731 doc_len:341-342 +ttt_progress: batch 284/782 batch_loss:2.8972 batch_bpb:1.0922 running_loss:2.7831 running_bpb:1.0731 doc_len:333-334 +ttt_progress: batch 277/782 batch_loss:2.8277 batch_bpb:1.1137 running_loss:2.7832 running_bpb:1.0733 doc_len:325-326 +ttt_progress: batch 270/782 batch_loss:2.7882 batch_bpb:1.0942 running_loss:2.7832 running_bpb:1.0734 doc_len:318-319 +ttt_progress: batch 264/782 batch_loss:2.9062 batch_bpb:1.1503 running_loss:2.7837 running_bpb:1.0736 doc_len:311-312 +ttt_progress: batch 257/782 batch_loss:2.9354 batch_bpb:1.1176 running_loss:2.7842 running_bpb:1.0738 doc_len:302-304 +ttt_progress: batch 251/782 batch_loss:2.8826 batch_bpb:1.1118 running_loss:2.7846 running_bpb:1.0739 doc_len:296-297 +ttt_progress: batch 244/782 batch_loss:2.9543 batch_bpb:1.1587 running_loss:2.7852 running_bpb:1.0742 doc_len:289-290 +ttt_progress: batch 237/782 batch_loss:2.9275 batch_bpb:1.1509 running_loss:2.7856 running_bpb:1.0745 doc_len:282-283 +ttt_progress: batch 229/782 batch_loss:2.9155 batch_bpb:1.1468 running_loss:2.7860 running_bpb:1.0747 doc_len:274-275 +ttt_progress: batch 222/782 batch_loss:2.8870 batch_bpb:1.1216 running_loss:2.7864 running_bpb:1.0748 doc_len:267-268 +ttt_progress: batch 215/782 batch_loss:2.8534 batch_bpb:1.1450 running_loss:2.7866 running_bpb:1.0750 doc_len:260-261 +ttt_progress: batch 209/782 batch_loss:2.9311 batch_bpb:1.1604 running_loss:2.7870 running_bpb:1.0753 doc_len:254-255 +ttt_progress: batch 202/782 batch_loss:2.8842 batch_bpb:1.1401 running_loss:2.7873 running_bpb:1.0755 doc_len:248-249 +ttt_progress: batch 195/782 batch_loss:2.8542 batch_bpb:1.1171 running_loss:2.7874 running_bpb:1.0756 doc_len:242-243 +ttt_progress: batch 189/782 batch_loss:2.9746 batch_bpb:1.2073 running_loss:2.7879 running_bpb:1.0759 doc_len:237-237 +ttt_progress: batch 180/782 batch_loss:2.9072 batch_bpb:1.1337 running_loss:2.7883 running_bpb:1.0761 doc_len:229-230 +ttt_progress: batch 173/782 batch_loss:2.9859 batch_bpb:1.1608 running_loss:2.7888 running_bpb:1.0763 doc_len:223-224 +ttt_progress: batch 167/782 batch_loss:2.9814 batch_bpb:1.1917 running_loss:2.7892 running_bpb:1.0766 doc_len:218-218 +ttt_progress: batch 160/782 batch_loss:2.8791 batch_bpb:1.1315 running_loss:2.7895 running_bpb:1.0767 doc_len:212-212 +ttt_progress: batch 154/782 batch_loss:3.0086 batch_bpb:1.1645 running_loss:2.7900 running_bpb:1.0769 doc_len:207-207 +ttt_progress: batch 146/782 batch_loss:2.9071 batch_bpb:1.1540 running_loss:2.7902 running_bpb:1.0771 doc_len:200-201 +ttt_progress: batch 138/782 batch_loss:2.9293 batch_bpb:1.1660 running_loss:2.7905 running_bpb:1.0772 doc_len:194-195 +ttt_progress: batch 132/782 batch_loss:2.9539 batch_bpb:1.1368 running_loss:2.7909 running_bpb:1.0774 doc_len:189-189 +ttt_progress: batch 124/782 batch_loss:2.8854 batch_bpb:1.1542 running_loss:2.7911 running_bpb:1.0775 doc_len:183-184 +ttt_progress: batch 116/782 batch_loss:3.0184 batch_bpb:1.1936 running_loss:2.7915 running_bpb:1.0777 doc_len:177-178 +ttt_progress: batch 109/782 batch_loss:3.0699 batch_bpb:1.2098 running_loss:2.7921 running_bpb:1.0780 doc_len:172-173 +ttt_progress: batch 103/782 batch_loss:2.8903 batch_bpb:1.1186 running_loss:2.7922 running_bpb:1.0781 doc_len:168-168 +ttt_progress: batch 96/782 batch_loss:2.9534 batch_bpb:1.1544 running_loss:2.7925 running_bpb:1.0782 doc_len:162-163 +ttt_progress: batch 90/782 batch_loss:3.0331 batch_bpb:1.1964 running_loss:2.7929 running_bpb:1.0784 doc_len:158-158 +ttt_progress: batch 82/782 batch_loss:2.9862 batch_bpb:1.2020 running_loss:2.7933 running_bpb:1.0786 doc_len:151-152 +ttt_progress: batch 75/782 batch_loss:3.1050 batch_bpb:1.2191 running_loss:2.7938 running_bpb:1.0788 doc_len:146-147 +ttt_progress: batch 68/782 batch_loss:3.1173 batch_bpb:1.2110 running_loss:2.7943 running_bpb:1.0790 doc_len:141-142 +ttt_progress: batch 62/782 batch_loss:2.9919 batch_bpb:1.2102 running_loss:2.7946 running_bpb:1.0792 doc_len:136-137 +ttt_progress: batch 56/782 batch_loss:3.0640 batch_bpb:1.2076 running_loss:2.7950 running_bpb:1.0794 doc_len:131-132 +ttt_progress: batch 50/782 batch_loss:2.9808 batch_bpb:1.2236 running_loss:2.7952 running_bpb:1.0796 doc_len:126-127 +ttt_progress: batch 45/782 batch_loss:3.0902 batch_bpb:1.2362 running_loss:2.7956 running_bpb:1.0798 doc_len:122-123 +ttt_progress: batch 39/782 batch_loss:3.1322 batch_bpb:1.2378 running_loss:2.7960 running_bpb:1.0800 doc_len:118-119 +ttt_progress: batch 35/782 batch_loss:3.0314 batch_bpb:1.2038 running_loss:2.7963 running_bpb:1.0801 doc_len:115-115 +ttt_progress: batch 28/782 batch_loss:3.0222 batch_bpb:1.2170 running_loss:2.7966 running_bpb:1.0803 doc_len:108-109 +ttt_progress: batch 21/782 batch_loss:3.2223 batch_bpb:1.2529 running_loss:2.7971 running_bpb:1.0805 doc_len:102-103 +ttt_progress: batch 15/782 batch_loss:3.2415 batch_bpb:1.2401 running_loss:2.7975 running_bpb:1.0807 doc_len:95-97 +ttt_progress: batch 9/782 batch_loss:3.2127 batch_bpb:1.2731 running_loss:2.7979 running_bpb:1.0808 doc_len:87-89 +ttt_progress: batch 3/782 batch_loss:3.3392 batch_bpb:1.2664 running_loss:2.7984 running_bpb:1.0810 doc_len:75-78 +quantized_ttt_lora val_loss:2.78341179 val_bpb:1.07754544 eval_time:243497ms +total_eval_time:404.0s +total_eval_time_with_compile:585.0s diff --git a/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed999.log b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed999.log new file mode 100644 index 0000000000..e0467f124a --- /dev/null +++ b/records/track_10min_16mb/2026-04-11_SP8192_VarLen_LoRATTT_FusedMLP/train_seed999.log @@ -0,0 +1,284 @@ +W0411 09:36:17.108000 112632 torch/distributed/run.py:803] +W0411 09:36:17.108000 112632 torch/distributed/run.py:803] ***************************************** +W0411 09:36:17.108000 112632 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0411 09:36:17.108000 112632 torch/distributed/run.py:803] ***************************************** +Hyperparameters: + adam_eps: 1e-08 + adam_wd: 0.02 + artifact_dir: + beta1: 0.9 + beta2: 0.95 + compressor: brotli + data_dir: ./data/ + datasets_dir: ./data/datasets/fineweb10B_sp8192 + distributed: True + ema_decay: 0.997 + embed_bits: 8 + embed_clip_sigmas: 20.0 + embed_lr: 0.6 + embed_wd: 0.095 + embedding_dim: 512 + enable_looping_at: 0.35 + etlb_clip: 3.0 + etlb_lr: 0.05 + etlb_steps: 5 + eval_only_path: + eval_seq_len: 2048 + eval_stride: 64 + gptq_calibration_batches: 64 + gptq_reserve_seconds: 12.0 + grad_accum_steps: 1 + grad_clip_norm: 0.3 + head_lr: 0.008 + is_main_process: True + iterations: 20000 + ln_scale: True + local_rank: 0 + logfile: logs/d5101fa1-8d3b-4158-be13-6e842748fcd4.txt + logit_softcap: 30.0 + loop_end: 5 + loop_start: 3 + matrix_bits: 6 + matrix_clip_sigmas: 12.85 + matrix_lr: 0.022 + max_wallclock_seconds: 600.0 + min_lr: 0.0 + mlp_mult: 4.0 + model_dim: 512 + model_path: final_model.pt + muon_backend_steps: 5 + muon_beta2: 0.95 + muon_momentum: 0.97 + muon_momentum_warmup_start: 0.92 + muon_momentum_warmup_steps: 1500 + muon_row_normalize: True + muon_wd: 0.095 + num_heads: 8 + num_kv_heads: 4 + num_layers: 11 + num_loops: 2 + parallel_start_layer: 7 + qk_gain_init: 5.0 + quantized_model_path: final_model.int6.ptz + rank: 0 + rope_base: 10000.0 + rope_dims: 16 + rope_train_seq_len: 2048 + rope_yarn: False + run_id: d5101fa1-8d3b-4158-be13-6e842748fcd4 + scalar_lr: 0.02 + seed: 999 + skip_gates_enabled: True + sliding_window_enabled: False + tie_embeddings: True + tied_embed_init_std: 0.005 + tied_embed_lr: 0.03 + tokenizer_path: ./data/tokenizers/fineweb_8192_bpe.model + train_batch_tokens: 786432 + train_files: ./data/datasets/fineweb10B_sp8192/fineweb_train_*.bin + train_log_every: 500 + train_seq_len: 2048 + ttt_batch_size: 64 + ttt_beta1: 0.0 + ttt_beta2: 0.999 + ttt_chunk_size: 64 + ttt_enabled: True + ttt_eval_batches: + ttt_eval_seq_len: 2048 + ttt_grad_steps: 1 + ttt_k_lora: True + ttt_lora_lr: 0.0001 + ttt_lora_rank: 96 + ttt_mlp_lora: True + ttt_o_lora: True + ttt_optimizer: adam + ttt_output_dir: + ttt_weight_decay: 0.5 + val_batch_tokens: 524288 + val_doc_fraction: 1.0 + val_files: ./data/datasets/fineweb10B_sp8192/fineweb_val_*.bin + val_loss_every: 4000 + vocab_size: 8192 + warmdown_frac: 0.667 + warmup_steps: 20 + world_size: 8 + xsa_last_n: 11 +train_shards: 80 +val_tokens: 40540160 +model_params:35944537 +gptq:reserving 12s, effective=588000ms +warmup_cu_buckets:64,128,192,256 iters_each:3 +warmup_step: 1/20 +warmup_step: 2/20 +warmup_step: 3/20 +warmup_step: 4/20 +warmup_step: 5/20 +warmup_step: 6/20 +warmup_step: 10/20 +warmup_step: 20/20 +loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +loop_warmup_step: 1/20 +loop_warmup_step: 2/20 +loop_warmup_step: 3/20 +loop_warmup_step: 4/20 +loop_warmup_step: 5/20 +loop_warmup_step: 6/20 +loop_warmup_step: 10/20 +loop_warmup_step: 20/20 +0/20000 val_loss: 9.0088 val_bpb: 3.4875 +1/20000 train_loss: 9.0093 train_time: 0.0m tok/s: 16630351 +2/20000 train_loss: 12.2674 train_time: 0.0m tok/s: 13292932 +3/20000 train_loss: 11.1657 train_time: 0.0m tok/s: 11242095 +4/20000 train_loss: 9.5007 train_time: 0.0m tok/s: 10344453 +5/20000 train_loss: 8.1682 train_time: 0.0m tok/s: 9905677 +500/20000 train_loss: 3.2839 train_time: 0.8m tok/s: 8154192 +1000/20000 train_loss: 3.0409 train_time: 1.6m tok/s: 8110161 +1500/20000 train_loss: 3.0466 train_time: 2.4m tok/s: 8104390 +2000/20000 train_loss: 3.0122 train_time: 3.2m tok/s: 8102512 +layer_loop:enabled step:2120 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10] +2500/20000 train_loss: 3.0908 train_time: 4.3m tok/s: 7570046 +3000/20000 train_loss: 2.9272 train_time: 5.5m tok/s: 7111217 +3500/20000 train_loss: 2.9869 train_time: 6.7m tok/s: 6836369 +4000/20000 train_loss: 2.9154 train_time: 7.9m tok/s: 6643465 +4000/20000 val_loss: 2.8896 val_bpb: 1.1186 +4500/20000 train_loss: 2.8634 train_time: 9.1m tok/s: 6501167 +4810/20000 val_loss: 2.7839 val_bpb: 1.0777 +stopping_early: wallclock_cap train_time: 588193ms step: 4810/20000 +peak memory allocated: 40019 MiB reserved: 44090 MiB +ema:applying EMA weights + +beginning eval timer +pre-quantization post-ema val_loss:2.78423851 val_bpb:1.07783075 eval_time:8263ms +Serialized model: 135408623 bytes +Code size (uncompressed): 115805 bytes +Code size (compressed): 26710 bytes +GPTQ:collecting Hessians from calibration data... +GPTQ:collected 67 Hessians in 12.5s +Quantized weights: + gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight + gptq (int8): tok_emb.weight + passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, lane_merge, skip_gates, skip_weights +Serialized model quantized+brotli: 15965560 bytes +Total submission size quantized+brotli: 15992270 bytes +quantized val_loss:2.81130195 val_bpb:1.08830752 eval_time:8973ms +ttt_lora:warming up compile +ttt_lora:compile warmup done (93.6s) +ttt_lora:docs:50000 rank:96 lr:0.0001 chunk:64 +ttt_progress: batch 775/782 batch_loss:2.7029 batch_bpb:1.0700 running_loss:2.7029 running_bpb:1.0700 doc_len:5853-6355 +ttt_progress: batch 774/782 batch_loss:2.7414 batch_bpb:1.0826 running_loss:2.7215 running_bpb:1.0761 doc_len:5552-5852 +ttt_progress: batch 769/782 batch_loss:2.7865 batch_bpb:1.1027 running_loss:2.7391 running_bpb:1.0833 doc_len:4307-4479 +ttt_progress: batch 763/782 batch_loss:2.8049 batch_bpb:1.1068 running_loss:2.7510 running_bpb:1.0876 doc_len:3536-3637 +ttt_progress: batch 757/782 batch_loss:2.6534 batch_bpb:1.0254 running_loss:2.7379 running_bpb:1.0791 doc_len:3033-3108 +ttt_progress: batch 750/782 batch_loss:2.8394 batch_bpb:1.0712 running_loss:2.7485 running_bpb:1.0782 doc_len:2638-2688 +ttt_progress: batch 746/782 batch_loss:2.6925 batch_bpb:1.0601 running_loss:2.7435 running_bpb:1.0766 doc_len:2459-2501 +ttt_progress: batch 737/782 batch_loss:2.8146 batch_bpb:1.0733 running_loss:2.7487 running_bpb:1.0764 doc_len:2165-2193 +ttt_progress: batch 731/782 batch_loss:2.7855 batch_bpb:1.0631 running_loss:2.7510 running_bpb:1.0755 doc_len:2017-2041 +ttt_progress: batch 729/782 batch_loss:2.7335 batch_bpb:1.0417 running_loss:2.7500 running_bpb:1.0735 doc_len:1978-1994 +ttt_progress: batch 718/782 batch_loss:2.7887 batch_bpb:1.0750 running_loss:2.7519 running_bpb:1.0736 doc_len:1773-1792 +ttt_progress: batch 715/782 batch_loss:2.6590 batch_bpb:1.0448 running_loss:2.7476 running_bpb:1.0723 doc_len:1725-1739 +ttt_progress: batch 704/782 batch_loss:2.7566 batch_bpb:1.0281 running_loss:2.7480 running_bpb:1.0704 doc_len:1595-1606 +ttt_progress: batch 697/782 batch_loss:2.7717 batch_bpb:1.0443 running_loss:2.7489 running_bpb:1.0694 doc_len:1522-1534 +ttt_progress: batch 692/782 batch_loss:2.7824 batch_bpb:1.0557 running_loss:2.7500 running_bpb:1.0689 doc_len:1477-1484 +ttt_progress: batch 684/782 batch_loss:2.8073 batch_bpb:1.0792 running_loss:2.7519 running_bpb:1.0692 doc_len:1407-1414 +ttt_progress: batch 676/782 batch_loss:2.8036 batch_bpb:1.0712 running_loss:2.7534 running_bpb:1.0693 doc_len:1347-1353 +ttt_progress: batch 669/782 batch_loss:2.7914 batch_bpb:1.0585 running_loss:2.7545 running_bpb:1.0690 doc_len:1301-1308 +ttt_progress: batch 662/782 batch_loss:2.8230 batch_bpb:1.0772 running_loss:2.7563 running_bpb:1.0692 doc_len:1258-1263 +ttt_progress: batch 655/782 batch_loss:2.6903 batch_bpb:1.0234 running_loss:2.7547 running_bpb:1.0680 doc_len:1215-1220 +ttt_progress: batch 648/782 batch_loss:2.7587 batch_bpb:1.0457 running_loss:2.7548 running_bpb:1.0675 doc_len:1177-1182 +ttt_progress: batch 641/782 batch_loss:2.7774 batch_bpb:1.0457 running_loss:2.7553 running_bpb:1.0670 doc_len:1140-1144 +ttt_progress: batch 637/782 batch_loss:2.8156 batch_bpb:1.0848 running_loss:2.7566 running_bpb:1.0674 doc_len:1120-1123 +ttt_progress: batch 630/782 batch_loss:2.8395 batch_bpb:1.0633 running_loss:2.7583 running_bpb:1.0673 doc_len:1087-1092 +ttt_progress: batch 619/782 batch_loss:2.8036 batch_bpb:1.0621 running_loss:2.7591 running_bpb:1.0672 doc_len:1037-1041 +ttt_progress: batch 612/782 batch_loss:2.8344 batch_bpb:1.0469 running_loss:2.7605 running_bpb:1.0668 doc_len:1007-1012 +ttt_progress: batch 606/782 batch_loss:2.8288 batch_bpb:1.0884 running_loss:2.7617 running_bpb:1.0672 doc_len:982-986 +ttt_progress: batch 599/782 batch_loss:2.7495 batch_bpb:1.0560 running_loss:2.7615 running_bpb:1.0670 doc_len:954-958 +ttt_progress: batch 595/782 batch_loss:2.7429 batch_bpb:1.0605 running_loss:2.7612 running_bpb:1.0669 doc_len:940-943 +ttt_progress: batch 588/782 batch_loss:2.7520 batch_bpb:1.0499 running_loss:2.7610 running_bpb:1.0666 doc_len:917-921 +ttt_progress: batch 581/782 batch_loss:2.7366 batch_bpb:1.0209 running_loss:2.7607 running_bpb:1.0659 doc_len:894-897 +ttt_progress: batch 574/782 batch_loss:2.7919 batch_bpb:1.0429 running_loss:2.7611 running_bpb:1.0656 doc_len:871-874 +ttt_progress: batch 567/782 batch_loss:2.6884 batch_bpb:1.0355 running_loss:2.7601 running_bpb:1.0652 doc_len:849-852 +ttt_progress: batch 559/782 batch_loss:2.7658 batch_bpb:1.0512 running_loss:2.7602 running_bpb:1.0650 doc_len:824-827 +ttt_progress: batch 552/782 batch_loss:2.8054 batch_bpb:1.0455 running_loss:2.7608 running_bpb:1.0647 doc_len:804-806 +ttt_progress: batch 545/782 batch_loss:2.7961 batch_bpb:1.0573 running_loss:2.7612 running_bpb:1.0646 doc_len:785-788 +ttt_progress: batch 537/782 batch_loss:2.7179 batch_bpb:1.0277 running_loss:2.7607 running_bpb:1.0642 doc_len:764-767 +ttt_progress: batch 531/782 batch_loss:2.7834 batch_bpb:1.0557 running_loss:2.7609 running_bpb:1.0641 doc_len:750-752 +ttt_progress: batch 524/782 batch_loss:2.8282 batch_bpb:1.0569 running_loss:2.7617 running_bpb:1.0640 doc_len:732-735 +ttt_progress: batch 517/782 batch_loss:2.7864 batch_bpb:1.0547 running_loss:2.7620 running_bpb:1.0639 doc_len:715-717 +ttt_progress: batch 510/782 batch_loss:2.7677 batch_bpb:1.0236 running_loss:2.7620 running_bpb:1.0635 doc_len:698-700 +ttt_progress: batch 503/782 batch_loss:2.8349 batch_bpb:1.0795 running_loss:2.7627 running_bpb:1.0636 doc_len:683-685 +ttt_progress: batch 496/782 batch_loss:2.8485 batch_bpb:1.0557 running_loss:2.7636 running_bpb:1.0636 doc_len:666-668 +ttt_progress: batch 489/782 batch_loss:2.8023 batch_bpb:1.0835 running_loss:2.7639 running_bpb:1.0637 doc_len:651-653 +ttt_progress: batch 482/782 batch_loss:2.7630 batch_bpb:1.0843 running_loss:2.7639 running_bpb:1.0639 doc_len:637-639 +ttt_progress: batch 475/782 batch_loss:2.7418 batch_bpb:1.0279 running_loss:2.7637 running_bpb:1.0636 doc_len:622-623 +ttt_progress: batch 467/782 batch_loss:2.8085 batch_bpb:1.0610 running_loss:2.7641 running_bpb:1.0636 doc_len:606-608 +ttt_progress: batch 460/782 batch_loss:2.8014 batch_bpb:1.0625 running_loss:2.7644 running_bpb:1.0636 doc_len:593-595 +ttt_progress: batch 453/782 batch_loss:2.7711 batch_bpb:1.0634 running_loss:2.7645 running_bpb:1.0636 doc_len:580-582 +ttt_progress: batch 446/782 batch_loss:2.8284 batch_bpb:1.0917 running_loss:2.7650 running_bpb:1.0638 doc_len:568-569 +ttt_progress: batch 439/782 batch_loss:2.7641 batch_bpb:1.0476 running_loss:2.7649 running_bpb:1.0637 doc_len:555-556 +ttt_progress: batch 432/782 batch_loss:2.7731 batch_bpb:1.0550 running_loss:2.7650 running_bpb:1.0636 doc_len:542-544 +ttt_progress: batch 425/782 batch_loss:2.7666 batch_bpb:1.0526 running_loss:2.7650 running_bpb:1.0635 doc_len:530-532 +ttt_progress: batch 418/782 batch_loss:2.8200 batch_bpb:1.0757 running_loss:2.7654 running_bpb:1.0636 doc_len:517-519 +ttt_progress: batch 412/782 batch_loss:2.7207 batch_bpb:1.0566 running_loss:2.7651 running_bpb:1.0636 doc_len:508-510 +ttt_progress: batch 405/782 batch_loss:2.8302 batch_bpb:1.0697 running_loss:2.7655 running_bpb:1.0636 doc_len:497-498 +ttt_progress: batch 398/782 batch_loss:2.8847 batch_bpb:1.0956 running_loss:2.7663 running_bpb:1.0638 doc_len:486-487 +ttt_progress: batch 391/782 batch_loss:2.8249 batch_bpb:1.1002 running_loss:2.7666 running_bpb:1.0640 doc_len:475-476 +ttt_progress: batch 384/782 batch_loss:2.8544 batch_bpb:1.0951 running_loss:2.7671 running_bpb:1.0642 doc_len:464-466 +ttt_progress: batch 377/782 batch_loss:2.8136 batch_bpb:1.0909 running_loss:2.7674 running_bpb:1.0644 doc_len:454-455 +ttt_progress: batch 370/782 batch_loss:2.6910 batch_bpb:1.0469 running_loss:2.7670 running_bpb:1.0643 doc_len:444-446 +ttt_progress: batch 362/782 batch_loss:2.8251 batch_bpb:1.0682 running_loss:2.7673 running_bpb:1.0643 doc_len:433-434 +ttt_progress: batch 355/782 batch_loss:2.7223 batch_bpb:1.0727 running_loss:2.7671 running_bpb:1.0643 doc_len:423-424 +ttt_progress: batch 348/782 batch_loss:2.8247 batch_bpb:1.0734 running_loss:2.7674 running_bpb:1.0644 doc_len:414-415 +ttt_progress: batch 341/782 batch_loss:2.8823 batch_bpb:1.1034 running_loss:2.7679 running_bpb:1.0646 doc_len:404-406 +ttt_progress: batch 334/782 batch_loss:2.8781 batch_bpb:1.1075 running_loss:2.7685 running_bpb:1.0648 doc_len:395-396 +ttt_progress: batch 327/782 batch_loss:2.7855 batch_bpb:1.0814 running_loss:2.7685 running_bpb:1.0649 doc_len:387-388 +ttt_progress: batch 319/782 batch_loss:2.8357 batch_bpb:1.1125 running_loss:2.7689 running_bpb:1.0651 doc_len:376-377 +ttt_progress: batch 312/782 batch_loss:2.7449 batch_bpb:1.0715 running_loss:2.7688 running_bpb:1.0651 doc_len:367-368 +ttt_progress: batch 305/782 batch_loss:2.8700 batch_bpb:1.0888 running_loss:2.7692 running_bpb:1.0652 doc_len:358-359 +ttt_progress: batch 298/782 batch_loss:2.8548 batch_bpb:1.1050 running_loss:2.7696 running_bpb:1.0654 doc_len:349-351 +ttt_progress: batch 291/782 batch_loss:2.9621 batch_bpb:1.1183 running_loss:2.7703 running_bpb:1.0656 doc_len:341-342 +ttt_progress: batch 284/782 batch_loss:2.8961 batch_bpb:1.0918 running_loss:2.7708 running_bpb:1.0657 doc_len:333-334 +ttt_progress: batch 277/782 batch_loss:2.8214 batch_bpb:1.1112 running_loss:2.7710 running_bpb:1.0659 doc_len:325-326 +ttt_progress: batch 270/782 batch_loss:2.7950 batch_bpb:1.0969 running_loss:2.7711 running_bpb:1.0660 doc_len:318-319 +ttt_progress: batch 263/782 batch_loss:2.8448 batch_bpb:1.1080 running_loss:2.7714 running_bpb:1.0661 doc_len:310-311 +ttt_progress: batch 256/782 batch_loss:2.8922 batch_bpb:1.1338 running_loss:2.7718 running_bpb:1.0664 doc_len:301-302 +ttt_progress: batch 249/782 batch_loss:2.9102 batch_bpb:1.1592 running_loss:2.7723 running_bpb:1.0667 doc_len:294-295 +ttt_progress: batch 242/782 batch_loss:2.9142 batch_bpb:1.1142 running_loss:2.7728 running_bpb:1.0668 doc_len:287-288 +ttt_progress: batch 235/782 batch_loss:2.9404 batch_bpb:1.1177 running_loss:2.7733 running_bpb:1.0670 doc_len:280-281 +ttt_progress: batch 228/782 batch_loss:2.8797 batch_bpb:1.1395 running_loss:2.7737 running_bpb:1.0672 doc_len:273-274 +ttt_progress: batch 221/782 batch_loss:2.8423 batch_bpb:1.1407 running_loss:2.7739 running_bpb:1.0675 doc_len:266-267 +ttt_progress: batch 214/782 batch_loss:2.9458 batch_bpb:1.1333 running_loss:2.7744 running_bpb:1.0677 doc_len:259-260 +ttt_progress: batch 208/782 batch_loss:2.8266 batch_bpb:1.1161 running_loss:2.7745 running_bpb:1.0678 doc_len:254-254 +ttt_progress: batch 199/782 batch_loss:2.9524 batch_bpb:1.1316 running_loss:2.7750 running_bpb:1.0680 doc_len:246-247 +ttt_progress: batch 193/782 batch_loss:2.8953 batch_bpb:1.1665 running_loss:2.7754 running_bpb:1.0682 doc_len:240-241 +ttt_progress: batch 186/782 batch_loss:2.9442 batch_bpb:1.1763 running_loss:2.7758 running_bpb:1.0685 doc_len:234-235 +ttt_progress: batch 179/782 batch_loss:2.9498 batch_bpb:1.1522 running_loss:2.7763 running_bpb:1.0687 doc_len:228-229 +ttt_progress: batch 172/782 batch_loss:3.0047 batch_bpb:1.1817 running_loss:2.7768 running_bpb:1.0690 doc_len:222-223 +ttt_progress: batch 165/782 batch_loss:2.9561 batch_bpb:1.1697 running_loss:2.7773 running_bpb:1.0692 doc_len:216-217 +ttt_progress: batch 158/782 batch_loss:2.9039 batch_bpb:1.1496 running_loss:2.7776 running_bpb:1.0694 doc_len:210-211 +ttt_progress: batch 151/782 batch_loss:2.8091 batch_bpb:1.1071 running_loss:2.7777 running_bpb:1.0695 doc_len:204-205 +ttt_progress: batch 145/782 batch_loss:2.9108 batch_bpb:1.1421 running_loss:2.7780 running_bpb:1.0697 doc_len:200-200 +ttt_progress: batch 137/782 batch_loss:2.9520 batch_bpb:1.1896 running_loss:2.7783 running_bpb:1.0699 doc_len:193-194 +ttt_progress: batch 128/782 batch_loss:2.8466 batch_bpb:1.0929 running_loss:2.7785 running_bpb:1.0700 doc_len:186-187 +ttt_progress: batch 122/782 batch_loss:2.9017 batch_bpb:1.1610 running_loss:2.7787 running_bpb:1.0701 doc_len:181-182 +ttt_progress: batch 115/782 batch_loss:2.8688 batch_bpb:1.1576 running_loss:2.7789 running_bpb:1.0703 doc_len:176-177 +ttt_progress: batch 108/782 batch_loss:2.8767 batch_bpb:1.1048 running_loss:2.7791 running_bpb:1.0704 doc_len:171-172 +ttt_progress: batch 101/782 batch_loss:2.9690 batch_bpb:1.1653 running_loss:2.7794 running_bpb:1.0705 doc_len:166-167 +ttt_progress: batch 94/782 batch_loss:2.9922 batch_bpb:1.1801 running_loss:2.7798 running_bpb:1.0707 doc_len:160-161 +ttt_progress: batch 87/782 batch_loss:3.0285 batch_bpb:1.2105 running_loss:2.7802 running_bpb:1.0710 doc_len:155-156 +ttt_progress: batch 80/782 batch_loss:2.9108 batch_bpb:1.1923 running_loss:2.7805 running_bpb:1.0712 doc_len:150-151 +ttt_progress: batch 75/782 batch_loss:3.0900 batch_bpb:1.2132 running_loss:2.7810 running_bpb:1.0714 doc_len:146-147 +ttt_progress: batch 68/782 batch_loss:3.1191 batch_bpb:1.2118 running_loss:2.7815 running_bpb:1.0716 doc_len:141-142 +ttt_progress: batch 63/782 batch_loss:3.0145 batch_bpb:1.2158 running_loss:2.7818 running_bpb:1.0718 doc_len:137-138 +ttt_progress: batch 57/782 batch_loss:3.0574 batch_bpb:1.2324 running_loss:2.7822 running_bpb:1.0720 doc_len:132-133 +ttt_progress: batch 51/782 batch_loss:3.0393 batch_bpb:1.2148 running_loss:2.7826 running_bpb:1.0722 doc_len:127-128 +ttt_progress: batch 46/782 batch_loss:3.1417 batch_bpb:1.2285 running_loss:2.7831 running_bpb:1.0724 doc_len:123-124 +ttt_progress: batch 39/782 batch_loss:3.1467 batch_bpb:1.2435 running_loss:2.7835 running_bpb:1.0726 doc_len:118-119 +ttt_progress: batch 35/782 batch_loss:3.0306 batch_bpb:1.2035 running_loss:2.7838 running_bpb:1.0728 doc_len:115-115 +ttt_progress: batch 28/782 batch_loss:3.0325 batch_bpb:1.2211 running_loss:2.7841 running_bpb:1.0730 doc_len:108-109 +ttt_progress: batch 21/782 batch_loss:3.2073 batch_bpb:1.2471 running_loss:2.7846 running_bpb:1.0732 doc_len:102-103 +ttt_progress: batch 15/782 batch_loss:3.2296 batch_bpb:1.2356 running_loss:2.7851 running_bpb:1.0733 doc_len:95-97 +ttt_progress: batch 8/782 batch_loss:3.2702 batch_bpb:1.2642 running_loss:2.7855 running_bpb:1.0735 doc_len:86-87 +ttt_progress: batch 2/782 batch_loss:3.1484 batch_bpb:1.1677 running_loss:2.7858 running_bpb:1.0736 doc_len:70-75 +quantized_ttt_lora val_loss:2.78454746 val_bpb:1.07798510 eval_time:235786ms +total_eval_time:346.1s +total_eval_time_with_compile:439.8s