Extra SLERP parameters

#1
by sometimesanotion - opened

These are interesting SLERP directives you've used! I've tried your recipe with minor tweaks at sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental, using Arcee's mergekit-gui space. Any guesses how these SLERP merges will score?

These are interesting SLERP directives you've used! I've tried your recipe with minor tweaks at sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental, using Arcee's mergekit-gui space. Any guesses how these SLERP merges will score?

I must say that the new technologies used in these projects are truly impressive to me. I have just recently learned about them. In some of my new experimental projects like tempesthenno-nuslerp-001, I've drawn significant inspiration from your remarkable project Lamarck-14B-v0.6, and I believe @bamec66557 's Qwen-2.5-14B-MINUS will also serve as a role model for my learning in the next steps.

However, I have some personal concerns. In an era where computational costs are consistently decreasing, can we push the boundaries even further? While @arcee-ai's research and work are highly valuable references, I'm concerned their approach may eventually reach an optimizable limit (regardless of evaluation methods) in terms of "real performance" β€” perhaps we're already approaching that edge, at least for 14B models. Thus, what will be our next direction for advancement β€” Reinforcement Learning, or perhaps just expanding model size (personally, I don't think this is a reliable approach, since once the size has been increased, we likely have no way to scale it back down)?

Sign up or log in to comment