You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.
ie:
For RWKV to initialize a pre-train model of size 70B set flags: example.py --n_embd XXXXX etc
For transformer to initialize a pre-train model of size 70B set flags: example2.py --some_flag etc
Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)
The text was updated successfully, but these errors were encountered:
On Wed, Feb 7, 2024, 9:44 AM bennmann ***@***.***> wrote:
It would be good to have a section in the top level read.me on
initializing a new pre-train model of various common sizes (up to 70B) for
each architecture.
ie:
For RWKV to initialize a pre-train model of size 70B set flags:
example.py --n_embd XXXXX etc
For transformer to initialize a pre-train model of size 70B set flags:
example2.py --some_flag etc
Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large
examples each)
—
Reply to this email directly, view it on GitHub
<#5>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACDK33VPGREMHUF3VGQGKWDYSPKRHAVCNFSM6AAAAABC6NLQ5OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNZUHE4TAMI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.
ie:
For RWKV to initialize a pre-train model of size 70B set flags:
example.py --n_embd XXXXX etc
For transformer to initialize a pre-train model of size 70B set flags:
example2.py --some_flag etc
Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)
The text was updated successfully, but these errors were encountered: