Number of parameters

by HugoLaurencon - opened Apr 6, 2024

Discussion

HugoLaurencon

Apr 6, 2024

For my understanding, why is it callled a 7B if it has 8.54B parameters in the safetensors?

riddhiman

Apr 7, 2024

This comment has been hidden

rombodawg

Apr 7, 2024

@HugoLaurencon I think they are trying to compete with mistral-7b so they are faking the name to seem like its smaller than it actually is. Because the more popular model is a 7b parameter size. If google has a better explanation they can pitch in here.

anihm136

Google org Apr 8, 2024

(Disclaimer: I'm not from the Gemma development team, and this explanation is to the best of my understanding) The model itself contains close to 7B parameters. However the number you see on the model page on HF should also include the embeddings layer, which would add to the overall number (but is not strictly part of the model size). If you see the Mistral 7B model, it also has a small number of parameters above 7B on the HF page. However, the vocabulary for Gemma 7B is much larger (~8x), which would result in a larger number of params shown on HF

HugoLaurencon

Apr 9, 2024

Ok thanks! I leave this open if people want to look at it but feel free to close.

postmasters

Google org Apr 9, 2024

CodeGemma (https://goo.gle/codegemma) uses the term "size class".

qnixsynapse

Apr 9, 2024

I think it's better to be represent it as an 8B model. Yes, Mistral-7B has small number of parameters above 7B(7.24) however we call it 7B due by rounding it off to 7B, i.e we remove the fractional part to its nearest neighbor. Similar case with llama-2 which has number of parameters lower than than 7B(6.7 iirc) but we round it off to 7B. :)

osanseviero

Google org Apr 9, 2024

lkv

Google org 2 days ago

Hi @HugoLaurencon , Sorry for late response, many of those are embedding parameters, which we often do not count in the total parameter count for papers and releases. With respect to the emerging 7B class of open models, we've targeted the same use cases as other models in the 7B class from a hardware and software compatibility standpoint -- so it should be strictly transferable for many, if not all, 7B-class use cases.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment