Bug in Parameters count for GPTQ quantized model

#1068
by Qubitium - opened

@alozowski I think I found a bug that is related to gptq and likely other quantized models in OpenLLM leaderboard. The parameters count appears off by a factor of 4? This gptq model of 3.2-1B instruct but filter will not show this model until I set to ~6B in the UI.

Leaderboard UI: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C6&search=modelcloud
Model: https://huggingface.co/ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortex-v1

Open LLM Leaderboard org

Hi @Qubitium ,

Thanks for opening the discussion here! Currently, the Leaderboard calculates the number of parameters for GPTQ models using this method (line 118). Could you suggest any improvements to this calculation? It would be appreciated

@alozowski The current code assumes a factor of 8 which means it assumes gptq quants are 4bits (int4), and 8 of int4 packed into one int32. gptq can be various bits including 2, 3, 4, 8. So this is one bug.

But even if the current code does on surface correctly calculate the 4bit refactor I still don't know why the end-value is ~4x more than reality. I may have more time on Monday to fix this if you haven't fixed this already.

Sign up or log in to comment