[BUG] quantization_config is not supported in TransformersModel #969

AlekseyMalyshev · 2025-03-13T12:41:57Z

Describe the bug

The following function supports quantization_config parameter that can be used to utilize quantization.

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map=device_map,
        torch_dtype=torch_dtype,
        trust_remote_code=trust_remote_code,
        quantization_config=quantization_config,
)

However the smolagents.TransformersModel() does not pass this parameter. Therefore quantization is not supported.

Code to reproduce the error

    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
    return TransformersModel(
        model_id=model_id,
        quantization_config=quantization_config,
        max_new_tokens=10240
    )

Error logs (if any)
There are no logs. I can see that model does not fit in my GPU memory, however it should using 8 bit quantization.

Expected behavior
The model should load in 8bit mode

Packages version:

accelerate==1.4.0
aiofiles==24.1.0
aiohappyeyeballs==2.5.0
aiohttp==3.11.13
aiosignal==1.3.2
altair==5.5.0
annotated-types==0.7.0
anyio==4.8.0
asgiref==3.8.1
async-timeout==4.0.3
attrs==25.1.0
backoff==2.2.1
bcrypt==4.3.0
beautifulsoup4==4.13.3
bitsandbytes==0.45.3
blinker==1.9.0
build==1.2.2.post1
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
chardet==5.2.0
charset-normalizer==3.4.1
chroma-hnswlib==0.7.6
chromadb==0.6.3
click==8.1.8
coloredlogs==15.0.1
cryptography==44.0.2
dataclasses-json==0.6.7
Deprecated==1.2.18
distro==1.9.0
docx2txt==0.8
duckduckgo_search==7.5.1
durationpy==0.9
emoji==2.14.1
eval_type_backport==0.2.2
exceptiongroup==1.2.2
fastapi==0.115.11
filelock==3.17.0
filetype==1.2.0
flatbuffers==25.2.10
frozenlist==1.5.0
fsspec==2025.3.0
gitdb==4.0.12
GitPython==3.1.44
google-auth==2.38.0
googleapis-common-protos==1.69.1
greenlet==3.1.1
grpcio==1.71.0
h11==0.14.0
html5lib==1.1
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
httpx-sse==0.4.0
huggingface-hub==0.29.3
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
importlib_resources==6.5.2
Jinja2==3.1.6
jiter==0.9.0
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kubernetes==32.0.1
langchain==0.3.20
langchain-chroma==0.2.2
langchain-community==0.3.19
langchain-core==0.3.43
langchain-huggingface==0.1.2
langchain-text-splitters==0.3.6
langdetect==1.0.9
langsmith==0.3.13
lxml==5.3.1
markdown-it-py==3.0.0
markdownify==1.1.0
MarkupSafe==3.0.2
marshmallow==3.26.1
mdurl==0.1.2
mmh3==5.1.0
monotonic==1.6
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
narwhals==1.30.0
nest-asyncio==1.6.0
networkx==3.4.2
nltk==3.9.1
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
oauthlib==3.2.2
olefile==0.47
onnxruntime==1.21.0
openai==1.66.2
opentelemetry-api==1.30.0
opentelemetry-exporter-otlp-proto-common==1.30.0
opentelemetry-exporter-otlp-proto-grpc==1.30.0
opentelemetry-instrumentation==0.51b0
opentelemetry-instrumentation-asgi==0.51b0
opentelemetry-instrumentation-fastapi==0.51b0
opentelemetry-proto==1.30.0
opentelemetry-sdk==1.30.0
opentelemetry-semantic-conventions==0.51b0
opentelemetry-util-http==0.51b0
orjson==3.10.15
outcome==1.3.0.post0
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pillow==11.1.0
posthog==3.19.1
primp==0.14.0
propcache==0.3.0
protobuf==5.29.3
psutil==7.0.0
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycparser==2.22
pydantic==2.10.6
pydantic-settings==2.8.1
pydantic_core==2.27.2
pydeck==0.9.1
Pygments==2.19.1
pypdf==5.3.1
pypdfium2==4.30.1
PyPika==0.48.9
pyproject_hooks==1.2.0
PySocks==1.7.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-iso639==2025.2.18
python-magic==0.4.27
python-oxmsg==0.0.2
pytz==2025.1
PyYAML==6.0.2
RapidFuzz==3.12.2
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
safetensors==0.5.3
scikit-learn==1.6.1
scipy==1.15.2
selenium==4.29.0
sentence-transformers==3.4.1
shellingham==1.5.4
six==1.17.0
smmap==5.0.2
smolagents==1.10.0
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.6
SQLAlchemy==2.0.39
starlette==0.46.1
streamlit==1.43.2
sympy==1.13.1
tenacity==9.0.0
threadpoolctl==3.5.0
tokenizers==0.21.0
toml==0.10.2
tomli==2.2.1
torch==2.6.0
tornado==6.4.2
tqdm==4.67.1
transformers==4.49.0
trio==0.29.0
trio-websocket==0.12.2
triton==3.2.0
typer==0.15.2
typing-inspect==0.9.0
typing-inspection==0.4.0
typing_extensions==4.12.2
tzdata==2025.1
unstructured==0.16.25
unstructured-client==0.31.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
watchdog==6.0.0
watchfiles==1.0.4
webencodings==0.5.1
websocket-client==1.8.0
websockets==15.0.1
wrapt==1.17.2
wsproto==1.2.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0

Additional context

I want to fit the model into my GPU using 8 bit quantization because standard 16 bit float is too large.

The text was updated successfully, but these errors were encountered:

AlekseyMalyshev · 2025-03-13T12:42:35Z

I created a PR #968

prasiyer · 2025-03-13T21:31:13Z

I would request that this PR be expanded to include additional arguments for the TransformersModel such as hf_token, attn_implementation etc. Subsequently, these arguments would have to be passed to the AutoModelForCausalLM.from_pretrained()

AlekseyMalyshev added the bug Something isn't working label Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] quantization_config is not supported in TransformersModel #969

[BUG] quantization_config is not supported in TransformersModel #969

AlekseyMalyshev commented Mar 13, 2025

AlekseyMalyshev commented Mar 13, 2025

prasiyer commented Mar 13, 2025

[BUG] quantization_config is not supported in TransformersModel #969

[BUG] quantization_config is not supported in TransformersModel #969

Comments

AlekseyMalyshev commented Mar 13, 2025

AlekseyMalyshev commented Mar 13, 2025

prasiyer commented Mar 13, 2025