Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] quantization_config is not supported in TransformersModel #969

Open
AlekseyMalyshev opened this issue Mar 13, 2025 · 2 comments
Open
Labels
bug Something isn't working

Comments

@AlekseyMalyshev
Copy link

Describe the bug

The following function supports quantization_config parameter that can be used to utilize quantization.

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map=device_map,
        torch_dtype=torch_dtype,
        trust_remote_code=trust_remote_code,
        quantization_config=quantization_config,
)

However the smolagents.TransformersModel() does not pass this parameter. Therefore quantization is not supported.

Code to reproduce the error

    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
    return TransformersModel(
        model_id=model_id,
        quantization_config=quantization_config,
        max_new_tokens=10240
    )

Error logs (if any)
There are no logs. I can see that model does not fit in my GPU memory, however it should using 8 bit quantization.

Expected behavior
The model should load in 8bit mode

Packages version:

accelerate==1.4.0
aiofiles==24.1.0
aiohappyeyeballs==2.5.0
aiohttp==3.11.13
aiosignal==1.3.2
altair==5.5.0
annotated-types==0.7.0
anyio==4.8.0
asgiref==3.8.1
async-timeout==4.0.3
attrs==25.1.0
backoff==2.2.1
bcrypt==4.3.0
beautifulsoup4==4.13.3
bitsandbytes==0.45.3
blinker==1.9.0
build==1.2.2.post1
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
chardet==5.2.0
charset-normalizer==3.4.1
chroma-hnswlib==0.7.6
chromadb==0.6.3
click==8.1.8
coloredlogs==15.0.1
cryptography==44.0.2
dataclasses-json==0.6.7
Deprecated==1.2.18
distro==1.9.0
docx2txt==0.8
duckduckgo_search==7.5.1
durationpy==0.9
emoji==2.14.1
eval_type_backport==0.2.2
exceptiongroup==1.2.2
fastapi==0.115.11
filelock==3.17.0
filetype==1.2.0
flatbuffers==25.2.10
frozenlist==1.5.0
fsspec==2025.3.0
gitdb==4.0.12
GitPython==3.1.44
google-auth==2.38.0
googleapis-common-protos==1.69.1
greenlet==3.1.1
grpcio==1.71.0
h11==0.14.0
html5lib==1.1
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
httpx-sse==0.4.0
huggingface-hub==0.29.3
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
importlib_resources==6.5.2
Jinja2==3.1.6
jiter==0.9.0
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kubernetes==32.0.1
langchain==0.3.20
langchain-chroma==0.2.2
langchain-community==0.3.19
langchain-core==0.3.43
langchain-huggingface==0.1.2
langchain-text-splitters==0.3.6
langdetect==1.0.9
langsmith==0.3.13
lxml==5.3.1
markdown-it-py==3.0.0
markdownify==1.1.0
MarkupSafe==3.0.2
marshmallow==3.26.1
mdurl==0.1.2
mmh3==5.1.0
monotonic==1.6
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
narwhals==1.30.0
nest-asyncio==1.6.0
networkx==3.4.2
nltk==3.9.1
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
oauthlib==3.2.2
olefile==0.47
onnxruntime==1.21.0
openai==1.66.2
opentelemetry-api==1.30.0
opentelemetry-exporter-otlp-proto-common==1.30.0
opentelemetry-exporter-otlp-proto-grpc==1.30.0
opentelemetry-instrumentation==0.51b0
opentelemetry-instrumentation-asgi==0.51b0
opentelemetry-instrumentation-fastapi==0.51b0
opentelemetry-proto==1.30.0
opentelemetry-sdk==1.30.0
opentelemetry-semantic-conventions==0.51b0
opentelemetry-util-http==0.51b0
orjson==3.10.15
outcome==1.3.0.post0
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pillow==11.1.0
posthog==3.19.1
primp==0.14.0
propcache==0.3.0
protobuf==5.29.3
psutil==7.0.0
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycparser==2.22
pydantic==2.10.6
pydantic-settings==2.8.1
pydantic_core==2.27.2
pydeck==0.9.1
Pygments==2.19.1
pypdf==5.3.1
pypdfium2==4.30.1
PyPika==0.48.9
pyproject_hooks==1.2.0
PySocks==1.7.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-iso639==2025.2.18
python-magic==0.4.27
python-oxmsg==0.0.2
pytz==2025.1
PyYAML==6.0.2
RapidFuzz==3.12.2
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
safetensors==0.5.3
scikit-learn==1.6.1
scipy==1.15.2
selenium==4.29.0
sentence-transformers==3.4.1
shellingham==1.5.4
six==1.17.0
smmap==5.0.2
smolagents==1.10.0
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.6
SQLAlchemy==2.0.39
starlette==0.46.1
streamlit==1.43.2
sympy==1.13.1
tenacity==9.0.0
threadpoolctl==3.5.0
tokenizers==0.21.0
toml==0.10.2
tomli==2.2.1
torch==2.6.0
tornado==6.4.2
tqdm==4.67.1
transformers==4.49.0
trio==0.29.0
trio-websocket==0.12.2
triton==3.2.0
typer==0.15.2
typing-inspect==0.9.0
typing-inspection==0.4.0
typing_extensions==4.12.2
tzdata==2025.1
unstructured==0.16.25
unstructured-client==0.31.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
watchdog==6.0.0
watchfiles==1.0.4
webencodings==0.5.1
websocket-client==1.8.0
websockets==15.0.1
wrapt==1.17.2
wsproto==1.2.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0

Additional context

I want to fit the model into my GPU using 8 bit quantization because standard 16 bit float is too large.

@AlekseyMalyshev AlekseyMalyshev added the bug Something isn't working label Mar 13, 2025
@AlekseyMalyshev
Copy link
Author

I created a PR #968

@prasiyer
Copy link

I would request that this PR be expanded to include additional arguments for the TransformersModel such as hf_token, attn_implementation etc. Subsequently, these arguments would have to be passed to the AutoModelForCausalLM.from_pretrained()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants