You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would request that this PR be expanded to include additional arguments for the TransformersModel such as hf_token, attn_implementation etc. Subsequently, these arguments would have to be passed to the AutoModelForCausalLM.from_pretrained()
Describe the bug
The following function supports
quantization_config
parameter that can be used to utilize quantization.However the smolagents.TransformersModel() does not pass this parameter. Therefore quantization is not supported.
Code to reproduce the error
Error logs (if any)
There are no logs. I can see that model does not fit in my GPU memory, however it should using 8 bit quantization.
Expected behavior
The model should load in 8bit mode
Packages version:
accelerate==1.4.0
aiofiles==24.1.0
aiohappyeyeballs==2.5.0
aiohttp==3.11.13
aiosignal==1.3.2
altair==5.5.0
annotated-types==0.7.0
anyio==4.8.0
asgiref==3.8.1
async-timeout==4.0.3
attrs==25.1.0
backoff==2.2.1
bcrypt==4.3.0
beautifulsoup4==4.13.3
bitsandbytes==0.45.3
blinker==1.9.0
build==1.2.2.post1
cachetools==5.5.2
certifi==2025.1.31
cffi==1.17.1
chardet==5.2.0
charset-normalizer==3.4.1
chroma-hnswlib==0.7.6
chromadb==0.6.3
click==8.1.8
coloredlogs==15.0.1
cryptography==44.0.2
dataclasses-json==0.6.7
Deprecated==1.2.18
distro==1.9.0
docx2txt==0.8
duckduckgo_search==7.5.1
durationpy==0.9
emoji==2.14.1
eval_type_backport==0.2.2
exceptiongroup==1.2.2
fastapi==0.115.11
filelock==3.17.0
filetype==1.2.0
flatbuffers==25.2.10
frozenlist==1.5.0
fsspec==2025.3.0
gitdb==4.0.12
GitPython==3.1.44
google-auth==2.38.0
googleapis-common-protos==1.69.1
greenlet==3.1.1
grpcio==1.71.0
h11==0.14.0
html5lib==1.1
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
httpx-sse==0.4.0
huggingface-hub==0.29.3
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
importlib_resources==6.5.2
Jinja2==3.1.6
jiter==0.9.0
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kubernetes==32.0.1
langchain==0.3.20
langchain-chroma==0.2.2
langchain-community==0.3.19
langchain-core==0.3.43
langchain-huggingface==0.1.2
langchain-text-splitters==0.3.6
langdetect==1.0.9
langsmith==0.3.13
lxml==5.3.1
markdown-it-py==3.0.0
markdownify==1.1.0
MarkupSafe==3.0.2
marshmallow==3.26.1
mdurl==0.1.2
mmh3==5.1.0
monotonic==1.6
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
narwhals==1.30.0
nest-asyncio==1.6.0
networkx==3.4.2
nltk==3.9.1
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
oauthlib==3.2.2
olefile==0.47
onnxruntime==1.21.0
openai==1.66.2
opentelemetry-api==1.30.0
opentelemetry-exporter-otlp-proto-common==1.30.0
opentelemetry-exporter-otlp-proto-grpc==1.30.0
opentelemetry-instrumentation==0.51b0
opentelemetry-instrumentation-asgi==0.51b0
opentelemetry-instrumentation-fastapi==0.51b0
opentelemetry-proto==1.30.0
opentelemetry-sdk==1.30.0
opentelemetry-semantic-conventions==0.51b0
opentelemetry-util-http==0.51b0
orjson==3.10.15
outcome==1.3.0.post0
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pillow==11.1.0
posthog==3.19.1
primp==0.14.0
propcache==0.3.0
protobuf==5.29.3
psutil==7.0.0
pyarrow==19.0.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycparser==2.22
pydantic==2.10.6
pydantic-settings==2.8.1
pydantic_core==2.27.2
pydeck==0.9.1
Pygments==2.19.1
pypdf==5.3.1
pypdfium2==4.30.1
PyPika==0.48.9
pyproject_hooks==1.2.0
PySocks==1.7.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-iso639==2025.2.18
python-magic==0.4.27
python-oxmsg==0.0.2
pytz==2025.1
PyYAML==6.0.2
RapidFuzz==3.12.2
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.9.4
rpds-py==0.23.1
rsa==4.9
safetensors==0.5.3
scikit-learn==1.6.1
scipy==1.15.2
selenium==4.29.0
sentence-transformers==3.4.1
shellingham==1.5.4
six==1.17.0
smmap==5.0.2
smolagents==1.10.0
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.6
SQLAlchemy==2.0.39
starlette==0.46.1
streamlit==1.43.2
sympy==1.13.1
tenacity==9.0.0
threadpoolctl==3.5.0
tokenizers==0.21.0
toml==0.10.2
tomli==2.2.1
torch==2.6.0
tornado==6.4.2
tqdm==4.67.1
transformers==4.49.0
trio==0.29.0
trio-websocket==0.12.2
triton==3.2.0
typer==0.15.2
typing-inspect==0.9.0
typing-inspection==0.4.0
typing_extensions==4.12.2
tzdata==2025.1
unstructured==0.16.25
unstructured-client==0.31.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
watchdog==6.0.0
watchfiles==1.0.4
webencodings==0.5.1
websocket-client==1.8.0
websockets==15.0.1
wrapt==1.17.2
wsproto==1.2.0
yarl==1.18.3
zipp==3.21.0
zstandard==0.23.0
Additional context
I want to fit the model into my GPU using 8 bit quantization because standard 16 bit float is too large.
The text was updated successfully, but these errors were encountered: