Skip to content

Conversation

@ericcurtin
Copy link
Collaborator

So we can load these natively just like gguf

@ericcurtin ericcurtin marked this pull request as draft November 28, 2025 20:32
@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Nov 28, 2025

safetensors tend to be downloaded/pulled more often than gguf. This will introduce support for both formats.

It will help us run performance comparisons using the exact same models/files in vllm and llama.cpp

@ericcurtin ericcurtin force-pushed the support-safetensors branch 2 times, most recently from a963646 to 43efc4d Compare November 30, 2025 12:48
@ericcurtin ericcurtin force-pushed the support-safetensors branch 3 times, most recently from 6144e45 to e5e869d Compare November 30, 2025 16:04
So we can load these natively just like gguf

Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin ericcurtin marked this pull request as ready for review November 30, 2025 16:11
@ericcurtin ericcurtin requested a review from CISC as a code owner November 30, 2025 16:11
@ericcurtin
Copy link
Collaborator Author

I think this is good for review now, I did most of the testing with:

llama-server -hf HuggingFaceTB/SmolLM2-135M-Instruct

@ericcurtin
Copy link
Collaborator Author

@ggerganov @CISC PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant