618 4 146

Bartowski PRO

bartowski

AI & ML interests

Official model curator for https://lmstudio.ai/

Recent Activity

updated a model 38 minutes ago

bartowski/Sky-T1-32B-Preview-GGUF

updated a model about 11 hours ago

bartowski/phi-4-GGUF

updated a model about 11 hours ago

bartowski/phi-4-GGUF

View all activity

Organizations

Posts 9

Post

4250

Switching to author_model-name

I posted a poll on twitter, and others have mentioned the interest in me using the convention of including the author name in the model path when I upload.

It has a couple advantages, first and foremost of course is ensuring clarity of who uploaded the original model (did Qwen upload Qwen2.6? Or did someone fine tune Qwen2.5 and named it 2.6 for fun?)

The second thing is that it avoids collisions, so if multiple people upload the same model and I try to quant them both, I would normally end up colliding and being unable to upload both

I'll be implementing the change next week, there are just two final details I'm unsure about:

First, should the files also inherit the author's name?

Second, what to do in the case that the author name + model name pushes us past the character limit?

Haven't yet decided how to handle either case, so feedback is welcome, but also just providing this as a "heads up"

Post

15414

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

View all posts

Collections 3

spaces 2

Runtime error

🦀

Gguf Metadata Updater

Running

😻

Repo duplicator

models 1552

datasets

None public yet

Bartowski PRO

AI & ML interests

Recent Activity

Organizations

Posts 9

Collections 3

bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF

bartowski/Meta-Llama-3.1-70B-Instruct-GGUF

bartowski/Qwen2.5-Coder-32B-Instruct-GGUF

bartowski/Qwen2.5-32B-Instruct-GGUF

bartowski/Qwen2.5.1-Coder-7B-Instruct-GGUF

bartowski/Meta-Llama-3.1-8B-Instruct-GGUF

bartowski/OLMo-2-1124-7B-Instruct-GGUF

bartowski/OLMo-2-1124-13B-Instruct-GGUF

spaces 2

Gguf Metadata Updater

Repo duplicator

models 1552

bartowski/Sky-T1-32B-Preview-GGUF

bartowski/phi-4-GGUF

bartowski/Rombos-Qwen2.5-Writer-32b-GGUF

bartowski/Phi-3.5-MoE-instruct-GGUF

bartowski/GWQ-9B-Preview2-GGUF

bartowski/Chuluun-Qwen2.5-72B-v0.01-GGUF

bartowski/HuatuoGPT-o1-7B-v0.1-GGUF

bartowski/MiniThinky-v2-1B-Llama-3.2-GGUF

bartowski/QwQ-32B-Preview-IdeaWhiz-v1-GGUF

bartowski/CollectiveLM-Falcon-3-7B-GGUF

datasets

Bartowski PRO

AI & ML interests

Recent Activity

Organizations

Posts 9

Collections 3

spaces 2 Sort: Recently updated

Gguf Metadata Updater

Repo duplicator

models 1552 Sort: Recently updated

datasets

spaces 2

models 1552