Granite Four #13550

gabe-l-hart · 2025-05-14T20:13:13Z

Description

This PR is the end-point for architecture support for Granite 4.0 (#13269 . It incorporates a number of changes from other in-flight branches that will need to be merged first:

Mamba2 model support: llama : initial Mamba-2 support #9126
Hybrid recurrent cache: Hybrid recurrent cache #13979
llama : support Jamba hybrid Transformer-Mamba models #7531 (needed for further hybrid cache refactors)

Additionally, this PR replaces some work done on other PRs / branches:

Initial Bamba support: Bamba architecture #10810
- Bamba is fully supported on this branch, so the old PR can be closed in favor of this PR
Refactored Bamba support: https://github.com/gabe-l-hart/llama.cpp/tree/BambaArchitectureRefactor
- I've used this branch as an A/B comparison along the way, but will abandon it now
Draft Granite 4.0 support: https://github.com/gabe-l-hart/llama.cpp/tree/GraniteFourDraft
- Build off of the previous Bamba work, this will also be abandoned in favor of this PR
Initial work on Jamba: llama : support Jamba hybrid Transformer-Mamba models #7531
- This work is quite out-of-date and would be a lot of work to overhaul to the refactors on master.
- I had planned to include Jamba support in this branch, but on further inspection, it looks like the Jamba architecture has some additional bells-and-whistles (eg sliding-window-attention) that would need further work, so my plan is to leave Jamba off for now and possibly tackle it later (hopefully it's much easier than the original branch!)

Outstanding Questions

Besides the upstream PRs, there are a few questions to answer before this PR is merge ready:

This PR contains several changes to llama-kv-cache beyond those in feat: Hybrid unified/recurrent cache #13276, but they depend on the addition of hparams.recurrent_layer_arr which is only populated correctly if there is a valid model architecture to check against. Should I move all of these changes to the hybrid cache PR or keep them here where the model architectures become real?
Is there a more efficient way to implement hparams.recurrent_layer_arr? Using a max-layer-size std::array doesn't feel quite right.
There are still some numerical differences between the attention outputs when running Bamba and granite-4.0-tiny-shared-preview on this branch vs the respective draft branches, so I need to determine if this is due to changes in the attention implementation (ie "working as expected") or a bug somewhere.
The use of dymamic_cast to get the right cache type could be expensive (though it's likely negligible relative to the tensor math). Should we do something more clever to handle different cache types in llama-graph?
The switch statement for determining the type of KV cache to allocate in llama-model.cpp seems redundant with llama_model_is_recurrent and llama_model_is_hybrid. Should we use those functions instead and eliminate the duplicate logic and additional place to tweak for new recurrent / hybrid models?

Testing

To test out this branch, I've been using the following models:

granite-4.0-tiny-preview: https://huggingface.co/ibm-granite/granite-4.0-tiny-preview
Bamba-9B-v1: https://huggingface.co/ibm-ai-platform/Bamba-9B-v1
- NOTE: v2 is out (here), but I already had v1 from previous branches and stuck with that for consistency
mamba2-370m-hf: https://huggingface.co/AntonV/mamba2-370m-hf

Details

This PR has a lot of changes in it, some of which are isolated in the prereq-PRs above. In addition to the general mamba2 and llama_kv_cache_hybrid changes, this PR does the following:

python side

Add conversion support for BambaForCausalLM and GraniteMoeHybridForCausalLM
- This includes one small tweak to gguf_writer.py that allows duplicate key/value pairs through add_key_value if (and only if) they match both value and type with the existing key. This is a convenience for hybrid models so that the converter doesn't need to rewrite the hparam conversion from multiple parents.
- This also adds the new HybridAttention section under Keys in constants.py to hold attention.layer_indices. OPEN QUESTION: Should this just go under Attention?

c++ side

Add a new public API function llama_model_is_hybrid akin to llama_model_is_recurrent
- I also split up both this function and llama_model_is_recurrent into llm_arch_is_* implemented in llama-arch.* and llama_model_is_* implemented in llama-model.*. This was done so that they could be used during model initialization before the model itself can be passed as the argument, specifically to determine how to populate hparams.recurrent_layer_arr (see below).
Add hparams.recurrent_layer_arr and support parsing it
- The current implementation pre-allocates it as a fixed-length array which doesn't feel quite right.
Add an optional layer id to hparams.n_embd_k_s / hparams.n_embd_v_s
- This is done because for hybrid models, the values may be different by layer.
- I plumbed through as many usages of these methods as I could find to properly pass the layer index, but there are some places where it's not available which default to layer 0. This should be fine since none of those places interact with the hybrid caching.
Add hparams.recurrent_layer(uint32_t) to check whether a given layer is recurrent
Model name/param/arch plumbing for bamba and granitemoeshared in llama-arch.* (the boring part!)
(possibly breaking) Add hparams as an additional argument to the llama_model.create_memory method
- This is done so the hparams can be given to the cache construction and used to determine which layers are recurrent for hybrid cache creation
In llama-graph, anywhere that a specific cache type needs to be fetched, it is grabbed using new methods get_recurrent_cache / get_unified_cache. These methods use dynamic_cast to handle both non-hybrid caches and hybrid caches.
Add support for instantiating the hybrid cache in llama-model.cpp
Add model support for bamba and granitemoehybrid in llama-model
- Most of this is "business as usual," but that breaks down when trying to avoid code duplication for the hybrid architecture
- To avoid code duplication, I hoisted build_mamba_layer / build_mamba2_layer from llm_build_mamba and build_attention_layer / build_layer_ffn from llm_build_granite into static methods on their respective classes. This makes for some gross function signatures where member data needs to be explicitly passed, but it allows the hybrid model architecture(s) to use these methods without complex inheritance.
- I tried an alternative route using diamond inheritance, but this would have required some kind of "don't actually initialize the graph" switch in the parent model builders' constructors to avoid trying to build the parent model graphs during initialization of the hybrid class.

This will be necessary to support Jamba (and other recurrent models mixed with Attention). Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.

* llama : begin work on support for variable GQA This will also be useful for Jamba if we consider the Mamba layers to have 0 KV heads. * llama : gracefully fail when not finding hybrid slot

* ggml : simplify SSM-related operators * llama : make recurrent state slot allocation contiguous * llama : adapt internal uses of batches to llama_ubatch

This reduces overhead when running hellaswag on thousands of sequences with very small 100k params Mamba models.

This otherwise was a problem when running the HellaSwag benchmark with small batch sizes, making it crash.

This removes the need for ggml_ssm_conv!!! But performance seems slighly worse on my system, especially for prompt processing. Maybe ggml_mul_mat isn't optimized for small row sizes? More performance testing is necessary until GGML_OP_SSM_CONV is removed. * ggml : make ggml_ssm_scan not modify its source tensors * llama : fix shared recurrent tail cell count for small ubatch sizes Otherwise it was impossible to run the 'parallel' example with '-ub 1' with a Mamba or Jamba model.

* ggml : allow GGML_OP_CONCAT to work on non-contiguous tensors The implementation already supported it, and this makes Mamba's conv step slightly faster.

This can be changed back later if the name change is wrong. I was renaming the functions anyway to generalize kv-cache-related functions to hybrid and recurrent model architectures. I think llama_past is a better name than llama_cache for a combined kv cache and recurrent state cache, because the states it contains pretty much always come before the newly-added ones for any particular sequence. Also 'llama_past_clear' sounds more obvious in what it does than 'llama_kv_cache_clear'. The future is what the models generate. (For embeddings, the kv cache isn't really used anyway) Still, I'm open to better suggestions.

* origin/master: llama : remove llm_graph_input_one (ggml-org#14603) Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart · 2025-07-09T21:11:16Z

I've removed the virtual inheritance now and collapsed Bamba and GraniteMoeHybrid into simply GraniteHybrid which covers all permutations of hybrid architectures (w/ and w/out the granite multipliers on top of llama) and dense/MoE/MoE+shared.

src/llama-model.cpp

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

This matches how recurrent vs attention heads are identified for Jamba Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

ggerganov

Can merge after @compilade approves.

src/llama-model-loader.cpp

CISC · 2025-07-10T08:30:59Z

src/llama-model.cpp

+                ml.get_key(LLM_KV_ROPE_SCALING_FINETUNED, rope_finetuned, false);
+                hparams.rope_finetuned = rope_finetuned;
+
+                // A layer is recurrent IFF the n_head_kv value is set to 0


Suggested change

// A layer is recurrent IFF the n_head_kv value is set to 0

// A layer is recurrent IF the n_head_kv value is set to 0

I actually meant IFF as in if and only if. Happy to change it if that's too obscure though

Heh, never heard of that abbreviation, one lives and learns... :)

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

* origin/master: cmake : do not search for curl libraries by ourselves (ggml-org#14613) SYCL: Initial set_rows kernel implementation (ggml-org#14562) llama : minor coding style fix for smollm3 (ggml-org#14605) cmake : bump llguidance version to v1.0.1 (ggml-org#14609) cmake : llguidance build parser library only (ggml-org#14608) cuda : support Falcon-H1 state size for SSM_SCAN (ggml-org#14602) Signed-off-by: Gabe Goodhart <[email protected]>

* origin/master: Smoldocling support (ggml-org#14597) Docs: script to auto-generate ggml operations docs (ggml-org#14598)

CISC · 2025-07-10T20:09:57Z

@compilade You have the final say. :)

convert_hf_to_gguf.py

compilade · 2025-07-10T19:01:35Z

convert_hf_to_gguf.py

            ]

        return super().modify_tensors(data_torch, name, bid)


+@ModelBase.register("GraniteMoeHybridForCausalLM", "BambaForCausalLM")
+class GraniteHybridModel(Mamba2Model, GraniteMoeModel):


Multiple inheritance in Python works by using methods from the first class where it's present. (At least according to https://stackoverflow.com/questions/3277367/how-does-pythons-super-work-with-multiple-inheritance)

In this case, it means methods from Mamba2Model will be used for mostly everything, and GraniteMoeModel will be used for its prepare_tensors override (from LlamaModel somewhere in its parent hierarchy), unless I'm misunderstanding the order.

The resolution order seems to be

$ python3 >>> import convert_hf_to_gguf >>> convert_hf_to_gguf.GraniteHybridModel.__mro__ (<class 'convert_hf_to_gguf.GraniteHybridModel'>, <class 'convert_hf_to_gguf.Mamba2Model'>, <class 'convert_hf_to_gguf.GraniteMoeModel'>, <class 'convert_hf_to_gguf.GraniteModel'>, <class 'convert_hf_to_gguf.LlamaModel'>, <class 'convert_hf_to_gguf.TextModel'>, <class 'convert_hf_to_gguf.ModelBase'>, <class 'object'>)

(Noting this here, because I had to check how that works, not because there's a problem).

Yep, that's right. I like the suggestions below to be more explicit.

convert_hf_to_gguf.py

compilade · 2025-07-10T19:08:48Z

convert_hf_to_gguf.py

+        return [(self.map_tensor_name(name), data_torch)]
+
+    def set_gguf_parameters(self):
+        GraniteMoeModel.set_gguf_parameters(self)


If all the key-values are overwritten below (which might not be the case, I did not verify), then it could be simpler to not call the parent set_gguf_parameters.

There's at least one part of GraniteMoeModel that should be kept, so I'm inclined to keep this as is

convert_hf_to_gguf.py

The gist is to be explicit about which base class is being used with the multiple inheritance setup Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart · 2025-07-10T20:29:35Z

@compilade thanks for catching my slop! 🏓 back to you (fixed all except the question about set_gguf_parameters)

convert_hf_to_gguf.py

compilade · 2025-07-10T20:39:17Z

gguf-py/gguf/gguf_writer.py

-        if any(key in kv_data for kv_data in self.kv_data):
+        # Warn about duplicate keys if they differ by value or type
+        if any(
+            (
+                key in kv_data
+                and (kv_data[key].value != val or kv_data[key].type != vtype)
+            )
+            for kv_data in self.kv_data
+        ):
            logger.warning(f'Duplicated key name {key!r}, overwriting it with new value {val!r} of type {vtype.name}')


My concern with this is how this would hide redundant overrides (meaning they might not get noticed to be removed).

In a way, the warnings kind of encourage making set_gguf_parameters easier to follow (so that the overrides don't happen). I could be wrong, though.

For example, these seem to be the duplicate keys for which warnings are hidden by this section:

diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py index 2df43ba11..28188b43d 100755 --- a/convert_hf_to_gguf.py +++ b/convert_hf_to_gguf.py @@ -6550,13 +6550,6 @@ class GraniteHybridModel(Mamba2Model, GraniteMoeModel): def set_gguf_parameters(self): GraniteMoeModel.set_gguf_parameters(self) - ## General Params ## - self.gguf_writer.add_embedding_length(self.d_model) - self.gguf_writer.add_block_count(self.block_count) - self.gguf_writer.add_context_length(self.hparams.get("max_position_embeddings", 0)) - self.gguf_writer.add_vocab_size(self.hparams["vocab_size"]) - self.gguf_writer.add_feed_forward_length(self.hparams["intermediate_size"]) - ## Mamba mixer params ## self.gguf_writer.add_ssm_conv_kernel(self.find_hparam(["conv_kernel", "d_conv"])) self.gguf_writer.add_ssm_state_size(self.find_hparam(["state_size", "d_state"])) @@ -6573,14 +6566,8 @@ class GraniteHybridModel(Mamba2Model, GraniteMoeModel): ] if rope_dim := self.hparams.get("attn_rotary_emb"): self.gguf_writer.add_rope_dimension_count(rope_dim) - self.gguf_writer.add_head_count(self.hparams["num_attention_heads"]) self.gguf_writer.add_head_count_kv(head_count_kv_vec) - ## Feed Forward Params ## - self.gguf_writer.add_layer_norm_rms_eps( - self.find_hparam(["layer_norm_epsilon", "rms_norm_eps"], optional=True) or 1e-5 - ) - ## If Bamba, use rope, otherwise don't use_rope = "BambaForCausalLM" in self.hparams["architectures"] self.gguf_writer.add_rope_scaling_finetuned(use_rope)

Yeah, that's a good point. I think given that we've moved pretty significantly away from the complex inheritance on the c++ side, it ight make sense to see about doing something similar here since the multiple inheritance is definitely causing confusing code here as well. Let me take a look at how to simplify this.

This part isn't really made confusing by multiple inheritance, but by the long (but linear) family tree of GraniteMoeModel(GraniteModel(LlamaModel(TextModel))) for the inheritance of the set_gguf_parameters method.

It's a dual edged sword, since we are getting more layers of inheritance it's easy to add duplicates below without noticing, and then adding lots of noise above about something that isn't really an issue. Noise leads to complacency.

Yeah, agreed. I guess the real question is whether we want to support using parents directly for gguf params in a multiple-inheritance situation. If it's single inheritance, there should be no reason to overwrite what a parent did. It's possible that a child uses a different key for the same value as a parent, but that would cause the parent's lookup to not find the key and the child's lookup would have a different value (which I think is actually happening here).

Heh, very good point! Scope creep for sure

Ok removed. That leaves a few known override warnings:

WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.embedding_length', overwriting it with new value 1536 of type UINT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.block_count', overwriting it with new value 40 of type UINT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.vocab_size', overwriting it with new value 49160 of type UINT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.feed_forward_length', overwriting it with new value 512 of type UINT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.attention.head_count', overwriting it with new value 12 of type UINT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.attention.head_count_kv', overwriting it with new value [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0] of type ARRAY WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32 WARNING:gguf.gguf_writer:Duplicated key name 'granitehybrid.context_length', overwriting it with new value 1048576 of type UINT32

I think this can be resolved here with a comment in the converter and the warnings reinstanted?

Ok removed. That leaves a few known override warnings

All of these (except the head_count_kv one (and the context length, I forgot that)) can be avoided by applying the patch from #13550 (comment).

The source hparams fields seem to be mostly the same both times they are set (small difference with layer_norm_epsilon, but both times the actually-used field is rms_norm_eps (for granite-4.0-tiny-random at least)).

If you think it's clearer to keep it this way, this is fine with me too.

🤦 Nope, you're totally right. I'm concurrently trying to update my draft of bumping llama.cpp in ollama and multitasking poorly. I'll push with those removed.

…alue After further discussion, this encourages sloppy overwriting in the model converters Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

convert_hf_to_gguf.py

Co-authored-by: Francis Couture-Harpin <[email protected]> (thanks for the sharp eyes and patience!) Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

gabe-l-hart · 2025-07-10T22:09:16Z

Ok, thanks for the patience @compilade, I think it should be good-to-go once CI is green!

gabe-l-hart · 2025-07-11T02:43:51Z

@compilade @ggerganov @CISC Thank you so much for all the help and discussion in getting this merged! It's really great to have it officially in, and even more so for all the other great hybrid recurrent models that have come in during the process.

compilade added 30 commits April 3, 2024 20:47

wip: llama : separate recurrent states from the KV cache

271104c

This will be necessary to support Jamba (and other recurrent models mixed with Attention). Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.

llama : use std::find for seq_nodes in llama_rs_cache

8db1e4d

llama : state checkpoints for recurrent models

0028010

llama : correctly handle more edge cases for the rs cache

0c8b3b2

Merge branch 'master' into compilade/refactor-kv-cache

d66849f

llama : rename many llama_kv_cache_* functions

a09db95

Merge branch 'master' into compilade/refactor-kv-cache

c460ff1

llama : remove useless return value for some llama_cache_* functions

b6fafd1

Merge branch 'master' into compilade/refactor-kv-cache

b7ec12e

Merge branch 'master' into compilade/refactor-kv-cache

3b57b55

llama : rethink recurrent state cell counts

7e13f19

* llama : begin work on support for variable GQA This will also be useful for Jamba if we consider the Mamba layers to have 0 KV heads. * llama : gracefully fail when not finding hybrid slot

llama : support Jamba

cbc743e

Merge branch 'master' into compilade/refactor-kv-cache

0fd13e9

llama : fix BERT inference without KV cache

61a88a1

convert-hf : check for unprocessed Jamba experts

ea2e63e

convert-hf : support Mini-Jamba conversion

fc59407

llama : fix Jamba quantization sanity checks

181dadf

llama : sequence-length-aware batch splitting

3a414b0

Merge branch 'master' into compilade/refactor-kv-cache

4e4c41e

llama : use equal-sequence-length sub-batches for recurrent models

3587a94

* ggml : simplify SSM-related operators * llama : make recurrent state slot allocation contiguous * llama : adapt internal uses of batches to llama_ubatch

Merge branch 'master' into compilade/refactor-kv-cache

5d3c7b9

llama : fix batch split output count for embeddings

72eea49

llama : minimize swaps when reordering logits

18d1c14

This reduces overhead when running hellaswag on thousands of sequences with very small 100k params Mamba models.

llama : fix edge case finding batch seq_id of split recurrent cell

61200ef

This otherwise was a problem when running the HellaSwag benchmark with small batch sizes, making it crash.

llama : avoid copies for simple batch splits

eb589d5

llama : fix .base() compilation error on Windows

17f6c1e

llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL

fee3c1d

* ggml : allow GGML_OP_CONCAT to work on non-contiguous tensors The implementation already supported it, and this makes Mamba's conv step slightly faster.

Merge branch 'master' into compilade/refactor-kv-cache

6840ac0

Merge remote-tracking branch 'origin/master' into GraniteFour

2b36420

* origin/master: llama : remove llm_graph_input_one (ggml-org#14603) Signed-off-by: Gabe Goodhart <[email protected]>

compilade reviewed Jul 9, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

src/llama-model.cpp Show resolved Hide resolved

gabe-l-hart added 3 commits July 9, 2025 21:06

feat: Log mamba params for Granite Hybrid

dcf51e0

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

fix: Remove unused ssm_in_b

5b44f4e

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

refactor: Remove ATTENTION_LAYER_INDICES hparam in favor of n_head_kv

4e9fef1

This matches how recurrent vs attention heads are identified for Jamba Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

ggerganov approved these changes Jul 10, 2025

View reviewed changes

ggerganov requested review from compilade and CISC July 10, 2025 06:14

CISC approved these changes Jul 10, 2025

View reviewed changes

gabe-l-hart added 3 commits July 10, 2025 07:26

fix: Remove unused template expansion for get_arr

d02d3dd

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

Merge remote-tracking branch 'origin/master' into GraniteFour

d7d5b01

* origin/master: Smoldocling support (ggml-org#14597) Docs: script to auto-generate ggml operations docs (ggml-org#14598)

compilade reviewed Jul 10, 2025

View reviewed changes

fix: Review cleanup in convert_hf_to_gguf

f43a8dc

The gist is to be explicit about which base class is being used with the multiple inheritance setup Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

compilade reviewed Jul 10, 2025

View reviewed changes

gabe-l-hart added 3 commits July 10, 2025 15:01

fix: Undo hidden warnings about duplicate identical keys in add_key_v…

63f1ed8

…alue After further discussion, this encourages sloppy overwriting in the model converters Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

fix: If not using ROPE, context is "infinite"

f1485d2

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

doc: Add a comment outlining expected duplicate key warnings

04883fc

Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

compilade approved these changes Jul 10, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

fix: Remove unnecessary duplicate keys in converter

e53632b

Co-authored-by: Francis Couture-Harpin <[email protected]> (thanks for the sharp eyes and patience!) Branch: GraniteFour Signed-off-by: Gabe Goodhart <[email protected]>

CISC merged commit 0aedae0 into ggml-org:master Jul 11, 2025
51 checks passed

ggerganov added the hot Something that is hot label Jul 11, 2025

gabe-l-hart deleted the GraniteFour branch July 11, 2025 14:51

This was referenced Jul 11, 2025

Add support for IBM Granite-4.0-Tiny-Preview ollama/ollama#10557

Open

Granite four (llama.cpp bump 443e7e7+) ollama/ollama#11195

Open

	// A layer is recurrent IFF the n_head_kv value is set to 0
	// A layer is recurrent IF the n_head_kv value is set to 0

Granite Four #13550

Granite Four #13550

Conversation

gabe-l-hart commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Outstanding Questions

Testing

Details

python side

c++ side

Uh oh!

gabe-l-hart commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC commented Jul 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gabe-l-hart commented Jul 10, 2025

Uh oh!

Uh oh!

compilade Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

compilade Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

compilade Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gabe-l-hart commented Jul 10, 2025

Uh oh!

Uh oh!

gabe-l-hart commented Jul 11, 2025

Uh oh!

Uh oh!

gabe-l-hart commented May 14, 2025 •

edited

Loading

compilade Jul 10, 2025 •

edited

Loading

compilade Jul 10, 2025 •

edited

Loading

compilade Jul 10, 2025 •

edited

Loading