Skip to content

Conversation

@Phylliida
Copy link

This adds extra functions

ggml_conv_2d_circular
ggml_conv_2d_dw_circular
ggml_conv_2d_dw_direct_circular
ggml_conv_transpose_2d_p0_circular
ggml_conv_2d_direct_circular
ggml_pad_circular
ggml_pad_ext_circular

That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.

I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.

This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).

I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.

int d1); // dilation dimension 1


GGML_API struct ggml_tensor * ggml_conv_2d_circular(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer the wrapping to be an option on existing commands (either add an optional parameter to existing functions, or do something like ggml_mul_mat_set_prec to modify it after it's created. But the core maintainers should decide. I just don't want to end up with 2^N different convolution functions as these additional options keep getting added.

Copy link
Author

@Phylliida Phylliida Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an optional parameter is an option (iiuc) since they are c api? (could hack with macros, but not ideal) But I'm open to some state modifying thing if that's what they want

uint32_t H_idx = OH_idx * p.s1 + KH_idx_b * p.d1 - p.p1;
uint32_t W_idx = OW_idx * p.s0 + KW_idx_b * p.d0 - p.p0;
#endif
H_pos = (p.circular != 0) ? wrap_coord(int(H_idx), p.H) : H_idx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a pretty significant slowdown in test-backend-ops -o CONV_2D perf for some cases. Once #16978 lands, circular should become a spec constant and this problem will go away.

@Acly
Copy link
Collaborator

Acly commented Nov 4, 2025

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants