Skip to content

SDXL crash with embeddings and clip on CPU #656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wbruna opened this issue Apr 15, 2025 · 2 comments · May be fixed by #657
Open

SDXL crash with embeddings and clip on CPU #656

wbruna opened this issue Apr 15, 2025 · 2 comments · May be fixed by #657

Comments

@wbruna
Copy link

wbruna commented Apr 15, 2025

On master-10c6501, SDXL embeddings crash with an assertion failure, either on a CPU build or when passing --clip-on-cpu. The Vulkan backend works fine.

The following test is with CyberRealisticPony_v7 and its positive embedding CyberRealisticPony_POSV1, but every model+embedding combination I tried seemed to crash in the same way, on ggml-cpu.c:

./sd --model ./cyberrealisticPony_v7.safetensors --embd-dir . -p CyberRealisticPony_POSV1 --steps 1 --cfg-scale 1 --clip-on-cpu
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Vega 11 Graphics (RADV RAVEN) (radv) | uma: 1 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
[INFO ] stable-diffusion.cpp:197  - loading model from './cyberrealisticPony_v7.safetensors'
[INFO ] model.cpp:908  - load ./cyberrealisticPony_v7.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:244  - Version: SDXL 
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[WARN ] stable-diffusion.cpp:287  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] stable-diffusion.cpp:324  - CLIP: Using CPU backend
  |==================================================| 2641/2641 - 500.00it/s
[INFO ] stable-diffusion.cpp:503  - total params memory size = 6751.89MB (VRAM 4994.54MB, RAM 1757.36MB): clip 1757.36MB(RAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:522  - loading model from './cyberrealisticPony_v7.safetensors' completed, taking 6.45s
[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[INFO ] model.cpp:908  - load ./CyberRealisticPony_POSV1.safetensors using safetensors format
  |==================================================| 2/2 - 0.00it/s
/tmp/sdcpp/ggml/src/ggml-cpu/ggml-cpu.c:9684: /tmp/sdcpp/ggml/src/ggml-cpu/ggml-cpu.c:9684: /tmp/sdcpp/ggml/src/ggml-cpu/ggml-cpu.c:9684: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed/tmp/sdcpp/ggml/src/ggml-cpu/ggml-cpu.c:9684: GGML_ASSERT(i01 >= 0 && i01 < ne01) failedGGML_ASSERT(i01 >= 0 && i01 < ne01) failed

GGML_ASSERT(i01 >= 0 && i01 < ne01) failed

[New LWP 82838]
[New LWP 82845]
[New LWP 82846]
[New LWP 82847]
warning: process 82837 is already traced by process 82848
ptrace: Operação não permitida.
No stack.
The program is not being run.
warning: process 82837 is already traced by process 82848
ptrace: Operação não permitida.
No stack.
The program is not being run.
warning: process 82837 is already traced by process 82848
ptrace: Operação não permitida.
No stack.
The program is not being run.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f85d30f2c17 in __GI___wait4 (pid=82848, stat_loc=0x7ffed6595704, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      ../sysdeps/unix/sysv/linux/wait4.c: Arquivo ou diretório inexistente.
#0  0x00007f85d30f2c17 in __GI___wait4 (pid=82848, stat_loc=0x7ffed6595704, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000562553811811 in ggml_abort ()
#2  0x000056255376434c in ggml_compute_forward_get_rows ()
#3  0x000056255378588d in ggml_graph_compute_thread.isra ()
#4  0x00007f85d35b60b6 in GOMP_parallel () from /lib/x86_64-linux-gnu/libgomp.so.1
#5  0x00005625537883af in ggml_graph_compute ()
#6  0x00005625537887e2 in ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) ()
#7  0x0000562553827edc in ggml_backend_graph_compute ()
#8  0x0000562553693469 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#9  0x0000562553697bab in FrozenCLIPEmbedderWithCustomWords::get_learned_condition_common(ggml_context*, int, std::vector<int, std::allocator<int> >&, std::vector<float, std::allocator<float> >&, int, int, int, int, bool) ()
#10 0x00005625537132dc in FrozenCLIPEmbedderWithCustomWords::get_learned_condition(ggml_context*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, int, bool) ()
#11 0x0000562553686818 in generate_image(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, float, float, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*) ()
#12 0x0000562553689481 in txt2img ()
#13 0x00005625535f5173 in main ()
[Inferior 1 (process 82837) detached]
Abortado (imagem do núcleo gravada)

Disabling the assertion avoids the crash, so I'm guessing the CPU backend is catching a non-critical error ignored by the Vulkan backend.

@stduhpf , this seems related to your SDXL embeddings fix?

@stduhpf
Copy link
Contributor

stduhpf commented Apr 15, 2025

Okay I'm not sure what exactly is causing the crash, but this made me reaslize my "fix" for SDXL embeddings was pure placebo. The embeddinsg are loaded but never used... Fixing that other problem seems to make the crash with --clip-on-cpu go away, so it's probably related.

Edit, ok, not pure placebo, embeddings were only applied to clip-l and not clip-g. So the effect was greatly diminished.

@stduhpf stduhpf linked a pull request Apr 15, 2025 that will close this issue
@wbruna
Copy link
Author

wbruna commented Apr 15, 2025

Just tested your fix with a few embeddings and models; it seems to be working now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants