First impressions info dump #922
Replies: 21 comments
-
Thanks for the feedback.
|
Beta Was this translation helpful? Give feedback.
-
very nice
i see
oh, i overlooked that one run with verbosethe tokenizer really looks like it needs some work, really surprised the image came out that good.
good to hear.
cant wait 😄 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I am using |
Beta Was this translation helpful? Give feedback.
-
I'm using |
Beta Was this translation helpful? Give feedback.
-
ah yes, a fellow ubuntu20.04 user stuck on lts 🤣 |
Beta Was this translation helpful? Give feedback.
-
|
Cool stuff! Here is a sample run on M2 Ultra: $ ▶ ./sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -t 12
[INFO] stable-diffusion.cpp:2191 - loading model from '../models/sd-v1-4-ggml-model-f16.bin'
[INFO] stable-diffusion.cpp:2216 - ftype: f16
[INFO] stable-diffusion.cpp:2261 - params ctx size = 1970.08 MB
[INFO] stable-diffusion.cpp:2401 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' completed, taking 0.72s
[INFO] stable-diffusion.cpp:2482 - condition graph use 13.11MB of memory: static 10.17MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2482 - condition graph use 13.11MB of memory: static 10.17MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2824 - get_learned_condition completed, taking 0.12s
[INFO] stable-diffusion.cpp:2832 - start sampling
[INFO] stable-diffusion.cpp:2676 - step 1 sampling completed, taking 5.42s
[INFO] stable-diffusion.cpp:2676 - step 2 sampling completed, taking 5.35s
[INFO] stable-diffusion.cpp:2676 - step 3 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 4 sampling completed, taking 5.35s
[INFO] stable-diffusion.cpp:2676 - step 5 sampling completed, taking 5.30s
[INFO] stable-diffusion.cpp:2676 - step 6 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 7 sampling completed, taking 5.36s
[INFO] stable-diffusion.cpp:2676 - step 8 sampling completed, taking 5.47s
[INFO] stable-diffusion.cpp:2676 - step 9 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 10 sampling completed, taking 5.37s
[INFO] stable-diffusion.cpp:2676 - step 11 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 12 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 13 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 14 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 15 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 16 sampling completed, taking 5.33s
[INFO] stable-diffusion.cpp:2676 - step 17 sampling completed, taking 5.39s
[INFO] stable-diffusion.cpp:2676 - step 18 sampling completed, taking 5.36s
[INFO] stable-diffusion.cpp:2676 - step 19 sampling completed, taking 5.34s
[INFO] stable-diffusion.cpp:2676 - step 20 sampling completed, taking 5.38s
[INFO] stable-diffusion.cpp:2691 - diffusion graph use 623.74MB of memory: static 69.53MB, dynamic = 554.21MB
[INFO] stable-diffusion.cpp:2837 - sampling completed, taking 107.12s
[INFO] stable-diffusion.cpp:2771 - vae graph use 2177.12MB of memory: static 1153.12MB, dynamic = 1024.00MB
[INFO] stable-diffusion.cpp:2844 - decode_first_stage completed, taking 17.86s
[INFO] stable-diffusion.cpp:2850 - txt2img completed in 125.10s, with a runtime memory usage of 2177.12MB and parameter memory usage of 1969.94MB
save result image to 'output.png'
Looks like
|
Beta Was this translation helpful? Give feedback.
-
|
Thank you for the feedback. Thank you for creating such amazing ggml.
OK, I will sort out the code of new operators and upstream later. I'm also considering whether to upstream the "dynamic mode".
I've tried it before,but it seems that combining |
Beta Was this translation helpful? Give feedback.
-
|
Any plans for sdxl? |
Beta Was this translation helpful? Give feedback.
-
I'm willing to implement SDXL once I've improved the support for SD 1.x and added support for SD 2.x. |
Beta Was this translation helpful? Give feedback.
-
|
Took a stab at a larger resolution 768x768 Detailsunsurprisingly it takes way (way) longer: |
Beta Was this translation helpful? Give feedback.
-
|
Wow, this is so cool. Easy to convert existing models, quantization.. very nice. https://github.com/bes-dev/stable_diffusion.openvino <- this is way faster though, probably due to it using OpenVINO. |
Beta Was this translation helpful? Give feedback.
-
I've implemented a memory optimization, and now when using txt2img with fp16 precision to generate a 512x512 image, it only requires 2.3GB. |
Beta Was this translation helpful? Give feedback.
-
Oh, yeah. Now I'm working hard to make it run faster. |
Beta Was this translation helpful? Give feedback.
-
is this already on master? bc i reran my diffusion above with similar timings and memory usage (? memory reporting changed) Details |
Beta Was this translation helpful? Give feedback.
-
Since you are generating 768x768 images, this will cause the runtime memory to grow, and there is still room for optimization |
Beta Was this translation helpful? Give feedback.
-
|
@leejet i dont think that is how that label is supposed to be used 😄 |
Beta Was this translation helpful? Give feedback.
-
You're right, I made a mistake. I accidentally clicked on it while browsing, it wasn't my intention. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Any chance we could get OpenVino support? Would help a lot! |
Beta Was this translation helpful? Give feedback.








Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, finally stable diffusion for ggml 😄
Did a test run
Painpoint: the extra python libs for conversion. Got a pip install error bc i have an incompatible version of something installed already,
convert.pyworked anyway though. :)Timings: I used the q8_0 quantization and ran with different thread counts:
I have a 12core(24threads) cpu.
I took the timing of a sampling step.
Additional questions:
(cinematic:1.3))edit: added f16 timings
Beta Was this translation helpful? Give feedback.
All reactions