I need to generate a 1536* 1536 (or any big Hires) image, but it doesn't work. I am ready to do anything for this, even if it is generated for at least an hour. Are there extensions that make this possible?
Please help me.
venv "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 545cb6bf1187a11475ce2b28b3f7f99938cddf3d
Using ZLUDA in D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\.zluda
WARNING: no ROCm agent was found!
Launching Web UI with arguments: --use-zluda --no-half --upcast-sampling --precision full --theme dark --skip-version-check --always-batch-cond-uncond --opt-sub-quad-attention --disable-nan-check
You are using PyTorch below version 2.3. Some optimizations will be disabled.
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.2.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ControlNet preprocessor location: D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
*** Error loading script: pa.py
Traceback (most recent call last):
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\scripts.py", line 525, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\script_loading.py", line 13, in load_module
module_spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\extensions\sd-webui-prevent-artifact\scripts\pa.py", line 55, in <module>
sd_hijack_clip.FrozenCLIPEmbedderWithCustomWordsBase.process_tokens = process_tokens
AttributeError: module 'modules.sd_hijack_clip' has no attribute 'FrozenCLIPEmbedderWithCustomWordsBase'
---
2024-10-12 23:07:54,293 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'D:\\Stable Diffusion\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\snowpony_v10.safetensors', 'hash': '7a851477'}, 'additional_modules': ['D:\\Stable Diffusion\\stable-diffusion-webui-amdgpu-forge\\models\\VAE\\sdxl.vae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 31.6s (prepare environment: 1.9s, import torch: 19.7s, initialize shared: 2.2s, load scripts: 2.7s, initialize google blockly: 0.2s, create ui: 3.0s, gradio launch: 1.7s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'D:\\Stable Diffusion\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\snowpony_v10.safetensors', 'hash': '7a851477'}, 'additional_modules': ['D:\\Stable Diffusion\\stable-diffusion-webui-amdgpu-forge\\models\\VAE\\sdxl.vae.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 1680, 'vae': 250, 'text_encoder': 197, 'text_encoder_2': 518, 'ignore': 0}
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
IntegratedAutoencoderKL Unexpected: ['model_ema.decay', 'model_ema.num_updates']
K-Model Created: {'storage_dtype': torch.float16, 'computation_dtype': torch.float16}
Model loaded in 12.2s (unload existing model: 0.3s, forge model load: 11.9s).
activating extra network lora with arguments [<modules.extra_networks.ExtraNetworkParams object at 0x0000024052A59C30>, <modules.extra_networks.ExtraNetworkParams object at 0x0000024052A5AB00>, <modules.extra_networks.ExtraNetworkParams object at 0x0000024052A59F90>]: AttributeError
Traceback (most recent call last):
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\extensions-builtin\sd_forge_lora\networks.py", line 94, in load_networks
net = load_network(name, network_on_disk)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\extensions-builtin\sd_forge_lora\networks.py", line 63, in load_network
net.mtime = os.path.getmtime(network_on_disk.filename)
AttributeError: 'NoneType' object has no attribute 'filename'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\extra_networks.py", line 135, in activate
extra_network.activate(p, extra_network_args)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\extensions-builtin\sd_forge_lora\extra_networks_lora.py", line 45, in activate
networks.load_networks(names, te_multipliers, unet_multipliers, dyn_dims)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\extensions-builtin\sd_forge_lora\networks.py", line 96, in load_networks
errors.display(e, f"loading network {network_on_disk.filename}")
AttributeError: 'NoneType' object has no attribute 'filename'
[Unload] Trying to free 3051.58 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7347.49 MB, Model Require: 1559.68 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 4763.81 MB, All loaded to GPU.
Moving model(s) has taken 5.07 seconds
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 5587.12 MB ... Done.
[Unload] Trying to free 11609.04 MB for cuda:0 with 0 models keep loaded ... Current free memory is 5586.86 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 7333.84 MB, Model Require: 4897.05 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 1412.79 MB, All loaded to GPU.
Moving model(s) has taken 17.84 seconds
0%| | 0/25 [00:14<?, ?it/s]
Traceback (most recent call last):
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules_forge\main_thread.py", line 30, in work
self.result = self.func(*self.args, **self.kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\txt2img.py", line 123, in txt2img_function
processed = processing.process_images(p)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 818, in process_images
res = process_images_inner(p)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 1053, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 1430, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\sd_samplers_kdiffusion.py", line 240, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\sd_samplers_common.py", line 272, in launch_sampling
return func()
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\sd_samplers_kdiffusion.py", line 240, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\k_diffusion\sampling.py", line 595, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\modules\sd_samplers_cfg_denoiser.py", line 199, in forward
denoised, cond_pred, uncond_pred = sampling_function(self, denoiser_params=denoiser_params, cond_scale=cond_scale, cond_composition=cond_composition)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\sampling\sampling_function.py", line 362, in sampling_function
denoised, cond_pred, uncond_pred = sampling_function_inner(model, x, timestep, uncond, cond, cond_scale, model_options, seed, return_full=True)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\sampling\sampling_function.py", line 303, in sampling_function_inner
cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\sampling\sampling_function.py", line 273, in calc_cond_uncond_batch
output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\modules\k_model.py", line 45, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 713, in forward
h = module(h, emb, context, transformer_options)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 83, in forward
x = layer(x, context, transformer_options)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 321, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 181, in forward
return checkpoint(self._forward, (x, context, transformer_options), None, self.checkpoint)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 12, in checkpoint
return f(*args)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 235, in _forward
n = self.attn1(n, context=context_attn1, value=value_attn1, transformer_options=extra_options)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\nn\unet.py", line 154, in forward
out = attention_function(q, k, v, self.heads, mask)
File "D:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\backend\attention.py", line 335, in attention_pytorch
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 8.00 GiB of which 6.57 GiB is free. Of the allocated memory 10.17 GiB is allocated by PyTorch, and 395.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 8.00 GiB of which 6.57 GiB is free. Of the allocated memory 10.17 GiB is allocated by PyTorch, and 395.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I need to generate a 1536* 1536 (or any big Hires) image, but it doesn't work. I am ready to do anything for this, even if it is generated for at least an hour. Are there extensions that make this possible?
GPU: RX580 8GB
CPU: intel xeon e3 1270 v3
RAM: 16gb.
Please help me.
And another problem:
I restart the SD every time after the first generation because the video card on the second generation does not clear the memory, after which it writes "Low GPU vram warning" to the console and the memory is clogged more, and generates slower.
Why is this so and can it be fixed? I immediately say that clearing
%temp%does not help.(the first generation is at the bottom, there are no warnings)