|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Stable Diffusion Turbo - Text-to-Image Generation\n", |
| 8 | + "\n", |
| 9 | + "\n", |
| 10 | + "\n", |
| 11 | + "This example shows how you can use the power of a GPU to quickly generate AI images from text prompts in Saturn Cloud. This code runs on a single GPU of a Jupyter server resource.\n", |
| 12 | + "\n", |
| 13 | + "This is an example of a text-to-image generation model using Stable Diffusion Turbo, which can create photorealistic or artistic images from natural language descriptions. The model uses a diffusion process that starts with random noise and gradually refines it into a coherent image based on your text prompt. The \"Turbo\" version is optimized for speed, generating high-quality images in just 1-4 steps instead of the typical 50+ steps.\n", |
| 14 | + "\n", |
| 15 | + "This notebook creates an interactive Gradio interface where users can type any prompt and instantly see generated images. The interface can be viewed in the notebook or deployed to Saturn Cloud for continuous hosting." |
| 16 | + ] |
| 17 | + }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "metadata": {}, |
| 21 | + "source": [ |
| 22 | + "First, we ensure all required packages are installed. This will only install packages that are missing from your environment. We use specific versions of `diffusers`, `transformers`, and `accelerate` to maintain compatibility with the Stable Diffusion Turbo model." |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "code", |
| 27 | + "execution_count": null, |
| 28 | + "metadata": {}, |
| 29 | + "outputs": [], |
| 30 | + "source": [ |
| 31 | + "!pip install -q \"diffusers[torch]==0.35.1\" transformers==4.56.2 accelerate==1.10.1 gradio safetensors gradio>=4.0.0 pillow>=9.5.0 " |
| 32 | + ] |
| 33 | + }, |
| 34 | + { |
| 35 | + "cell_type": "markdown", |
| 36 | + "metadata": {}, |
| 37 | + "source": [ |
| 38 | + "This code mainly relies on the Diffusers library for loading the Stable Diffusion model, Gradio for creating the interactive interface, and PyTorch for GPU acceleration. The architecture and components are automatically handled." |
| 39 | + ] |
| 40 | + }, |
| 41 | + { |
| 42 | + "cell_type": "code", |
| 43 | + "execution_count": null, |
| 44 | + "metadata": {}, |
| 45 | + "outputs": [], |
| 46 | + "source": [ |
| 47 | + "import torch\n", |
| 48 | + "from diffusers import AutoPipelineForText2Image\n", |
| 49 | + "import gradio as gr\n", |
| 50 | + "from PIL import Image\n", |
| 51 | + "import numpy as np\n", |
| 52 | + "import matplotlib.pyplot as plt" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "markdown", |
| 57 | + "metadata": {}, |
| 58 | + "source": [ |
| 59 | + "We load the SDXL-Turbo model from Hugging Face Hub, which is optimized for fast single-step image generation. The model is approximately 7GB and will be downloaded on first run." |
| 60 | + ] |
| 61 | + }, |
| 62 | + { |
| 63 | + "cell_type": "code", |
| 64 | + "execution_count": null, |
| 65 | + "metadata": {}, |
| 66 | + "outputs": [], |
| 67 | + "source": [ |
| 68 | + "print(\"Loading Stable Diffusion Turbo model...\")\n", |
| 69 | + "print(\"(This may take a few minutes on first run as the model downloads)\\n\")\n", |
| 70 | + "\n", |
| 71 | + "# Load the model with optimizations for GPU\n", |
| 72 | + "pipe = AutoPipelineForText2Image.from_pretrained(\n", |
| 73 | + " \"stabilityai/sdxl-turbo\",\n", |
| 74 | + " torch_dtype=torch.float16,\n", |
| 75 | + " variant=\"fp16\"\n", |
| 76 | + ")\n", |
| 77 | + "\n", |
| 78 | + "# Move model to GPU\n", |
| 79 | + "pipe = pipe.to(\"cuda\")\n", |
| 80 | + "\n", |
| 81 | + "pipe.enable_attention_slicing()\n", |
| 82 | + "\n", |
| 83 | + "print(\"✓ Model loaded successfully!\")" |
| 84 | + ] |
| 85 | + }, |
| 86 | + { |
| 87 | + "cell_type": "markdown", |
| 88 | + "metadata": {}, |
| 89 | + "source": [ |
| 90 | + "Here we define the core function that interacts with the loaded pipeline. The `generate_image` function takes a text prompt and uses the diffusion model to create an image. For the Turbo model, we use just 1 inference step and a guidance scale of 0.0 for maximum speed, as this is its intended configuration. The function also supports a random seed for reproducible results." |
| 91 | + ] |
| 92 | + }, |
| 93 | + { |
| 94 | + "cell_type": "code", |
| 95 | + "execution_count": null, |
| 96 | + "metadata": {}, |
| 97 | + "outputs": [], |
| 98 | + "source": [ |
| 99 | + "def generate_image(prompt, num_inference_steps=1, guidance_scale=0.0, seed=None):\n", |
| 100 | + " \"\"\"\n", |
| 101 | + " Generate an image from a text prompt.\n", |
| 102 | + "\n", |
| 103 | + " Args:\n", |
| 104 | + " prompt: Text description of the desired image\n", |
| 105 | + " num_inference_steps: Number of denoising steps (1 for Turbo)\n", |
| 106 | + " guidance_scale: How strictly to follow the prompt (0.0 for Turbo)\n", |
| 107 | + " seed: Random seed for reproducibility (None for random)\n", |
| 108 | + "\n", |
| 109 | + " Returns:\n", |
| 110 | + " PIL Image\n", |
| 111 | + " \"\"\"\n", |
| 112 | + " # Set seed for reproducibility if provided\n", |
| 113 | + " generator = None\n", |
| 114 | + " device = pipe.device # Define the device\n", |
| 115 | + " if seed is not None:\n", |
| 116 | + " generator = torch.Generator(device=device).manual_seed(seed)\n", |
| 117 | + "\n", |
| 118 | + " # Generate image\n", |
| 119 | + " image = pipe(\n", |
| 120 | + " prompt=prompt,\n", |
| 121 | + " num_inference_steps=num_inference_steps,\n", |
| 122 | + " guidance_scale=guidance_scale,\n", |
| 123 | + " generator=generator\n", |
| 124 | + " ).images[0]\n", |
| 125 | + "\n", |
| 126 | + " return image" |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "markdown", |
| 131 | + "metadata": {}, |
| 132 | + "source": [ |
| 133 | + "To showcase the model's ability to create variations, we define a function to generate a grid of images from the same prompt using different random seeds. This is useful for exploring different interpretations of a single concept. The function creates a matplotlib figure displaying all generated images in a grid layout with their corresponding seeds." |
| 134 | + ] |
| 135 | + }, |
| 136 | + { |
| 137 | + "cell_type": "code", |
| 138 | + "execution_count": null, |
| 139 | + "metadata": {}, |
| 140 | + "outputs": [], |
| 141 | + "source": [ |
| 142 | + "# Gradio Blocks Interface for Image Grid Generation\n", |
| 143 | + "def generate_image_grid_gradio(prompt, num_images=4, seed_start=42):\n", |
| 144 | + " \"\"\"\n", |
| 145 | + " Generate a grid of images for Gradio interface.\n", |
| 146 | + " \n", |
| 147 | + " Args:\n", |
| 148 | + " prompt: Text description\n", |
| 149 | + " num_images: Number of variations to generate\n", |
| 150 | + " seed_start: Starting seed\n", |
| 151 | + " \n", |
| 152 | + " Returns:\n", |
| 153 | + " matplotlib figure\n", |
| 154 | + " \"\"\"\n", |
| 155 | + " images = []\n", |
| 156 | + " seeds = []\n", |
| 157 | + " \n", |
| 158 | + " for i in range(num_images):\n", |
| 159 | + " seed = seed_start + i\n", |
| 160 | + " image = generate_image(prompt, seed=seed)\n", |
| 161 | + " images.append(image)\n", |
| 162 | + " seeds.append(seed)\n", |
| 163 | + " \n", |
| 164 | + " # Create grid layout\n", |
| 165 | + " grid_size = int(np.ceil(np.sqrt(num_images)))\n", |
| 166 | + " fig, axes = plt.subplots(grid_size, grid_size, figsize=(10, 10))\n", |
| 167 | + " \n", |
| 168 | + " if grid_size == 1:\n", |
| 169 | + " axes = [[axes]]\n", |
| 170 | + " elif len(axes.shape) == 1:\n", |
| 171 | + " axes = [axes]\n", |
| 172 | + " \n", |
| 173 | + " fig.suptitle(f'Prompt: \"{prompt}\"', fontsize=14, fontweight='bold')\n", |
| 174 | + " \n", |
| 175 | + " for idx, ax in enumerate(axes.flat):\n", |
| 176 | + " if idx < len(images):\n", |
| 177 | + " ax.imshow(images[idx])\n", |
| 178 | + " ax.set_title(f'Seed: {seeds[idx]}', fontsize=8)\n", |
| 179 | + " ax.axis('off')\n", |
| 180 | + " \n", |
| 181 | + " plt.tight_layout()\n", |
| 182 | + " return fig" |
| 183 | + ] |
| 184 | + }, |
| 185 | + { |
| 186 | + "cell_type": "markdown", |
| 187 | + "metadata": {}, |
| 188 | + "source": [ |
| 189 | + "Now let's create an advanced interactive web interface using Gradio's Blocks API. This interface provides more control, allowing users to generate a grid of image variations from a single prompt. It includes sliders to control the number of images and the starting seed for reproducibility." |
| 190 | + ] |
| 191 | + }, |
| 192 | + { |
| 193 | + "cell_type": "code", |
| 194 | + "execution_count": null, |
| 195 | + "metadata": {}, |
| 196 | + "outputs": [], |
| 197 | + "source": [ |
| 198 | + "# Create Gradio Blocks interface\n", |
| 199 | + "with gr.Blocks(title=\"🎨 AI Image Grid Generator\") as demo:\n", |
| 200 | + " gr.Markdown(\"# 🎨 AI Image Grid Generator\")\n", |
| 201 | + " gr.Markdown(\"Generate multiple variations of the same prompt with different seeds\")\n", |
| 202 | + " \n", |
| 203 | + " with gr.Row():\n", |
| 204 | + " with gr.Column():\n", |
| 205 | + " prompt_input = gr.Textbox(\n", |
| 206 | + " label=\"Prompt\",\n", |
| 207 | + " placeholder=\"Describe the image you want to generate...\",\n", |
| 208 | + " lines=2\n", |
| 209 | + " )\n", |
| 210 | + " num_images = gr.Slider(\n", |
| 211 | + " minimum=1,\n", |
| 212 | + " maximum=9,\n", |
| 213 | + " value=4,\n", |
| 214 | + " step=1,\n", |
| 215 | + " label=\"Number of Images\"\n", |
| 216 | + " )\n", |
| 217 | + " seed_start = gr.Slider(\n", |
| 218 | + " minimum=0,\n", |
| 219 | + " maximum=1000,\n", |
| 220 | + " value=42,\n", |
| 221 | + " step=1,\n", |
| 222 | + " label=\"Starting Seed\"\n", |
| 223 | + " )\n", |
| 224 | + " generate_btn = gr.Button(\"Generate Image Grid\", variant=\"primary\")\n", |
| 225 | + " \n", |
| 226 | + " with gr.Column():\n", |
| 227 | + " output_plot = gr.Plot(label=\"Generated Images\")\n", |
| 228 | + " \n", |
| 229 | + " examples = gr.Examples(\n", |
| 230 | + " examples=[\n", |
| 231 | + " [\"a cat wearing sunglasses on a beach\", 4, 42],\n", |
| 232 | + " [\"a futuristic city at sunset with flying cars\", 4, 123],\n", |
| 233 | + " [\"a magical forest with glowing mushrooms\", 4, 456]\n", |
| 234 | + " ],\n", |
| 235 | + " inputs=[prompt_input, num_images, seed_start]\n", |
| 236 | + " )\n", |
| 237 | + " \n", |
| 238 | + " generate_btn.click(\n", |
| 239 | + " fn=generate_image_grid_gradio,\n", |
| 240 | + " inputs=[prompt_input, num_images, seed_start],\n", |
| 241 | + " outputs=output_plot\n", |
| 242 | + " )\n", |
| 243 | + "\n", |
| 244 | + "print(\"✓ Gradio Grid interface created!\")\n", |
| 245 | + "demo.launch(share=False)" |
| 246 | + ] |
| 247 | + }, |
| 248 | + { |
| 249 | + "cell_type": "code", |
| 250 | + "execution_count": null, |
| 251 | + "metadata": {}, |
| 252 | + "outputs": [], |
| 253 | + "source": [ |
| 254 | + "def gradio_generate(prompt, seed, num_steps):\n", |
| 255 | + " \"\"\"\n", |
| 256 | + " Wrapper function for Gradio interface.\n", |
| 257 | + " \n", |
| 258 | + " Args:\n", |
| 259 | + " prompt: Text description\n", |
| 260 | + " seed: Random seed (-1 for random)\n", |
| 261 | + " num_steps: Number of inference steps (1-4 for Turbo)\n", |
| 262 | + " \n", |
| 263 | + " Returns:\n", |
| 264 | + " PIL Image\n", |
| 265 | + " \"\"\"\n", |
| 266 | + " # Convert -1 to None for random seed\n", |
| 267 | + " actual_seed = None if seed == -1 else seed\n", |
| 268 | + " \n", |
| 269 | + " # Generate and return image\n", |
| 270 | + " return generate_image(prompt, num_inference_steps=num_steps, seed=actual_seed)\n", |
| 271 | + "\n", |
| 272 | + "\n", |
| 273 | + "# Create Gradio interface\n", |
| 274 | + "demo = gr.Interface(\n", |
| 275 | + " fn=gradio_generate,\n", |
| 276 | + " inputs=[\n", |
| 277 | + " gr.Textbox(\n", |
| 278 | + " label=\"Prompt\",\n", |
| 279 | + " placeholder=\"Describe the image you want to generate...\",\n", |
| 280 | + " lines=3\n", |
| 281 | + " ),\n", |
| 282 | + " gr.Slider(\n", |
| 283 | + " minimum=-1,\n", |
| 284 | + " maximum=10000,\n", |
| 285 | + " value=-1,\n", |
| 286 | + " step=1,\n", |
| 287 | + " label=\"Seed (-1 for random)\"\n", |
| 288 | + " ),\n", |
| 289 | + " gr.Slider(\n", |
| 290 | + " minimum=1,\n", |
| 291 | + " maximum=4,\n", |
| 292 | + " value=1,\n", |
| 293 | + " step=1,\n", |
| 294 | + " label=\"Inference Steps (1 = fastest, 4 = highest quality)\"\n", |
| 295 | + " )\n", |
| 296 | + " ],\n", |
| 297 | + " outputs=gr.Image(label=\"Generated Image\", type=\"pil\"),\n", |
| 298 | + " title=\"🎨 Stable Diffusion Turbo - Text-to-Image Generator\",\n", |
| 299 | + " description=\"Generate AI images from text prompts using Stable Diffusion Turbo. Type your prompt and click Submit!\",\n", |
| 300 | + " examples=[\n", |
| 301 | + " [\"a cat wearing sunglasses on a beach\", 42, 1],\n", |
| 302 | + " [\"a futuristic city at sunset with flying cars\", 123, 1],\n", |
| 303 | + " [\"a magical forest with glowing mushrooms and fairy lights\", 456, 1],\n", |
| 304 | + " [\"portrait of a wise old wizard with a long beard\", 789, 1],\n", |
| 305 | + " [\"a steampunk robot playing piano in a Victorian mansion\", 999, 1],\n", |
| 306 | + " [\"an underwater city with bioluminescent coral and fish\", 555, 1]\n", |
| 307 | + " ],\n", |
| 308 | + " theme=gr.themes.Soft(),\n", |
| 309 | + " allow_flagging=\"never\"\n", |
| 310 | + ")\n", |
| 311 | + "\n", |
| 312 | + "print(\"\\n✓ Gradio interface created!\")\n", |
| 313 | + "print(\" Run the next cell to launch the interface.\")" |
| 314 | + ] |
| 315 | + }, |
| 316 | + { |
| 317 | + "cell_type": "markdown", |
| 318 | + "metadata": {}, |
| 319 | + "source": [ |
| 320 | + "### Launch the Interface\n", |
| 321 | + "\n", |
| 322 | + "Run this cell to launch the interactive Gradio interface. You can use it directly in the notebook or share the public URL. The interface connects the UI elements to the model, allowing real-time image generation." |
| 323 | + ] |
| 324 | + }, |
| 325 | + { |
| 326 | + "cell_type": "code", |
| 327 | + "execution_count": null, |
| 328 | + "metadata": {}, |
| 329 | + "outputs": [], |
| 330 | + "source": [ |
| 331 | + "# Launch the interface\n", |
| 332 | + "demo.launch(share=False, debug=False)" |
| 333 | + ] |
| 334 | + }, |
| 335 | + { |
| 336 | + "cell_type": "markdown", |
| 337 | + "metadata": {}, |
| 338 | + "source": [ |
| 339 | + "The interface is now live! You can:\n", |
| 340 | + "- Type any prompt in the text box\n", |
| 341 | + "- Adjust the seed for reproducibility\n", |
| 342 | + "- Change inference steps (1 = fastest, 4 = best quality)\n", |
| 343 | + "- Click example prompts to try them instantly\n", |
| 344 | + "- Generate unlimited images!\n", |
| 345 | + "\n", |
| 346 | + "**Tip:** For best results, be descriptive in your prompts. Instead of \"a cat\", try \"a fluffy orange cat sitting on a windowsill at sunset\"." |
| 347 | + ] |
| 348 | + } |
| 349 | + ], |
| 350 | + "metadata": { |
| 351 | + "kernelspec": { |
| 352 | + "display_name": "Python 3 (ipykernel)", |
| 353 | + "language": "python", |
| 354 | + "name": "python3" |
| 355 | + }, |
| 356 | + "language_info": { |
| 357 | + "codemirror_mode": { |
| 358 | + "name": "ipython", |
| 359 | + "version": 3 |
| 360 | + }, |
| 361 | + "file_extension": ".py", |
| 362 | + "mimetype": "text/x-python", |
| 363 | + "name": "python", |
| 364 | + "nbconvert_exporter": "python", |
| 365 | + "pygments_lexer": "ipython3", |
| 366 | + "version": "3.13.7" |
| 367 | + } |
| 368 | + }, |
| 369 | + "nbformat": 4, |
| 370 | + "nbformat_minor": 4 |
| 371 | +} |
0 commit comments