Summary
ImageGenerationEvent currently emits only .progress(step:total:) and .completed(URL). Consuming apps can show a "Step X of Y" spinner but cannot show the image forming during the denoising loop. The diffusion backends already iterate per-step latents internally — the data exists, it just isn't surfaced through the public event stream.
Adding a preview event would let apps deliver a live in-progress preview (the single biggest "delight" UX win for a local image-gen app), without each consumer having to reach into backend internals.
Current state
Sources/ManifoldModelCatalog/ImageGenerationEvent.swift — two cases only (.progress, .completed).
Sources/ManifoldMLX/MLXDiffusionBackend.swift and Sources/ManifoldMLX/Diffusion/Flux/FluxDiffusionBackend.swift already step through latents (generateLatents() / denoiser.i), so intermediate representations are available mid-loop.
Proposed change
Add a preview case to ImageGenerationEvent, e.g.:
/// Optional low-res/decoded preview of the image as it forms during the
/// denoising loop. Emitted on a subset of steps (backend-throttled).
case preview(step: Int, total: Int, url: URL) // or CGImage / raw pixels
Design questions to decide:
- Payload: file
URL (consistent with .completed, avoids CoreGraphics in the inference layer) vs. CGImage/pixel buffer (no disk write per preview tick). Decoding a latent to a viewable image each tick has a cost — a URL re-encode per step may be too heavy; a lightweight in-memory buffer throttled to every N steps may be better.
- Throttling: previews should be opt-in and/or throttled (e.g. every 2-4 steps, or a
previewStride on ImageGenerationConfig) so short 1-4 step Turbo/Schnell runs and long 20-50 step runs both behave sensibly.
- Opt-in: gate behind a config flag (
emitPreviews: Bool / previewStride: Int?) so backends that can't cheaply decode intermediates can no-op.
- Cost: VAE-decoding intermediate latents adds GPU work per preview; document the perf tradeoff.
Motivation / consumer
LocalImage (the macOS showcase app) wants to show the image emerging during generation instead of just a step counter. It can't today because the public stream has no preview channel. This belongs in MK so every consumer benefits and backends own the latent-decode logic.
Filed from the LocalImage integration review.
Summary
ImageGenerationEventcurrently emits only.progress(step:total:)and.completed(URL). Consuming apps can show a "Step X of Y" spinner but cannot show the image forming during the denoising loop. The diffusion backends already iterate per-step latents internally — the data exists, it just isn't surfaced through the public event stream.Adding a preview event would let apps deliver a live in-progress preview (the single biggest "delight" UX win for a local image-gen app), without each consumer having to reach into backend internals.
Current state
Sources/ManifoldModelCatalog/ImageGenerationEvent.swift— two cases only (.progress,.completed).Sources/ManifoldMLX/MLXDiffusionBackend.swiftandSources/ManifoldMLX/Diffusion/Flux/FluxDiffusionBackend.swiftalready step through latents (generateLatents()/denoiser.i), so intermediate representations are available mid-loop.Proposed change
Add a preview case to
ImageGenerationEvent, e.g.:Design questions to decide:
URL(consistent with.completed, avoids CoreGraphics in the inference layer) vs.CGImage/pixel buffer (no disk write per preview tick). Decoding a latent to a viewable image each tick has a cost — aURLre-encode per step may be too heavy; a lightweight in-memory buffer throttled to every N steps may be better.previewStrideonImageGenerationConfig) so short 1-4 step Turbo/Schnell runs and long 20-50 step runs both behave sensibly.emitPreviews: Bool/previewStride: Int?) so backends that can't cheaply decode intermediates can no-op.Motivation / consumer
LocalImage (the macOS showcase app) wants to show the image emerging during generation instead of just a step counter. It can't today because the public stream has no preview channel. This belongs in MK so every consumer benefits and backends own the latent-decode logic.
Filed from the LocalImage integration review.