Context
WWDC 2026 announced that AFM 3 (Apple's third-generation on-device model) is 'natively multimodal' — developer docs confirm: 'Multimodal prompts let you pass images alongside text so your app can reason about visual content, and Vision framework tools like OCR and barcode readers are available for your model to call directly, all on-device.'
FoundationBackend currently advertises supportsVision: false because the pre-WWDC FoundationModels SDK (Xcode 26.4, module 1.4.34) had no image-input surface. This issue tracks wiring the new API.
Ref: #20 (closed — FoundationBackend was the last unchecked checkbox)
Investigation required first (Phase 0)
A post-WWDC Xcode beta may have shipped with the image API. Before implementing, probe the installed SDK:
- Check for an
image / attachment case on Transcript.Segment
- Check for a
Data- or CGImage-accepting Prompt initializer
- Check for a
PromptRepresentable conformance on image types
- Check
PromptBuilder for new content cases
If the SDK doesn't yet expose image input, document the gap and revisit when the SDK ships.
Implementation (once SDK confirmed)
Cross-cutting
- UI:
PhotoAttachmentButton / VisionInputButton already ship — no UI changes needed once vision is wired
- Open question: does image routing go on-device or to Private Cloud Compute? Impact on on-device positioning claim.
Context
WWDC 2026 announced that AFM 3 (Apple's third-generation on-device model) is 'natively multimodal' — developer docs confirm: 'Multimodal prompts let you pass images alongside text so your app can reason about visual content, and Vision framework tools like OCR and barcode readers are available for your model to call directly, all on-device.'
FoundationBackendcurrently advertisessupportsVision: falsebecause the pre-WWDC FoundationModels SDK (Xcode 26.4, module 1.4.34) had no image-input surface. This issue tracks wiring the new API.Ref: #20 (closed — FoundationBackend was the last unchecked checkbox)
Investigation required first (Phase 0)
A post-WWDC Xcode beta may have shipped with the image API. Before implementing, probe the installed SDK:
image/attachmentcase onTranscript.SegmentData- orCGImage-acceptingPromptinitializerPromptRepresentableconformance on image typesPromptBuilderfor new content casesIf the SDK doesn't yet expose image input, document the gap and revisit when the SDK ships.
Implementation (once SDK confirmed)
supportsVision: trueinFoundationBackend.capabilitiesMessagePart.image(data:mimeType:)→ SDK image type ingenerate()(follow ClaudeBackend / OpenAIBackend pattern)FoundationBackenddoc comment; describe the wiringFoundationBackend.probeIsReady()Cross-cutting
PhotoAttachmentButton/VisionInputButtonalready ship — no UI changes needed once vision is wired