Skip to content

fix: correct MIME type detection for Computer Use screenshots#261

Closed
amDosion wants to merge 2 commits intoclaude-code-best:mainfrom
amDosion:fix/cu-screenshot-mime-detection
Closed

fix: correct MIME type detection for Computer Use screenshots#261
amDosion wants to merge 2 commits intoclaude-code-best:mainfrom
amDosion:fix/cu-screenshot-mime-detection

Conversation

@amDosion
Copy link
Copy Markdown
Collaborator

@amDosion amDosion commented Apr 14, 2026

Summary

  • detectMimeFromBase64() was comparing raw byte magic numbers against base64-encoded characters — base64 transforms byte values so no condition ever matched, always returning image/png
  • When Windows backend produced JPEG screenshots, API returned 400 invalid_request_error: image was specified using image/png but appears to be image/jpeg
  • Fix: decode first 12 raw bytes from base64 and check standard magic byte signatures directly (PNG/JPEG/WebP/GIF)

Changes

packages/@ant/computer-use-mcp/src/toolCalls.ts — rewrite detectMimeFromBase64():

  • Decode Buffer.from(b64.slice(0, 16), "base64") to get raw bytes
  • PNG: 89 50 4E 47 (4-byte signature)
  • JPEG: FF D8 FF (3-byte, covers all JFIF/EXIF/DQT variants)
  • WebP: dual check RIFF (bytes 0-3) + WEBP (bytes 8-11), won't false-positive on WAV/AVI
  • GIF: GIF prefix (covers GIF87a and GIF89a)

Verification

  • Codex (GPT-5.4) independently verified all base64 prefixes via Buffer.from().toString('base64') computation
  • tsc --noEmit passes with zero new errors

Test plan

  • Run Computer Use screenshot on Windows — confirm no more 400 API error
  • Verify PNG screenshots still detected correctly
  • Verify JPEG screenshots now correctly labeled as image/jpeg

Summary by CodeRabbit

Release Notes

  • New Features
    • Added language preference selection (en/zh/auto) via /lang command
    • Introduced background session management: ps, logs, kill, and attach commands for session control
    • Added template jobs workflow for creating and managing reusable prompts
    • Enhanced assistant remote attachment with explicit session ID support
    • Expanded autonomous mode with managed autonomy flows and proactive orchestration
    • Added daemon status and stop commands for lifecycle management
    • Improved notification delivery and file sharing through bridge sessions

unraid added 2 commits April 13, 2026 20:22
…y, KAIROS activation, openclaw autonomy

Squashed merge of:
1. fix/mcp-tsc-errors — 修复上游 MCP 重构后的 tsc 错误和测试失败
2. feat/pipe-mute-disconnect — Pipe IPC 逻辑断开、/lang 命令、mute 状态机
3. feat/stub-recovery-all — 实现全部 stub 恢复 (task 001-012)
4. feat/kairos-activation — KAIROS 激活解除阻塞 + 工具实现
5. codex/openclaw-autonomy-pr — 自治权限系统、运行记录、managed flows

Conflicts resolved:
- src/commands/assistant/assistant.tsx (stub-recovery + kairos)
- src/services/api/openai/__tests__/queryModelOpenAI.test.ts (mcp-fix + autonomy)

Tested: bun test (2695 pass, 0 fail)
The original detectMimeFromBase64() compared raw byte magic numbers
(0x89, 0xFF, etc.) against charCodeAt(0) of a base64-encoded string.
Base64 encoding transforms byte values, so none of the conditions ever
matched and the function always returned the default "image/png" —
causing API 400 errors when screenshots were actually JPEG.

Fix: decode the first 12 raw bytes from the base64 string and check
standard magic byte signatures directly:
- PNG:  89 50 4E 47
- JPEG: FF D8 FF (covers all marker variants)
- WebP: RIFF header + WEBP at bytes 8-11 (precise, won't match WAV/AVI)
- GIF:  "GIF" prefix (covers GIF87a and GIF89a)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive autonomy subsystem with daemon state management, background session handling, template job creation, language preferences, and pipe muting infrastructure. It adds BG_SESSIONS and TEMPLATES build features, implements CLI handlers for background sessions and templates, establishes baseline test coverage, and adds bridge integration for push notifications and file uploads.

Changes

Cohort / File(s) Summary
Build & Dev Configuration
build.ts, scripts/dev.ts, tsconfig.json
Added BG_SESSIONS and TEMPLATES feature flags to build and dev defaults; extended TypeScript path aliases for mcp-client and agent-tools packages.
Documentation & Task Planning
docs/features/stub-recovery-design-1-4.md, docs/task/task-00*.md, docs/test-plans/openclaw-autonomy-baseline.md, 02-kairos (1).md, .gitignore
Added comprehensive task documentation for daemon status/stop, background sessions, template jobs, and assistant session attach; introduced KAIROS hidden assistant mode documentation and autonomy baseline test specification.
Daemon State & Lifecycle
src/daemon/state.ts, src/daemon/__tests__/state.test.ts, src/daemon/main.ts
Introduced daemon state persistence via JSON file (reading/writing/querying status), implemented status and stop subcommands, added lifecycle hooks for state management.
Assistant Mode & Gating
src/assistant/gate.ts, src/assistant/index.ts, src/assistant/sessionDiscovery.ts, src/assistant/AssistantSessionChooser.*
Removed dependency on getKairosActive() from assistant enablement; replaced stubs with real session discovery and React chooser component; updated session selection logic and system prompt addendum loading.
CLI Command Handlers
src/cli/bg.ts, src/cli/handlers/ant.ts, src/cli/handlers/templateJobs.ts, src/cli/rollback.ts, src/cli/up.ts
Implemented full background session management (ps/logs/kill/attach/--bg handlers); replaced task/log/error handlers with working CRUD flows; implemented template job lifecycle handlers; added rollback and up command implementations.
New Commands & Command Registry
src/commands/lang/index.ts, src/commands/lang/lang.ts, src/commands/autonomy.ts, src/commands/assistant/assistant.tsx, src/commands/torch.ts, src/commands/init.ts, src/commands.ts
Added lang command for language preference, autonomy command for run/flow inspection and control, new /assistant command with daemon installation wizard, reserved torch debug command; updated init prompt with autonomy agents path.
Job & Template System
src/jobs/state.ts, src/jobs/templates.ts, src/jobs/classifier.ts, src/jobs/__tests__/*.test.ts
Added job state persistence (creation, reading, reply appending); implemented template discovery and loading from filesystem; added job status classification from assistant messages; added comprehensive test coverage.
Autonomy Framework Core
src/utils/autonomyAuthority.ts, src/utils/autonomyFlows.ts, src/utils/autonomyRuns.ts, src/utils/autonomyPersistence.ts
Introduced autonomy authority loading (agents/heartbeat files), managed flow orchestration, autonomy run tracking with persistence, file-lock-based concurrent access control for autonomy subsystem.
Autonomy Integration Points
src/utils/handlePromptSubmit.ts, src/cli/print.ts, src/hooks/useScheduledTasks.ts, src/proactive/useProactive.ts, src/screens/REPL.tsx
Integrated autonomy run lifecycle tracking into prompt submission; updated proactive tick and scheduled-task handling to use autonomy queued commands; modified REPL to handle autonomy-aware queueing.
Pipe Muting & Master-Slave Sync
src/utils/pipeMuteState.ts, src/utils/pipeTransport.ts, src/utils/pipePermissionRelay.ts, src/hooks/usePipeIpc.ts, src/hooks/usePipeRelay.ts, src/hooks/usePipeMuteSync.ts, src/hooks/useMasterMonitor.ts, src/commands/send/send.ts
Added in-memory mute state tracking and send-override management; introduced relay_mute/relay_unmute pipe message types; implemented master-side mute synchronization with permission relay gating; added muted-message handling and automatic denial responses.
Teammate Integration
src/tasks/InProcessTeammateTask/InProcessTeammateTask.tsx, src/tasks/InProcessTeammateTask/types.ts, src/utils/swarm/inProcessRunner.ts, src/utils/swarm/spawnInProcess.ts
Extended teammate message injection with autonomy run metadata; updated pending message structure to carry autonomyRunId and origin; added autonomy run lifecycle tracking in in-process teammate execution.
Language & Localization
src/utils/language.ts, src/utils/config.ts, src/services/awaySummary.ts
Added language preference system with auto/en/zh support; integrated language resolution for away summary prompts; extended global config with preferredLanguage field.
Tool Bridge Integration
packages/builtin-tools/src/tools/PushNotificationTool/PushNotificationTool.ts, packages/builtin-tools/src/tools/SendUserFileTool/SendUserFileTool.ts
Updated tool signatures to accept context; implemented bridge delivery for push notifications and file uploads when appState.replBridgeEnabled is true; added graceful fallback behavior.
MIME Detection Enhancement
packages/@ant/computer-use-mcp/src/toolCalls.ts
Improved detectMimeFromBase64 to match magic byte sequences (PNG, JPEG, WebP, GIF) instead of relying on first-character heuristics.
Feature Flag Management
src/services/analytics/growthbook.ts, src/main.tsx
Added tengu_kairos_assistant to local gate defaults; updated fast-path gating for feature values; adjusted assistant activation logic to check both isAssistantForced() and raw options.
Proactive & Task Management
src/proactive/__tests__/state.baseline.test.ts, src/proactive/useProactive.ts, src/utils/taskSummary.ts, src/hooks/useAwaySummary.ts
Added proactive state baseline tests; updated proactive hook to work with autonomy queued commands; implemented background-session-aware task summary generation; added send override cleanup on task completion.
Baseline Test Suites
src/__tests__/context.baseline.test.ts, src/commands/__tests__/proactive.baseline.test.ts, src/commands/__tests__/autonomy.test.ts, src/services/api/openai/__tests__/queryModelOpenAI.*, src/services/langfuse/__tests__/langfuse.*
Added comprehensive baseline tests for context/language, proactive commands, autonomy commands; isolated OpenAI stream adapter and Langfuse tracing tests; established baseline assertions for cron tasks, scheduler, and autonomy authority.
Cron & Scheduling Tests
src/utils/__tests__/cronTasks.baseline.test.ts, src/utils/__tests__/cronScheduler.baseline.test.ts, src/utils/__tests__/autonomyAuthority.test.ts, src/utils/__tests__/autonomyRuns.test.ts, src/utils/__tests__/language.test.ts, src/utils/__tests__/pipeMuteState.test.ts, src/utils/__tests__/taskSummary.test.ts
Added comprehensive baseline and integration tests for cron task persistence, scheduler helpers, autonomy authority loading, autonomy run lifecycle, language resolution, pipe mute state, and task summary generation.
Test Infrastructure
src/services/api/openai/__tests__/streamAdapter.test.ts, src/jobs/__tests__/state.test.ts, src/jobs/__tests__/templates.test.ts, src/jobs/__tests__/classifier.test.ts, tests/mocks/file-system.ts
Added/updated isolated test suites for job state persistence, template discovery, job classification; improved test file-system mock to ensure parent directory creation.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • KonghaYao

🐰 The Rabbit's Tale of Autonomy

In meadows where daemons run free,
Sessions dance with templates and spree,
Languages bloom in en and zh,
While pipes hush their songs with a "shhh"—
New flows flourish, managed with glee! 🌿✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@amDosion amDosion closed this Apr 14, 2026
@amDosion amDosion deleted the fix/cu-screenshot-mime-detection branch April 14, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant