openai · MoizIbnYousaf · Apr 25, 2026 · Apr 26, 2026 · chatgpt-codex-connector · Apr 25, 2026
diff --git a/plugins/codex/agents/codex-image.md b/plugins/codex/agents/codex-image.md
@@ -0,0 +1,64 @@
+---
+name: codex-image
+description: Proactively use when the user wants Codex to generate an image. Drafts a craft-grade prompt that respects the six community-tested rules for high-end image models, then forwards exactly one task call to the Codex companion runtime so Codex can call its native image generation tool.
+tools: Bash
+skills:
+  - codex-cli-runtime
+  - gpt-5-4-prompting
+  - image
+---
+
+You are a thin forwarding wrapper around the Codex companion task runtime, specialized for image generation.
+
+Your only job is to:
+
+1. Apply the `image` skill to turn the user's image intent into a craft-grade prompt that respects the six rules (style-first, quoted text, explicit pixel dimensions, full constraints block).
+2. Wrap that prompt in a single Codex `task` instruction that tells Codex to call its native image generation tool with the prompt.
+3. Forward that single instruction to the Codex companion task runtime, then immediately ask the runtime which PNG(s) actually landed on disk.
+4. Return the runtime's stdout verbatim, including the trailing `==Generated PNG(s)==` block from `latest-images`.
+
+Selection guidance:
+
+- Use this subagent only when the user wants Codex to generate an image.
+- Do not handle review, debugging, refactor, or non-image generation requests. Those belong to `codex-rescue`.
+
+Why we always run `latest-images` after `task`:
+
+Codex's native image generation tool always saves PNGs to `~/.codex/generated_images/<thread-id>/ig_*.png`. Codex's text response can mention a different path, but that text is not authoritative — the file is in the native location. We always end the Bash call by invoking `latest-images --since <ms>` so the user sees the real absolute path. If the user supplied `--out <path>`, `latest-images --copy-to <path>` copies the real PNG to that location and reports the copied path.
+
+Forwarding rules:
+
+- Use exactly one `Bash` call. That call chains three steps in order:
+  1. `SINCE_MS=$(node -e 'console.log(Date.now())')` captured BEFORE invoking Codex.
+  2. `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write "<wrapped prompt>"` — the Codex run.
+  3. `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" latest-images --since "$SINCE_MS"` — append the real saved paths. If the user supplied `--out <path>`, add `--copy-to "<path>"` to the `latest-images` call.
+  Use `set +e` (or capture the task exit code) so step 3 still runs even if step 2 returned non-zero. Exit with the task step's exit code so callers see whether Codex itself succeeded.
+- Always pass `--write` to the `task` call so Codex can save the generated PNG.
+- If the user did not explicitly choose `--background` or `--wait`, prefer foreground. Single image generations are usually fast.
+- If the user asked for a series of images or multi-step image work, prefer background.
+- You may use the `gpt-5-4-prompting` skill to tighten the wrapping `<task>` block, but the inner image prompt itself must be drafted via the `image` skill rules.
+- Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own.
+- Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`. This subagent only chains `task` and `latest-images`.
+- Leave model unset by default. Only add `--model` when the user explicitly asks for a specific Codex model. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
+- Treat `--effort <value>`, `--model <value>`, `--background`, `--wait`, and `--out <path>` as routing controls. Do not include them in the task text you pass through.
+
+Image prompt drafting rules:
+
+- Apply every rule from the `image` skill: lead with style and intended use, quote every literal string the user wants visible, end with an explicit pixel-dimension line.
+- If the user supplied dimensions or a ratio, honor them and convert ratios to explicit pixel dimensions.
+- If the user supplied no dimensions, infer from intent using the defaults table in the `image` skill (landscape `1536x1024` is the safe default).
+- Do not ask follow-up questions. The slash command already prompted the user once; commit to a craft-grade prompt from whatever intent you received.
+
+Wrapping the task for Codex:
+
+The wrapping instruction sent to Codex must be a single `<task>` block with these elements (use the `gpt-5-4-prompting` skill for the XML structure):
+
+- `<task>`: tell Codex to use its built-in image generation tool to render the prompt below verbatim. Make it explicit that the prompt is the artifact and must not be paraphrased, shortened, or "improved."
+- `<image_prompt>`: the drafted image prompt, verbatim, with all double-quoted literal strings preserved exactly.
+- `<completeness_contract>`: Codex must call its native `image_generation` tool exactly once. The subagent will discover the actual saved path via `latest-images` after the turn ends, so Codex does not need to print the path itself or copy the file.
+- `<action_safety>`: do not modify any file outside the chosen output directory. Do not run unrelated commands. Do not edit a previously generated image as a reference; generate fresh from the prompt.
+
+Response style:
+
+- Do not add commentary before or after the chained Bash output. The user sees Codex's stdout followed immediately by the `==Generated PNG(s)==` block from `latest-images`.
+- If the Bash call fails or Codex cannot be invoked, return nothing.
diff --git a/plugins/codex/commands/image.md b/plugins/codex/commands/image.md
@@ -0,0 +1,32 @@
+---
+description: Generate an image by handing a craft-grade prompt to Codex through the shared runtime so Codex can call its native image generation tool
+argument-hint: "[--background|--wait] [--model <model|spark>] [--out <path>] [what you want the image to show]"
+allowed-tools: Bash(node:*), AskUserQuestion, Agent
+---
+
+Invoke the `codex:codex-image` subagent via the `Agent` tool (`subagent_type: "codex:codex-image"`), forwarding the raw user request as the prompt.
+`codex:codex-image` is a subagent, not a skill — do not call `Skill(codex:codex-image)` (no such skill) or `Skill(codex:image)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
+The final user-visible response must be Codex's output verbatim.
+
+Raw user request:
+$ARGUMENTS
+
+Execution mode:
+
+- If the request includes `--background`, run the `codex:codex-image` subagent in the background.
+- If the request includes `--wait`, run the `codex:codex-image` subagent in the foreground.
+- If neither flag is present, default to foreground. Most single-image generations finish in well under a minute.
+- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `task`, and do not treat them as part of the natural-language image intent.
+- `--model` is a runtime-selection flag for the Codex side (the model that drives the image generation tool). Preserve it for the forwarded `task` call, but do not treat it as part of the image intent.
+- `--out` is an optional absolute path for the saved PNG. If omitted, Codex uses its native generated_images directory and prints the absolute path. Preserve `--out` for the subagent.
+
+Operating rules:
+
+- The subagent is a thin forwarder only. It uses one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...` and returns that command's stdout as-is.
+- Return the Codex companion stdout verbatim to the user.
+- Do not paraphrase, summarize, rewrite, or add commentary before or after it.
+- Do not ask the subagent to inspect the repository, monitor progress, poll `/codex:status`, fetch `/codex:result`, call `/codex:cancel`, or do follow-up work of its own.
+- Leave model unset on the Codex side unless the user explicitly asks for one. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
+- This command is write-capable on the Codex side because Codex needs to save the resulting PNG to disk and optionally copy it to the user's `--out` path. Always pass `--write`.
+- If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `/codex:setup`.
+- If the user did not supply an image intent, ask what the image should show.
diff --git a/plugins/codex/scripts/codex-companion.mjs b/plugins/codex/scripts/codex-companion.mjs
@@ -80,11 +80,108 @@ function printUsage() {
       "  node scripts/codex-companion.mjs task [--background] [--write] [--resume-last|--resume|--fresh] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>] [prompt]",
       "  node scripts/codex-companion.mjs status [job-id] [--all] [--json]",
       "  node scripts/codex-companion.mjs result [job-id] [--json]",
-      "  node scripts/codex-companion.mjs cancel [job-id] [--json]"
+      "  node scripts/codex-companion.mjs cancel [job-id] [--json]",
+      "  node scripts/codex-companion.mjs latest-images --since <epoch-ms> [--copy-to <abs-path>] [--json]"
     ].join("\n")
   );
 }
 
+function handleLatestImages(argv) {
+  const { options } = parseCommandInput(argv, {
+    valueOptions: ["since", "copy-to"],
+    booleanOptions: ["json"]
+  });
+  const sinceRaw = options["since"];
+  const copyTo = options["copy-to"];
+  const asJson = Boolean(options["json"]);
+
+  if (!sinceRaw) {
+    throw new Error("latest-images requires --since <epoch-ms>");
+  }
+  const sinceMs = Number(sinceRaw);
+  if (!Number.isFinite(sinceMs)) {
+    throw new Error(`latest-images --since must be a millisecond epoch, got: ${sinceRaw}`);
+  }
+
+  const root = path.join(process.env.HOME || process.env.USERPROFILE || ".", ".codex", "generated_images");
+  const matches = [];
+  if (fs.existsSync(root)) {
+    const stack = [root];
+    while (stack.length > 0) {
+      const dir = stack.pop();
+      let entries;
+      try {
+        entries = fs.readdirSync(dir, { withFileTypes: true });
+      } catch {
+        continue;
+      }
+      for (const entry of entries) {
+        const full = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          stack.push(full);
+        } else if (entry.isFile() && /\.png$/i.test(entry.name)) {
+          let stat;
+          try {
+            stat = fs.statSync(full);
+          } catch {
+            continue;
+          }
+          if (stat.mtimeMs >= sinceMs) {
+            matches.push({ path: full, mtimeMs: stat.mtimeMs });
+          }
+        }
+      }
+    }
+  }
+
+  matches.sort((a, b) => a.mtimeMs - b.mtimeMs);
+
+  const copied = [];
+  if (copyTo && matches.length > 0) {
+    const looksLikeFile = /\.png$/i.test(copyTo);
+    if (looksLikeFile && matches.length === 1) {
+      fs.mkdirSync(path.dirname(copyTo), { recursive: true });
+      fs.copyFileSync(matches[0].path, copyTo);
+      copied.push(path.resolve(copyTo));
+    } else {
+      const targetDir = looksLikeFile ? path.dirname(copyTo) : copyTo;
+      fs.mkdirSync(targetDir, { recursive: true });
+      const basenameRoot = looksLikeFile
+        ? path.basename(copyTo, path.extname(copyTo))
+        : "codex-image";
+      matches.forEach((match, index) => {
+        const suffix = matches.length === 1 ? "" : `-${index + 1}`;
+        const target = path.join(targetDir, `${basenameRoot}${suffix}.png`);
+        fs.copyFileSync(match.path, target);
+        copied.push(path.resolve(target));
+      });
+    }
+  }
+
+  const sourcePaths = matches.map((m) => m.path);
+  if (asJson) {
+    console.log(JSON.stringify({ sources: sourcePaths, copied }, null, 2));
+    return;
+  }
+  if (sourcePaths.length === 0) {
+    process.stdout.write("==Generated PNG(s)==\n(none — no images written by the image_generation tool during this window)\n==/Generated PNG(s)==\n");
+    return;
+  }
+  const lines = ["==Generated PNG(s)=="];
+  if (copied.length > 0) {
+    for (const target of copied) {
+      lines.push(target);
+    }
+    lines.push(`(originals in ~/.codex/generated_images/, copied to the path${copied.length > 1 ? "s" : ""} above)`);
+  } else {
+    for (const source of sourcePaths) {
+      lines.push(source);
+    }
+  }
+  lines.push("==/Generated PNG(s)==");
+  process.stdout.write(lines.join("\n") + "\n");
+}
+
 function outputResult(value, asJson) {
   if (asJson) {
     console.log(JSON.stringify(value, null, 2));
@@ -1015,6 +1112,9 @@ async function main() {
     case "cancel":
       await handleCancel(argv);
       break;
+    case "latest-images":
+      handleLatestImages(argv);
+      break;
     default:
       throw new Error(`Unknown subcommand: ${subcommand}`);
   }

diff --git a/plugins/codex/skills/image/SKILL.md b/plugins/codex/skills/image/SKILL.md
@@ -0,0 +1,70 @@
+---
+name: image
+description: Internal guidance for drafting craft-grade image prompts that Codex will pass to its native image generation tool inside the Codex Claude Code plugin
+user-invocable: false
+---
+
+# Image Prompting
+
+Use this skill only inside the `codex:codex-image` subagent.
+
+Modern frontier image models (GPT Image 2 and successors) plan, reference, critique, and iterate before rendering. Treat the prompt as context, not a description. Diffusion-era prompt habits leave most of the model's capability unused.
+
+Codex has a stable built-in `image_generation` feature. The subagent does not need to write a script or call any external API — it just hands a craft-grade prompt to Codex with a `task` instruction telling Codex to use its native image tool.
+
+## The six rules (community-tested in the first thirty days post-launch)
+
+1. **Lead with style and intended use.** The first words carry the highest visual weight. Open with the medium and aesthetic — "Premium editorial magazine cover...", "High-fidelity iOS UI screenshot...", "Photoreal editorial food photograph, shot on a Leica Q3 full-frame..." — before naming the subject.
+2. **Quote every literal string.** Anything that must appear in the rendered image — labels, taglines, button copy, dates, file paths, handles, captions, all of it — goes inside double quotes inside the prompt. Quoting engages the high-accuracy text rendering path. Typography drifts when you do not.
+3. **Treat the prompt as context.** Pack palette hex values, brand rules, anti-patterns, polish details, and named font families into the prompt. The model reasons over them.
+4. **Aspect ratio = explicit pixel dimensions.** End every prompt with a literal line like `Output in exactly 1536px x 1024px (3:2 ratio) landscape format.` Do not rely on a bare ratio string. Map the user's intent or supplied ratio into pixel dimensions before sending.
+5. **Constraints block is mandatory.** A dedicated paragraph of what NOT to do — typically as long as the subject section. The most underused part of an image prompt.
+6. **Generate fresh, do not edit.** Image-to-image is still unreliable. If the user pastes a reference image, extract its qualities into words and regenerate from text only. Tell Codex explicitly to generate fresh, not to use a previous image as a starting point.
+
+## Crafting checklist
+
+Build the inner image prompt in this exact order. Every section is mandatory unless flagged optional.
+
+1. **Style + intended use.** Open with the medium and aesthetic. For photoreal work, name the camera, lens, film stock, and lighting condition — specificity is realism.
+2. **Scene.** Where, when, lighting, mood, weather, time of day. One paragraph.
+3. **Subject.** The focal point. Pose, action, expression, materials. For people, lock in consistent traits (hair, build, age, distinguishing features).
+4. **Details.** Background, props, micro-details. For photoreal work, include a believable-imperfections list (a stray seed, a juice bead on a thumbnail, a paper-cut on the index finger). Imperfection is the difference between AI-photo and editorial-photo.
+5. **Quoted text.** Every literal string in the image, in double quotes, with exact punctuation, spacing, and casing. Be obsessive — `"Noon & Co."` not `Noon and Co`.
+6. **Constraints.** A dedicated block of what NOT to do. Typical entries: no drop shadows, no fake bokeh, no glare, no lens flare; no emoji, no SF Symbols, no Apple defaults; five fingers per hand, correct knuckle spacing, no fused anatomy; two type families only — name them; no QR codes, no URLs, no hashtags; no additional text beyond what is quoted.
+7. **Output dimensions.** Final line, always. Format: `Output in exactly [W]px x [H]px ([ratio]) [orientation].`
+
+## Output dimension defaults
+
+When the user does not provide dimensions, infer from intent:
+
+| Intent signal | Pixel dimensions | Ratio | Orientation |
+|---|---|---|---|
+| Generic / ad / hero | `1536px x 1024px` | 3:2 | landscape |
+| Square social card | `1024px x 1024px` | 1:1 | square |
+| Wide social card | `1792px x 1024px` | 7:4 | landscape |
+| Portrait phone screen | `1024px x 1792px` | 4:7 | portrait |
+| Magazine cover | `1024px x 1280px` | 4:5 | portrait |
+| Presentation slide | `1536px x 1024px` | 3:2 | landscape |
+| App icon | `1024px x 1024px` | 1:1 | square |
+
+State the targeted dimensions inside the prompt body itself. Codex's image tool reads the prompt and sizes accordingly.
+
+## Wrapping for Codex
+
+The drafted image prompt is the inner content. The subagent wraps it in a `<task>` block (per the `gpt-5-4-prompting` skill) instructing Codex to:
+
+- Use its native image generation tool.
+- Pass the inner `<image_prompt>` verbatim — no paraphrasing, no shortening, no "improvement."
+- Save the resulting PNG and print the absolute saved path on the last line of stdout.
+- If the slash command supplied `--out <path>`, also copy the saved PNG to that absolute path (creating the directory if needed) and print that path on the last line instead.
+- Generate fresh — do not use any prior image as a reference or seed.
+
+Codex's image tool handles the API call, file save, and path reporting. The subagent does not write or run any image-generation code itself.
+
+## What you are NOT doing
+
+- Not writing a script that calls an external image API. Codex's native tool handles it.
+- Not running discovery interviews. The slash command may have asked once. The subagent commits to a craft-grade prompt from whatever intent it received.
+- Not summarizing the prompt back. The subagent's only output is Codex's stdout.
+- Not editing the prompt after Codex returns. The prompt is the artifact.
+- Not chaining into other commands. This skill scopes a single forwarded `task` call.