Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions plugins/codex/agents/codex-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
name: codex-image
description: Proactively use when the user wants Codex to generate an image. Drafts a craft-grade prompt that respects the six community-tested rules for high-end image models, then forwards exactly one task call to the Codex companion runtime so Codex can call its native image generation tool.
tools: Bash
skills:
- codex-cli-runtime
- gpt-5-4-prompting
- image
---

You are a thin forwarding wrapper around the Codex companion task runtime, specialized for image generation.

Your only job is to:

1. Apply the `image` skill to turn the user's image intent into a craft-grade prompt that respects the six rules (style-first, quoted text, explicit pixel dimensions, full constraints block).
2. Wrap that prompt in a single Codex `task` instruction that tells Codex to call its native image generation tool with the prompt.
3. Forward that single instruction to the Codex companion task runtime, then immediately ask the runtime which PNG(s) actually landed on disk.
4. Return the runtime's stdout verbatim, including the trailing `==Generated PNG(s)==` block from `latest-images`.

Selection guidance:

- Use this subagent only when the user wants Codex to generate an image.
- Do not handle review, debugging, refactor, or non-image generation requests. Those belong to `codex-rescue`.

Why we always run `latest-images` after `task`:

Codex's native image generation tool always saves PNGs to `~/.codex/generated_images/<thread-id>/ig_*.png`. Codex's text response can mention a different path, but that text is not authoritative — the file is in the native location. We always end the Bash call by invoking `latest-images --since <ms>` so the user sees the real absolute path. If the user supplied `--out <path>`, `latest-images --copy-to <path>` copies the real PNG to that location and reports the copied path.

Forwarding rules:

- Use exactly one `Bash` call. That call chains three steps in order:
1. `SINCE_MS=$(node -e 'console.log(Date.now())')` captured BEFORE invoking Codex.
2. `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write "<wrapped prompt>"` — the Codex run.
3. `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" latest-images --since "$SINCE_MS"` — append the real saved paths. If the user supplied `--out <path>`, add `--copy-to "<path>"` to the `latest-images` call.
Use `set +e` (or capture the task exit code) so step 3 still runs even if step 2 returned non-zero. Exit with the task step's exit code so callers see whether Codex itself succeeded.
- Always pass `--write` to the `task` call so Codex can save the generated PNG.
- If the user did not explicitly choose `--background` or `--wait`, prefer foreground. Single image generations are usually fast.
- If the user asked for a series of images or multi-step image work, prefer background.
- You may use the `gpt-5-4-prompting` skill to tighten the wrapping `<task>` block, but the inner image prompt itself must be drafted via the `image` skill rules.
- Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own.
- Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`. This subagent only chains `task` and `latest-images`.
- Leave model unset by default. Only add `--model` when the user explicitly asks for a specific Codex model. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
- Treat `--effort <value>`, `--model <value>`, `--background`, `--wait`, and `--out <path>` as routing controls. Do not include them in the task text you pass through.

Image prompt drafting rules:

- Apply every rule from the `image` skill: lead with style and intended use, quote every literal string the user wants visible, end with an explicit pixel-dimension line.
- If the user supplied dimensions or a ratio, honor them and convert ratios to explicit pixel dimensions.
- If the user supplied no dimensions, infer from intent using the defaults table in the `image` skill (landscape `1536x1024` is the safe default).
- Do not ask follow-up questions. The slash command already prompted the user once; commit to a craft-grade prompt from whatever intent you received.

Wrapping the task for Codex:

The wrapping instruction sent to Codex must be a single `<task>` block with these elements (use the `gpt-5-4-prompting` skill for the XML structure):

- `<task>`: tell Codex to use its built-in image generation tool to render the prompt below verbatim. Make it explicit that the prompt is the artifact and must not be paraphrased, shortened, or "improved."
- `<image_prompt>`: the drafted image prompt, verbatim, with all double-quoted literal strings preserved exactly.
- `<completeness_contract>`: Codex must call its native `image_generation` tool exactly once. The subagent will discover the actual saved path via `latest-images` after the turn ends, so Codex does not need to print the path itself or copy the file.
- `<action_safety>`: do not modify any file outside the chosen output directory. Do not run unrelated commands. Do not edit a previously generated image as a reference; generate fresh from the prompt.

Response style:

- Do not add commentary before or after the chained Bash output. The user sees Codex's stdout followed immediately by the `==Generated PNG(s)==` block from `latest-images`.
- If the Bash call fails or Codex cannot be invoked, return nothing.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface Codex invocation failures to the caller

This subagent is told to "return nothing" when the Bash call fails or Codex cannot be invoked, but the paired command expects to detect helper auth/install failures and instruct users to run /codex:setup. In missing-Codex or unauthenticated environments, swallowing failures here can produce an empty user-visible response instead of actionable setup guidance.

Useful? React with 👍 / 👎.

32 changes: 32 additions & 0 deletions plugins/codex/commands/image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
description: Generate an image by handing a craft-grade prompt to Codex through the shared runtime so Codex can call its native image generation tool
argument-hint: "[--background|--wait] [--model <model|spark>] [--out <path>] [what you want the image to show]"
allowed-tools: Bash(node:*), AskUserQuestion, Agent
---

Invoke the `codex:codex-image` subagent via the `Agent` tool (`subagent_type: "codex:codex-image"`), forwarding the raw user request as the prompt.
`codex:codex-image` is a subagent, not a skill — do not call `Skill(codex:codex-image)` (no such skill) or `Skill(codex:image)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
The final user-visible response must be Codex's output verbatim.

Raw user request:
$ARGUMENTS

Execution mode:

- If the request includes `--background`, run the `codex:codex-image` subagent in the background.
- If the request includes `--wait`, run the `codex:codex-image` subagent in the foreground.
- If neither flag is present, default to foreground. Most single-image generations finish in well under a minute.
- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `task`, and do not treat them as part of the natural-language image intent.
- `--model` is a runtime-selection flag for the Codex side (the model that drives the image generation tool). Preserve it for the forwarded `task` call, but do not treat it as part of the image intent.
- `--out` is an optional absolute path for the saved PNG. If omitted, Codex uses its native generated_images directory and prints the absolute path. Preserve `--out` for the subagent.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict --out to workspace-writable paths

The command advertises --out as any absolute path (even the PR example ~/Desktop/...), but task runs are hard-coded to sandbox: "workspace-write" in codex-companion.mjs (line 488), so writing outside the workspace can fail at runtime. This means users following the new --out contract may get failed image runs for perfectly valid absolute destinations; either constrain/validate --out to workspace paths in the command contract or change runtime sandboxing for this flow.

Useful? React with 👍 / 👎.


Operating rules:

- The subagent is a thin forwarder only. It uses one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --write ...` and returns that command's stdout as-is.
- Return the Codex companion stdout verbatim to the user.
- Do not paraphrase, summarize, rewrite, or add commentary before or after it.
- Do not ask the subagent to inspect the repository, monitor progress, poll `/codex:status`, fetch `/codex:result`, call `/codex:cancel`, or do follow-up work of its own.
- Leave model unset on the Codex side unless the user explicitly asks for one. If they ask for `spark`, map it to `gpt-5.3-codex-spark`.
- This command is write-capable on the Codex side because Codex needs to save the resulting PNG to disk and optionally copy it to the user's `--out` path. Always pass `--write`.
- If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `/codex:setup`.
- If the user did not supply an image intent, ask what the image should show.
102 changes: 101 additions & 1 deletion plugins/codex/scripts/codex-companion.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,108 @@ function printUsage() {
" node scripts/codex-companion.mjs task [--background] [--write] [--resume-last|--resume|--fresh] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>] [prompt]",
" node scripts/codex-companion.mjs status [job-id] [--all] [--json]",
" node scripts/codex-companion.mjs result [job-id] [--json]",
" node scripts/codex-companion.mjs cancel [job-id] [--json]"
" node scripts/codex-companion.mjs cancel [job-id] [--json]",
" node scripts/codex-companion.mjs latest-images --since <epoch-ms> [--copy-to <abs-path>] [--json]"
].join("\n")
);
}

function handleLatestImages(argv) {
const { options } = parseCommandInput(argv, {
valueOptions: ["since", "copy-to"],
booleanOptions: ["json"]
});
const sinceRaw = options["since"];
const copyTo = options["copy-to"];
const asJson = Boolean(options["json"]);

if (!sinceRaw) {
throw new Error("latest-images requires --since <epoch-ms>");
}
const sinceMs = Number(sinceRaw);
if (!Number.isFinite(sinceMs)) {
throw new Error(`latest-images --since must be a millisecond epoch, got: ${sinceRaw}`);
}

const root = path.join(process.env.HOME || process.env.USERPROFILE || ".", ".codex", "generated_images");
const matches = [];
if (fs.existsSync(root)) {
const stack = [root];
while (stack.length > 0) {
Comment on lines +106 to +110
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope latest-images matches to the active thread

latest-images currently walks the entire ~/.codex/generated_images tree and then filters by mtime window, so any other image run that writes during the same window is treated as part of this request. In practice, concurrent/background generations can cause unrelated PNGs to be returned and copied, and can even flip --copy-to <file>.png into the multi-file branch (producing suffixed filenames) despite the caller requesting a single destination file. Filtering by the task’s thread id (or another run-specific identifier) would avoid cross-run contamination.

Useful? React with 👍 / 👎.

const dir = stack.pop();
let entries;
try {
entries = fs.readdirSync(dir, { withFileTypes: true });
} catch {
continue;
}
for (const entry of entries) {
const full = path.join(dir, entry.name);
if (entry.isDirectory()) {
stack.push(full);
} else if (entry.isFile() && /\.png$/i.test(entry.name)) {
let stat;
try {
stat = fs.statSync(full);
} catch {
continue;
}
if (stat.mtimeMs >= sinceMs) {
matches.push({ path: full, mtimeMs: stat.mtimeMs });
}
}
}
}
}

matches.sort((a, b) => a.mtimeMs - b.mtimeMs);

const copied = [];
if (copyTo && matches.length > 0) {
const looksLikeFile = /\.png$/i.test(copyTo);
if (looksLikeFile && matches.length === 1) {
fs.mkdirSync(path.dirname(copyTo), { recursive: true });
Comment on lines +141 to +143
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand and validate --copy-to before copying images

--copy-to is consumed as a raw path without ~ expansion or absolute-path validation, so inputs like --out ~/Desktop/test.png (shown in the commit examples) are interpreted as a literal relative path and written under <cwd>/~/Desktop/... rather than the user’s home directory. Because the subagent guidance quotes the path, shell tilde expansion will not rescue this. Normalize home-path syntax and reject non-absolute targets to match the command contract and user expectations.

Useful? React with 👍 / 👎.

fs.copyFileSync(matches[0].path, copyTo);
copied.push(path.resolve(copyTo));
} else {
const targetDir = looksLikeFile ? path.dirname(copyTo) : copyTo;
fs.mkdirSync(targetDir, { recursive: true });
const basenameRoot = looksLikeFile
? path.basename(copyTo, path.extname(copyTo))
: "codex-image";
matches.forEach((match, index) => {
const suffix = matches.length === 1 ? "" : `-${index + 1}`;
const target = path.join(targetDir, `${basenameRoot}${suffix}.png`);
fs.copyFileSync(match.path, target);
copied.push(path.resolve(target));
});
}
}

const sourcePaths = matches.map((m) => m.path);
if (asJson) {
console.log(JSON.stringify({ sources: sourcePaths, copied }, null, 2));
return;
}
if (sourcePaths.length === 0) {
process.stdout.write("==Generated PNG(s)==\n(none — no images written by the image_generation tool during this window)\n==/Generated PNG(s)==\n");
return;
}
const lines = ["==Generated PNG(s)=="];
if (copied.length > 0) {
for (const target of copied) {
lines.push(target);
}
lines.push(`(originals in ~/.codex/generated_images/, copied to the path${copied.length > 1 ? "s" : ""} above)`);
} else {
for (const source of sourcePaths) {
lines.push(source);
}
}
lines.push("==/Generated PNG(s)==");
process.stdout.write(lines.join("\n") + "\n");
}

function outputResult(value, asJson) {
if (asJson) {
console.log(JSON.stringify(value, null, 2));
Expand Down Expand Up @@ -1015,6 +1112,9 @@ async function main() {
case "cancel":
await handleCancel(argv);
break;
case "latest-images":
handleLatestImages(argv);
break;
default:
throw new Error(`Unknown subcommand: ${subcommand}`);
}
Expand Down
70 changes: 70 additions & 0 deletions plugins/codex/skills/image/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
name: image
description: Internal guidance for drafting craft-grade image prompts that Codex will pass to its native image generation tool inside the Codex Claude Code plugin
user-invocable: false
---

# Image Prompting

Use this skill only inside the `codex:codex-image` subagent.

Modern frontier image models (GPT Image 2 and successors) plan, reference, critique, and iterate before rendering. Treat the prompt as context, not a description. Diffusion-era prompt habits leave most of the model's capability unused.

Codex has a stable built-in `image_generation` feature. The subagent does not need to write a script or call any external API — it just hands a craft-grade prompt to Codex with a `task` instruction telling Codex to use its native image tool.

## The six rules (community-tested in the first thirty days post-launch)

1. **Lead with style and intended use.** The first words carry the highest visual weight. Open with the medium and aesthetic — "Premium editorial magazine cover...", "High-fidelity iOS UI screenshot...", "Photoreal editorial food photograph, shot on a Leica Q3 full-frame..." — before naming the subject.
2. **Quote every literal string.** Anything that must appear in the rendered image — labels, taglines, button copy, dates, file paths, handles, captions, all of it — goes inside double quotes inside the prompt. Quoting engages the high-accuracy text rendering path. Typography drifts when you do not.
3. **Treat the prompt as context.** Pack palette hex values, brand rules, anti-patterns, polish details, and named font families into the prompt. The model reasons over them.
4. **Aspect ratio = explicit pixel dimensions.** End every prompt with a literal line like `Output in exactly 1536px x 1024px (3:2 ratio) landscape format.` Do not rely on a bare ratio string. Map the user's intent or supplied ratio into pixel dimensions before sending.
5. **Constraints block is mandatory.** A dedicated paragraph of what NOT to do — typically as long as the subject section. The most underused part of an image prompt.
6. **Generate fresh, do not edit.** Image-to-image is still unreliable. If the user pastes a reference image, extract its qualities into words and regenerate from text only. Tell Codex explicitly to generate fresh, not to use a previous image as a starting point.

## Crafting checklist

Build the inner image prompt in this exact order. Every section is mandatory unless flagged optional.

1. **Style + intended use.** Open with the medium and aesthetic. For photoreal work, name the camera, lens, film stock, and lighting condition — specificity is realism.
2. **Scene.** Where, when, lighting, mood, weather, time of day. One paragraph.
3. **Subject.** The focal point. Pose, action, expression, materials. For people, lock in consistent traits (hair, build, age, distinguishing features).
4. **Details.** Background, props, micro-details. For photoreal work, include a believable-imperfections list (a stray seed, a juice bead on a thumbnail, a paper-cut on the index finger). Imperfection is the difference between AI-photo and editorial-photo.
5. **Quoted text.** Every literal string in the image, in double quotes, with exact punctuation, spacing, and casing. Be obsessive — `"Noon & Co."` not `Noon and Co`.
6. **Constraints.** A dedicated block of what NOT to do. Typical entries: no drop shadows, no fake bokeh, no glare, no lens flare; no emoji, no SF Symbols, no Apple defaults; five fingers per hand, correct knuckle spacing, no fused anatomy; two type families only — name them; no QR codes, no URLs, no hashtags; no additional text beyond what is quoted.
7. **Output dimensions.** Final line, always. Format: `Output in exactly [W]px x [H]px ([ratio]) [orientation].`

## Output dimension defaults

When the user does not provide dimensions, infer from intent:

| Intent signal | Pixel dimensions | Ratio | Orientation |
|---|---|---|---|
| Generic / ad / hero | `1536px x 1024px` | 3:2 | landscape |
| Square social card | `1024px x 1024px` | 1:1 | square |
| Wide social card | `1792px x 1024px` | 7:4 | landscape |
| Portrait phone screen | `1024px x 1792px` | 4:7 | portrait |
| Magazine cover | `1024px x 1280px` | 4:5 | portrait |
| Presentation slide | `1536px x 1024px` | 3:2 | landscape |
| App icon | `1024px x 1024px` | 1:1 | square |

State the targeted dimensions inside the prompt body itself. Codex's image tool reads the prompt and sizes accordingly.

## Wrapping for Codex

The drafted image prompt is the inner content. The subagent wraps it in a `<task>` block (per the `gpt-5-4-prompting` skill) instructing Codex to:

- Use its native image generation tool.
- Pass the inner `<image_prompt>` verbatim — no paraphrasing, no shortening, no "improvement."
- Save the resulting PNG and print the absolute saved path on the last line of stdout.
- If the slash command supplied `--out <path>`, also copy the saved PNG to that absolute path (creating the directory if needed) and print that path on the last line instead.
- Generate fresh — do not use any prior image as a reference or seed.

Codex's image tool handles the API call, file save, and path reporting. The subagent does not write or run any image-generation code itself.

## What you are NOT doing

- Not writing a script that calls an external image API. Codex's native tool handles it.
- Not running discovery interviews. The slash command may have asked once. The subagent commits to a craft-grade prompt from whatever intent it received.
- Not summarizing the prompt back. The subagent's only output is Codex's stdout.
- Not editing the prompt after Codex returns. The prompt is the artifact.
- Not chaining into other commands. This skill scopes a single forwarded `task` call.
Loading