Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions lib/chat/tools/__tests__/createPromptSandboxStreamingTool.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,28 @@ describe("createPromptSandboxStreamingTool", () => {
});
});

describe("description explains runId background task behavior", () => {
it("tells LLM not to interpret empty output when runId is present", () => {
const tool = createPromptSandboxStreamingTool("acc_1", "key_1");

expect(tool.description).toContain("runId");
expect(tool.description).toContain("background task");
expect(tool.description).toContain("do NOT");
});

it("tells LLM not to poll task status", () => {
const tool = createPromptSandboxStreamingTool("acc_1", "key_1");

expect(tool.description).toContain("Do NOT automatically poll");
});

it("tells LLM to let user know they can ask for status", () => {
const tool = createPromptSandboxStreamingTool("acc_1", "key_1");

expect(tool.description).toContain("ask you to check");
});
});

describe("description mentions release management", () => {
it("includes release management as a primary use case", () => {
const tool = createPromptSandboxStreamingTool("acc_1", "key_1");
Expand Down
7 changes: 6 additions & 1 deletion lib/chat/tools/createPromptSandboxStreamingTool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,12 @@ export function createPromptSandboxStreamingTool(
"file operations, data analysis, content generation, and any multi-step task. " +
"The sandbox has skills for managing RELEASE.md documents, generating deliverables, and more. " +
"Reuses the account's existing running sandbox or creates one from the latest snapshot. " +
"Streams output in real-time.",
"Streams output in real-time. " +
"IMPORTANT: When the result contains a `runId`, it means the sandbox is being set up for the first time " +
"and the command was dispatched to a background task. The output will be empty because the task is still running. " +
"The UI automatically shows a live progress view for background tasks — do NOT summarize or interpret the empty output. " +
"Simply tell the user their request is being processed in the sandbox and the results will appear in the task progress view above. " +
Comment on lines +56 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for UI components that handle runId or task progress views

echo "=== Searching for runId handling in UI/app components ==="
rg -n 'runId' --type=ts --type=tsx | grep -E '(app/|components/)' | head -20

echo ""
echo "=== Searching for task progress view components ==="
rg -n -i 'progress.*view|task.*progress|background.*task' --type=ts --type=tsx | grep -E '(app/|components/)' | head -20

echo ""
echo "=== Searching for Trigger.dev run subscription patterns ==="
rg -n 'useRealtimeRun|subscribeToRun|task.*status' --type=ts --type=tsx | head -20

echo ""
echo "=== Checking app directory structure ==="
find app -type f -name "*.ts" -o -name "*.tsx" | head -20

Repository: recoupable/api

Length of output: 975


🏁 Script executed:

#!/bin/bash
# Search for UI components in other common locations

echo "=== Checking for components directory ==="
find . -type d -name "components" | head -10

echo ""
echo "=== Checking for pages directory ==="
find . -type d -name "pages" | head -10

echo ""
echo "=== Search for runId in all TypeScript/JavaScript files ==="
rg 'runId' --type=ts | head -30

echo ""
echo "=== Check createPromptSandboxStreamingTool.ts for imports and context ==="
head -100 lib/chat/tools/createPromptSandboxStreamingTool.ts | grep -E '^import|^export|runId|progress'

echo ""
echo "=== Verify: does promptSandboxStreaming return runId? ==="
rg -A10 'return.*runId|runId.*:' lib/ | head -40

Repository: recoupable/api

Length of output: 6782


Remove or revise the inaccurate claim about the UI progress view.

The description references "The UI automatically shows a live progress view for background tasks," but this repository is a backend API-only codebase with no frontend UI components. The claim will mislead the LLM and ultimately confuse users about where results will appear.

Revise to:

"IMPORTANT: When the result contains a `runId`, it means the sandbox is being set up for the first time " +
"and the command was dispatched to a background task. The output will be empty because the task is still running. " +
"Do NOT summarize or interpret the empty output. " +
"Simply tell the user their request is being processed in the sandbox and the results will appear once the task completes."

The runId behavior itself is correctly documented based on the backend implementation, but remove references to UI components that don't exist in this codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/chat/tools/createPromptSandboxStreamingTool.ts` around lines 56 - 59,
Update the long instruction string in createPromptSandboxStreamingTool (the
multi-line message built around the `runId` behavior) to remove the inaccurate
"UI automatically shows a live progress view" claim and replace it with the
provided wording: state that when a `runId` is present the sandbox is being set
up, the output will be empty while the background task runs, do NOT summarize
the empty output, and tell the user the request is being processed and results
will appear once the task completes; locate the string construction in
createPromptSandboxStreamingTool.ts (the block concatenating multiple quoted
segments) and substitute the revised text exactly as suggested.

"Do NOT automatically poll or check the task status — instead, let the user know they can ask you to check on it whenever they want.",
inputSchema: promptSandboxSchema,
execute: async function* ({ prompt }, { abortSignal }) {
yield { status: "booting" as const, output: "" };
Expand Down